<a href="https://colab.research.google.com/github/EventHopper/ReviewsScraping/blob/master/Eventbrite_App_Review.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Eventbrite Competitive Review**


---


### **Application**
**Eventbrite**

Details:

Size: 15MB,
Installs: 10M+,
Requires Android 5+

---



### **Review Rationale**

*We want to review feedback for this app on the Google Play Store and AppStore. Note that both negative and positive are good, however the negative reviews can reveal critical features that are missing or significant bugs (if they appear frequently).*
<br>

### **Packages Used**

**Google Play Store Related Packages:**

 - [google-play-scraper](https://github.com/JoMingyu/google-play-scraper)

**AppStore Related Packages:**

- TBA (to be added)

# Project Init

In [0]:
!pip install -qq google-play-scraper

[?25l[K     |██████▉                         | 10kB 23.2MB/s eta 0:00:01[K     |█████████████▋                  | 20kB 6.4MB/s eta 0:00:01[K     |████████████████████▍           | 30kB 7.5MB/s eta 0:00:01[K     |███████████████████████████▎    | 40kB 8.4MB/s eta 0:00:01[K     |████████████████████████████████| 51kB 3.9MB/s 
[?25h  Building wheel for google-play-scraper (setup.py) ... [?25l[?25hdone


In [0]:
!pip install -qq -U watermark

In [0]:
%reload_ext watermark
%watermark -v -p pandas,matplotlib,seaborn,google_play_scraper

CPython 3.6.9
IPython 5.5.0

pandas 1.0.4
matplotlib 3.2.1
seaborn 0.10.1
google_play_scraper 0.0.2.6


  import pandas.util.testing as tm


In [0]:
import json
import time
import pandas as pd
from tqdm import tqdm

import seaborn as sns
import matplotlib.pyplot as plt

from pygments import highlight
from pygments.lexers import JsonLexer
from pygments.formatters import TerminalFormatter

from google_play_scraper import Sort, reviews, app

%matplotlib inline
%config InlineBackend.figure_format='retina'

sns.set(style='whitegrid', palette='muted', font_scale=1.2)

# Data Wrangling

## Google Play Store Scraping

Step Summary:

- Scrape Google Play app information
- Scrape user reviews for Google Play apps
- Save the dataset to CSV files

Package name of app: com.facebook.Socal
Obtained using ADB shell command


```
$ adb shell "pm list packages -f facebook"
```

In [0]:
app_packages = [
  'com.eventbrite.attendee'
]

Utilized package name in for loop (note multiple package names could have been added to app_packages for same result)

In [0]:
app_infos = []

for ap in tqdm(app_packages):
  info = app(ap, lang='en', country='us')
  del info['comments']
  app_infos.append(info)

100%|██████████| 1/1 [00:00<00:00,  7.59it/s]


We got the info for the app. Let's write a helper function that prints JSON objects a bit better:

In [0]:
def print_json(json_object):
  json_str = json.dumps(
    json_object, 
    indent=2, 
    sort_keys=True, 
    default=str
  )
  print(highlight(json_str, JsonLexer(), TerminalFormatter()))

Here is a sample app information from the list:

In [0]:
print_json(app_infos[0])

{
  [34;01m"adSupported"[39;49;00m: [34mnull[39;49;00m,
  [34;01m"androidVersion"[39;49;00m: [33m"5.0"[39;49;00m,
  [34;01m"androidVersionText"[39;49;00m: [33m"5.0 and up"[39;49;00m,
  [34;01m"appId"[39;49;00m: [33m"com.eventbrite.attendee"[39;49;00m,
  [34;01m"containsAds"[39;49;00m: [34mnull[39;49;00m,
  [34;01m"contentRating"[39;49;00m: [33m"Everyone"[39;49;00m,
  [34;01m"contentRatingDescription"[39;49;00m: [34mnull[39;49;00m,
  [34;01m"currency"[39;49;00m: [33m"USD"[39;49;00m,
  [34;01m"description"[39;49;00m: [33m"Discover upcoming events near you and get personalized recommendations. Stay up on the latest for popular events like concerts, festivals, yoga classes, holiday events on New Year\u2019s Eve or Halloween and networking events. Find something fun to hit up by date, time and location. Buy tix and keep \u2018em handy on your mobile device to make check-in nice and easy. Ready to explore?\r\n\r\nWith the Eventbrite app you can:\r\n\r\n\u20

This contains lots of information including the number of ratings, number of reviews and number of ratings for each score (1 to 5). Let's ignore all of that and have a look at their beautiful icons:

In [0]:
def format_title(title):
  sep_index = title.find(':') if title.find(':') != -1 else title.find('-')
  if sep_index != -1:
    title = title[:sep_index]
  return title[:10]

fig, axs = plt.subplots(2, len(app_infos) // 2, figsize=(14, 5))

for i, ax in enumerate(axs.flat):
  ai = app_infos[i]
  img = plt.imread(ai['icon'])
  ax.imshow(img)
  ax.set_title(format_title(ai['title']))
  ax.axis('off')

<Figure size 1008x360 with 0 Axes>

In [0]:
timestr = time.strftime("%Y%m%d-%H%M%S")
app_infos_df = pd.DataFrame(app_infos)
app_infos_df.to_csv('app_details'+timestr+'.csv', index=None, header=True)

### Scraping App Reviews

In an ideal world, we would get all the reviews. But there are lots of them and we're scraping the data. That wouldn't be very polite. What should we do?

We want:

- Balanced dataset - roughly the same number of reviews for each score (1-5)
- A representative sample of the reviews for each app

We can satisfy the first requirement by using the scraping package option to filter the review score. For the second, we'll sort the reviews by their helpfulness, which are the reviews that Google Play thinks are most important. Just in case, we'll get a subset from the newest, too:

In [0]:
app_reviews = []

for ap in tqdm(app_packages):
  for score in list(range(1, 6)):
    for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]:
      rvs, _ = reviews(
        ap,
        lang='en',
        country='us',
        sort=sort_order,
        count= 200 if score == 3 else 100,
        filter_score_with=score
      )
      for r in rvs:
        r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest'
        r['appId'] = ap
      app_reviews.extend(rvs)

100%|██████████| 1/1 [00:04<00:00,  4.97s/it]


In [0]:
print_json(app_reviews[0])

{
  [34;01m"appId"[39;49;00m: [33m"com.eventbrite.attendee"[39;49;00m,
  [34;01m"at"[39;49;00m: [33m"2020-05-23 22:31:05"[39;49;00m,
  [34;01m"content"[39;49;00m: [33m"Bought two tix from a Facebook post that had the incorrect date. First card hung at the last part of the transaction. So I purchased with a different card. Both cards were charged. When I requested a refund there was never a response. Re-requested multiple times with no response or update. Had to reach out to my bank for one them and I was refunded one transaction. Still outstanding on the other bank."[39;49;00m,
  [34;01m"repliedAt"[39;49;00m: [34mnull[39;49;00m,
  [34;01m"replyContent"[39;49;00m: [34mnull[39;49;00m,
  [34;01m"reviewCreatedVersion"[39;49;00m: [33m"7.2.1"[39;49;00m,
  [34;01m"reviewId"[39;49;00m: [33m"gp:AOqpTOFwlZY0oEoPCb9g4ue9tGkfHtmtGfco9rwbIEOUhAeJu3zObhqufAo4Aifcr9QYKPDbxj_ZjPGXIGF1lg"[39;49;00m,
  [34;01m"score"[39;49;00m: [34m1[39;49;00m,
  [34;01m"sortOrder"[39;4

In [0]:
len(app_reviews)

1200

In [0]:
timestr = time.strftime("%Y%m%d-%H%M%S")
app_reviews_df = pd.DataFrame(app_reviews)
app_reviews_df.to_csv('EventbriteReviews-'+timestr+'.csv', index=None, header=True)

## AppStore Scraping

In [0]:
# TBA

# EDA

Next, we're going to use the reviews for sentiment analysis with BERT. But first, we'll have to do some text preprocessing!

In [0]:
# TBA

# Summary

TBA

# References


- [Brilliant tutorial for Google Play Store Scraping](https://www.curiousily.com/posts/create-dataset-for-sentiment-analysis-by-scraping-google-play-app-reviews-using-python/)