<a href="https://colab.research.google.com/github/MarkAvilin1/DS-and-ML/blob/main/GooglePLayParcer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Task
The goal of this issue is to collect a minimum of 1000 reviews for any 10 cryptocurrency-related applications (100 each). These could be the apps of the blockchain wallets, crypto custodians, or any crypto projects.

Collect reviews for the apps of your choice and make sure that you define where the reviews are coming from - Apple App Store or Google Play Store.
Identify sentiment score for each review using one of the existing sentiment analysis tools.
Identify either geolocation, or language for each review.
For the final deliverable, create:
CSV-file with the following structure: app name, username, timestamp, app review, text, sentiment, score, country (language), marketplace*
*marketplace - Apple App Store or Google Play Store

short report with graphs (based on the reviews that you collected) and basic descriptive statistics
Upon completion of the task:

Share your CSV-file and report with challenge@inca.digital.
Leave a comment in the issue, saying you’re done.

In [None]:
!pip install google-play-scraper

Collecting google-play-scraper
  Downloading google-play-scraper-1.0.5.tar.gz (52 kB)
[?25l[K     |██████▏                         | 10 kB 15.8 MB/s eta 0:00:01[K     |████████████▍                   | 20 kB 21.3 MB/s eta 0:00:01[K     |██████████████████▋             | 30 kB 25.4 MB/s eta 0:00:01[K     |████████████████████████▊       | 40 kB 27.2 MB/s eta 0:00:01[K     |███████████████████████████████ | 51 kB 29.9 MB/s eta 0:00:01[K     |████████████████████████████████| 52 kB 1.6 MB/s 
[?25hBuilding wheels for collected packages: google-play-scraper
  Building wheel for google-play-scraper (setup.py) ... [?25l[?25hdone
  Created wheel for google-play-scraper: filename=google_play_scraper-1.0.5-py3-none-any.whl size=24484 sha256=92f884fc32660acaa800aa4b3a7a3a30ee9f05b3ce7b02a5066dd824c15e969f
  Stored in directory: /root/.cache/pip/wheels/4a/26/18/48fda51f20c9e550c735fa6f3a6887dc8836f8d709a3cf8a9c
Successfully built google-play-scraper
Installing collected packages: g

In [None]:
from google_play_scraper import app, Sort, reviews
from textblob import TextBlob
import pandas as pd
import numpy as np

In [None]:
links = ['com.binance.dev', 'io.metamask', 
        'co.mona.android', 'com.kubi.kucoin', 
        'com.coinbase.android', 'com.coindcx.btc', 
        'io.cex.app.prod', 'co.bitx.android.wallet', 
        'com.coinmarketcap.android', 'com.coingecko.coingeckoapp']

num_links = len(links)

In [None]:
def get_title_app_rev(links):
  """Function to get app title and app reviews"""
  title = []
  app_review = []

  result = [app(link, lang='en', country='us') for link in links]

  for comments in result:
    title.append(comments['title'])
    app_review.append(comments['reviews'])

  return title, app_review

In [None]:
def get_data(links):
  """Function to get all needed data"""
  result = []
  for i in range(num_links):
    result.append(reviews(links[i], lang='en', country='us', sort=Sort.NEWEST, count=100, filter_score_with=5))

  return result

In [None]:
def sentiment_state(text):
  """Function to convert sentiment statement into words by using the percentage"""
  states = ['Positive', 'Neutral', 'Negative']
  blob = TextBlob(text)
  state = blob.sentiment.polarity
  if state > 0:
    return f'{states[0]} {round(state * 100, 2)} %'
  elif state < 0:
    return f'{states[2]} {round(state * 100, 2)} %'
  return f'{states[1]} {round(state * 100, 2)} %'

In [None]:
def collect_data(links):
  """Function to collect all needed data from google play by giving app links of google play store"""
  # app name, username, timestamp, app review, text, sentiment, score, country (language), marketplace
  app_name, app_review = get_title_app_rev(links)

  username = []
  timestamp = []
  text = []
  sentiment = []
  score = []
  country = ['US-en' for _ in range(1000)]
  marketplace = ['Google play store' for _ in range(1000)]

  reviews = get_data(links)

  for i in range(num_links):
    for j in range(100):
      for el in reviews[i][0][j]['userName'].split('\n'):
        username.append(el)
  for i in range(num_links):
    for j in range(100):
      for el in str(reviews[i][0][j]['at']).split('\n'):
        timestamp.append(str(el))
  for i in range(num_links):
    for j in range(100):
      for el in reviews[i][0][j]['content'].split('\n'):
        text.append(el)
  for i in range(num_links):
    for j in range(100):
      for el in str(reviews[i][0][j]['score']).split('\n'):
        score.append(el)
  for el in text:
    sentiment.append(sentiment_state(el))

  app_name = [name for name in app_name for _ in range(100)]
  app_review = [review for review in app_review for _ in range(100)]

  data = np.array([app_name, username, timestamp, app_review, text, sentiment, score, country, marketplace])

  return data

In [None]:
# Get all needed data 
data = collect_data(links)

In [None]:
# To be sure that all lists size are equal
print(data[0].shape, data[1].shape, data[2].shape, data[3].shape, 
      data[4].shape, data[5].shape, data[6].shape, data[7].shape, data[8].shape, )

(1000,) (1000,) (1000,) (1000,) (1000,) (1000,) (1000,) (1000,) (1000,)


In [None]:
columns = [f'{i}' for i in range(1000)]
index = ['App name', 'Username', 'Timestamp', 'App review', 'Text', 'Sentiment', 'Score', 'Country (language)', 'Marketplace']
df = pd.DataFrame(data, index=index, columns=columns)

In [None]:
df.to_csv('App_Reviews.csv', encoding='utf-8')

In [None]:
df = df.T

In [None]:
df.head(10)

Unnamed: 0,App name,Username,Timestamp,App review,Text,Sentiment,Score,Country (language),Marketplace
0,Binance: Buy BTC & 600+ Crypto,AHMAD KGN,2022-05-07 13:54:02,7556,good application,Positive 70.0 %,5,US-en,Google play store
1,Binance: Buy BTC & 600+ Crypto,Mouctar Mohamadou,2022-05-07 13:00:23,7556,I am very happy with this platform,Positive 100.0 %,5,US-en,Google play store
2,Binance: Buy BTC & 600+ Crypto,Mohammad Afaque Aslam,2022-05-07 12:56:37,7556,good,Positive 70.0 %,5,US-en,Google play store
3,Binance: Buy BTC & 600+ Crypto,Sammuel Okon,2022-05-07 12:35:56,7556,Cool Morra*****,Positive 35.0 %,5,US-en,Google play store
4,Binance: Buy BTC & 600+ Crypto,France Kayl,2022-05-07 12:32:58,7556,Wow❣️❣️,Neutral 0.0 %,5,US-en,Google play store
5,Binance: Buy BTC & 600+ Crypto,Mosam Shah,2022-05-07 12:20:44,7556,balance is a good place ☺️ but I don't see the...,Positive 70.0 %,5,US-en,Google play store
6,Binance: Buy BTC & 600+ Crypto,GAME ARENA,2022-05-07 12:19:45,7556,Easy to Use and Security No 1,Positive 43.33 %,5,US-en,Google play store
7,Binance: Buy BTC & 600+ Crypto,Asad awis,2022-05-07 11:59:13,7556,nice,Positive 60.0 %,5,US-en,Google play store
8,Binance: Buy BTC & 600+ Crypto,Damir Turk,2022-05-07 11:54:03,7556,good app with good security,Positive 70.0 %,5,US-en,Google play store
9,Binance: Buy BTC & 600+ Crypto,adzi fhm,2022-05-07 11:52:12,7556,Hiya hiya hiyaaa,Neutral 0.0 %,5,US-en,Google play store


In [None]:
df.to_csv('App_Reviews.csv', encoding='utf-8')