This is a script for scraping **reviews** from the **Apple Podcast** website.

First of all, you need to connect this Colab notebook with your Google Drive and define the directory for input and output data.

In [None]:
## mount drive
from google.colab import drive
drive.mount("/content/drive")
directory="/content/drive/My Drive/Colab_NLP_UM/"

Then we need to install additional packages. The most important package is the **App Store Scraper** for accessing reviews.

In [None]:
## install packages that are not part of Python's standard distribution

!pip install app_store_scraper 

Now we can import the packages and run the actual code. To scrape data from a specific podcast, we need to find the podcast name (cf. podcast URL) and the app id (also indicated in the pdocast ULR).

Example of an Apple podcast URL used in this script: https://podcasts.apple.com/us/podcast/black-girl-gone-a-true-crime-podcast/id1556267741?see-all=reviews

The reviews include the star-ratings, dates and reviewer names and are first of all written to the "review" column of a data frame.

In [None]:
# import packages

import pandas as pd
import numpy as np
import csv

from app_store_scraper import AppStore

# app_name = podcast name from URL
# app_id = podcast id from URL

podcast = AppStore(country='us', app_name='black-girl-gone-a-true-crime-podcast', app_id = '1556267741') 

podcast.review(how_many=7000) # change number if necessary

reviews=podcast.reviews

print("This podcast has ", len(reviews), "reviews.")

df = pd.DataFrame()

# columns in each review = date, review, rating, isEdited, userName, title

podcastdf = pd.DataFrame(np.array(podcast.reviews),columns=['review'])

display(podcastdf)

Now we can write the individual information from the dataframe to a CSV file on Google Drive for further processing.

In [None]:
data_file = open(directory+'podcast_reviews.csv', 'w') # change file name for each podcast you scrape
csv_writer = csv.writer(data_file)

for x in range (0, len(reviews)):
   p_dict = podcastdf.loc[[x]].values[0][0]
   # print(p_dict) optional output to check performance
   if x==0:
       header = p_dict.keys()
       csv_writer.writerow(header)
       csv_writer.writerow(p_dict.values())
   else:
       csv_writer.writerow(p_dict.values())

data_file.close()

Script provided by Monika Barget, Maastricht University, March 2022