# Apple Podcast Review Scraping with the app_store_scraper

This program is a wrapper for scraping the Apple Podcast Reviews with the **app-store-scraper** (thank you Eric Lim, see https://pypi.org/project/app-store-scraper/, MIT license). It was adapted for use in teaching at Maastricht University by Monika Barget and Arnoud Wils in 2023.

The main script is kept as lean as possible to make it easy to use for students without previous coding experience. To use the script, carefully read and follow the instructions below.

## Install and import modules

This section ensures that your script has all the necessary functionalities. Just select the grey box below and click on the black arrow in the tool bar. Wait for the completion message before you continue!

In [18]:
!pip install --upgrade pip
!pip install urllib3
!pip install app-store-scraper

from pprint import pprint
import os
import re
import pandas as pd
import urllib3
import numpy as np

print("Installations and package import complete!")

Installations and package import complete!


## Define data input

In this section, you may need to adjust a few things, depending on your research project. In the first grey box below, country codes for the Apple Store are imported. It is recommended to use all country codes available, but if you want to limit them, you can do so in the separate applestore_country_codes.py file. The second grey box has code that checks what reviews are available for selected podcasts. Here, you may need to insert the app IDs and app names for your own selected podcasts. The existing values can be used as a test to see if the script works.

In [19]:
# import Apple Store country codes from separate file
from applestore_country_codes import select_countries 

countries = select_countries()

# display items in list
print("The country codes have been successfully loaded: ", countries)

The country codes have been successfully loaded:  ['DZ', 'AO', 'AI', 'AR', 'AM', 'AU', 'AT', 'AZ', 'BH', 'BB', 'BY', 'BE', 'BZ', 'BM', 'BO', 'BW', 'BR', 'VG', 'BN', 'BG', 'CA', 'KY', 'CL', 'CN', 'CO', 'CR', 'HR', 'CY', 'CZ', 'DK', 'DM', 'EC', 'EG', 'SV', 'EE', 'FI', 'FR', 'DE', 'GH', 'GB', 'GR', 'GD', 'GT', 'GY', 'HN', 'HK', 'HU', 'IS', 'IN', 'ID', 'IE', 'IL', 'IT', 'JM', 'JP', 'JO', 'KE', 'KW', 'LV', 'LB', 'LT', 'LU', 'MO', 'MG', 'MY', 'ML', 'MT', 'MU', 'MX', 'MS', 'NP', 'NL', 'NZ', 'NI', 'NE', 'NG', 'NO', 'OM', 'PK', 'PA', 'PY', 'PE', 'PH', 'PL', 'PT', 'QA', 'MK', 'RO', 'RU', 'SA', 'SN', 'SG', 'SK', 'SI', 'ZA', 'KR', 'ES', 'LK', 'SR', 'SE', 'CH', 'TW', 'TZ', 'TH', 'TN', 'TR', 'UG', 'UA', 'AE', 'US', 'UY', 'UZ', 'VE', 'VN', 'YE']


In [20]:
# Define a list of App Store items with app_id and app_name for scraping
# remove or add lines within the podcast list if needed
podcasts = [
    {"app_id": 1453181438, "app_name": 'black-women-talk-tech-podcast'},
    {"app_id": 1437754426, "app_name": 'sh-t-women-think-about'},
    {"app_id": 1537830674, "app_name": 'holistic-womens-health-hormones-endometriosis-pcos'}
]

# URL structure of typical App Store item:
# https://podcasts.apple.com/us/podcast/black-women-talk-tech-podcast/id1453181438
# copy ID and podcast name from your URL

# Standard URL for Apple Podcasts
base_url = "https://podcasts.apple.com/us/podcast/"

# Important: country codes will be selected from the list above

# Set output path
path_out = "../data/output/"

print("Podcasts defined!")

Podcasts defined!


## Validate data and collect reviews

Here, you only need to run the code below and monitor the output. No changes in the script are required from your side. If you encounter an error, let your tutor know. A common mistake is that the external scripts called here are not in the right place and cannot be found.

In [21]:
# Loop through the podcasts and country codes

from verify_countries import pool_checks
from scrape_reviews import scrape_reviews

for podcast in podcasts:
    app_id = podcast['app_id']
    app_name = podcast['app_name']

    # Construct full URL for each podcast
    podcast_url = f"{base_url}{app_name}/id{app_id}"

    # Output the URL to check if it's correct
    print(f"Scraping URL: {podcast_url}")

    # Create the filename for each podcast
    filename_csv = f'{app_name}_reviews_table.csv'
    file_csv = path_out + filename_csv

    # Print the output file path
    print(f"Saving reviews to: {file_csv}")

    # Iterate over all podcasts to check available countries
    countries_reviewed = pool_checks(podcast_url, countries)

    print("The following countries have reviews:", countries_reviewed)
    
    # run function to collect reviews for selected countries
    all_reviews = scrape_reviews(countries_reviewed, app_name, app_id)
    
    print("All reviews collected for ", podcast, "!")

    # NOTE: the review count seen on the landing page differs from the actual number of reviews fetched.
    # This is simply because only some users who rated the app also leave reviews.
    

Scraping URL: https://podcasts.apple.com/us/podcast/black-women-talk-tech-podcast/id1453181438
Saving reviews to: ../data/output/black-women-talk-tech-podcast_reviews_table.csv
The following countries have reviews: ['PK', 'US']
Scraping reviews for black-women-talk-tech-podcast in country PK


2024-09-19 20:24:54,215 [INFO] Base - Initialised: Podcast('pk', 'black-women-talk-tech-podcast', 1453181438)
2024-09-19 20:24:54,216 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/pk/podcast/black-women-talk-tech-podcast/id1453181438
2024-09-19 20:24:54,428 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:24:54,429 [INFO] Base - [id:1453181438] Fetched 0 reviews (0 fetched in total)
2024-09-19 20:24:54,492 [INFO] Base - Initialised: Podcast('us', 'black-women-talk-tech-podcast', 1453181438)
2024-09-19 20:24:54,493 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/us/podcast/black-women-talk-tech-podcast/id1453181438


No. of reviews found for country PK: 0
No reviews found for country PK.
Scraping reviews for black-women-talk-tech-podcast in country US


2024-09-19 20:24:54,648 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:24:54,649 [INFO] Base - [id:1453181438] Fetched 0 reviews (0 fetched in total)


No. of reviews found for country US: 0
No reviews found for country US.
All reviews collected for  {'app_id': 1453181438, 'app_name': 'black-women-talk-tech-podcast'} !
Scraping URL: https://podcasts.apple.com/us/podcast/sh-t-women-think-about/id1437754426
Saving reviews to: ../data/output/sh-t-women-think-about_reviews_table.csv
The following countries have reviews: ['PK', 'US']
Scraping reviews for sh-t-women-think-about in country PK


2024-09-19 20:25:02,008 [INFO] Base - Initialised: Podcast('pk', 'sh-t-women-think-about', 1437754426)
2024-09-19 20:25:02,009 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/pk/podcast/sh-t-women-think-about/id1437754426
2024-09-19 20:25:02,173 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:02,174 [INFO] Base - [id:1437754426] Fetched 0 reviews (0 fetched in total)
2024-09-19 20:25:02,261 [INFO] Base - Initialised: Podcast('us', 'sh-t-women-think-about', 1437754426)
2024-09-19 20:25:02,262 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/us/podcast/sh-t-women-think-about/id1437754426


No. of reviews found for country PK: 0
No reviews found for country PK.
Scraping reviews for sh-t-women-think-about in country US


2024-09-19 20:25:02,458 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:02,459 [INFO] Base - [id:1437754426] Fetched 0 reviews (0 fetched in total)


No. of reviews found for country US: 0
No reviews found for country US.
All reviews collected for  {'app_id': 1437754426, 'app_name': 'sh-t-women-think-about'} !
Scraping URL: https://podcasts.apple.com/us/podcast/holistic-womens-health-hormones-endometriosis-pcos/id1537830674
Saving reviews to: ../data/output/holistic-womens-health-hormones-endometriosis-pcos_reviews_table.csv


2024-09-19 20:25:09,220 [INFO] Base - Initialised: Podcast('ca', 'holistic-womens-health-hormones-endometriosis-pcos', 1537830674)
2024-09-19 20:25:09,221 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/ca/podcast/holistic-womens-health-hormones-endometriosis-pcos/id1537830674


The following countries have reviews: ['CA', 'PK', 'ZA', 'US']
Scraping reviews for holistic-womens-health-hormones-endometriosis-pcos in country CA


2024-09-19 20:25:09,426 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:09,426 [INFO] Base - [id:1537830674] Fetched 0 reviews (0 fetched in total)


No. of reviews found for country CA: 0
No reviews found for country CA.
Scraping reviews for holistic-womens-health-hormones-endometriosis-pcos in country PK


2024-09-19 20:25:09,827 [INFO] Base - Initialised: Podcast('pk', 'holistic-womens-health-hormones-endometriosis-pcos', 1537830674)
2024-09-19 20:25:09,828 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/pk/podcast/holistic-womens-health-hormones-endometriosis-pcos/id1537830674
2024-09-19 20:25:10,005 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:10,005 [INFO] Base - [id:1537830674] Fetched 0 reviews (0 fetched in total)
2024-09-19 20:25:10,071 [INFO] Base - Initialised: Podcast('za', 'holistic-womens-health-hormones-endometriosis-pcos', 1537830674)
2024-09-19 20:25:10,072 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/za/podcast/holistic-womens-health-hormones-endometriosis-pcos/id1537830674


No. of reviews found for country PK: 0
No reviews found for country PK.
Scraping reviews for holistic-womens-health-hormones-endometriosis-pcos in country ZA


2024-09-19 20:25:10,228 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:10,229 [INFO] Base - [id:1537830674] Fetched 0 reviews (0 fetched in total)
2024-09-19 20:25:10,293 [INFO] Base - Initialised: Podcast('us', 'holistic-womens-health-hormones-endometriosis-pcos', 1537830674)
2024-09-19 20:25:10,294 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/us/podcast/holistic-womens-health-hormones-endometriosis-pcos/id1537830674


No. of reviews found for country ZA: 0
No reviews found for country ZA.
Scraping reviews for holistic-womens-health-hormones-endometriosis-pcos in country US


2024-09-19 20:25:10,451 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:25:10,452 [INFO] Base - [id:1537830674] Fetched 0 reviews (0 fetched in total)


No. of reviews found for country US: 0
No reviews found for country US.
All reviews collected for  {'app_id': 1537830674, 'app_name': 'holistic-womens-health-hormones-endometriosis-pcos'} !


In [22]:
## DEBUGGING SECTION TO TEST WHY NO REVIEWS CAN BE COLLECTED ALTHOUGH COUNTRIES ARE CORRECT

# TEST PODCAST WITH MANY INTERNATIONAL REVIEWS:
# https://podcasts.apple.com/gb/podcast/crime-junkie/id1322200189

# get reviews for selected countries
import pandas as pd
from app_store_scraper import Podcast
from pprint import pprint

app_name="crime-junkie"
app_id="1322200189"
countries_reviewed=["us", "gb"]

def scrape_reviews(countries_reviewed, app_name, app_id):
    all_reviews = []
    for c in countries_reviewed:
        print(f"Scraping reviews for {app_name} in country {c}")
        
        try:
            sysk = Podcast(country=c, app_name=app_name, app_id=app_id)
            sysk.review()
            response = sysk._response  # Access the response object directly
            print("Response Status Code:", response.status_code)
        except Exception as e:
            print(f"An error occurred for country {c}: {e}")

            if sysk.reviews_count > 0:
                podcastdf = pd.DataFrame.from_dict(sysk.reviews)
                print(f"Retrieved {len(podcastdf)} reviews for country {c}.")
                all_reviews.append(podcastdf)
            else:
                print(f"No reviews found for country {c}.")
                
        except Exception as e:
            print(f"An error occurred while scraping for country {c}: {e}")
            if isinstance(e, ValueError):
                print(f"Response text: {sysk._response.text}")  # Print response for debugging

    return all_reviews

scrape_reviews(countries_reviewed, app_name, app_id)




Scraping reviews for crime-junkie in country us


2024-09-19 20:27:12,702 [INFO] Base - Initialised: Podcast('us', 'crime-junkie', 1322200189)
2024-09-19 20:27:12,703 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/us/podcast/crime-junkie/id1322200189
2024-09-19 20:27:13,177 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:27:13,178 [INFO] Base - [id:1322200189] Fetched 0 reviews (0 fetched in total)


Response Status Code: 401
Scraping reviews for crime-junkie in country gb


2024-09-19 20:27:14,391 [INFO] Base - Initialised: Podcast('gb', 'crime-junkie', 1322200189)
2024-09-19 20:27:14,393 [INFO] Base - Ready to fetch reviews from: https://podcasts.apple.com/gb/podcast/crime-junkie/id1322200189
2024-09-19 20:27:14,555 [ERROR] Base - Something went wrong: Expecting value: line 1 column 1 (char 0)
2024-09-19 20:27:14,556 [INFO] Base - [id:1322200189] Fetched 0 reviews (0 fetched in total)


Response Status Code: 401


[]

In [23]:
## CHECK IF URLS CAN BE CALLED VIA REQUESTS

import requests

country = 'us'
app_name = 'sh-t-women-think-about'
app_id = '1437754426'

url = f"https://podcasts.apple.com/{country}/podcast/{app_name}/id{app_id}"
response = requests.get(url)

print("Manual Request Status Code:", response.status_code)
print("Manual Request Response Text:", response.text)


Manual Request Status Code: 200
Manual Request Response Text: <!DOCTYPE html>
<html dir="ltr" lang="en-US">
    <head>
        <meta charset="utf-8" />
        <meta http-equiv="X-UA-Compatible" content="IE=edge" />
        <meta name="viewport" content="width=device-width,initial-scale=1" />
        <meta name="applicable-device" content="pc,mobile" />
        <meta name="referrer" content="strict-origin" />

        <link
            rel="apple-touch-icon"
            sizes="180x180"
            href="/assets/favicon/favicon-180.png"
        />
        <link
            rel="icon"
            type="image/png"
            sizes="32x32"
            href="/assets/favicon/favicon-32.png"
        />
        <link
            rel="icon"
            type="image/png"
            sizes="16x16"
            href="/assets/favicon/favicon-16.png"
        />
        <link
            rel="mask-icon"
            href="/assets/favicon/favicon.svg"
            color="#7e50df"
        />
        <link

## Export to CSV

In [None]:
    # Export to .csv
    df_final = pd.concat(all_reviews)
    print("Your final dataframe has", len(df_final), "rows.")

    df_final.to_csv(file_csv, index=False, sep="\t")
    print(f'Exported to {file_csv}')