# App Info Retriever

This notebook allows collecting app info from Google Play Store and Apple App Store using SerpAPI. In order to get access to SerpApi, you must do the following:

1. Register at https://serpapi.com/users/sign_up
2. Retrieve your API key at https://serpapi.com/manage-api-key
3. Save you API key to environment variables using  `export SERP_API_KEY=<your_key>` (or paste it in for the variable `serp_api_key` below)
4. Install the serpapi Python package: `pip install serpapi`  
5. Specify your search query and other search parameters according to your needs
6. Run the notebook

Results will be written as JSON files to the **data** directory. For each search query and app provider there will be a separate file.

In [2]:
import sys
import os
import json
from datetime import datetime

import serpapi

## Setup

In [None]:
serp_api_key = os.environ['SERP_API_KEY']
print(f'Using SerpApi key {serp_api_key}')

In the next cell fill in your desired search queries and change other parameters according to your needs.

In [26]:
queries = ['query_1', 'query_2', '...']
engines = ['apple_app_store', 'google_play']
countries = ['de']
languages = ['de-de']
disallow_explicit = False
max_pages = 50  # Maximum number of pages to be scraped (important for Google PlayStore due to infinity scroll functionality)

### Helper functions

In [27]:
def configure_params(query, engine, country, language, disallow_explicit=False, num=20, page=0):
    params = {
        'api_key': serp_api_key,       
        'engine': engine,           
        'device': 'mobile', 
        'country': country, 
        'lang': language, 
        'disallow_explicit': disallow_explicit,
        'num': num,                  
        'page': page,    
        'q': query,    # Used for Google Play Store
        'term': query  # Used for Apple App Store
    }
    return params

def save_results(results, query, engine, timestamp):
    output_fp = os.path.join('data', f'{engine}.{query}.{timestamp}.json')
    with open(output_fp, 'w') as f:
        f.write(json.dumps(results, indent=4, ensure_ascii=False))

## Run Scraper

In [None]:
client = serpapi.Client(api_key=serp_api_key)

# Iterate over queries, engines, countries and languages
for q in queries: 
    for e in engines:
        for c in countries:
            for l in languages:
                print(f'Retrieving results for query={q}, engine={e}, country={c}, language={l}')
                params = configure_params(query=q, engine=e, country=c, language=l)
                results = []
                page_idx = 1

                # Iterate over pages until `max_page`
                while page_idx <= max_pages:
                    page_idx += 1
                    print(f'Scraping page {params["page"]}...')
                    search = client.search(params)
                    new_page_results = search.as_dict()     # JSON -> Python dict                    
                    results.extend(new_page_results['organic_results'])
                
                    if 'next' in new_page_results.get('serpapi_pagination', {}):
                        params['page'] += 1
                    else:
                        break
                print(f'Saving results...')
                print('------------------')
                save_results(results, query=q, engine=e, timestamp=datetime.now())
print('Done.')