Various tools used for collecting data for AppGoblin's free ASO app marketing tools and app SDK lists. These tools are used to build AppGoblin. This is a large monorepo combining other popular tools along with a database schema to store them in.
Scrapers:
- pull apps from app store and google play store top lists digitalmethodsinitiative/itunes-app-scraper & facundoolano/google-play-scraper
- pull apps from some 3rd party stores
- unzip/decompile Android APKs and iOS IPAs to look for 3rd party tracking/advertising tools, requires manual setup of iBotPeaches/apktool and majd/ipatool
- App-ads.txt files are crawled based on the Interactive Advertising Bureau's Tech Lab specs. https://iabtechlab.com/ads-txt/
- Implementation of Waydroid emulator with MITM to capture HTTPs API traffic (requires manual steps for the headless implementation, please reach out if you need help)
Note: This project is not a one click setup but feel free to reach out for help.
- Requires Object Storage / S3
- PostgreSQL: 17/18
- Setup your database. The files here use the database name 'madrone'
- Add a password to your default db user if you dont have one yet
ALTER USER postgres WITH PASSWORD 'xxx'; - Python environment: Python 3.13
- Setup python environment
python3.12 -m venv .virtualenv&source .virtualenv/bin/activate uv pip install -r pyproject.tomlcp example_config.toml ~/config/adscrawler/config.tomland edit any needed values. For using all locally, the main thing that needs to be modified is thexxxfor postgres pass and S3 host.- In your virtualenv, init db
python db_init.py-> Initializes MVs, inserts 3m+ apps' store_ids from https://github.com/ddxv/appgoblin-data - Google Play App Ranks Require: NodeJS
npm install --save google-play-scraper- an S3 bucket used by app ranks, APK/IPA download, MITM
-
From your environment run
python main.pyto check setup. This will also help verify the database connection is working. -
-l, --limit-processesEnsure only one instance of adscrawler runs. If another instance is detected, the script will not run. This includes some some options like-p apple -
-t, --use-ssh-tunnelInclude to use SSH port forwarding to connect to a remote PostgreSQL database based on your~/.config/adscrawler/config.toml -
-p, --platformsSpecify platforms to target. Can be"google","apple", or both. Can be repeated multiple times. Default:[].
-
-u, --update-app-store-detailsScrape app stores for app details (e.g., downloads, ratings). Requires existing store IDs. -
--country-priority-groupDefault is 1 for US. Other groups for crawling can be configured in db. -
--workersNumber of workers to use for updating app store details. Default:1. -
-n, --new-apps-checkThis crawls app rank data and stores to S3. It is also the source of new apps. Crawl the iTunes and Play Store front pages to discover new apps. Checks top apps for each category and collection. -
--daily-s3-importsThis processes and imports data stored in S3. App Ranks and app metrics. -
-d, --new-apps-check-devsCrawl developers' pages to find new apps from those developers. -
--limit-query-rowsNumber of rows per run, default 200,000.
-a, --app-ads-txt-scrapeCrawl developer URLs forapp-ads.txtfiles. Requires store IDs and app details to be scraped first.
-
--download-apksDownload APK files for Android or iOS apps. -
--process-sdksProcess APKs, IPAs, and manifest files to extract SDK information. -
-k, --crawl-keywordsCrawl keywords from app stores.
-
-w, --waydroidRun apps using Waydroid. -
-s, --store-idWaydroid specific. Launch a specific store ID in Waydroid. -
--timeout-waydroidWaydroid specific. Timeout in seconds for Waydroid to run an app. Default:180. -
--redownload-geo-dbsWaydroid specific. Redownload geo databases. -
--creative-scan-all-appsScan all MITM files in an S3 bucket apps for creatives.