diff --git a/.gitignore b/.gitignore index 68bc17f..f1e94a0 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,5 @@ +Pipfile* + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] diff --git a/README.md b/README.md index 2801cc0..65d6714 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,13 @@ Jupyter Notebooks | **Notebook** | **Description** | **Notebook links** | **Tags** | | ------------ | ------------------------------------------------- | ------------------------------------------- | ---------------------------------------- | -| Holehe | A tool to find accounts associated with an email | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/holehe.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fholehe.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/holehe.ipynb) | `community`, `digital-footprint-tracing` | -| Maigret | A tool to find accounts associated with a username | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/maigret.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fmaigret.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/maigret.ipynb) | `community`, `digital-footprint-tracing` | -| Deepface | A tool to do facial comparison and analysis | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/deepface.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fdeepface.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/deepface.ipynb) | `community`, `ai`, `image analysis` | +| Holehe | A tool to find accounts associated with an email | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/holehe.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fholehe.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/holehe.ipynb) | `community`, `digital-footprint-tracing` | +| Maigret | A tool to find accounts associated with a username | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/maigret.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fmaigret.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/maigret.ipynb) | `community`, `digital-footprint-tracing` | +| Deepface | A tool to do facial comparison and analysis | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/deepface.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fdeepface.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/deepface.ipynb) | `community`, `ai`, `image analysis` | +| Wayback Google Analytics | Uncover historical analytics ids via the Wayback Machine | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/bellingcat/wayback-google-analytics.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fbellingcat%2Fwayback-google-analytics.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/bellingcat/wayback-google-analytics.ipynb) | `bellingcat`, `wayback-machine`, `google-analytics` | + + + + +[colab-badge]: https://colab.research.google.com/assets/colab-badge.svg \ No newline at end of file diff --git a/notebooks/bellingcat/wayback-google-analytics.ipynb b/notebooks/bellingcat/wayback-google-analytics.ipynb new file mode 100644 index 0000000..3c5ba3d --- /dev/null +++ b/notebooks/bellingcat/wayback-google-analytics.ipynb @@ -0,0 +1,137 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Wayback Google Analytics\n", + "Gather historic Google analytics data (UA, GA, and GTM codes) from a collection of website urls.\n", + "\n", + "Read [more about the tool and google analytics codes](https://github.com/bellingcat/wayback-google-analytics/blob/main/README.md#about-the-project).\n", + "\n", + "#### [Read the article on bellingcat.com](https://www.bellingcat.com/resources/2024/01/09/using-the-wayback-machine-and-google-analytics-to-uncover-disinformation-networks/)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 1 - install the python package " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "outputs": [], + "source": [ + "!pip install wayback-google-analytics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2 - Get a full command line description by calling `help` \n", + "This will show all the command line options we can use and what each of them does" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "outputs": [], + "source": [ + "!wayback-google-analytics --help" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 3 - call the tool\n", + "Using the options from above, let's call the CLI to get all google analytics ids from 4 different websites:\n", + "- `https://yapatriot.ru`\n", + "- `https://zanogu.com`\n", + "- `https://whoswho.com.ua`\n", + "- `https://adamants.ru`\n", + "\n", + "Starting in Jan 1st 2015 (`-s 01/01/2015`) by checking changes on a yearly frequency (`-f yearly`) and save the results into an excel file (`-o xlsx`). \n", + "\n", + "The progress will be visible in the console and the results in the last row of the console. The excel file (or csv,txt,json if you change the command to that output) will be written to the `output/` folder." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "bat" + } + }, + "outputs": [], + "source": [ + "!wayback-google-analytics -u https://yapatriot.ru https://zanogu.com https://whoswho.com.ua https://adamants.ru -s 01/01/2015 -f yearly -o xlsx" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's an example output for one of the sites (`yapatriot.ru`) that has 2 historical UA codes (`UA-65087228-1` that is the 1st assigned id to the analytics account `UA-65087228` and `UA-53176102-14` which is the 14th assigned id to the `UA-53176102` analytics account):\n", + "```\n", + "[{\n", + "\t'https://yapatriot.ru': {\n", + "\t\t'archived_UA_codes': {\n", + "\t\t\t'UA-65087228-1': {\n", + "\t\t\t\t'first_seen': '20/01/2017:03:55',\n", + "\t\t\t\t'last_seen': '30/06/2019:05:32'\n", + "\t\t\t},\n", + "\t\t\t'UA-53176102-14': {\n", + "\t\t\t\t'first_seen': '15/06/2015:19:36',\n", + "\t\t\t\t'last_seen': '15/06/2015:19:36'\n", + "\t\t\t}\n", + "\t\t},\n", + "\t\t'archived_GA_codes': {},\n", + "\t\t'archived_GTM_codes': {}\n", + "\t}\n", + "}\n", + "...\n", + "]\n", + "```\n", + "You can jump into other tools such as https://spyonweb.com/ and see that [`UA-53176102`](https://spyonweb.com/ua-53176102) is used on at least 4 other sites (`material-evidence.com` `news-region.ru` `syriainform.com` `whoswho.com.ua`) very likely belonging to the same owner. To note that between 2016-2017 this page started using `UA-65087228` instead (this would be the code in the live website), a fresh UA id which would not reveal the connection we uncovered with the historical UA code.\n", + "\n", + "That's a good example of how this tool can help investigations, by digging through historical ids. If you want to be even more thorough you can update the frequency parameter to `-f daily` or even `-f hourly` but that will make running the tool much slower." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "open-source-research-notebooks-4sg58OrJ", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}