Skip to content

Commit

Permalink
new notebook: wayback-google-analyics
Browse files Browse the repository at this point in the history
  • Loading branch information
msramalho committed Jan 11, 2024
1 parent d949782 commit 8656737
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 3 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Pipfile*

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,13 @@ Jupyter Notebooks

| **Notebook** | **Description** | **Notebook links** | **Tags** |
| ------------ | ------------------------------------------------- | ------------------------------------------- | ---------------------------------------- |
| Holehe | A tool to find accounts associated with an email | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/holehe.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fholehe.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/holehe.ipynb) | `community`, `digital-footprint-tracing` |
| Maigret | A tool to find accounts associated with a username | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/maigret.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fmaigret.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/maigret.ipynb) | `community`, `digital-footprint-tracing` |
| Deepface | A tool to do facial comparison and analysis | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/deepface.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fdeepface.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/deepface.ipynb) | `community`, `ai`, `image analysis` |
| Holehe | A tool to find accounts associated with an email | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/holehe.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fholehe.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/holehe.ipynb) | `community`, `digital-footprint-tracing` |
| Maigret | A tool to find accounts associated with a username | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/maigret.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fmaigret.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/maigret.ipynb) | `community`, `digital-footprint-tracing` |
| Deepface | A tool to do facial comparison and analysis | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/community/deepface.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fcommunity%2Fdeepface.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/community/deepface.ipynb) | `community`, `ai`, `image analysis` |
| Wayback Google Analytics | Uncover historical analytics ids via the Wayback Machine | [![Colab](colab-badge)](https://colab.research.google.com/github/bellingcat/open-source-research-notebooks/blob/main/notebooks/bellingcat/wayback-google-analytics.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/bellingcat/open-source-research-notebooks/main?labpath=notebooks%2Fbellingcat%2Fwayback-google-analytics.ipynb) [![Jupyter Notebook](https://img.shields.io/badge/jupyter-.ipynb%20file-orange)](notebooks/bellingcat/wayback-google-analytics.ipynb) | `bellingcat`, `wayback-machine`, `google-analytics` |



<!-- MARKDOWN LINKS & IMAGES -->
<!-- https://www.markdownguide.org/basic-syntax/#reference-style-links -->
[colab-badge]: https://colab.research.google.com/assets/colab-badge.svg
137 changes: 137 additions & 0 deletions notebooks/bellingcat/wayback-google-analytics.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Wayback Google Analytics\n",
"Gather historic Google analytics data (UA, GA, and GTM codes) from a collection of website urls.\n",
"\n",
"Read [more about the tool and google analytics codes](https://github.com/bellingcat/wayback-google-analytics/blob/main/README.md#about-the-project).\n",
"\n",
"#### [Read the article on bellingcat.com](https://www.bellingcat.com/resources/2024/01/09/using-the-wayback-machine-and-google-analytics-to-uncover-disinformation-networks/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1 - install the python package "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "bat"
}
},
"outputs": [],
"source": [
"!pip install wayback-google-analytics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2 - Get a full command line description by calling `help` \n",
"This will show all the command line options we can use and what each of them does"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "bat"
}
},
"outputs": [],
"source": [
"!wayback-google-analytics --help"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3 - call the tool\n",
"Using the options from above, let's call the CLI to get all google analytics ids from 4 different websites:\n",
"- `https://yapatriot.ru`\n",
"- `https://zanogu.com`\n",
"- `https://whoswho.com.ua`\n",
"- `https://adamants.ru`\n",
"\n",
"Starting in Jan 1st 2015 (`-s 01/01/2015`) by checking changes on a yearly frequency (`-f yearly`) and save the results into an excel file (`-o xlsx`). \n",
"\n",
"The progress will be visible in the console and the results in the last row of the console. The excel file (or csv,txt,json if you change the command to that output) will be written to the `output/` folder."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "bat"
}
},
"outputs": [],
"source": [
"!wayback-google-analytics -u https://yapatriot.ru https://zanogu.com https://whoswho.com.ua https://adamants.ru -s 01/01/2015 -f yearly -o xlsx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's an example output for one of the sites (`yapatriot.ru`) that has 2 historical UA codes (`UA-65087228-1` that is the 1st assigned id to the analytics account `UA-65087228` and `UA-53176102-14` which is the 14th assigned id to the `UA-53176102` analytics account):\n",
"```\n",
"[{\n",
"\t'https://yapatriot.ru': {\n",
"\t\t'archived_UA_codes': {\n",
"\t\t\t'UA-65087228-1': {\n",
"\t\t\t\t'first_seen': '20/01/2017:03:55',\n",
"\t\t\t\t'last_seen': '30/06/2019:05:32'\n",
"\t\t\t},\n",
"\t\t\t'UA-53176102-14': {\n",
"\t\t\t\t'first_seen': '15/06/2015:19:36',\n",
"\t\t\t\t'last_seen': '15/06/2015:19:36'\n",
"\t\t\t}\n",
"\t\t},\n",
"\t\t'archived_GA_codes': {},\n",
"\t\t'archived_GTM_codes': {}\n",
"\t}\n",
"}\n",
"...\n",
"]\n",
"```\n",
"You can jump into other tools such as https://spyonweb.com/ and see that [`UA-53176102`](https://spyonweb.com/ua-53176102) is used on at least 4 other sites (`material-evidence.com` `news-region.ru` `syriainform.com` `whoswho.com.ua`) very likely belonging to the same owner. To note that between 2016-2017 this page started using `UA-65087228` instead (this would be the code in the live website), a fresh UA id which would not reveal the connection we uncovered with the historical UA code.\n",
"\n",
"That's a good example of how this tool can help investigations, by digging through historical ids. If you want to be even more thorough you can update the frequency parameter to `-f daily` or even `-f hourly` but that will make running the tool much slower."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "open-source-research-notebooks-4sg58OrJ",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

0 comments on commit 8656737

Please sign in to comment.