-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
new notebook: wayback-google-analyics
- Loading branch information
Showing
3 changed files
with
148 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
Pipfile* | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Wayback Google Analytics\n", | ||
"Gather historic Google analytics data (UA, GA, and GTM codes) from a collection of website urls.\n", | ||
"\n", | ||
"Read [more about the tool and google analytics codes](https://github.com/bellingcat/wayback-google-analytics/blob/main/README.md#about-the-project).\n", | ||
"\n", | ||
"#### [Read the article on bellingcat.com](https://www.bellingcat.com/resources/2024/01/09/using-the-wayback-machine-and-google-analytics-to-uncover-disinformation-networks/)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Step 1 - install the python package " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "bat" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install wayback-google-analytics" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Step 2 - Get a full command line description by calling `help` \n", | ||
"This will show all the command line options we can use and what each of them does" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "bat" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!wayback-google-analytics --help" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Step 3 - call the tool\n", | ||
"Using the options from above, let's call the CLI to get all google analytics ids from 4 different websites:\n", | ||
"- `https://yapatriot.ru`\n", | ||
"- `https://zanogu.com`\n", | ||
"- `https://whoswho.com.ua`\n", | ||
"- `https://adamants.ru`\n", | ||
"\n", | ||
"Starting in Jan 1st 2015 (`-s 01/01/2015`) by checking changes on a yearly frequency (`-f yearly`) and save the results into an excel file (`-o xlsx`). \n", | ||
"\n", | ||
"The progress will be visible in the console and the results in the last row of the console. The excel file (or csv,txt,json if you change the command to that output) will be written to the `output/` folder." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"vscode": { | ||
"languageId": "bat" | ||
} | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!wayback-google-analytics -u https://yapatriot.ru https://zanogu.com https://whoswho.com.ua https://adamants.ru -s 01/01/2015 -f yearly -o xlsx" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here's an example output for one of the sites (`yapatriot.ru`) that has 2 historical UA codes (`UA-65087228-1` that is the 1st assigned id to the analytics account `UA-65087228` and `UA-53176102-14` which is the 14th assigned id to the `UA-53176102` analytics account):\n", | ||
"```\n", | ||
"[{\n", | ||
"\t'https://yapatriot.ru': {\n", | ||
"\t\t'archived_UA_codes': {\n", | ||
"\t\t\t'UA-65087228-1': {\n", | ||
"\t\t\t\t'first_seen': '20/01/2017:03:55',\n", | ||
"\t\t\t\t'last_seen': '30/06/2019:05:32'\n", | ||
"\t\t\t},\n", | ||
"\t\t\t'UA-53176102-14': {\n", | ||
"\t\t\t\t'first_seen': '15/06/2015:19:36',\n", | ||
"\t\t\t\t'last_seen': '15/06/2015:19:36'\n", | ||
"\t\t\t}\n", | ||
"\t\t},\n", | ||
"\t\t'archived_GA_codes': {},\n", | ||
"\t\t'archived_GTM_codes': {}\n", | ||
"\t}\n", | ||
"}\n", | ||
"...\n", | ||
"]\n", | ||
"```\n", | ||
"You can jump into other tools such as https://spyonweb.com/ and see that [`UA-53176102`](https://spyonweb.com/ua-53176102) is used on at least 4 other sites (`material-evidence.com` `news-region.ru` `syriainform.com` `whoswho.com.ua`) very likely belonging to the same owner. To note that between 2016-2017 this page started using `UA-65087228` instead (this would be the code in the live website), a fresh UA id which would not reveal the connection we uncovered with the historical UA code.\n", | ||
"\n", | ||
"That's a good example of how this tool can help investigations, by digging through historical ids. If you want to be even more thorough you can update the frequency parameter to `-f daily` or even `-f hourly` but that will make running the tool much slower." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "open-source-research-notebooks-4sg58OrJ", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |