Set of script to obtain the links of the available addons in https://addons.mozilla.org and https://addons.thunderbird.net to facilitate backup jobs. It currently get the following urls from the Android, Firefox, Thunderbird and Seamonkey addons:
- Information page.
- Reviews (only the pages, not the individual review).
- Images (preview), CSS and JS.
- The .xpi files of all the versions.
- User page.
The following table shows the links that are avoided or modified because they are considered redundant or unnecessary:
| Type | Example | Operation |
|---|---|---|
| src param | addons.mozilla.org/en-US/firefox/addon/someaddon/?src=cb-dl-name | Remove parameter |
| Individual review | addons.mozilla.org/en-US/firefox/addon/someaddon/reviews/34368935 | Skip (reviews are grouped in /reviews/?page=) |
| User reviews | addons.mozilla.org/en-US/firefox/addon/someaddon/reviews/user:3626 | Skip (the same as above) |
| Add review | addons.mozilla.org/en-US/firefox/addon/someaddon/reviews/add | Skip (unnecessary) |
| type:attachment | addons.mozilla.org/firefox/downloads/file/609267/type:attachment/someaddon.xpi | It is only saved if the same link without this segment has not been discovered before, otherwise it is omitted. |
| RSS | addons.mozilla.org/en-US/firefox/addon/someaddon/reviews/format:rss | Skip (unnecessary) |
| # link | addons.mozilla.org/en-US/firefox/addon/someaddon**#something** | Remove #* |
| Lang codes | addons.mozilla.org/de/firefox/addon/someaddon | The language code is changed for en-US |
- BeautifulSoup 4
- requests
- sqlite3 command
- Only tested in Python 2 but in Python 3 should work.
- Copy results-example.json to results.json.
- Create url-crc.sqlite database with this command:
sqlite3 url-crc.sqlite < url-crc.sql - Run addonlister.py to get the main page of each extension.
- Once the previous script finishes running, run urlgetter.py.
- When urlgetter.py finishes traversing all the addons pages, it is possible to save to a file a list with all the links obtained through the following command:
sqlite3 url-crc.sqlite "select url from url_crc where (url like '%addons.cdn.mozilla.net/%' or url like '%addons.mozilla.org/%' or url like '%addons.thunderbird.net/%') and url not like '%outgoing.prod.mozaws.net%';" > url-list.txt