Skip to content
Software that runs FavoriteIconsOfInternet.com crawler and web site
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DBUpgrade @ 68bac1d
WWW
anyimage2png
bootstrap
crawlerd
supervisord
test
.gitignore
.gitmodules
Makefile
README.md
bg.png
blacklist.txt
config_sample.php
dbupgrade.php
favicon-large.png
favicon-static.ico
favicon.ico
favicons-project-illustration.png
geticons.pl
index.tpl
init.sh
magnify.pl
menu.html
multirun.sh
prints.png
queue-cli.conf
queue-cli.dev.conf
smu.sh
step1.php
step2.php
step4.pl
step5.sh
step5ingest.php
step5tiles.php
steps_1_and_2.sh
steps_3_and_4.sh
test.pl

README.md

Favorite Icons Of Internet

Art project that aims to depict the vastness and colorfullness of the internet.

You can see the result of all the crawling and image-crunching at FavoriteIconsOfInternet.com

Our current goal is to bring the project to the state where we can keep the history of daily favicon changes for at least a million web sites.

Favorite Icons

Workers

This project uses Phantom Of The Cloud image to launch workers for parallelizable steps (3, 4, 6 and eventually 8), AWS auto-scaling groups can be used to speed-up or slow down processing.

Processing Steps

Step 1. Load domains

Updates a list of domains in the database, currently takes a list of Alexa Rankings.

Runs on central box. See steps_1_and_2.sh

Step 2. Get a list of domains to crawl

Gets a list of domains to crawl (currently only active Alexa domains) and uploads them to a queue in chunks for crawlers to pick up

Runs on central box. See steps_1_and_2.sh

Step 3. Fetch icons

Listens for messages in a queue and crawls the sites in the message finding favorite icons and comparing them to existing version to see if the have changed.

Runs on crawler workers. See steps_3_and_4.sh

Step 4. Convert icons to PNG

After all icons are fetched, convert them to PNG, calculate average color and upload to results storage together with manifest describing which icons are new, which has changed and etc.

Runs on crawler workers. See steps_3_and_4.sh

Step 5. Calculate tiles to be updated

Gather all the results and update the database. Calculate a list of tiles that need to be updated (currently all tiles with predefined width/height ordered by Alexa ranking) and put each tile as a job into a queue.

Generate HTML and necessary JSON metadata.

Runs on central box. See step5.sh

Step 6. Generate tiles 🔴

Grab images required for the tile (or sync them all) and generate a tile. Optimize the image using smu.sh and deploy to a CDN.

Runs on tile workers. TBD (To Be Developed)

Step 7. Move HTML and metadata to production 🔴

Once all tiles are done, move HTML and metadata chunks over to production!

Runs on central box. TBD (To Be Developed)

Step 8. Send emails, daily reports and etc 🔴

Notify users (if any), send daily newsletter and etc.

Runs on central box (and SMTP workers if load is high). TBD (To Be Developed)

You can’t perform that action at this time.