Set of scripts that create a single page website consisting of multiple different WODs (Workout Of the Day) from a number of different public sites.
See wodscrape.com.au for the production site.
How it works
index.htmlis created from a template + all sources that are available in
sources/is an individual scraper that accesses it's specified website, scrapes the WOD and spits out a block of HTML.
- Each scraped block is placed into it's own individual file in the production directory.
- All files in production directory are compressed then pushed to an s3 bucket.
- On load, AJAX requests are used to load individual WODs.
- Loaded WODs are kept in a never-expire cookie on the client to maintain state between loads.
- Stats provided through Google Analytics.
In production this is run from cron(8) three times a day to cover a full day across the globe.
Required for deploy
Makefile: requires DESTBUCKET
sources/adsense-index.html: requires adsense identifier
sources/analytics-app.js: requires analytics identifier
sources/analytics-index.html: requires analytics identifier
- S3 bucket configured as website
- Appropriately setup s3cmd configuration
make scrapeto create the index, scrape all sources and create production directory
make deployto scrape and push to an S3 bucket
Adding new sources
- Add the new source to a
- First field is name of source
- Second field is URL to scrape
util/gen-sources.shagainst the csv to generate one scraper per .csv line, which are placed into
- Edit each new source file and adjust the selector to pick the right portion of the page.
- Needs a rewrite.
- Each scraper is generated from the same template. For sites that require a couple of clicks to get to the WOD or something custom, the default template is lacking.
- Doesn't handle sites that generate their content over AJAX (ie. react sites).
- Let's go with MIT.