College de France audio scraper

This is a sraper for pages containing audio material from the College de France website.

Purpose

It doesn't download any audio file but instead stores metadata about them (lesson title, lecturer, date, etc.) in Google's Datastore to allow statistics and further extraction of data to be done.

How to run

For devs, follow the gist.

For an actual prod run, a docker file is provided you can run it with:

docker build -t scraper .
docker run scraper

You'll have to create your own project in Google Compute Engine and pass in your project ID and proper json service account credentials via environment variables.

Output

Once you run it, you'll see a few thousands entities in your dashboard.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scraper-cron.yaml		scraper-cron.yaml
scraper-job.yaml		scraper-job.yaml
scraper.py		scraper.py
scraper_test.py		scraper_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

College de France audio scraper

Purpose

How to run

Output

About

Releases

Packages

Languages

License

attwad/cdf-scraper

Folders and files

Latest commit

History

Repository files navigation

College de France audio scraper

Purpose

How to run

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages