No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
onderwijsscrapers
.gitignore
README.md
requirements.txt
scrapy.cfg

README.md

Onderwijsscrapers

Features

  • Export scraped items to ElasticSearch (ES)
  • Export scraped items as JSON files to disk
  • Data sources/scrapers:
    • SchoolVenstersOnline
      • General information (name, address, BRIN, etc.)
      • Indicator 2 ("Resultaten - Slaagpercentage")
    • DUO
      • General branch information ("02. Adressen alle vestigingen")
      • Students per branch by ZIP code ("02. Leerlingen per vestiging naar postcode leerling en leerjaar")
    • Onderwijsinspectie
      • All Voorgezet Onderwijs (VO) schools
        • General information (name, address, BRIN, etc.)
        • Rating history (per branch and education structure)
        • Reports (per branch and education structure)

Install and run

$ pip install -r requirements.txt
$ cd onderwijsscrapers/
$ scrapy crawl <crawler_name>

Available crawlers: vo.owinsp.nl, schoolvo.nl, data.duo.nl.

Todo

  • Replace this README with proper documentation
  • Document schema's of different sources
  • Add datetime information to items when scraped
  • Add ES index schema's
  • SchoolVenstersOnline:
    • Extend to scrape more data (try to get as much data as possible that is not available through DUO)
  • DUO
    • Add other VO data
    • Add Primair Onderwijs (PO) data
  • Onderwijsinspectie
    • Add PO data
    • Also scrape reports on the education sector overview page (see for example "Feliceum"), these reports are not included on the detail pages of a specific education structure/branch