Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

P2 Crawling: Office of Career, Technical and Adult Education #4

Closed
6 tasks done
Daniellappv opened this issue Feb 25, 2020 · 0 comments
Closed
6 tasks done

P2 Crawling: Office of Career, Technical and Adult Education #4

Daniellappv opened this issue Feb 25, 2020 · 0 comments
Assignees

Comments

@Daniellappv
Copy link

Daniellappv commented Feb 25, 2020

Description: Scrape metadata for https://www2.ed.gov/about/offices/list/ovae/index.html

Acceptance criteria

  • We have a data dump with all the resources metadata we can get from target site

Task-list:

  • Crawl the site
  • Perfect the crawling to reach as many resources as possible
  • Integrate with the existing pipeline rules (provide a HTML response for the parser)
  • Test run with a dummy parser - it should collect datasets and dump them into JSON files
  • Push the code once it checks all the above criteria

Jira card: https://open-data-ed.atlassian.net/browse/OD-500

@nightsh nightsh changed the title P2 Scraping: Office of Career, Technical and Adult Education P2 Crawling: Office of Career, Technical and Adult Education Mar 2, 2020
higorspinto added a commit that referenced this issue Mar 6, 2020
@nightsh nightsh closed this as completed Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants