Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/committee doc main #202

Merged
merged 9 commits into from
Mar 9, 2021
Merged

Feature/committee doc main #202

merged 9 commits into from
Mar 9, 2021

Conversation

ayeshamk
Copy link
Collaborator

@ayeshamk ayeshamk commented Mar 4, 2021

No description provided.

@ayeshamk ayeshamk requested review from aih and leedavidr March 4, 2021 18:17
scrapers/crec_scrapers/crec_detail_scraper.py Outdated Show resolved Hide resolved
scrapers/crec_scrapers/crec_scrape_urls.py Outdated Show resolved Hide resolved
scrapers/crec_scrapers/crec_detail_scraper.py Outdated Show resolved Hide resolved
@aih
Copy link
Collaborator

aih commented Mar 4, 2021

It looks like the urls scraper, crec_scrape_urls.py starts from the 104th Congress. Is there a way to start it from a specific Congress or even better from a specific date? We'll do the initial scraping, but then updates shoudl be more recent docs.

@ayeshamk
Copy link
Collaborator Author

ayeshamk commented Mar 4, 2021

It looks like the urls scraper, crec_scrape_urls.py starts from the 104th Congress. Is there a way to start it from a specific Congress or even better from a specific date? We'll do the initial scraping, but then updates shoudl be more recent docs.

Currently, it loads all the Committee documents data. Do we want to load by congress?

@aih
Copy link
Collaborator

aih commented Mar 4, 2021

Yes, or at least load from most recent to earliest. Otherwise, we don't have a way to upate without re-loading all.

@ayeshamk
Copy link
Collaborator Author

ayeshamk commented Mar 5, 2021

Ok, working on it. Opened issue: #204

@ayeshamk ayeshamk linked an issue Mar 5, 2021 that may be closed by this pull request
@ayeshamk ayeshamk linked an issue Mar 5, 2021 that may be closed by this pull request
@ayeshamk ayeshamk linked an issue Mar 5, 2021 that may be closed by this pull request
@aih
Copy link
Collaborator

aih commented Mar 9, 2021

Tested and working:
image

image

@aih aih merged commit 1fda406 into main Mar 9, 2021
@aih aih deleted the feature/committee_doc_main branch March 9, 2021 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify Committee documents scraper
2 participants