-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix/fix-houston-scraper #143
Conversation
cdp_scrapers/tests/test_scrapers.py
Outdated
3, | ||
31, | ||
"https://houston.novusagenda.com/agendapublic/" | ||
"CoverSheet.aspx?ItemID=27102&MeetingID=566", | ||
"CoverSheet.aspx?ItemID=26951&MeetingID=565", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are valid changes to test answers. The old expected values appear to be for the 11/15/22 event. The new answer is for the 11/08/22 event, which is earlier in the date range than 11/15/22.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and the extra event is for a "Childhood and Youth Committe" on 11/09/22 that for whatever reason was not scraped previously.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #143 +/- ##
==========================================
+ Coverage 15.05% 15.20% +0.14%
==========================================
Files 21 21
Lines 2179 2217 +38
==========================================
+ Hits 328 337 +9
- Misses 1851 1880 +29 ☔ View full report in Codecov by Sentry. |
except (AttributeError, IndexError): | ||
# Assuming event is a tr from search results | ||
# and that the first td contains the committee name | ||
cell_text = event.find("td").text.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have used our str_simplified
. Next time haha.
Merging this to fix all tests. |
Link to Relevant Issue
https://github.com/CouncilDataProject/cdp-scrapers/actions/runs/6150355264/job/16688128572#step:6:223
Description of Changes
Main web page for Houston has changed enough since we last worked on it to break the scraper.
Instead of parsing for elements for some date/year on that web page, we are now requesting a search query for each date in the date range passed to
get_events
. The search results web page is similar enough that existing code can be used without significant changes.Be able to handle events with agendas attached as PDF not as web page HTML.