Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/fix-houston-scraper #143

Merged
merged 6 commits into from
Dec 18, 2023
Merged

Conversation

dphoria
Copy link
Collaborator

@dphoria dphoria commented Sep 18, 2023

Link to Relevant Issue

https://github.com/CouncilDataProject/cdp-scrapers/actions/runs/6150355264/job/16688128572#step:6:223

Description of Changes

Main web page for Houston has changed enough since we last worked on it to break the scraper.

  • Instead of parsing for elements for some date/year on that web page, we are now requesting a search query for each date in the date range passed to get_events. The search results web page is similar enough that existing code can be used without significant changes.

  • Be able to handle events with agendas attached as PDF not as web page HTML.

@dphoria dphoria added bug Something isn't working enhancement New feature or request labels Sep 18, 2023
@dphoria dphoria self-assigned this Sep 18, 2023
Comment on lines 27 to 30
3,
31,
"https://houston.novusagenda.com/agendapublic/"
"CoverSheet.aspx?ItemID=27102&MeetingID=566",
"CoverSheet.aspx?ItemID=26951&MeetingID=565",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are valid changes to test answers. The old expected values appear to be for the 11/15/22 event. The new answer is for the 11/08/22 event, which is earlier in the date range than 11/15/22.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and the extra event is for a "Childhood and Youth Committe" on 11/09/22 that for whatever reason was not scraped previously.

@codecov
Copy link

codecov bot commented Sep 18, 2023

Codecov Report

Attention: 49 lines in your changes are missing coverage. Please review.

Comparison is base (80400a4) 15.05% compared to head (579a408) 15.20%.

Files Patch % Lines
cdp_scrapers/instances/houston.py 0.00% 49 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #143      +/-   ##
==========================================
+ Coverage   15.05%   15.20%   +0.14%     
==========================================
  Files          21       21              
  Lines        2179     2217      +38     
==========================================
+ Hits          328      337       +9     
- Misses       1851     1880      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dphoria
Copy link
Collaborator Author

dphoria commented Sep 18, 2023

@Shak2000

@dphoria dphoria marked this pull request as ready for review September 18, 2023 03:27
except (AttributeError, IndexError):
# Assuming event is a tr from search results
# and that the first td contains the committee name
cell_text = event.find("td").text.strip()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have used our str_simplified. Next time haha.

@dphoria
Copy link
Collaborator Author

dphoria commented Dec 18, 2023

Merging this to fix all tests.

@dphoria dphoria merged commit 167612f into CouncilDataProject:main Dec 18, 2023
8 checks passed
@dphoria dphoria deleted the fix-houston branch December 19, 2023 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant