Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knox County TN Scraper/Notifier #50

Merged
merged 3 commits into from
Jun 22, 2017
Merged

Conversation

Rigdon
Copy link
Collaborator

@Rigdon Rigdon commented Jun 21, 2017

  • Add notifier for Knox Co
  • Add Scraper for Knox Co
  • Move DB session configuration to common.py for tests
  • Add tests for new scraper

* Add Scraper for Knox Co
* Move DB sessions configuratio to `common.py` for tests
* Add tests for new scraper
@Rigdon
Copy link
Collaborator Author

Rigdon commented Jun 21, 2017

The doc name format isn't exactly as described in the story mainly because I was able to reuse some existing code. Instead of

June 28, 2017 (BZA Agenda)

it'll be

June 28, 2017: BZA Agenda

I can change the format though if desired.

However having just the date be the link may be a little more problematic without some changes to the Document class itself.

@anaulin
Copy link
Collaborator

anaulin commented Jun 21, 2017

Thanks for the PR and the notes, @Rigdon. I'll have a look at this probably ~tomorrow (Thursday), since I have a full plate today.

@klertmen feel free to chime in if you have the bandwidth to review and merge this.

@klertmen
Copy link
Contributor

@anaulin - this looks good to me.

@Rigdon - thanks for working on this, and for the quick turnaround! Let us know if you have any suggestions for re-organizing the code to make it easier to add new sites. Do you think we should use the 'responses' library you added in the other test cases too?

@Rigdon
Copy link
Collaborator Author

Rigdon commented Jun 22, 2017 via email

KnoxCoTNAgendaScraper.MEETING_SCHEDULE_URL,
body=self.page_str,
status=200
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

try:
with session.begin_nested():
session.add(doc)
except IntegrityError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice way to only commit what is new. I am not familiar with begin_nested(), I'm wondering if you know if this causes a round-trip to the DB on each add? (Not that it matters, because in our case we usually don't have that many items any way, and this is all a cron job, so a few more roundtrips are not going to matter.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is going to be a separate roundtrip to the DB but it's likely not much (if at all) slower than doing the query/insert method since the data set is so small.

begin_nested() starts a savepoint in PostgreSQL so that you can work in smaller chunks within a larger transaction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks for explaining.

@anaulin
Copy link
Collaborator

anaulin commented Jun 22, 2017

This looks great, works fine, etc. I'm merging it.

Thanks again, @Rigdon !

@anaulin anaulin merged commit c3214a5 into RagtagOpen:master Jun 22, 2017
@Rigdon Rigdon deleted the knox-tn-scraper branch June 26, 2017 03:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants