-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knox County TN Scraper/Notifier #50
Conversation
* Add Scraper for Knox Co * Move DB sessions configuratio to `common.py` for tests * Add tests for new scraper
The doc name format isn't exactly as described in the story mainly because I was able to reuse some existing code. Instead of June 28, 2017 (BZA Agenda) it'll be I can change the format though if desired. However having just the date be the link may be a little more problematic without some changes to the Document class itself. |
I added it for that test so that I could test the `scrape` method fully.
Responses is a great library for this type of thing where you already have
consistent examples of what your responses will generally look like.
Unrelated to bidwire it's also great for testing API integrations.
On Wed, Jun 21, 2017, 9:03 PM klertmen ***@***.***> wrote:
@anaulin <https://github.com/anaulin> - this looks good to me.
@Rigdon <https://github.com/rigdon> - thanks for working on this, and for
the quick turnaround! Let us know if you have any suggestions for
re-organizing the code to make it easier to add new sites. Do you think we
should use the 'responses' library you added in the other test cases too?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABcyuOMj-SrzrRsC2WZzxGahiES6zQTGks5sGb1ngaJpZM4OBMrs>
.
--
- Ryan Rigdon
|
KnoxCoTNAgendaScraper.MEETING_SCHEDULE_URL, | ||
body=self.page_str, | ||
status=200 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
try: | ||
with session.begin_nested(): | ||
session.add(doc) | ||
except IntegrityError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice way to only commit what is new. I am not familiar with begin_nested()
, I'm wondering if you know if this causes a round-trip to the DB on each add
? (Not that it matters, because in our case we usually don't have that many items any way, and this is all a cron job, so a few more roundtrips are not going to matter.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is going to be a separate roundtrip to the DB but it's likely not much (if at all) slower than doing the query/insert method since the data set is so small.
begin_nested()
starts a savepoint in PostgreSQL so that you can work in smaller chunks within a larger transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Thanks for explaining.
This looks great, works fine, etc. I'm merging it. Thanks again, @Rigdon ! |
common.py
for tests