Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uses priority queueing when updating prod, adds database and checksum optimizations #145

Merged
merged 44 commits into from
Apr 30, 2021

Conversation

spacemansteve
Copy link
Contributor

removes unused tweak code to modify solr records
use upsert to update metrics database
general code clean for pep8 compliance

removes unused tweak code to modify solr records
general code clean for pep8 compliance
Copy link
Contributor

@marblestation marblestation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good! I have some suggestions, maybe the biggest one is transforming the status field into three independent ones so that we do not overwrite that information (now that things will happen in parallel, chances are higher). Let me know what you think!

run.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
renamed config variable from update_timestamps to set_processed_timestamp
because it controlled setting the processed timestamps set when we
update a production data store, not the update timestamps that track
when data is received from other pipelines
Copy link
Contributor

@marblestation marblestation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second closer inspection, I found issues with the priorities (they are not setting any priority, actually), bugs (changes in behavior with respect to what we have in HEAD), and code that can be deduplicated and simplified. It was easier for me to reason by modifying code to see if what I had in mind made sense and could be done, it looks like it is possible. See #148, that PR illustrates what I have in mind. I did not test anything in that PR, it is just for us to talk and maybe incorporate the patterns and changes if they are indeed valid.

adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
SpacemanSteve added 5 commits January 28, 2021 10:39
removes unused tweak code to modify solr records
general code clean for pep8 compliance
renamed config variable from update_timestamps to set_processed_timestamp
because it controlled setting the processed timestamps set when we
update a production data store, not the update timestamps that track
when data is received from other pipelines
Copy link
Contributor

@marblestation marblestation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested some more tweaks. Also, it is hard to make run.py consistent with so many argument flags, but let's see if these suggestion make the --update-processed more consistent.

adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
adsmp/app.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
run.py Outdated Show resolved Hide resolved
scripts/reindex.py Outdated Show resolved Hide resolved
@coveralls
Copy link

Coverage Status

Coverage increased (+3.01%) to 76.186% when pulling 8f6dc0e on spacemansteve:PR145conflicts into 6686aab on adsabs:master.

@spacemansteve spacemansteve changed the title adds specialized queues to update prod data stores uses priority queueing when updating prod, adds database and checksum optimizations Feb 16, 2021
Copy link
Contributor

@marblestation marblestation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one major behavior change (i.e., always update the processed field) and one small request (sys.exit(1) if reindex fails). The former is key, otherwise we will not index only the records that need to be indexed.

EDIT: And another logger.error to exception request too!

scripts/reindex.py Outdated Show resolved Hide resolved
self.logger.exception('Failed posting individual bibcode %s to metrics', failed_bibcode)
failed_bibcodes.append(failed_bibcode)
if failed_bibcodes and update_processed:
self.mark_processed(failed_bibcodes, checksums=None, type='metrics', status='metrics-failed')
except Exception as e:
trans.rollback()
self.logger.error('DB failure: %s', e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use self.logger.exception instead of self.logger.error.

adsmp/app.py Outdated Show resolved Hide resolved
SpacemanSteve added 3 commits February 22, 2021 14:08
and sys.exit on reindex fail
and added sys.exit on reindex fail
fix mock call to datetime, previously it returned a mock not a datetime
Copy link
Contributor

@marblestation marblestation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are ready to go! There is only one minor forgotten change that was not addressed, where I was asking to change:

self.logger.error('DB failure: %s', e)

with:

self.logger.exception('DB failure') 

But apart from that, we should be good to merge. Thanks for all the work done here!

@spacemansteve spacemansteve merged commit 1893ea1 into adsabs:master Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants