uses priority queueing when updating prod, adds database and checksum optimizations #145

spacemansteve · 2020-12-15T15:27:13Z

removes unused tweak code to modify solr records
use upsert to update metrics database
general code clean for pep8 compliance

removes unused tweak code to modify solr records general code clean for pep8 compliance

marblestation

This looks very good! I have some suggestions, maybe the biggest one is transforming the status field into three independent ones so that we do not overwrite that information (now that things will happen in parallel, chances are higher). Let me know what you think!

run.py

adsmp/app.py

renamed config variable from update_timestamps to set_processed_timestamp because it controlled setting the processed timestamps set when we update a production data store, not the update timestamps that track when data is received from other pipelines

marblestation

After a second closer inspection, I found issues with the priorities (they are not setting any priority, actually), bugs (changes in behavior with respect to what we have in HEAD), and code that can be deduplicated and simplified. It was easier for me to reason by modifying code to see if what I had in mind made sense and could be done, it looks like it is possible. See #148, that PR illustrates what I have in mind. I did not test anything in that PR, it is just for us to talk and maybe incorporate the patterns and changes if they are indeed valid.

adsmp/app.py

run.py

removes unused tweak code to modify solr records general code clean for pep8 compliance

renamed config variable from update_timestamps to set_processed_timestamp because it controlled setting the processed timestamps set when we update a production data store, not the update timestamps that track when data is received from other pipelines

marblestation

I suggested some more tweaks. Also, it is hard to make run.py consistent with so many argument flags, but let's see if these suggestion make the --update-processed more consistent.

adsmp/app.py

run.py

scripts/reindex.py

we can not use the enum library because it conflicts with enum34

in dev-requirements.txt

coveralls · 2021-02-16T15:54:47Z

Coverage increased (+3.01%) to 76.186% when pulling 8f6dc0e on spacemansteve:PR145conflicts into 6686aab on adsabs:master.

marblestation

I have one major behavior change (i.e., always update the processed field) and one small request (sys.exit(1) if reindex fails). The former is key, otherwise we will not index only the records that need to be indexed.

EDIT: And another logger.error to exception request too!

scripts/reindex.py

marblestation · 2021-02-22T13:13:29Z

adsmp/app.py

+                            self.logger.exception('Failed posting individual bibcode %s to metrics', failed_bibcode)
+                            failed_bibcodes.append(failed_bibcode)
+                    if failed_bibcodes and update_processed:
+                        self.mark_processed(failed_bibcodes, checksums=None, type='metrics', status='metrics-failed')
                except Exception as e:
                    trans.rollback()
                    self.logger.error('DB failure: %s', e)


Use self.logger.exception instead of self.logger.error.

adsmp/app.py

and sys.exit on reindex fail

and added sys.exit on reindex fail

fix mock call to datetime, previously it returned a mock not a datetime

marblestation

I think we are ready to go! There is only one minor forgotten change that was not addressed, where I was asking to change:

ADSMasterPipeline/adsmp/app.py

Line 412 in e425021

self.logger.error('DB failure: %s', e)

with:

self.logger.exception('DB failure')

But apart from that, we should be good to merge. Thanks for all the work done here!

fix bug in code that detected when solr index updating was complete also wait for queue writing to solr to empty

pick up Roman's changes to remove aff_raw from master

handle case when no metrics info is available

…MasterPipeline into PR145conflicts

adds specialized queues to update prod data stores

a3573af

removes unused tweak code to modify solr records general code clean for pep8 compliance

spacemansteve requested a review from marblestation December 15, 2020 15:27

make priority a command line option

0d61491

marblestation suggested changes Dec 22, 2020

View reviewed changes

run.py Outdated Show resolved Hide resolved

run.py Outdated Show resolved Hide resolved

adsmp/app.py Outdated Show resolved Hide resolved

adsmp/app.py Outdated Show resolved Hide resolved

marblestation mentioned this pull request Jan 7, 2021

Ideas and suggestions for PR #145 #148

Closed

marblestation suggested changes Jan 7, 2021

View reviewed changes

SpacemanSteve added 5 commits January 28, 2021 10:39

adds specialized queues to update prod data stores

0262fef

removes unused tweak code to modify solr records general code clean for pep8 compliance

includes merge

d87aaca

changes missed in merge

fd24afb

improve and simplified unit tests for indexing

7ed0109

marblestation suggested changes Feb 9, 2021

View reviewed changes

SpacemanSteve and others added 11 commits February 10, 2021 15:15

changegs based on comments

b56bdf9

tweak test

908343e

updated python version used for travis

951818b

test travis build without enum class

fc2c325

revert minor python version

0cabb11

remove incompatible enum

1c3c68c

we can not use the enum library because it conflicts with enum34

include recent unit chanegs to test coverage versions

85ef514

in dev-requirements.txt

pick up additional changes for coveralls on travis

6e075cc

move workflows directory to right place

bd4d39a

file cleanup

509abe3

Merge branch 'master' into PR145conflicts

8f6dc0e

spacemansteve changed the title ~~adds specialized queues to update prod data stores~~ uses priority queueing when updating prod, adds database and checksum optimizations Feb 16, 2021

marblestation suggested changes Feb 22, 2021

View reviewed changes

SpacemanSteve added 3 commits February 22, 2021 14:08

clean up app.mark_processed

275cde0

and sys.exit on reindex fail

clean up app.mark_procesed

1300d82

and added sys.exit on reindex fail

fix reindex test

b6134a8

fix mock call to datetime, previously it returned a mock not a datetime

SpacemanSteve and others added 5 commits February 25, 2021 08:44

try to improve travis error reporting

35ba606

another try at getting log info

1b741ae

cleanup travis

c545d5a

fix reindex test code

1ef3db6

Merge branch 'master' into PR145conflicts

e425021

spacemansteve requested a review from marblestation February 28, 2021 18:27

marblestation approved these changes Mar 2, 2021

View reviewed changes

SpacemanSteve added 17 commits March 26, 2021 11:18

pickup latest changes for new solr

3cbad6f

improve --help message for solr collection

894bc88

fix bug in rebuild solr index

f9ccc80

fix bug in code that detected when solr index updating was complete also wait for queue writing to solr to empty

improve logging during index rebuild

8054912

change suggested by Roman in solr index rebuild

d052ed0

increase time for solr index rebuild

2c17a59

aff_raw not in solr

7c75646

pick up Roman's changes to remove aff_raw from master

merge with latest version of master

60770e1

fix bug for checksum error

b7c8a91

handle case when no metrics info is available

rebuild solr index now uses batch size parameter

958704f

make exception catch python 2/3 compatible

37f0ba2

fix conflicts

69930e7

merge changes in master branch to PR145conflicts

0b22082

pickup changed github workflow

7d45511

merege update

c0c6235

Merge branch 'PR145conflicts' of https://github.com/spacemansteve/ADS…

18b97d8

…MasterPipeline into PR145conflicts

more merge

da08833

spacemansteve merged commit 1893ea1 into adsabs:master Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uses priority queueing when updating prod, adds database and checksum optimizations #145

uses priority queueing when updating prod, adds database and checksum optimizations #145

spacemansteve commented Dec 15, 2020

marblestation left a comment

marblestation left a comment

marblestation left a comment

coveralls commented Feb 16, 2021

marblestation left a comment •

edited

Loading

marblestation Feb 22, 2021

marblestation left a comment

uses priority queueing when updating prod, adds database and checksum optimizations #145

uses priority queueing when updating prod, adds database and checksum optimizations #145

Conversation

spacemansteve commented Dec 15, 2020

marblestation left a comment

Choose a reason for hiding this comment

marblestation left a comment

Choose a reason for hiding this comment

marblestation left a comment

Choose a reason for hiding this comment

coveralls commented Feb 16, 2021

marblestation left a comment • edited Loading

Choose a reason for hiding this comment

marblestation Feb 22, 2021

Choose a reason for hiding this comment

marblestation left a comment

Choose a reason for hiding this comment

marblestation left a comment •

edited

Loading