Skip to content

Conversation

@ehenneken
Copy link
Member

A possible speedup in the check in psql_session.py to see if a record already exists is to check just the essential data. The histograms can be added to the excluded_fields list, for example. Probably just a marginal speedup compared to the rest of the process.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 54.615% when pulling 8ee4a25 on ehenneken:github61 into caf33b6 on adsabs:master.

@jonnybazookatone
Copy link
Contributor

I might be wrong, but this should cycle through every bibcode with no skipping, regardless of it needing to be updated, so the speed should be the same.

Also, it looks like it'll touch every record https://github.com/adsabs/adsdata/blob/master/adsdata/psql_session.py#L64, regardless of whether it needs to be modified.

Finally, even though tests pass in this module, just a heads up that there is very limited testing on anything other than a few unit tests, and there have never been any functional tests on the code you're modifying. Not to scare you, just so there's no illusions of safety 😉.

@ehenneken
Copy link
Member Author

I think

https://github.com/adsabs/adsdata/blob/master/adsdata/psql_session.py#L61

will take care of records that did not change. In that case like 64 will never be reached. I think that the list excluded_fields can even be extended to include all histograms and some other fields; this should make the comparison in line 61 faster (I don't know if that means significantly faster).

@jonnybazookatone
Copy link
Contributor

Ah yeah, you're right.

@jonnybazookatone
Copy link
Contributor

Go ahead and merge whenever, I commited the changes that were left lying around.

@ehenneken ehenneken merged commit 6331e79 into adsabs:master Apr 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants