Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #5

Merged
merged 28 commits into from
Nov 30, 2022
Merged

Update README.md #5

merged 28 commits into from
Nov 30, 2022

Conversation

YojanaGadiya
Copy link
Contributor

@YojanaGadiya YojanaGadiya commented Oct 13, 2022

Closes #4

This PR adds total compound counts via the SQLite dump, which implicitly documents for which versions SQLite is available.

Add ChEMBL version the code base has been tested on.
@cthoyt
Copy link
Owner

cthoyt commented Oct 13, 2022

Not sure a note like this is super valuable, here's how I'd go about this:

  1. Write a simple SQL query that extracts one or more high-level statistics that is likely to work even if the chembl database schema changed
  2. Write a python script that iterates from latest version of chembl back to 1
  3. Run the query with chembl_downloader.query() (https://github.com/cthoyt/chembl-downloader#run-a-query-and-get-a-pandas-dataframe)
  4. Build up table using tabulate and export the GitHub format that shows for each version of chembl if the query was possible, if not, where it failed, and if so, the results
  5. Include the script in the repo
  6. Optional: have this script automatically re-write the README. Otherwise manually copy-paste the resulting table

@YojanaGadiya
Copy link
Contributor Author

I see. Okay. It is going to take some time, but I will try and send a PR soon.

@YojanaGadiya
Copy link
Contributor Author

@cthoyt for the 1st point with regard to statistics, do you have anything specific in mind or should I just print the total no.of chemicals and then add it to the tabulate?

@cthoyt
Copy link
Owner

cthoyt commented Oct 21, 2022

Total number of chemicals sounds good

@YojanaGadiya
Copy link
Contributor Author

@cthoyt check out the commit. It writes to the README directly.. but I am sure you would want to error to be more cleaner.. any pointers for me?

README.md Outdated Show resolved Hide resolved
@cthoyt
Copy link
Owner

cthoyt commented Oct 25, 2022

Please run tox -e lint,flake8 and clean up the code then I can take another look.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
cthoyt added a commit that referenced this pull request Oct 26, 2022
Related to #4 and #5
@cthoyt
Copy link
Owner

cthoyt commented Oct 26, 2022

In fffd13c, I added the following functionality:

  1. Handle old archived versions of ChEBML
  2. Create a chart of all version release dates in chembl_downloader.history

README.md Outdated Show resolved Hide resolved
@cthoyt
Copy link
Owner

cthoyt commented Oct 26, 2022

Please don't worry about automatically re-writing the README, this is not so important

@cthoyt
Copy link
Owner

cthoyt commented Oct 26, 2022

@YojanaGadiya all you need to do now is re-run the script

@YojanaGadiya
Copy link
Contributor Author

@cthoyt I think the PR is ready to be merged. Please check if this is according to what you expected.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@YojanaGadiya
Copy link
Contributor Author

YojanaGadiya commented Nov 2, 2022

@cthoyt the table is now updated. Please check if this is how you planned.

README.md Outdated Show resolved Hide resolved
try:
path = chembl_downloader.download_extract_sqlite(version=version_name)
except Exception as e:
return version_name, 'No', str(e), '-'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better than str(e) is just to write that this version does not distribute SQLite, or even better, since this is tons of redundant information, leave the third column blank and put a note above the table.

Copy link
Owner

@cthoyt cthoyt Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also combine the last three columns since they're redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Will just have 2 columns then. With the version and the Yes/No running status as you mentioned.

@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2022

Codecov Report

Merging #5 (388a6c9) into main (ceda5e6) will decrease coverage by 1.62%.
The diff coverage is 3.57%.

@@            Coverage Diff             @@
##             main       #5      +/-   ##
==========================================
- Coverage   22.95%   21.32%   -1.63%     
==========================================
  Files           6        7       +1     
  Lines         305      333      +28     
  Branches       65       68       +3     
==========================================
+ Hits           70       71       +1     
- Misses        232      259      +27     
  Partials        3        3              
Impacted Files Coverage Δ
src/chembl_downloader/downloader_checker.py 0.00% <0.00%> (ø)
src/chembl_downloader/queries.py 60.86% <100.00%> (+1.77%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@YojanaGadiya
Copy link
Contributor Author

Does anything need to be done with respect to coverage?

@cthoyt
Copy link
Owner

cthoyt commented Nov 3, 2022

No don't worry about coverage but the rest are required

@YojanaGadiya
Copy link
Contributor Author

Please check the latest commits. Is this similar to what you expected?

@cthoyt
Copy link
Owner

cthoyt commented Nov 28, 2022

@YojanaGadiya better, but there's no need for a duplicate yes/no column at this point since there is either a number or a dash that says the same information.

@cthoyt
Copy link
Owner

cthoyt commented Nov 28, 2022

If you can fix that and pass tox, this is ready to merge.

@YojanaGadiya
Copy link
Contributor Author

@cthoyt I fixed the README and flake8 on my end. Please can you check/approve the workflow to confirm. Thank you.

@YojanaGadiya
Copy link
Contributor Author

Apologies for spamming. Hopefully the last commit to the PR. Please approve @cthoyt

@cthoyt
Copy link
Owner

cthoyt commented Nov 30, 2022

Note that before merging, I deduplicated a lot of code. And combine the new table with the existing one. Thanks @YojanaGadiya.

@cthoyt cthoyt merged commit 2894d5f into cthoyt:main Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repo status
3 participants