No PDB acessions matched and Retrieving no protein structure files #111

chagas98 · 2023-03-28T17:20:58Z

I tried to run basic commands from README and Documentation. As my primary goal is to retrieve the PDB files, I started creating a local database with

cazy_webscraper <e-mail> --families GH -o GH.db

And next, some command to get pdb structures

cw_get_pdb_structures GH.db --classes GH pdb

However, I got the following output:

Using default CAZy class synonyms
Applying CAZy class filter(s)
Retrieving GenBank accessions for selected CAZy classes:   0%| | 0/1 [00:00<?, ?Retrieving CAZymes for CAZy class GH
Retrieving GenBank accessions for selected CAZy classes: 100%|█| 1/1 [00:44<00:0
Retrieving GenBank accessions for selected CAZy families: 0it [00:00, ?it/s]
Applying no taxonomic filters
Loading existing PDB db records: 0it [00:00, ?it/s]
Loading existing Genbank_Pdbs db records: 0it [00:00, ?it/s]
No PDB accessions matched the criteria provided.
Retrieving no protein structure files from PDB

I ran different settings with PL, GH, and GT, and I got the same result.

My system configuration:

Linux 5.19.0-35-generic x86_64
Ubuntu 22.04.2 LTS
conda 23.1.0

The text was updated successfully, but these errors were encountered:

HobnobMancer · 2023-03-28T17:53:38Z

Hi! Thanks for using cazy_webscraper.

After building the local CAZyme database with records downloaded from CAZy, did you retrieve the PDB accessions from UniProt using cazy_webscraper?

(semi-shameless plug of our paper in coming ;) ) To summarise a chunk of the paper (where it's explained better): When building the local CAZyme database, cazy_webscraper parses data from a plain text file dump that's available from CAZy. The text file only contains:

NCBI protein version accessions
taxonomic kingdoms
source organisms
CAZy family annotations

Therefore, the resulting database only contains that data. - you can check this using sqlite3 to query the database, and which will return nothing:

sqlite3 -header GH.db "SELECT * FROM Pdbs"

cw_get_pdb_structures retrieves the structure files from the PDB database for PDB accessions that are in the local CAZyme database.

So you first need to populate the local CAZyme database with PDB accessions from UniProt, using the cw_get_uniprot_data command. Hence, the note in the documentation stating:

Note: PDB structure files are retrieved for the PDB accessions in a local CAZyme database created using cazy_webscraper.

I'll add an additional note to the documentation to make this clearer - I can see how it doesn't seem obvious

as referenced in issues #111

issue #111

chagas98 · 2023-03-29T18:15:57Z

@HobnobMancer Thanks for the help with this dumb error! btw I realized that the paper explains this workflow very well. Sorry for that.

However, the function cw_get_uniprot_data() gives me an error, where the code batch = self.services.http_get(link, frmt="txt") returns an int variable in the file ~/.local/lib/python3.10/site-packages/bioservices/uniprot.py and it should be a string. I didn't find any better workaround than just replacing this line batch = batch.split("\n")[1:] with batch = str(batch).split("\n")[1:] to avoid stopping the process from UniProt. Even so, it is still giving me a warning like

 WARNING [bioservices.UniProt:596]:  status is not ok with Forbidden

Even with this warning, I could get the PDB IDs and the PDB structures as well. Thanks again!

HobnobMancer · 2023-03-30T07:43:14Z

Don't apologise! It should have been more obvious in the documentation :)

I'm glad you got it working!

Reproduction

I can't reproduce this error. Using the following commands produces no errors for me:

cazy_webscraper <email> --families PL20 -o cazy_db
cw_get_uniprot_data cazy_db <email> --families --pdb --sequence --ec

And the data was downloaded and inserted into the local CAZyme database correctly - checked using:

sqlite3 cazy_db.db "SELECT * FROM Uniprots"
sqlite3 cazy_db.db "SELECT * FROM Ecs"
sqlite3 cazy_db.db "SELECT * FROM Pdbs"

Bioservices

The lines of code you are quoting are from bioservices. For reproducibility of work/research, I wouldn't recommend altering the code base of widely used packages such as bioservices. If you're having issues, I would recommend raising an issue in the respective GitHub repo.

The Bioservices error 596 typically arised from issues with the new UniProt API (updated last year), as discussed in issue #100 . You need to running bioservices version >= 1.10.4. cazy_webscraper should be handling this.

You might want to checkout the bioservices issue 224.

HobnobMancer self-assigned this Mar 28, 2023

HobnobMancer added the documentation Improvements or additions to documentation label Mar 28, 2023

HobnobMancer added a commit that referenced this issue Mar 28, 2023

add not on cw_get_uniprot before cw_get_pdb

39b7d57

as referenced in issues #111

HobnobMancer mentioned this issue Mar 28, 2023

add not on cw_get_uniprot before cw_get_pdb #112

Merged

HobnobMancer added a commit that referenced this issue Mar 28, 2023

add not on cw_get_uniprot before cw_get_pdb

e744313

issue #111

HobnobMancer added a commit that referenced this issue Mar 28, 2023

add not on cw_get_uniprot before cw_get_pdb

7022c23

issue #111

chagas98 mentioned this issue Mar 31, 2023

Error in bioservices/uniprot.py cokelaer/bioservices#254

Open

HobnobMancer mentioned this issue Apr 25, 2023

Unexpected error message when retrieving AA UniProt sequences #114

Closed

HobnobMancer linked a pull request Apr 25, 2023 that will close this issue

Issue 111 + 112 uniprot #115

Merged

HobnobMancer closed this as completed in #115 May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No PDB acessions matched and Retrieving no protein structure files #111

No PDB acessions matched and Retrieving no protein structure files #111

chagas98 commented Mar 28, 2023

HobnobMancer commented Mar 28, 2023

chagas98 commented Mar 29, 2023 •

edited

Loading

HobnobMancer commented Mar 30, 2023

No PDB acessions matched and Retrieving no protein structure files #111

No PDB acessions matched and Retrieving no protein structure files #111

Comments

chagas98 commented Mar 28, 2023

HobnobMancer commented Mar 28, 2023

chagas98 commented Mar 29, 2023 • edited Loading

HobnobMancer commented Mar 30, 2023

Reproduction

Bioservices

chagas98 commented Mar 29, 2023 •

edited

Loading