Protein Data Crystallization Extraction V1.0.1 #4
Njantang1
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🎉 Protein Crystallization Data Extraction (PCDE) V1.0.1
Protein Crystallization Data Extraction Tool
by Nana Njantang Ruth · ORCID 0000-0002-6003-7521
Overview
This is the first stable release of the automated pipeline for retrieving, filtering, and analyzing protein crystallization conditions from the RCSB Protein Data Bank (PDB). Starting from an amino acid sequence, the pipeline produces structured, analysis-ready datasets and publication-quality figures.
What's included in this release
🗂️ Input & FASTA Export
output/{seq_type_name}/🔍 Step 1 — RCSB Sequence Search (
rcsb_sequence_identity.py)/rcsbsearch/v2/query) using sequence similarity scoringpdb_protein_sequence,pdb_dna_sequence,pdb_rna_sequence)/rest/v1/core/entry/{pdb_id}PDB_ID,Entity,Score,Seq_id(sequence identity),E-value{seq_type_name}_rcsb_hits.csv🧹 Step 2 — Crystallization Data Retrieval & Filtering (
PDB_searchAPI.py)filter_experimental_conditions()🧪 Compound Annotation (
extract_structures.py)COMPOUNDcolumn usingstructures.pkl🔗 Step 3 — Merge RCSB + Crystallization Data
PDB_ID{seq_type_name}_merged_results.csv📊 Step 4 — High-Resolution Visualization (
plot.py)run_plot()generates 300 dpi analytical scatter plots from the merged CSV:📄 Consolidated PDF Report
Cryst_cocktail_Table.pdf: a colored summary table compiling PDB IDs, alignment metrics, ligands, and complete chemical cocktails into a single laboratory resourcePipeline Flow
Output Structure
Module Overview
main.pyrcsb_sequence_identity.pyPDB_searchAPI.pyextract_structures.pyplot.pyDependencies
requests,pandas,openpyxl,matplotlib,concurrent.futures,tempfileInstall all dependencies with:
📦 Download Assets
The following pre-built archives are available for this release:
Source code (zip)Source code (tar.gz)Direct download links:
How to Install
Option 1: From ZIP Archive
Option 2: Clone from Git
git clone -b V1.0.0 https://github.com/RitAreaSciencePark/Protein_Crystallization_Data_Extraction.git cd Protein_Crystallization_Data_Extraction pip install -r requirements.txtQuick Start
Web Application
cd protein_crystallization_app python manage.py migrate python manage.py runserverThen visit
http://127.0.0.1:8000Command-Line (Single Sequence)
Command-Line (FASTA Batch)
cd src_fasta_file python main.py input.fastaHow to Cite
If you use this pipeline in your research, please cite:
Or use the
CITATION.cfffile included in this repository.Known Limitations
structures.pklreference fileLicense
MIT © 2025 RitAreaSciencePark
This project is released under the MIT License. See the LICENSE file for details.
Support & Feedback
For issues, feature requests, or questions:
Thank you for using PCDE! 🧬
This discussion was created from the release Protein Data Crystallization Extraction V1.0.1.
Beta Was this translation helpful? Give feedback.
All reactions