In [25]:
from pickle import loads
import blosc
import json

with open('jcvi.prost.db.pkl','rb') as f:
    db = loads(blosc.decompress(f.read()))

In [4]:
intro='''Minimal Organisim JCVI-Syn3a PROST Results
------------------------------------------

This page presents PROST analysis of J. Craig Venter Institute (JCVI) _Syn3a_ (minimal organism) [\[1\]](https://doi.org/10.1126/science.aad6253)[\[2\]](https://doi.org/10.7554/eLife.36842) genes. Each link in the _NCBI ID_ column opens a detail page for that protein. Homologs for each protein were found by three different tools: PROST [\[3\]](https://www.biorxiv.org/content/10.1101/2022.03.10.483778v1), BLAST[\[4\]](https://doi.org/10.1016/S0022-2836(05)80360-2), and Foldseek[\[5\]](https://doi.org/10.1101/2022.02.07.479398). Later, a structural alignment tool FATCAT [\[6\]](https://doi.org/10.1093/nar/gkaa443), is used to get the significance of the structural alignment of the minimal organism protein with its homologs. The most significant structural homolog named as best homolog shown in this table with its function for reference. A detailed explanation of each column is given below:

*   **NCBI ID:** NCBI identification tag for minimal organism protein. Opens a detail page for that protein.
*   **JCVI ID:** JCVI identification tag for the same protein.
*   **Function:** Assigned function of the minimal organism protein when it's published.
*   **Classification:** Initial classification of minimal organism proteins using the TIGRfam database by its authors [\[1\]](https://doi.org/10.1126/science.aad6253)
*   **Best Homolog:** The most significant structural homolog for the minimal organism protein.
*   **Hom. Func.:** The function of the most significant structural homolog.
*   **P-Score:** The structural similarty statistical significance value produced by FATCAT [\[6\]](https://doi.org/10.1093/nar/gkaa443) structural alignment tool. This value indicates the similarity between the JCVI-Syn3a protein with its closest homolog.
*   **Seq-Id:** The sequence identity score produced by the global alignment algorithm using the ProtSub matrix [\[7\]](https://doi.org/10.1002/prot.26050) with a gap opening of 5 and gap extension 1 penalties.
*   **Hom. Source:** The tool that found the most significant structural homolog.
*   **PROST Hom:** The number of homologs found by PROST.
*   **BLAST Hom:** The number of homologs found by BLAST.
*   **Foldseek Hom:** The number of homologs found by Foldseek.'''

In [39]:
website = {
    'md:intro':intro,
    'table:results': {
        'columns': ['l:NCBI ID','JCVI ID','Function','Classification','l:Best Homolog','Homolog Function','FATCAT P-Score','Sequence Identity','Homolog Source','# PROST Hom.','# BLAST Hom.','# Foldseek Hom.'],
        'rows':[]
    },
    'navpage:disc':{
        'p:info':'For documents and software available from this server, we do not warrant or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, product, or process disclosed. We do not endorse or recommend any commercial products, processes, or services. Some pages may provide links to other Internet sites for the convenience of users. We are not responsible for the availability or content of these external sites, nor do we endorse, warrant, or guarantee the products, services, or information described or offered at these other Internet sites. Information that is created by this site is within the public domain. It is not the intention to provide specific medically related advice but rather to provide users with information for better understanding. However, it is requested that in any subsequent use of this work, PROST be given appropriate acknowledgment. We do not collect any personally identifiable information (PII) about visitors to our Web sites.'
    }
}

def tr_source(sr):
    if   sr == '1. PBF': return 'PROST+BLAST+Foldseek'
    elif sr == '2. PF': return 'PROST+Foldseek'
    elif sr == '3. BF': return 'BLAST+Foldseek'
    elif sr == '4. PB': return 'PROST+BLAST'
    elif sr == '5. P': return 'Only PROST'
    elif sr == '6. F': return 'Only Foldseek'
    elif sr == '7. B': return 'Only BLAST'
    else: return 'NA'

summary = []
for p in db:
    info = db[p]
    jcviID = info[0][1].split('_')[1]
    website['table:results']['rows'].append(['/'+jcviID+'@'+p,                                               #NCBI ID
                                             jcviID,                                                         #JCVI ID
                                             info[0][3],info[0][4],                                          #Function
                                             info[0][5],                                                     #Classification
                                             'https://www.uniprot.org/uniprot/'+info[1][0]+'@'+info[1][0],   #l:Best Homolog
                                             info[1][3],                                                     #Homolog Function
                                             info[1][1],                                                     #FATCAT P-Score
                                             info[1][2],                                                     #Sequence Identity
                                             tr_source(info[1][4]),                                          #Homolog Source
                                             info[0][19],                                                    # # PROST Hom.
                                             info[0][20],                                                    # # BLAST Hom.
                                             info[0][21]])                                                   # # Foldseek Hom.
    website['page:'+jcviID] = {
        'md:info':f'''Summary
-------

The _literature_ section presents previous knowledge about this protein. There have been five different efforts to annotate minimal organism genome. The annotations from those efforts are given in the _Literature_ section for completeness.
        ''',
        'md:s1:4':f'''##### {info[0][0]}

###### {info[0][1]}

{info[0][3]}.  
_M. mycoides_ homolog: {info[0][2]}.  
TIGRfam Classification: {info[0][4]}.  
Category: {info[0][5]}.
        ''',
        'md:s2:4':f'''##### Statistics

Total GO Annotation: {info[0][6]}  
Unique PROST Go: {info[0][7]}  
Unique BLAST Go: {info[0][8]}  
Unique Foldseek Go: {info[0][9]}

Total Homologs: {info[0][10]}  
Unique PROST Homologs: {info[0][11]}  
Unique BLAST Homologs: {info[0][12]}  
Unique Foldseek Homologs: {info[0][13]}
        ''',
        'md:s3:4':f'''##### Literature

* Danchin and Fang [\[1\]](https://doi.org/10.1111/1751-7915.12384): {info[0][14]}  
* Yang and Tsui [\[2\]](https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00262): {info[0][15]}  
* Antczak et al. [\[3\]](https://doi.org/10.1038/s41467-019-10837-2): {info[0][16]}  
* Zhang et al. [\[4\]](https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00359?ref=pdf): {info[0][17]} 
* Bianchi et al. [\[5\]](https://pubs.acs.org/doi/full/10.1021/acs.jpcb.2c04188): {info[0][18]}
        ''',
        'md:a1':f'''Structures and Sequence Alignment
---------------------------------

The best structural homolog that predicted by {info[1][4]} was [{info[1][0]}](https://www.uniprot.org/uniprot/{info[1][0]}) ({info[1][3]}) with a FATCAT **P-Value: {info[1][1]}** and RMSD of **{info[1][5]} angstrom**.  
Structural alignment shown below. Query protein {info[0][0]} colored as red in alignment, homolog {info[1][0]} colored as blue. Query protein {info[0][0]} is also shown in right top, homolog {info[1][0]} showed in right bottom. They are colored based on secondary structures.
       ''',  
        "alnpdb:a2":{
        "pdb1":f"https://raw.githubusercontent.com/MesihK/minweb/master/static/results/{info[0][0]}/query.pdb",
        "name1":info[0][0],
        "pdb2":f"https://raw.githubusercontent.com/MesihK/minweb/master/static/results/{info[0][0]}/target.pdb",
        "name2":info[1][0],
        "alnpdb":f"https://raw.githubusercontent.com/MesihK/minweb/master/static/results/{info[0][0]}/aln.pdb",
        "lineLen":120,
        "info":""
      },
      'md:go1':f'''Go Annotations
--------------

**1\. PBF** indicates the go terms that are found by both PROST and BLAST and Foldseek.  
**2\. PF** indicates the go terms that are found by only PROST and Foldseek.  
**3\. BF** indicates the go terms that are found by only BLAST and Foldseek.  
**4\. PB** indicates the go terms that are found by both PROST and BLAST.  
**5\. P** indicates the go terms that are found by only PROST.  
**6\. F** indicates the go terms that are found by only Foldseek.  
**7\. B** indicates the go terms that are found by only BLAST.
      ''',
      'table:go1': {
        'columns': ['Source','l:GO','Description'],
        'rows':[]
      },
      'md:go2':f'''Uniprot GO Annotations
----------------------
      ''',
      'table:go2': {
        'columns': ['l:GO','Description'],
        'rows':[]
      },
      'md:hom':f'''Homologs
--------

**1\. PBF** indicates the homologs that are found by both PROST and BLAST and Foldseek.  
**2\. PF** indicates the homologs that are found by only PROST and Foldseek.  
**3\. BF** indicates the homologs that are found by only BLAST and Foldseek.  
**4\. PB** indicates the homologs that are found by both PROST and BLAST.  
**5\. P** indicates the homologs that are found by only PROST.  
**6\. F** indicates the homologs that are found by only Foldseek.  
**7\. B** indicates the homologs that are found by only BLAST.
      ''',
      'table:hom': {
        'columns': ['Source','l:Homolog','Description','Fatcat P-value','PROST E-value','BLAST E-value','Foldseek TM-Score'],
        'rows':[]
      }
    }
    for g in info[2]:
        website['page:'+jcviID]['table:go1']['rows'].append([g[0],f'http://amigo.geneontology.org/amigo/term/{g[1]}@{g[1]}',g[2]])
    for g in info[3]:
        website['page:'+jcviID]['table:go2']['rows'].append([f'http://amigo.geneontology.org/amigo/term/{g[0]}@{g[0]}',g[1]])
    for h in info[4]:
        website['page:'+jcviID]['table:hom']['rows'].append([h[0],f'https://www.uniprot.org/uniprot/{h[1]}@{h[1]}',h[2],h[3],h[4],h[5],h[6]])

In [24]:
website['page:0001']

{'md:info': 'Summary\n-------\n\nThe _literature_ section presents previous knowledge about this protein. There have been five different efforts to annotate minimal organism genome. The annotations from those efforts are given in the _Literature_ section for completeness.\n\n##### AVX54569.1\n\n###### JCVISYN3A_0001\n\nChromosomal replication initiator protein.  \n_M. mycoides_ homolog: Q6MUM7.  \nTIGRfam Classification: 5=Equivalog.  \nCategory: Essential.\n\n##### Statistics\n\nTotal GO Annotation: 44  \nUnique PROST Go: 4  \nUnique BLAST Go: 0  \nUnique Foldseek Go: 23\n\nTotal Homologs: 833  \nUnique PROST Homologs: 17  \nUnique BLAST Homologs: 2  \nUnique Foldseek Homologs: 209\n\n##### Literature\n\nDanchin and Fang [\\[1\\]](https://doi.org/10.1111/1751-7915.12384): NA  \nYang and Tsui [\\[2\\]](https://pubs.acs.org/doi/10.1021/acs.jproteome.8b00262): NA  \nAntczak et al. [\\[3\\]](https://doi.org/10.1038/s41467-019-10837-2): dnaA; Chromosomal replication initiator protein  \nZh

In [40]:
with open('jsonwp/PROST-MinimalOrganismV1.json','w') as f:
    f.write(json.dumps(website)) 


In [42]:
!gzip jsonwp/t1.json

gzip: jsonwp/t1.json: No such file or directory
