In [1]:
!date

Tue Jun 26 15:37:26 PDT 2018


### Tuesday, June 26, 2018 ###

This notebook is to document my use of the MetaGOmics web application (http://www.yeastrc.org/metagomics/home.do) to perform a peptide-based functional and taxonomic analysis of my metaproteomic data, as described in [Riffle et al., 2018](https://www.ncbi.nlm.nih.gov/pubmed/29280960). The MetaGOmics algorithim is pretty close to Unipept in its approach to a peptide-based least common ancestor evaluation (LCA) that is tied to the UniProtKB database, but it also does a similar evaluation of protein functionality using GO terms from the protein annotations, and it also uses spetral counts to approach some quanitiation of these analyses.

The input needed for MetaGOmics:

- the queried sequence database in FASTA format. It's noted in the Riffle paper that one only needs a FASTA file containing the proteins matched by any of the uploaded peptides - hence there's some trimming that will make the analysis much quicker. To achieve this I'm uploading the exported FASTA from PEAKS 8.5 Spider searches. This was simple for PEAKS output, but apparently there's some tips for this trimming in the MetaGOmics's GitHub repo

- a text file containing the identified peptides and their corresponding spectral counts. Though, it would work just as well to use spectral areas over spectral counts, it's a relative comparison (and would compare a peptide's spectral area to the total, just as the number of spectral counts attributed to a peptide would be compared to the total number of spectral counts for all peptides).

I went to MetaGOmics and uploaded ja2-proteins.fasta in the web application and named it _2017 P2 100 m suspended (JA2)_ I chose to use the SwissProt (Unisprot) database for the GO annotations database and I selected a for the BLAST filters: 1E-10, using the top hit only. I recieved an email several seconds later with a link to a page where I could access the database I'd set up and load peptide lists. It looks like this: ![metagomics database page screengrab]( https://raw.githubusercontent.com/MeganEDuffy/2017-etnp/master/images/06.26.18-metgomics-ss.png)

Then I need to actually upload the peptide sequences and their corresponding spectral counts (or areas, I chose counts to start). I took the Spider output xlxs file I'd made from the PEAKS exported .csv, and I just made a new Excel file containing the sequences and the # spectra columns, with no headers. It looked like this:


In [2]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3


Note that I needed to have it saved as tab delimted (.txt). When I tried saving it as a .csv, I didn't get any meaninggul output (no error messages, just no matched GO terms for any peptides). Then I got an email saying I could view my data. You can see that the two (one failed .csv, one suscessful .txt uploads look like in the screengrab above). When I click on the GO annotation download button for the sucessful one (100 m suspended), I get this: ![metagomics results download window](https://raw.githubusercontent.com/MeganEDuffy/2017-etnp/master/images/06.27.18-metagomics-results-ss.png)


I don't understand why I don't have GO images. Downloading the GO report, I get this below. Plenty of content! I made a new `metagomics` directory in `analyses` and renamed the report with a more descriptive title. See here:

In [8]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/go_report_ja2_06.26.18.txt

# MetaGOmics GO report
# MetaGOmics version: 0.1.1
# Run date: Tue Jun 26 21:21:21 PDT 2018
GO acc	GO aspect	GO name	count	total count	ratio
unknownprc	biological_process	unknown biological process	1960	2961	0.6619385342789598
GO:0038061	biological_process	NIK/NF-kappaB signaling	2	2961	6.754474839581223E-4
unknownfun	molecular_function	unknown molecular function	2225	2961	0.751435325903411
GO:0009733	biological_process	response to auxin	4	2961	0.0013508949679162446
GO:0032392	biological_process	DNA geometric change	2	2961	6.754474839581223E-4
unknowncmp	cellular_component	unknown cellular component	1982	2961	0.6693684566024991


The taxonomy report looks like this:

In [7]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/taxonomy_report_ja2_06.26.18.txt

# MetaGOmics taxonomy report
# MetaGOmics version: 0.1.1
# Run date: Tue Jun 26 21:24:25 PDT 2018
GO acc	GO aspect	GO name	taxon name	NCBI taxonomy id	taxonomy rank	PSM count	ratio of GO	ratio of run
unknownprc	biological_process	unknown biological process	root	1	no rank	51	0.026020408163265306	0.017223910840932118
unknownprc	biological_process	unknown biological process	Bacteria	2	superkingdom	32	0.0163265306122449	0.010807159743329957
unknownprc	biological_process	unknown biological process	Hominidae	9604	family	2	0.0010204081632653062	6.754474839581223E-4
unknownprc	biological_process	unknown biological process	Cyanothecaceae	1892249	family	13	0.0066326530612244895	0.004390408645727794
unknownprc	biological_process	unknown biological process	Rodentia	9989	order	2	0.0010204081632653062	6.754474839581223E-4
unknownprc	biological_process	unknown biological process	Homo	9605	genus	2	0.0010204081632653062	6.754474839581223E-4


There were some modifications in parentheses in my peptide input, I need to fix that step of this process to continue. And I need to figure out why I'm not getting any images from the MetaGOmics app.

In [10]:
#What sequence modifications are there?

!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3
IVVGGPYSSVSDASSGLDGSQK	3
YKDNFTVTAPEIALNEGFDR	3
SC(+57.02)AAAGTEC(+57.02)LISGWGNTK(+28.03)	10
TNYFGIQGTDNGNLTNSFAESELER	3
TNYFGLQGTDNGNLTNSFAESELER	3
AIQQQIE(+21.98)NPLAQQILSGELVPGK	1
LGSDSGM(+15.99)LAFEPSNIK	2
LPQVEGTGGDVQPSQ(+.98)DLVR	1
VIGQNEAVDAVSNAIR	19
ALTTGVDYAQGLVALGGDDK	3
ILSTINDADLNASAAEIK	2
GGQPLFFGEGTYANLSQTAR	3
TNYFGIQGTDNGNLTNSFAESELERA	6
TNYFGLQGTDNGNLTNSFAESELERA	6
TGEIGDGKIFISPVDSVVR	2
DNFTVTAPEIALNEGFDR	2
TNQNVGLDPETLALATPAR	3
YNSGEGGC(+57.02)FYSVDTIEAPWNSGR	2
SGGGGGGGLGSGGSIR	1
AGAGDDEVNAGS(-15.99)GDDIVR	1
LPQVEGTGGD(+21.98)VQPSQDLVR	2
AISSGAGNSDFVVGPER	3
LGSDAGM(+15.99)LAFEPSNIK	3
L(+28.03)GEHNIDVLEGNEQFINAAK	1
GGSGNDTINGGGGDDLIR	2
IVVGGPYSSVSDAASSLDSSQKSVFDK	2
LVPGFEA

In [14]:
# remove '(+15.99)' and '(+57.02)', (+.98) from sequences
!sed 's/(+15.99)//g' /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt \
| sed 's/(+57.02)//g'\
| sed 's/(+.98)//g'\
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics--nomod-input.txt

In [15]:
!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics--nomod-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3
IVVGGPYSSVSDASSGLDGSQK	3
YKDNFTVTAPEIALNEGFDR	3
SCAAAGTECLISGWGNTK(+28.03)	10
TNYFGIQGTDNGNLTNSFAESELER	3
TNYFGLQGTDNGNLTNSFAESELER	3
AIQQQIE(+21.98)NPLAQQILSGELVPGK	1
LGSDSGMLAFEPSNIK	2
LPQVEGTGGDVQPSQDLVR	1
VIGQNEAVDAVSNAIR	19
ALTTGVDYAQGLVALGGDDK	3
ILSTINDADLNASAAEIK	2
GGQPLFFGEGTYANLSQTAR	3
TNYFGIQGTDNGNLTNSFAESELERA	6
TNYFGLQGTDNGNLTNSFAESELERA	6
TGEIGDGKIFISPVDSVVR	2
DNFTVTAPEIALNEGFDR	2
TNQNVGLDPETLALATPAR	3
YNSGEGGCFYSVDTIEAPWNSGR	2
SGGGGGGGLGSGGSIR	1
AGAGDDEVNAGS(-15.99)GDDIVR	1
LPQVEGTGGD(+21.98)VQPSQDLVR	2
AISSGAGNSDFVVGPER	3
LGSDAGMLAFEPSNIK	3
L(+28.03)GEHNIDVLEGNEQFINAAK	1
GGSGNDTINGGGGDDLIR	2
IVVGGPYSSVSDAASSLDSSQKSVFDK	2
LVPGFEAPVNLVYSQGNR	4
LVPGFEAPVNLVYSK	3
AYGGETVTAG

In [None]:
#I have more to get. Looking at some old notebooks where I did this.