In [1]:
!date

Tue Jun 26 15:37:26 PDT 2018


### Tuesday, June 26, 2018 ###

This notebook is to document my use of the MetaGOmics web application (http://www.yeastrc.org/metagomics/home.do) to perform a peptide-based functional and taxonomic analysis of my metaproteomic data, as described in [Riffle et al., 2018](https://www.ncbi.nlm.nih.gov/pubmed/29280960). The MetaGOmics algorithim is pretty close to Unipept in its approach to a peptide-based least common ancestor evaluation (LCA) that is tied to the UniProtKB database, but it also does a similar evaluation of protein functionality using GO terms from the protein annotations, and it also uses spetral counts to approach some quanitiation of these analyses.

The input needed for MetaGOmics:

- the queried sequence database in FASTA format. It's noted in the Riffle paper that one only needs a FASTA file containing the proteins matched by any of the uploaded peptides - hence there's some trimming that will make the analysis much quicker. To achieve this I'm uploading the exported FASTA from PEAKS 8.5 Spider searches. This was simple for PEAKS output, but apparently there's some tips for this trimming in the MetaGOmics's GitHub repo

- a text file containing the identified peptides and their corresponding spectral counts. Though, it would work just as well to use spectral areas over spectral counts, it's a relative comparison (and would compare a peptide's spectral area to the total, just as the number of spectral counts attributed to a peptide would be compared to the total number of spectral counts for all peptides).

I went to MetaGOmics and uploaded ja2-proteins.fasta in the web application and named it _2017 P2 100 m suspended (JA2)_ I chose to use the SwissProt (Unisprot) database for the GO annotations database and I selected a for the BLAST filters: 1E-10, using the top hit only. I recieved an email several seconds later with a link to a page where I could access the database I'd set up and load peptide lists. It looks like this: ![metagomics database page screengrab]( https://raw.githubusercontent.com/MeganEDuffy/2017-etnp/master/images/06.26.18-metgomics-ss.png)

Then I need to actually upload the peptide sequences and their corresponding spectral counts (or areas, I chose counts to start). I took the Spider output xlxs file I'd made from the PEAKS exported .csv, and I just made a new Excel file containing the sequences and the # spectra columns, with no headers. It looked like this:


In [2]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3


Note that I needed to have it saved as tab delimted (.txt). When I tried saving it as a .csv, I didn't get any meaninggul output (no error messages, just no matched GO terms for any peptides). Then I got an email saying I could view my data. You can see that the two (one failed .csv, one suscessful .txt uploads look like in the screengrab above). When I click on the GO annotation download button for the sucessful one (100 m suspended), I get this: ![metagomics results download window](https://raw.githubusercontent.com/MeganEDuffy/2017-etnp/master/images/06.27.18-metagomics-results-ss.png)


I don't understand why I don't have GO images. Downloading the GO report, I get this below. Plenty of content! I made a new `metagomics` directory in `analyses` and renamed the report with a more descriptive title. See here:

In [16]:
!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/go_report_ja2_06.26.18.txt

# MetaGOmics GO report
# MetaGOmics version: 0.1.1
# Run date: Tue Jun 26 21:21:21 PDT 2018
GO acc	GO aspect	GO name	count	total count	ratio
unknownprc	biological_process	unknown biological process	1960	2961	0.6619385342789598
GO:0038061	biological_process	NIK/NF-kappaB signaling	2	2961	6.754474839581223E-4
unknownfun	molecular_function	unknown molecular function	2225	2961	0.751435325903411
GO:0009733	biological_process	response to auxin	4	2961	0.0013508949679162446
GO:0032392	biological_process	DNA geometric change	2	2961	6.754474839581223E-4
unknowncmp	cellular_component	unknown cellular component	1982	2961	0.6693684566024991
GO:0009735	biological_process	response to cytokinin	102	2961	0.034447821681864235
GO:0009737	biological_process	response to abscisic acid	6	2961	0.002026342451874367
GO:0032403	molecular_function	protein complex binding	16	2961	0.005403579871664978
GO:0009741	biological_process	response to brassinosteroid	10	2961	0.003377237419790611
GO:0051591	bio

The taxonomy report looks like this:

In [7]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/taxonomy_report_ja2_06.26.18.txt

# MetaGOmics taxonomy report
# MetaGOmics version: 0.1.1
# Run date: Tue Jun 26 21:24:25 PDT 2018
GO acc	GO aspect	GO name	taxon name	NCBI taxonomy id	taxonomy rank	PSM count	ratio of GO	ratio of run
unknownprc	biological_process	unknown biological process	root	1	no rank	51	0.026020408163265306	0.017223910840932118
unknownprc	biological_process	unknown biological process	Bacteria	2	superkingdom	32	0.0163265306122449	0.010807159743329957
unknownprc	biological_process	unknown biological process	Hominidae	9604	family	2	0.0010204081632653062	6.754474839581223E-4
unknownprc	biological_process	unknown biological process	Cyanothecaceae	1892249	family	13	0.0066326530612244895	0.004390408645727794
unknownprc	biological_process	unknown biological process	Rodentia	9989	order	2	0.0010204081632653062	6.754474839581223E-4
unknownprc	biological_process	unknown biological process	Homo	9605	genus	2	0.0010204081632653062	6.754474839581223E-4


There were some modifications in parentheses in my peptide input, I need to fix that step of this process to continue. And I need to figure out why I'm not getting any images from the MetaGOmics app.

In [10]:
#What sequence modifications are there?

!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3
IVVGGPYSSVSDASSGLDGSQK	3
YKDNFTVTAPEIALNEGFDR	3
SC(+57.02)AAAGTEC(+57.02)LISGWGNTK(+28.03)	10
TNYFGIQGTDNGNLTNSFAESELER	3
TNYFGLQGTDNGNLTNSFAESELER	3
AIQQQIE(+21.98)NPLAQQILSGELVPGK	1
LGSDSGM(+15.99)LAFEPSNIK	2
LPQVEGTGGDVQPSQ(+.98)DLVR	1
VIGQNEAVDAVSNAIR	19
ALTTGVDYAQGLVALGGDDK	3
ILSTINDADLNASAAEIK	2
GGQPLFFGEGTYANLSQTAR	3
TNYFGIQGTDNGNLTNSFAESELERA	6
TNYFGLQGTDNGNLTNSFAESELERA	6
TGEIGDGKIFISPVDSVVR	2
DNFTVTAPEIALNEGFDR	2
TNQNVGLDPETLALATPAR	3
YNSGEGGC(+57.02)FYSVDTIEAPWNSGR	2
SGGGGGGGLGSGGSIR	1
AGAGDDEVNAGS(-15.99)GDDIVR	1
LPQVEGTGGD(+21.98)VQPSQDLVR	2
AISSGAGNSDFVVGPER	3
LGSDAGM(+15.99)LAFEPSNIK	3
L(+28.03)GEHNIDVLEGNEQFINAAK	1
GGSGNDTINGGGGDDLIR	2
IVVGGPYSSVSDAASSLDSSQKSVFDK	2
LVPGFEA

In [14]:
# remove '(+15.99)' and '(+57.02)', (+.98) from sequences
!sed 's/(+15.99)//g' /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt \
| sed 's/(+57.02)//g'\
| sed 's/(+.98)//g'\
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics--nomod-input.txt

In [15]:
!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics--nomod-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3
IVVGGPYSSVSDASSGLDGSQK	3
YKDNFTVTAPEIALNEGFDR	3
SCAAAGTECLISGWGNTK(+28.03)	10
TNYFGIQGTDNGNLTNSFAESELER	3
TNYFGLQGTDNGNLTNSFAESELER	3
AIQQQIE(+21.98)NPLAQQILSGELVPGK	1
LGSDSGMLAFEPSNIK	2
LPQVEGTGGDVQPSQDLVR	1
VIGQNEAVDAVSNAIR	19
ALTTGVDYAQGLVALGGDDK	3
ILSTINDADLNASAAEIK	2
GGQPLFFGEGTYANLSQTAR	3
TNYFGIQGTDNGNLTNSFAESELERA	6
TNYFGLQGTDNGNLTNSFAESELERA	6
TGEIGDGKIFISPVDSVVR	2
DNFTVTAPEIALNEGFDR	2
TNQNVGLDPETLALATPAR	3
YNSGEGGCFYSVDTIEAPWNSGR	2
SGGGGGGGLGSGGSIR	1
AGAGDDEVNAGS(-15.99)GDDIVR	1
LPQVEGTGGD(+21.98)VQPSQDLVR	2
AISSGAGNSDFVVGPER	3
LGSDAGMLAFEPSNIK	3
L(+28.03)GEHNIDVLEGNEQFINAAK	1
GGSGNDTINGGGGDDLIR	2
IVVGGPYSSVSDAASSLDSSQKSVFDK	2
LVPGFEAPVNLVYSQGNR	4
LVPGFEAPVNLVYSK	3
AYGGETVTAG

In [17]:
#I have more to get. Looking at some old notebooks where I did this. Maybe I'll try printing out all the mods
!awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt > /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-propeptides-mods.txt

In [19]:
!head -50 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-propeptides-mods.txt

+57.02
+57.02
+28.03
+21.98
+15.99
+.98
+57.02
-15.99
+21.98
+15.99
+28.03
+14.02
+15.99
+14.02
+14.02
+57.02
+57.02
+15.99
+57.02
+21.98
+57.02
+57.02
+57.02
-17.03
+21.98
+15.99
+15.99
+15.99
+21.98
+15.99
+15.99
+37.95
-15.99
+21.98
-17.03
-17.03
+57.02
+57.02
+57.02
+57.02
+15.99
+37.95
+57.02
+57.02
+.98
+28.03
+15.99
+57.02
+21.98
+15.99


In [89]:
# remove '(+15.99)' and '(+57.02)', (+.98), all other mods from sequences
# I went through and manually found all the mods to remove them, checking that none were left
!sed 's/(+15.99)//g' /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-input.txt \
| sed 's/(+57.02)//g'\
| sed 's/(+.98)//g'\
| sed 's/(+21.98)//g'\
| sed 's/(+28.03)//g'\
| sed 's/(+37.95)//g'\
| sed 's/(-17.03)//g'\
| sed 's/(+14.02)//g'\
| sed 's/(-15.99)//g'\
| sed 's/(+42.01)//g'\
| sed 's/(sub Q)//g'\
| sed 's/(+71.04)//g'\
| sed 's/(+55.92)//g'\
| sed 's/(sub A)//g'\
| sed 's/(+30.01)//g'\
| sed 's/(+3.99)//g'\
| sed 's/(+58.01)//g'\
| sed 's/(+43.01)//g'\
| sed 's/(+72.02)//g'\
| sed 's/(-18.01)//g'\
| sed 's/(sub D)//g'\
| sed 's/(sub G)//g'\
| sed 's/(+27.99)//g'\
| sed 's/(sub K)//g'\
| sed 's/(+37.96)//g'\
| sed 's/(+43.99)//g'\
| sed 's/(+21.97)//g'\
| sed 's/(+31.99)//g'\
| sed 's/(sub I)//g'\
| sed 's/(sub S)//g'\
| sed 's/(sub N)//g'\
| sed 's/(+15.01)//g'\
| sed 's/(-.98)//g'\
| sed 's/(+114.04)//g'\
| sed 's/(sub Y)//g'\
| sed 's/(sub T)//g'\
| sed 's/(del Q)//g'\
| sed 's/(-48.00)//g'\
| sed 's/(+100.02)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(+79.97)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(sub L)//g'\
| sed 's/(+53.92)//g'\
| sed 's/(sub V)//g'\
| sed 's/(sub M)//g'\
| sed 's/(+15.00)//g'\
| sed 's/(+150.04)//g'\
| sed 's/(+60.00)//g'\
| sed 's/(+119.04)//g'\
| sed 's/(+41.03)//g'\
| sed 's/(+29.02)//g'\
| sed 's/(+79.96)//g'\
| sed 's/(+282.05)//g'\
| sed 's/(del N)//g'\
| sed 's/(+228.11)//g'\
| sed 's/(sub E)//g'\
| sed 's/(+27.05)//g'\
| sed 's/(+87.03)//g'\
| sed 's/(-15.01)//g'\
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-nomod-input.txt

In [90]:
!head -20 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja1-ja6/MED_ETNP_JA2_uwpr201704_SPIDER_19/ja2-metagomics-nomod-input.txt

IVVGGPYSSVSDAASVLDGSQK	7
IVVGGPYSSVSDAASSLDSSQK	6
AIQQQIENPLAQQILSGELVPGK	6
LPQVEGTGGDVQPSQDLVR	13
QAVSADSSGSFIGGAELASLK	7
LGEHNIDVLEGNEQFINAAK	11
YLGSTGGLLNSAETEEK	6
YIGSTGGLLNSAETEEK	6
AISADSSGGFIGGAELSQLK	3
AGAGDDEVNAGSGDDIVR	3
IVVGGPYSSVSDASSGLDGSQK	3
YKDNFTVTAPEIALNEGFDR	3
SCAAAGTECLISGWGNTK	10
TNYFGIQGTDNGNLTNSFAESELER	3
TNYFGLQGTDNGNLTNSFAESELER	3
AIQQQIENPLAQQILSGELVPGK	1
LGSDSGMLAFEPSNIK	2
LPQVEGTGGDVQPSQDLVR	1
VIGQNEAVDAVSNAIR	19
ALTTGVDYAQGLVALGGDDK	3


Now I'm using the input file without modifications to search in MetaGOmics. It completed in a matter of minutes. Again, no downloadable images - so I guess this means there were no significant terms of that go aspect? I saved the files with today's date, 06.28.18. 

In [97]:
!ls /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/

go_report_ja2_06.26.18.txt             taxonomy_report_ja2_06.26.18.txt
go_report_ja2nomods_06.28.18.txt       taxonomy_report_ja2nomods_06.28.18.txt


In [98]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/go_report_ja2nomods_06.28.18.txt  

# MetaGOmics GO report
# MetaGOmics version: 0.1.1
# Run date: Thu Jun 28 14:58:29 PDT 2018
GO acc	GO aspect	GO name	count	total count	ratio
unknownprc	biological_process	unknown biological process	1518	2509	0.6050219210840972
GO:0038061	biological_process	NIK/NF-kappaB signaling	2	2509	7.971303308090873E-4
unknownfun	molecular_function	unknown molecular function	1752	2509	0.6982861697887605
GO:0009733	biological_process	response to auxin	4	2509	0.0015942606616181746
GO:0032392	biological_process	DNA geometric change	2	2509	7.971303308090873E-4
unknowncmp	cellular_component	unknown cellular component	1545	2509	0.6157831805500199


In [99]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja2/taxonomy_report_ja2nomods_06.28.18.txt

# MetaGOmics taxonomy report
# MetaGOmics version: 0.1.1
# Run date: Thu Jun 28 15:01:39 PDT 2018
GO acc	GO aspect	GO name	taxon name	NCBI taxonomy id	taxonomy rank	PSM count	ratio of GO	ratio of run
unknownprc	biological_process	unknown biological process	root	1	no rank	68	0.04479578392621871	0.027102431247508968
unknownprc	biological_process	unknown biological process	Bacteria	2	superkingdom	49	0.032279314888010543	0.019529693104822637
unknownprc	biological_process	unknown biological process	Hominidae	9604	family	2	0.0013175230566534915	7.971303308090873E-4
unknownprc	biological_process	unknown biological process	Cyanothecaceae	1892249	family	16	0.010540184453227932	0.006377042646472699
unknownprc	biological_process	unknown biological process	Rodentia	9989	order	2	0.0013175230566534915	7.971303308090873E-4
unknownprc	biological_process	unknown biological process	Homo	9605	genus	2	0.0013175230566534915	7.971303308090873E-4


Now uploading the ja14 protein fasta (ja14-proteins-spider72.fasta) and doing the same process for that sample (1000 m suspended). I opened the csv file from the Spider search (72) and extracted just the peptide sequence column and the #spectra column, deleted the headers, and saved the file as a tab-delimited text file all in Excel. It now looks like this:

In [100]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja9-ja16/MED_ETNP_JA14_11.07.17_SPIDER_72/ja14-metagomics-input.txt

SC(+57.02)AAAGTEC(+57.02)LISGWGNTK(+28.03)	4
AIQQQIENPLAQQILSGELVPGK	3
LPQVEGTGGDVQPSQDLVR	6
AIQQQIE(+21.98)NPLAQQILSGELVPGK	1
VATVSLPRSC(+57.02)AAAGTEC(+57.02)LISGWGNTK(+28.03)	1
S(+28.03)SGSSYPSLLQC(+57.02)LK	2
LPQVEGTGGD(+37.96)VQPSQDLVR	1
LPQVEGTGGDVQ(+.98)PSQDLVR	2
SSGSSYPSLLQC(+57.02)LK(+28.03)APVLSDSSC(+57.02)K	1
LGEHNIDVLEGNEQFINAAK	3


In [112]:
# Now using the code from the previous file to remove modifications. 
# There will probably be some here I need to weed out as they didn't occur in JA2.
# Update: there were only 3! Many fewer peptides, period, though.
!sed 's/(+15.99)//g' /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja9-ja16/MED_ETNP_JA14_11.07.17_SPIDER_72/ja14-metagomics-input.txt \
| sed 's/(+57.02)//g'\
| sed 's/(+.98)//g'\
| sed 's/(+21.98)//g'\
| sed 's/(+28.03)//g'\
| sed 's/(+37.95)//g'\
| sed 's/(-17.03)//g'\
| sed 's/(+14.02)//g'\
| sed 's/(-15.99)//g'\
| sed 's/(+42.01)//g'\
| sed 's/(sub Q)//g'\
| sed 's/(+71.04)//g'\
| sed 's/(+55.92)//g'\
| sed 's/(sub A)//g'\
| sed 's/(+30.01)//g'\
| sed 's/(+3.99)//g'\
| sed 's/(+58.01)//g'\
| sed 's/(+43.01)//g'\
| sed 's/(+72.02)//g'\
| sed 's/(-18.01)//g'\
| sed 's/(sub D)//g'\
| sed 's/(sub G)//g'\
| sed 's/(+27.99)//g'\
| sed 's/(sub K)//g'\
| sed 's/(+37.96)//g'\
| sed 's/(+43.99)//g'\
| sed 's/(+21.97)//g'\
| sed 's/(+31.99)//g'\
| sed 's/(sub I)//g'\
| sed 's/(sub S)//g'\
| sed 's/(sub N)//g'\
| sed 's/(+15.01)//g'\
| sed 's/(-.98)//g'\
| sed 's/(+114.04)//g'\
| sed 's/(sub Y)//g'\
| sed 's/(sub T)//g'\
| sed 's/(del Q)//g'\
| sed 's/(-48.00)//g'\
| sed 's/(+100.02)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(+79.97)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(sub L)//g'\
| sed 's/(+53.92)//g'\
| sed 's/(sub V)//g'\
| sed 's/(sub M)//g'\
| sed 's/(+15.00)//g'\
| sed 's/(+150.04)//g'\
| sed 's/(+60.00)//g'\
| sed 's/(+119.04)//g'\
| sed 's/(+41.03)//g'\
| sed 's/(+29.02)//g'\
| sed 's/(+79.96)//g'\
| sed 's/(+282.05)//g'\
| sed 's/(del N)//g'\
| sed 's/(+228.11)//g'\
| sed 's/(sub E)//g'\
| sed 's/(+27.05)//g'\
| sed 's/(+87.03)//g'\
| sed 's/(-15.01)//g'\
| sed 's/(-1.03)//g'\
| sed 's/(sub F)//g'\
| sed 's/(del A)//g'\
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja9-ja16/MED_ETNP_JA14_11.07.17_SPIDER_72/ja14-metagomics-nomod-input.txt

In [111]:
!head -2000 /Users/meganduffy/Documents/git-repos/2017-etnp/data/ja9-ja16/MED_ETNP_JA14_11.07.17_SPIDER_72/ja14-metagomics-nomod-input.txt

SCAAAGTECLISGWGNTK	4
AIQQQIENPLAQQILSGELVPGK	3
LPQVEGTGGDVQPSQDLVR	6
AIQQQIENPLAQQILSGELVPGK	1
VATVSLPRSCAAAGTECLISGWGNTK	1
SSGSSYPSLLQCLK	2
LPQVEGTGGDVQPSQDLVR	1
LPQVEGTGGDVQPSQDLVR	2
SSGSSYPSLLQCLKAPVLSDSSCK	1
LGEHNIDVLEGNEQFINAAK	3
VIGQNEAVDAVSNAIR	4
LPQVEGTGGDVQPSQDLVR	1
LGEHNIDVLEGNEQFINAAK	1
LPQVEGTGGDVQPSQDLVR	1
IITHPNFNGNTLDNDIMLIK	1
VIGQNEAVDAVSNAIR	1
VIGQNEAVDAVSNAIR	2
VATVSLPR	1
SSGSSYPSLLQCLK	2
GQNEAVDAVSNAIR	1
SCAAAGTECLISGWGNTK	3
SCAAAGTECLISGWGNTK	1
VIGQNEAVDAVSNAIR	1
VIGQNEAVDAVSNAIR	1
SSGSSYPSLLQCLK	1
AIDLIDEAASSIR	7
AIDLIDEAASSLR	7
VIGQNEAVDAVSNAIR	1
VTDAEIAEVLAR	6
SCAAAGTECLISGWGNTK	1
EEEVGLDIAQNGER	1
SCAAAGTECLISGWGNTK	2
LGEHNIDVLEGNEQFINAAK	1
DLYGNQADASWNPR	1
ADIKAVKDK	1
LSSPATLNSR	2
TQFYNDEPEAIEYGENFIVHR	1
VIGQNEAVDAVSNAIR	1
SCAAAGTECLISGWGNTK	1
VTVEEPFYVRPEEHPGAI	1
VIGQNEPVDAVSNAIR	1
PGGYQANALAQR	1
LPQVEGTGGDVQPSQDLVR	2
ADEVVAAYDSGR	1
NNPVLIGEPGVGK	4
SSGSSYPSLLQCLK	1
VIGQN

Did the MetaGOmics search with the same (default) parameters are before and here are the results below. Again, no images available to download. 

In [113]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja14/go_report_ja14_06.28.18.txt

# MetaGOmics GO report
# MetaGOmics version: 0.1.1
# Run date: Thu Jun 28 16:13:24 PDT 2018
GO acc	GO aspect	GO name	count	total count	ratio
GO:0016817	molecular_function	hydrolase activity, acting on acid anhydrides	17	138	0.12318840579710146
GO:0031328	biological_process	positive regulation of cellular biosynthetic process	15	138	0.10869565217391304
unknownprc	biological_process	unknown biological process	103	138	0.7463768115942029
GO:0016818	molecular_function	hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides	17	138	0.12318840579710146
unknownfun	molecular_function	unknown molecular function	114	138	0.8260869565217391
GO:0031329	biological_process	regulation of cellular catabolic process	2	138	0.014492753623188406


In [115]:
!head /Users/meganduffy/Documents/git-repos/2017-etnp/analyses/metagomics/susvsink/ja14/taxonomy_report_ja14_06.28.18.txt

# MetaGOmics taxonomy report
# MetaGOmics version: 0.1.1
# Run date: Thu Jun 28 16:13:31 PDT 2018
GO acc	GO aspect	GO name	taxon name	NCBI taxonomy id	taxonomy rank	PSM count	ratio of GO	ratio of run
GO:0016817	molecular_function	hydrolase activity, acting on acid anhydrides	root	1	no rank	17	1.0	0.12318840579710146
GO:0016817	molecular_function	hydrolase activity, acting on acid anhydrides	Eukaryota	2759	superkingdom	2	0.11764705882352941	0.014492753623188406
GO:0031328	biological_process	positive regulation of cellular biosynthetic process	root	1	no rank	15	1.0	0.10869565217391304
GO:0016818	molecular_function	hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides	root	1	no rank	17	1.0	0.12318840579710146
GO:0016818	molecular_function	hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides	Eukaryota	2759	superkingdom	2	0.11764705882352941	0.014492753623188406
unknownfun	molecular_function	unknown molecular function	Rhizo

Now for the sinking particles. Since they were run in duplicate, I'm going to go ahead and combine both the protein fasta files and the peptide files. I'll just do this here.

In [125]:
!ls /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/

264+265-proteins.fasta      268+269-proteins.fasta
264-metagomics-input.txt    268-metagomics-input.txt
264-proteins-spider37.fasta 268-proteins-spider54.fasta
265-metagomics-input.txt    269-metagomics-input.txt
265-proteins-spider41.fasta 269-proteins-spider58.fasta


In [120]:
# Concatenating the two FASTA files from each sample
!cat /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264-proteins-spider37.fasta /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/265-proteins-spider41.fasta \
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-proteins.fasta

In [122]:
!cat /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268-proteins-spider54.fasta /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/269-proteins-spider58.fasta \
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268+269-proteins.fasta

In [126]:
# Now trying to concatenate the peptide data
# Concatenating the two FASTA files from each sample
!cat /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264-metagomics-input.txt /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/265-metagomics-input.txt \
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-input.txt

In [127]:
# Should have worked...
!head /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-input.txt

LPQVEGTGGDVQPSQDLVR	7
LPQVEGTGGDVQ(+.98)PSQDLVR	1
VIGQNEAVDAVSNAIR	4
AIQQQIENPLAQQILSGELVPGK	2
VIGQNEAVDAVSN(+.98)AIR	1
AIDLIDEAASSIR	7
AIDLID(+21.98)EAASSIR	1
NNPVLLGE(+21.98)PGVGK	2
NNPVLIGE(+21.98)PGVGK	2
NN(+.98)PVLIGEPGVGK	2


In [128]:
# Did it? Counting the number of lines
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264-metagomics-input.txt

     297     602    4573 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264-metagomics-input.txt


In [129]:
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/265-metagomics-input.txt

     291     589    4201 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/265-metagomics-input.txt


In [130]:
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-input.txt

     588    1190    8774 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-input.txt


In [131]:
297+291

588

Great!

In [132]:
# Concatenating 268 and 269 (965 m suspended)
!cat /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268-metagomics-input.txt /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/269-metagomics-input.txt \
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268+269-metagomics-input.txt

In [133]:
# Did it? Counting the number of lines
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268-metagomics-input.txt

     459     922    7228 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268-metagomics-input.txt


In [134]:
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/269-metagomics-input.txt

     691    1410   11762 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/269-metagomics-input.txt


In [135]:
!wc /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268+269-metagomics-input.txt

    1150    2331   18990 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/268+269-metagomics-input.txt


In [139]:
# Now I need to get all the modifications out. Using the long bit of code from above to start.
# There will probably be some here I need to weed out as they didn't occur in JA2 or JA14.
# Update: there were only 1! Many fewer peptides, period, though.
# BUT! The concatenation
!sed 's/(+15.99)//g' /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-input.txt \
| sed 's/(+57.02)//g'\
| sed 's/(+.98)//g'\
| sed 's/(+21.98)//g'\
| sed 's/(+28.03)//g'\
| sed 's/(+37.95)//g'\
| sed 's/(-17.03)//g'\
| sed 's/(+14.02)//g'\
| sed 's/(-15.99)//g'\
| sed 's/(+42.01)//g'\
| sed 's/(sub Q)//g'\
| sed 's/(+71.04)//g'\
| sed 's/(+55.92)//g'\
| sed 's/(sub A)//g'\
| sed 's/(+30.01)//g'\
| sed 's/(+3.99)//g'\
| sed 's/(+58.01)//g'\
| sed 's/(+43.01)//g'\
| sed 's/(+72.02)//g'\
| sed 's/(-18.01)//g'\
| sed 's/(sub D)//g'\
| sed 's/(sub G)//g'\
| sed 's/(+27.99)//g'\
| sed 's/(sub K)//g'\
| sed 's/(+37.96)//g'\
| sed 's/(+43.99)//g'\
| sed 's/(+21.97)//g'\
| sed 's/(+31.99)//g'\
| sed 's/(sub I)//g'\
| sed 's/(sub S)//g'\
| sed 's/(sub N)//g'\
| sed 's/(+15.01)//g'\
| sed 's/(-.98)//g'\
| sed 's/(+114.04)//g'\
| sed 's/(sub Y)//g'\
| sed 's/(sub T)//g'\
| sed 's/(del Q)//g'\
| sed 's/(-48.00)//g'\
| sed 's/(+100.02)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(+79.97)//g'\
| sed 's/(+17.03)//g'\
| sed 's/(sub L)//g'\
| sed 's/(+53.92)//g'\
| sed 's/(sub V)//g'\
| sed 's/(sub M)//g'\
| sed 's/(+15.00)//g'\
| sed 's/(+150.04)//g'\
| sed 's/(+60.00)//g'\
| sed 's/(+119.04)//g'\
| sed 's/(+41.03)//g'\
| sed 's/(+29.02)//g'\
| sed 's/(+79.96)//g'\
| sed 's/(+282.05)//g'\
| sed 's/(del N)//g'\
| sed 's/(+228.11)//g'\
| sed 's/(sub E)//g'\
| sed 's/(+27.05)//g'\
| sed 's/(+87.03)//g'\
| sed 's/(-15.01)//g'\
| sed 's/(-1.03)//g'\
| sed 's/(sub F)//g'\
| sed 's/(del A)//g'\
| sed 's/(+29.04)//g'\
> /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-nomod-input.txt

In [140]:
!head -2000 /Users/meganduffy/Documents/git-repos/2017-etnp/data/264-269_netwash/combined-dups/264+265-metagomics-nomod-input.txt

LPQVEGTGGDVQPSQDLVR	7
LPQVEGTGGDVQPSQDLVR	1
VIGQNEAVDAVSNAIR	4
AIQQQIENPLAQQILSGELVPGK	2
VIGQNEAVDAVSNAIR	1
AIDLIDEAASSIR	7
AIDLIDEAASSIR	1
NNPVLLGEPGVGK	2
NNPVLIGEPGVGK	2
NNPVLIGEPGVGK	2
NNPVLLGEPGVGK	2
LPQVEGTGGDVQPSQDLVR	2
LPQVEGTGGDVQPSQDLVR	1
LGEHNIDVLEGNEQFINAAK	3
VTDAEIAEVLAR	1
AIDLIDEAASSIR	1
AIDLIDEAASSIR	1
VATVSLPR	1
GQNEAVDAVSNAIR	1
VTDAEIAEVLAR	5
NNPVLIGEPGVGK	4
NNPVLLGEPGVGK	4
QSVEPVLENAIK	1
VIGQNEAVDAV	1
QNEAVDAVSNAIR	1
LPQVEGTGGDVQPSQDLVR	1
LPQVEGTGNDVQPSQDLVR	1
NEAVDAVSNAIR	1
DAYVGDEAQSKR	1
VTDAEIAEVLAR	1
VIGANEAVDAVSNAIR	2
VIGQNEAVDAVSNAIR	1
LSSPATLNSR	2
LIKLSSPATLNSR	1
VIGQNEAVDAVSNAIR	1
VATVSLPR	3
LATVLSPR	2
QPLVEVARK	1
VIGQNEAVDAVSNAIR	1
VTDAEIAEVLAR	1
GDGDAAVAKTA	1
AGFAGDDAPR	1
QKVVDVR	2
NKVTDAEIAEVLAR	1
VTDAEIAEVLAR	1
SVEPVLENAIK	2
TAKKADPK	1
AITHFEK	8
ALTHFEK	8
NPVLIGEPGVGK	1
NPVLLGEPGVGK	1
LSDALAAIDLR	1
DDGQPLVEVAR	1
LDDGQPLVEVAR	1
GGGGDDLIR	1
GGGGDDLLR	