# Dead Man’s Teeth. Introduction to metagenomics analysis.

## Part 1. Amplicon sequencing.


### 1. QIIME2 installation


### 2. Importing data.


`qiime tools import   --type 'SampleData[SequencesWithQuality]'   --input-path manifest.tsv   --output-path sequences.qza   --input-format SingleEndFastqManifestPhred33V`

_Imported manifest.tsv as SingleEndFastqManifestPhred33V2 to sequences.qza_

`qiime tools validate sequences.qza`

_Result sequences.qza appears to be valid at level=max._

### 3. Demultiplexing and QC

`qiime demux summarize   --i-data sequences.qza   --o-visualization sequences.qzv`

_Saved Visualization to: sequences.qzv_

### 4. Feature table construction (and more QC)


`qiime dada2 denoise-single   --i-demultiplexed-seqs sequences.qza   --p-trim-left 32 --p-trunc-len 100 --o-representative-sequences rep-seqs.qza --o-table table.qza --o-denoising-stats stats.qza`

_Saved FeatureTable[Frequency] to: table.qza_

_Saved FeatureData[Sequence] to: rep-seqs.qza_

_Saved SampleData[DADA2Stats] to: stats.qza_

`qiime metadata tabulate   --m-input-file stats.qza   --o-visualization stats.qzv`

numeric	percentage of input passed filter 

bone -

calculus -

### 5. FeatureTable and FeatureData summaries


`qiime feature-table summarize   --i-table table.qza   --o-visualization table.qzv   --m-sample-metadata-file sample-metadata.tsv`

`qiime feature-table tabulate-seqs  --i-data rep-seqs.qza   --o-visualization rep-seqs.qzv`

### 6. Taxonomic analysis

`qiime feature-classifier classify-sklearn   --i-classifier gg-13-8-99-nb-classifier.qza   --i-reads rep-seqs.qza   --o-classification taxonomy.qza`

_Saved FeatureData[Taxonomy] to: taxonomy.qza_

`qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv`



`qiime taxa barplot \
  --i-table table.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots.qzv`
  
_Saved Visualization to: taxa-bar-plots.qzv_ 

## Part 2. Shotgun sequencing.

### 1-3: Problems with metaphlan

### 4. Comparison with ancient Tannerella forsythia genome


In [7]:
file_path = '/media/data/prakt_7/test_1/G12_out.txt'
output_text = ''
cnt = 0
with open(file_path, 'r') as input_file:
    for line in input_file:
        if 'RefSeq\tgene' in line:
            cnt += 1
            output_text += line

In [8]:
cnt

172

In [9]:
print(output_text)

NC_016610.1	RefSeq	gene	106544	107707	.	+	.	ID=gene-BFO_RS00425;Name=BFO_RS00425;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00425;old_locus_tag=BFO_0094
NC_016610.1	RefSeq	gene	119304	119939	.	+	.	ID=gene-BFO_RS00480;Name=BFO_RS00480;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00480;old_locus_tag=BFO_0103
NC_016610.1	RefSeq	gene	119936	121180	.	+	.	ID=gene-BFO_RS00485;Name=BFO_RS00485;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00485;old_locus_tag=BFO_0104
NC_016610.1	RefSeq	gene	121228	121548	.	+	.	ID=gene-BFO_RS00490;Name=BFO_RS00490;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00490
NC_016610.1	RefSeq	gene	121611	122735	.	+	.	ID=gene-BFO_RS00495;Name=BFO_RS00495;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00495;old_locus_tag=BFO_0105
NC_016610.1	RefSeq	gene	125074	127296	.	+	.	ID=gene-BFO_RS00510;Name=BFO_RS00510;gbkey=Gene;gene_biotype=protein_coding;locus_tag=BFO_RS00510;old_locus_tag=BFO_0109
NC_016610.1	RefSeq	gene	127293	12