Reference: https://www.metagenomics.wiki/tools/fastq/ncbi-ftp-genome-download

In [1]:
import importlib
import MicroRG as mrg
importlib.reload(mrg)

<module 'MicroRG' from '/grain/wl61/github/MicroRG/src/MicroRG.py'>

**Initialization**

When you create an `MicroRG` object, it will automatically detect and download the assembly summary report if you don't have the report at the current working directory.

In [2]:
microbe_rg = mrg.MicroRG()

**Query**

Take fly (Drosophila melanogaster) as an example.

In [3]:
results = microbe_rg.search(["Drosophila melanogaster"])

Search results: 1 / 1
Number of strain: 1
Number of unfound species: 0


In [4]:
results

({'drosophila melanogaster': [('',
    'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT')]},
 [])

**Example**

In [5]:
import pandas as pd

In [6]:
dir_file = "/grain/wl61/github/cancer_microbiome/data/genome_coverage/sp_list/sp_list.txt"
df_sp = pd.read_csv(dir_file, sep="\t")
df_sp.head()

Unnamed: 0,D,P,C,O,F,G,S
0,Bacteria,Actinobacteria,Actinomycetia,Micrococcales,Micrococcaceae,Rothia,Rothia mucilaginosa
1,Bacteria,Actinobacteria,Actinomycetia,Micrococcales,Micrococcaceae,Rothia,Rothia dentocariosa
2,Bacteria,Actinobacteria,Actinomycetia,Micrococcales,Micrococcaceae,Rothia,Rothia aeria
3,Bacteria,Actinobacteria,Actinomycetia,Micrococcales,Micrococcaceae,Rothia,Rothia kristinae
4,Bacteria,Actinobacteria,Actinomycetia,Micrococcales,Micrococcaceae,Rothia,Rothia nasimurium


In [7]:
species = list(df_sp["S"])

In [8]:
results, unfound = microbe_rg.search(species, full_genome=True, strain_specific=False)

Search results: 5831 / 7927
Number of strain: 180072
Number of unfound species: 2096


In [9]:
unfound

['pseudomonas virus ab09',
 'lactococcus virus ascc532',
 'blautia sp. lzlj-3',
 'citrobacter virus sh3',
 'alkalihalobacillus pseudofirmus',
 'klebsiella virus zckp1',
 'escherichia virus f',
 'culex pipiens associated tunisia virus',
 'geobacillus sp. 44c',
 'nocardiopsis sp. 90127',
 'tetragenococcus muriaticus',
 'klebsiella sp. wp8-s18-esbl-06',
 'vibrio sp. sm1977',
 'nerine yellow stripe virus',
 'solitalea canadensis',
 'actinoplanes friuliensis',
 'bacillus vini',
 'spiroplasma taiwanense',
 'desulfopila sp. imcc35004',
 'gordonia virus wizard',
 'desulfotalea psychrophila',
 'bat hp-betacoronavirus zhejiang2013',
 'oscillibacter sp. nsj-62',
 'pseudomonas virus pakp1',
 'acholeplasma brassicae',
 'mycobacterium virus angelica',
 'filifactor alocis',
 'thermodesulfobium narugense',
 'phenylobacterium zucineum',
 'maridesulfovibrio hydrothermalis',
 'arthrobacter virus arv1',
 'gordonia virus ruthy',
 'escherichia virus biff',
 'mycobacterium virus gadjet',
 'mycobacterium viru

In [10]:
results["eubacterium limosum"]

[('atcc 8486',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/807/675/GCF_000807675.2_ASM80767v2'),
 ('sa11',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/481/725/GCF_001481725.1_ASM148172v1'),
 ('8486cho',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/003/182/515/GCF_003182515.1_ASM318251v1'),
 ('dfi.6.107',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/559/625/GCF_020559625.1_ASM2055962v1'),
 ('b2',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/023/520/755/GCF_023520755.1_ASM2352075v1'),
 ('',
  'https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/900/683/775/GCF_900683775.1_ASM90068377v1')]

**Download the microbial reference genome (fna & gff) based on the query**

Shell file output

In [11]:
microbe_rg.download_ref_genome("../raw/RefSeq/Micro+Virus/", dir_shell_output="../raw/RefSeq/download_shell_scripts/", as_shell_file=True)