# Download NGS data from ENA

User guide of ENA API here https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access/advanced-search.html

Swagger documentation here https://www.ebi.ac.uk/ena/portal/api/

An example of searching raw reads entries from ENA:
https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=country=%22United%20Kingdom%22%20AND%20host_tax_id=9913%20AND%20host_body_site=%22rumen%22

**NOTE**: using the query parameter `format=json` gives you the output in JSON format

In [1]:
import requests

## Fetch taxonomic information

Species taxonomic information can be fetched by scientific name among other attributes. We are interested in fetching the taxonomy identifier for subsequent queries.

Documentation here https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access/taxon-api.html



In [2]:
requests.get("https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/Severe%20acute%20respiratory%20syndrome%20coronavirus%202").json()

[{'taxId': '2697049',
  'scientificName': 'Severe acute respiratory syndrome coronavirus 2',
  'formalName': 'false',
  'rank': 'no rank',
  'division': 'VRL',
  'lineage': 'Viruses; Riboviria; Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Cornidovirineae; Coronaviridae; Orthocoronavirinae; Betacoronavirus; Sarbecovirus; ',
  'geneticCode': '1',
  'submittable': 'true'}]

In [3]:
requests.get("https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/Influenza A virus").json()

[{'taxId': '11320',
  'scientificName': 'Influenza A virus',
  'formalName': 'true',
  'rank': 'species',
  'division': 'VRL',
  'lineage': 'Viruses; Riboviria; Orthornavirae; Negarnaviricota; Polyploviricotina; Insthoviricetes; Articulavirales; Orthomyxoviridae; Alphainfluenzavirus; ',
  'geneticCode': '1',
  'submittable': 'true'}]

In [4]:
requests.get("https://www.ebi.ac.uk/ena/taxonomy/rest/scientific-name/Influenza B virus").json()

[{'taxId': '11520',
  'scientificName': 'Influenza B virus',
  'formalName': 'true',
  'rank': 'species',
  'division': 'VRL',
  'lineage': 'Viruses; Riboviria; Orthornavirae; Negarnaviricota; Polyploviricotina; Insthoviricetes; Articulavirales; Orthomyxoviridae; Betainfluenzavirus; ',
  'geneticCode': '1',
  'submittable': 'false'}]

## Fetch the identifiers for the NGS raw data entries

In [5]:
# this fetches 100000 entries for Sars-cov-2 which is the default
list_runs = requests.get("https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=tax_eq(2697049)&format=json").json()
len(list_runs)

100000

In [6]:
# this fetches all entries for Sars-cov-2 adding limit=0
list_runs = requests.get("https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=tax_eq(2697049)&limit=0&format=json").json()
len(list_runs)

152927

In [7]:
list_runs[0:5]

[{'run_accession': 'ERR4080473', 'description': 'MinION sequencing'},
 {'run_accession': 'ERR4080474', 'description': 'MinION sequencing'},
 {'run_accession': 'ERR4080475', 'description': 'MinION sequencing'},
 {'run_accession': 'ERR4080476', 'description': 'MinION sequencing'},
 {'run_accession': 'ERR4080477', 'description': 'MinION sequencing'}]

In [8]:
# this shows all the fields that can be returned
requests.get("https://www.ebi.ac.uk/ena/portal/api/returnFields?result=read_run&format=json").json()

[{'columnId': 'study_accession', 'description': 'study accession number'},
 {'columnId': 'secondary_study_accession',
  'description': 'secondary study accession number'},
 {'columnId': 'sample_accession', 'description': 'sample accession number'},
 {'columnId': 'secondary_sample_accession',
  'description': 'secondary sample accession number'},
 {'columnId': 'experiment_accession',
  'description': 'experiment accession number'},
 {'columnId': 'run_accession', 'description': 'run accession number'},
 {'columnId': 'submission_accession',
  'description': 'submission accession number'},
 {'columnId': 'tax_id', 'description': 'taxonomic ID'},
 {'columnId': 'scientific_name', 'description': 'scientific name'},
 {'columnId': 'instrument_platform',
  'description': 'instrument platform used in sequencing experiment'},
 {'columnId': 'instrument_model',
  'description': 'instrument model used in sequencing experiment'},
 {'columnId': 'library_name', 'description': 'sequencing library name'},


In [9]:
# using the fields attribute we can add metadata, including FTP URL to fetch the FASTQ files
requests.get("https://www.ebi.ac.uk/ena/portal/api/search?result=read_run&query=tax_eq(2697049)&limit=5&fields=fastq_ftp,host_tax_id,host_sex,lat,lon,country&format=json").json()

[{'run_accession': 'ERR4080473',
  'sample_accession': 'SAMEA6798401',
  'fastq_ftp': 'ftp.sra.ebi.ac.uk/vol1/fastq/ERR408/003/ERR4080473/ERR4080473_1.fastq.gz',
  'host_tax_id': '9606',
  'host_sex': '',
  'lat': '',
  'lon': '',
  'country': 'Denmark'},
 {'run_accession': 'ERR4080474',
  'sample_accession': 'SAMEA6798402',
  'fastq_ftp': 'ftp.sra.ebi.ac.uk/vol1/fastq/ERR408/004/ERR4080474/ERR4080474_1.fastq.gz',
  'host_tax_id': '9606',
  'host_sex': '',
  'lat': '',
  'lon': '',
  'country': 'Denmark'},
 {'run_accession': 'ERR4080475',
  'sample_accession': 'SAMEA6798403',
  'fastq_ftp': 'ftp.sra.ebi.ac.uk/vol1/fastq/ERR408/005/ERR4080475/ERR4080475_1.fastq.gz',
  'host_tax_id': '9606',
  'host_sex': '',
  'lat': '',
  'lon': '',
  'country': 'Denmark'},
 {'run_accession': 'ERR4080476',
  'sample_accession': 'SAMEA6798404',
  'fastq_ftp': 'ftp.sra.ebi.ac.uk/vol1/fastq/ERR408/006/ERR4080476/ERR4080476_1.fastq.gz',
  'host_tax_id': '9606',
  'host_sex': '',
  'lat': '',
  'lon': '',
 