# Fetching data from EVA RS ID releases
## API swagger documentation - [Identifiers_API](https://www.ebi.ac.uk/eva/webservices/identifiers/swagger-ui.html)
We can use the API to query clustered data from EVA RS ID releases. This is variant data that has been submitted directly to the EVA, accessioned to receive an SS ID, then clustered into an RS ID.
In this case, we will be using our RS ID of interest rs379920406 which can be used as an input

In [1]:
import requests
import json

In [2]:
# Query paratmeter - only the numeric string of the ID is required
rsid = "379920406"

In [3]:
request_url = f'https://www.ebi.ac.uk/eva/webservices/identifiers/v1/clustered-variants/{rsid}/submitted'
response = requests.get(request_url)
output = response.json()

In [4]:
print(json.dumps(output[0], indent=2))
print('Number of submitted variant=' + str(len(output)))

{
  "accession": 7571130482,
  "version": 1,
  "data": {
    "referenceSequenceAccession": "GCA_002263795.2",
    "taxonomyAccession": 9913,
    "projectAccession": "PRJEB46861",
    "contig": "CM008173.2",
    "start": 85411136,
    "referenceAllele": "C",
    "alternateAllele": "T",
    "clusteredVariantAccession": 379920406,
    "supportedByEvidence": true,
    "assemblyMatch": true,
    "allelesMatch": true,
    "validated": false,
    "mapWeight": null,
    "createdDate": "2021-09-08T13:08:25.551",
    "remappedFrom": null,
    "remappedDate": null,
    "remappingId": null,
    "backPropagatedVariantAccession": null
  }
}
Number of submitted variant=23


Our general query provides an abundance of information including date the ID was created and if the variant has been remapped to a newer assembly. 
It also lets us know the number of submitted variants that have been clustered under this RS ID (23)
For this task we want to find the associated clustered SS IDs, associated projects, species/assembly, & variant location

We can parse the JSON output to find data of interest and assign these to variables

In [5]:
ss_list=[]
for record in output:
    ss_list.append(record['accession'])
prjeb_list=[]
for project in output:
    prjeb_list.append(project['data']['projectAccession'])
assembly = output[0]['data']['referenceSequenceAccession']
chrom = output[0]['data']['contig']
locat = output[0]['data']['start']
species = output[0]['data']['taxonomyAccession']
ref = output[0]['data']['referenceAllele']
alt = output[0]['data']['alternateAllele']
rsid = output[0]['data']['clusteredVariantAccession']

In [6]:
species_list_url = f'https://www.ebi.ac.uk/eva/webservices/rest/v1/meta/species/list'
list_res = requests.get(species_list_url)
species_list = list_res.json()['response'][0]['result']
for x in species_list:
    if x['assemblyAccession'] == assembly:
        assembly_code = x['assemblyName']
for y in species_list:
    if y['taxonomyId'] == species:
        taxonomy_code = y['taxonomyScientificName']

As the API uses INSDC nonmenclature, we can use another endpoint to convert the accessions to names

In [7]:
insdc_contig = f'https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/genbank/{chrom}'
contig_res = requests.get(insdc_contig)
chromosomes = contig_res.json()['_embedded']['chromosomeEntities']
for z in chromosomes:
    if z['insdcAccession'] == chrom:
        chrom_num = z['genbankSequenceName']

In [8]:
print('RS_ID = ', "rs", (rsid), sep='')
print('SS_ID(s) = ', (ss_list), sep='')
print('Submission_project(s) =', (prjeb_list), sep='')
print('Species/Assembly = ', (taxonomy_code), "/", (assembly_code), sep='')
print('Variant_location = ', (chrom_num), ":", (locat), sep='')
print('REF/ALT = ', (ref), "/", (alt), sep='')

RS_ID = rs379920406
SS_ID(s) = [7571130482, 7626432460, 7217211020, 683318726, 1451546628, 1088931533, 758509205, 2126697835, 1515067307, 1956275935, 1088931533, 683318726, 1404048026, 1451546628, 1404048026, 1829370055, 2126697835, 830915657, 758509205, 1956275935, 1515067307, 1829370055, 830915657]
Submission_project(s) =['PRJEB46861', 'PRJEB47999', 'PRJEB38336', 'TZ_TUM_43_FV_SEQ SNPS', 'WBARENDSE_BRA2 - SNP', 'PRJEB7061', '1000_BULL_GENOMES_1000_BULL_GENOMES_RUN2', 'BIOPOP_WHOLE_GENOME_SNP_ASSAY', 'PRJEB6119', 'AU-MBG-MOLGEN_WHOLEGENOME_SNP_DISCOVERY', 'PRJEB7061', 'TZ_TUM_43_FV_SEQ SNPS', 'HAFL_QUALITAS_20140601_SLIG', 'WBARENDSE_BRA2 - SNP', 'HAFL_QUALITAS_20140601_SLIG', 'MARC_GPE_BULL_GENEX', 'BIOPOP_WHOLE_GENOME_SNP_ASSAY', 'IZ-PIB_SNP_INDEL_DISCOVERY', '1000_BULL_GENOMES_1000_BULL_GENOMES_RUN2', 'AU-MBG-MOLGEN_WHOLEGENOME_SNP_DISCOVERY', 'PRJEB6119', 'MARC_GPE_BULL_GENEX', 'IZ-PIB_SNP_INDEL_DISCOVERY']
Species/Assembly = Bos taurus/ARS-UCD1.2
Variant_location = 6:85411136
REF