# Fetching data from EVA RS ID releases
## API swagger documentation - [Identifiers_API](https://www.ebi.ac.uk/eva/webservices/identifiers/swagger-ui.html)
We can use the API to query clustered data from EVA RS ID releases. This is variant data that has been submitted directly to the EVA, accessioned to receive an SS ID, then clustered into an RS ID.
We will be using the **RS ID** as an input

In [1]:
import requests
import json

In [2]:
# Query paratmeter - only the numeric string of the ID is required
rsid = "379920406"

In [3]:
request_url = f'https://www.ebi.ac.uk/eva/webservices/identifiers/v1/clustered-variants/{rsid}/submitted'
response = requests.get(request_url)
output = response.json()

In [4]:
print(output)

[{'accession': 7626432460, 'version': 1, 'data': {'referenceSequenceAccession': 'GCA_002263795.2', 'taxonomyAccession': 9913, 'projectAccession': 'PRJEB47999', 'contig': 'CM008173.2', 'start': 85411136, 'referenceAllele': 'C', 'alternateAllele': 'T', 'clusteredVariantAccession': 379920406, 'supportedByEvidence': True, 'assemblyMatch': True, 'allelesMatch': True, 'validated': False, 'mapWeight': None, 'createdDate': '2021-10-12T11:33:46.097', 'remappedFrom': None, 'remappedDate': None, 'remappingId': None, 'backPropagatedVariantAccession': None}}, {'accession': 7571130482, 'version': 1, 'data': {'referenceSequenceAccession': 'GCA_002263795.2', 'taxonomyAccession': 9913, 'projectAccession': 'PRJEB46861', 'contig': 'CM008173.2', 'start': 85411136, 'referenceAllele': 'C', 'alternateAllele': 'T', 'clusteredVariantAccession': 379920406, 'supportedByEvidence': True, 'assemblyMatch': True, 'allelesMatch': True, 'validated': False, 'mapWeight': None, 'createdDate': '2021-09-08T13:08:25.551', 'r

Our general query provides an abundance of information including date the ID was created and if the variant has been remapped to a newer assembly. For this task we want to find the associated clustered SS IDs, species/assembly, & variant location

We can parse the JSON output to find data of interest and assign these to variables

In [5]:
ss_list=[]
for record in output:
    ssroll = record['accession']
    ss_list.append(ssroll)
assembly = output[0]['data']['referenceSequenceAccession']
chrom = output[0]['data']['contig']
locat = output[0]['data']['start']
species = output[0]['data']['taxonomyAccession']
ref = output[0]['data']['referenceAllele']
alt = output[0]['data']['alternateAllele']
rsid = output[0]['data']['clusteredVariantAccession']

As the API uses INSDC nonmenclature, we can use another endpoint to convert the accessions to names

In [6]:
species_list_url = f'https://www.ebi.ac.uk/eva/webservices/rest/v1/meta/species/list'
list_res = requests.get(species_list_url)
species_list = list_res.json()['response'][0]['result']
for x in species_list:
    if x['assemblyAccession'] == assembly:
        assembly_code = x['assemblyName']
for y in species_list:
    if y['taxonomyId'] == species:
        taxonomy_code = y['taxonomyScientificName']

In [7]:
insdc_contig = f'https://www.ebi.ac.uk/eva/webservices/contig-alias/v1/chromosomes/genbank/{chrom}'
contig_res = requests.get(insdc_contig)
chromosomes = contig_res.json()['_embedded']['chromosomeEntities']
for z in chromosomes:
    if z['insdcAccession'] == chrom:
        chrom_num = z['genbankSequenceName']

In [8]:
print('RS_ID = ', "rs", (rsid), sep='')
print('SS_ID(s) = ', (ss_list), sep='')
print('Species/Assembly = ', (taxonomy_code), "/", (assembly_code), sep='')
print('Variant_location = ', (chrom_num), ":", (locat), sep='')
print('REF/ALT = ', (ref), "/", (alt), sep='')

RS_ID = rs379920406
SS_ID(s) = [7626432460, 7571130482, 7217211020, 1088931533, 683318726, 1451546628, 1956275935, 758509205, 2126697835, 1515067307, 1088931533, 683318726, 1451546628, 1829370055, 2126697835, 830915657, 758509205, 1956275935, 1515067307, 1829370055, 830915657, 1404048026, 1404048026]
Species/Assembly = Bos taurus/ARS-UCD1.2
Variant_location = 6:85411136
REF/ALT = C/T
