# Demo of endpoint rarest_variants

Documentation: http://geco.deib.polimi.it/popstudy/api/ui/#/default/server.api.rarest_variants

Requirements to run this demo: https://github.com/tomalf2/data_summarization_1KGP/blob/master/demo/README_requirements.txt

In this demo, we're going to ask for the rarest variants found in a populatio composed of female healthy individuals from East Asian countries and having the two variants - described as a tuple (chromosome)-(start)-(reference allele)-(alternative allele) :

 1-13271-G-C 

 1-10176--C 
 
 aligned on assembly hg19. 

Few considerations: the population considered in this example contains 22 individuals (you can check it by querying the endpoint \donor_distribution with the same body request). Knowing that each individual carries about 4.5 millions variants, we can estimate that our population includes ⁓99 millions variants. As you can imagine, finding the rarest variants in this set, requires some time; in this case it will take ~3m:10s to answer the request (execution time can be estimated roughly as 10 sec + 8 sec * < size of population >).  If you wish to reduce further the population size, you can introduce constraints on the country of origin (for example you can select only the BEB - Bangladesh - population) and on the DNA source type (for example only blood), or increase the region constraints.

In [8]:
import json
param = {
    'having_meta': {
        'health_status': "true",
        'super_population': ['EAS'],
        'gender': 'female',
        'assembly': 'hg19'
        },
    'having_variants': {
        'with': [
            {'chrom': 1, 'start': 10176, 'ref': '', 'alt': 'C'},
            {'chrom': 1, 'start': 13271, 'ref': 'G', 'alt': 'C'},
            ]
    },
    'filter_output': {
        'limit': 30,
        'min_frequency': 0.0099
        }
}
body = json.dumps(param)
print(body)

{"meta": {"health_status": "true", "super_population": ["EAS"], "gender": "female", "assembly": "hg19"}, "variants": {"with": [{"chrom": 1, "start": 10176, "ref": "", "alt": "C"}, {"chrom": 1, "start": 13271, "ref": "G", "alt": "C"}]}, "filter_output": {"limit": 30, "min_frequency": 0.0099}}


POST the endpoint \rarest_mutations with the prepared JSON parameter. Since this operation is very demanding, it can take some time. For a population of 7 individuals it takes about 65 seconds. In general, the execution time is 10s + 8.16s * (number of individuals) up to 149 individuals, and ~45/55 minutes for larger populations.

In [6]:
import requests

r = requests.post('http://geco.deib.polimi.it/popstudy/api/rarest_variants', json=param)
print(' response status code: {}'.format(r.status_code))
response_body = r.json()

response status code: 200


# Inspect response data:
The response includes the 10 rarest mutations (from each data source) found in the individuals of the selected population.

In [7]:
import pandas as pd
from matplotlib import pyplot as plt
columns = response_body['columns']
rows = response_body['rows']
df = pd.DataFrame.from_records(rows, columns=columns)
df.fillna(value='', inplace=True)   # replace Nones with empty values

df

Unnamed: 0,CHROM,START,REF,ALT,POPULATION_SIZE,POSITIVE_DONORS,OCCURRENCE_OF_VARIANT,FREQUENCY_OF_VARIANT
0,1,76836,T,G,22,1,1,0.022727
1,1,77872,G,A,22,1,1,0.022727
2,1,72524,A,G,22,1,1,0.022727
3,1,74790,G,A,22,1,1,0.022727
4,1,72296,,TAT,22,1,1,0.022727
5,1,77864,C,T,22,1,1,0.022727
6,1,64929,G,A,22,1,1,0.022727
7,1,68594,T,G,22,1,1,0.022727
8,1,74788,C,G,22,1,1,0.022727
9,1,60349,A,G,22,1,1,0.022727
