# Run the server script from package data_summarization_1KGP

Run the module "main.py" from the base project directory with program arguments: "server <db_user> <db_password>" after replacing <db_user> and <db_password> with appropriate values.

Prepare the request parameter for selecting only the female individuals from Bangladesh having the variants:
(chromosome)-(start)-(reference allele)-(alternative allele)

 1-13271-G-C 

 1-10176--C 
 
 aligned on assembly hg19. 

The so defined population contains 7 individuals (you can check it by querying the endpoint \donor_distribution with the same body request). Knowing each individual carries about 4.5 millions variants, we can estimate that our population includes ⁓31.5 millions variants.

In [34]:
import json
param = {
    'meta': {
        'health_status': "true",
        'population': ['BEB'],
        'gender': 'female',
        'assembly': 'hg19'
        },
    'variants': {
        'with': [
            {'chrom': 1, 'start': 10176, 'ref': '', 'alt': 'C'},
            {'chrom': 1, 'start': 13271, 'ref': 'G', 'alt': 'C'},
            ]
    },
    'filter_output': {
        'limit': 30,
        'min_frequency': 0.0099
        }
}
body = json.dumps(param)
print(body)

{"meta": {"health_status": "true", "population": ["BEB"], "gender": "female", "assembly": "hg19"}, "variants": {"with": [{"chrom": 1, "start": 10176, "ref": "", "alt": "C"}, {"chrom": 1, "start": 13271, "ref": "G", "alt": "C"}]}, "filter_output": {"limit": 30, "min_frequency": 0.0099}}


POST the endpoint \rarest_mutations with the prepared JSON parameter. Since this operation is very demanding, it can take some time. For a population of 7 individuals it takes about 65 seconds. In general, the execution time is 10s + 8.16s * (number of individuals) up to 149 individuals, and ~45/55 minutes for larger populations.

In [35]:
import requests

r = requests.post('http://localhost:51992/rarest_variants', json=param)
print(' response status code: {}'.format(r.status_code))
response_body = r.json()

response status code: 200


# Inspect response data:
The response includes the 10 rarest mutations (from each data source) found in the individuals of the selected population.

In [36]:
import pandas as pd
from matplotlib import pyplot as plt
columns = response_body['columns']
rows = response_body['rows']
df = pd.DataFrame.from_records(rows, columns=columns)
df.fillna(value='', inplace=True)   # replace Nones with empty values

df

Unnamed: 0,CHROM,START,REF,ALT,POPULATION_SIZE,POSITIVE_DONORS,OCCURRENCE,FREQUENCY
0,1,86190,G,A,7,1,1,0.071429
1,1,88708,C,G,7,1,1,0.071429
2,1,86026,T,C,7,1,1,0.071429
3,1,86063,G,C,7,1,1,0.071429
4,1,82607,C,G,7,1,1,0.071429
5,1,87407,C,T,7,1,1,0.071429
6,1,13114,T,G,7,1,1,0.071429
7,1,64511,G,,7,1,1,0.071429
8,1,74790,G,A,7,1,1,0.071429
9,1,72524,A,G,7,1,1,0.071429
