## GA4GH 1000 Genome Variant Service Example
This example illustrates how to access the different variant calls implemented within the variant service.

### Initialize the client
In this step we create a client object which will be used to communicate with the server. It is initialized using the URL.

In [2]:
import ga4gh.client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

### Search variant sets method
This call returns variant sets hosted by the API, via the following format.
Observe that we are using the dataset id obtained from the metadata service example.

In [3]:
for variant_sets in c.search_variant_sets(dataset_id="WyIxa2dlbm9tZXMiXQ"):
    print "Data Set Id: {},\nVariant Set Name: {},\tReference Set Id: {},\nVariant Id: {}\n".format(
        variant_sets.dataset_id, variant_sets.name, variant_sets.reference_set_id, variant_sets.id)

Data Set Id: WyIxa2dlbm9tZXMiXQ,
Variant Set Name: phase3-release,	Reference Set Id: WyJOQ0JJMzciXQ,
Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0

Data Set Id: WyIxa2dlbm9tZXMiXQ,
Variant Set Name: functional-annotation,	Reference Set Id: WyJOQ0JJMzciXQ,
Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd



###### Note: 
    In the previous call, not all the elements returned are illustrated, only the descriptive parameters which belong to each of the variant sets contained. But they will be shown in the next, get by id call.

### Get variant set by id method
The following request, returns a single element from the variant sets, when the id of the variant set is provided

In [15]:
variant_set = c.get_variant_set(variant_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0")
print"Name: {},\tData Id: {},\tReference Set Id: {},\nVariant Id: {}\n".format(
variant_set.name, variant_set.dataset_id, variant_set.reference_set_id, variant_set.id)
for metadata in variant_set.metadata:
    print "\tMetadata Key: {},\tValue: {},\tType: {}\n\tDescription: {}\n".format(
    metadata.key, metadata.value, metadata.type, metadata.description)

Name: phase3-release,	Data Id: WyIxa2dlbm9tZXMiXQ,	Reference Set Id: WyJOQ0JJMzciXQ,
Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0

	Metadata Key: version,	Value: VCFv4.1,	Type: String
	Description: 

	Metadata Key: INFO.CIEND,	Value: ,	Type: Integer
	Description: Confidence interval around END for imprecise variants

	Metadata Key: INFO.CIPOS,	Value: ,	Type: Integer
	Description: Confidence interval around POS for imprecise variants

	Metadata Key: INFO.CS,	Value: ,	Type: String
	Description: Source call set.

	Metadata Key: INFO.END,	Value: ,	Type: Integer
	Description: End coordinate of this variant

	Metadata Key: INFO.IMPRECISE,	Value: ,	Type: Flag
	Description: Imprecise structural variation

	Metadata Key: INFO.MC,	Value: ,	Type: String
	Description: Merged calls.

	Metadata Key: INFO.MEINFO,	Value: ,	Type: String
	Description: Mobile element info of the form NAME,START,END<POLARITY; If there is only 5' OR 3' support for this call, will be NULL NULL for START and E

###### Observe that in this call we have shown all of the methadata elements are illustrated. 

### Search variants method
Using the variant set id returned in one of the previous calls we can search for variants

In [73]:
counter = 15
for variant in c.search_variants(variant_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0", reference_name="1", start=10176, end= 40176):
    if counter <= 0:
        break
    else:
        counter -= 1
        print "Variant Id: {}\n\tVariant Set Id: {}\n\tNames:{}\tReference Chromosome: {}\n\tStart: {},\tEnd: {},\n\tReference Bases: {}\tAlternate Bases: {}\n".format(
            variant.id, variant.variant_set_id, variant.names, variant.reference_name, variant.start, variant.end, variant.reference_bases, variant.alternate_bases)

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEwMTc2IiwiZDAxNmM0ZTFhZGNhZDVkMWJjODljMmNhNGFkYmEzYTgiXQ
	Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0
	Names:[u'rs367896724']	Reference Chromosome: 1
	Start: 10176,	End: 10177,
	Reference Bases: A	Alternate Bases: [u'AC']

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEwMjM0IiwiMGNlMzUwNzI0NDYxNGMzNzA1ZjVlMmFhMmQxMGFmMjUiXQ
	Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0
	Names:[u'rs540431307']	Reference Chromosome: 1
	Start: 10234,	End: 10235,
	Reference Bases: T	Alternate Bases: [u'TA']

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEwMzUxIiwiMGNlMzUwNzI0NDYxNGMzNzA1ZjVlMmFhMmQxMGFmMjUiXQ
	Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0
	Names:[u'rs555500075']	Reference Chromosome: 1
	Start: 10351,	End: 10352,
	Reference Bases: T	Alternate Bases: [u'TA']

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEwNTA0I

###### Observe that informational fields have been excluded in order to included the esential information to make an independent call for the variants listed. Note that the data returned is richer, and will be illiustrated in the following example.

### Get variant by id method
We can get an independent variable if we know its unique identifier by making a get variant call. We can use on of the ids obtained above to make our query.

In [62]:
"""variant.id is obtained from the above call, and used here to get and single variant"""
single_variant = c.get_variant(variant_id=variant.id)
print "Variant Id: {}\n\tVariant Set Id: {}\n\tNames: {}\tReference Name: {}\n\tStart: {},\tEnd: {}\n\tReference Bases: {},\tAlternate Bases: {}\n".format(
single_variant.id, single_variant.variant_set_id, single_variant.names, single_variant.reference_name, single_variant.start, single_variant.end, single_variant.reference_bases, single_variant.alternate_bases)
for info in single_variant.info:
    print "\tKey: {},\tValues: {}".format(info, single_variant.info[info].values[0].string_value)

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEzMTA5IiwiNDVjNGNmMDZjYjVjZGFiYTg1MTJhMTYxZTczMmI1YzIiXQ
	Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0
	Names: [u'rs540538026']	Reference Name: 1
	Start: 13109,	End: 13110
	Reference Bases: G,	Alternate Bases: [u'A']

	Key: EUR_AF,	Values: 0.0566999986768
	Key: SAS_AF,	Values: 0.0439999997616
	Key: AC,	Values: 134
	Key: AA,	Values: g|||
	Key: AF,	Values: 0.0267571993172
	Key: AFR_AF,	Values: 0.00529999984428
	Key: AMR_AF,	Values: 0.0359999984503
	Key: AN,	Values: 5008
	Key: VT,	Values: SNP
	Key: EAS_AF,	Values: 0.00200000009499
	Key: NS,	Values: 2504
	Key: DP,	Values: 23422


##### Note: The data returned by the get variant method, contains other elements that will be shown at the end of this notebook. Bacause its a field that will be shown consecutevily. The `Key` and `Value` elements are informational fields contained by this call.

### Search call sets method
Which is a list of elements which affirm or negate an individual having the variant trait

In [69]:
counter = 15
list_of_callset_ids = [] #Will use this list near the end
for calls in c.search_call_sets(variant_set_id=single_variant.variant_set_id):
    if counter <= 0:
        break
    else:
        counter -= 1
        list_of_callset_ids.append(calls.id)
        print "Call Set Name: {},\nId: {}\nBio Sample Id: {}\nVariant Set Id: {}\n".format(
            calls.name, calls.id, calls.bio_sample_id, calls.variant_set_ids)

Call Set Name: HG00096,
Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5NiJd
Bio Sample Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5NiJd
Variant Set Id: [u'WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0']

Call Set Name: HG00097,
Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5NyJd
Bio Sample Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5NyJd
Variant Set Id: [u'WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0']

Call Set Name: HG00099,
Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5OSJd
Bio Sample Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5OSJd
Variant Set Id: [u'WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0']

Call Set Name: HG00100,
Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDEwMCJd
Bio Sample Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDEwMCJd
Variant Set Id: [u'WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0']

Call Set Name: HG00101,
Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDEwMSJd
Bio Sample Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDEwMSJd
Variant Set Id:

##### Only a small amount of elements where chosen to display, this request contains more elements. The information in them is completed by the parameters shown above.

### Get call set by id
Using one of the ids obtained in the previous call we can get a single call set element.

In [68]:
call_set = c.get_call_set(call_set_id=calls.id)
print call_set

id: "WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDExMyJd"
name: "HG00113"
bio_sample_id: "WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDExMyJd"
variant_set_ids: "WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0"



### Making a search variants request with callset ids
By passing in a list of call set ids to the variants search method, we can get in return
a set of variants with the list of calls that belong to that variant.

In [81]:
for variants_with_callsets in  c.search_variants(call_set_ids=list_of_callset_ids, variant_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0", reference_name="1", start=10176, end= 10502):
    print "Variant Id: {}\n\tVariant Set Id: {}\n\tNames:{}\tReference Chromosome: {}\n\tStart: {},\tEnd: {},\n\tReference Bases: {}\tAlternate Bases: {}\n".format(
        variants_with_callsets.id, variants_with_callsets.variant_set_id, variants_with_callsets.names, variants_with_callsets.reference_name, variants_with_callsets.start, variants_with_callsets.end, variants_with_callsets.reference_bases, variants_with_callsets.alternate_bases)
    for variant_calls in  variants_with_callsets.calls:
        print"\tCall Set Name: {}\tGenotype: {},\tPhaset: {}\n\tCall Set Id: {}".format(
            variant_calls.call_set_name, variant_calls.genotype, variant_calls.phaseset, variant_calls.call_set_id)

Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiMSIsIjEwMTc2IiwiZDAxNmM0ZTFhZGNhZDVkMWJjODljMmNhNGFkYmEzYTgiXQ
	Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIl0
	Names:[u'rs367896724']	Reference Chromosome: 1
	Start: 10176,	End: 10177,
	Reference Bases: A	Alternate Bases: [u'AC']

	Call Set Name: HG00096	Genotype: [1, 0],	Phaset: True
	Call Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5NiJd
	Call Set Name: HG00097	Genotype: [0, 1],	Phaset: True
	Call Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5NyJd
	Call Set Name: HG00099	Genotype: [0, 1],	Phaset: True
	Call Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDA5OSJd
	Call Set Name: HG00100	Genotype: [1, 0],	Phaset: True
	Call Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDEwMCJd
	Call Set Name: HG00101	Genotype: [0, 0],	Phaset: True
	Call Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsInBoYXNlMy1yZWxlYXNlIiwiSEcwMDEwMSJd
	Call Set Name: HG00102	Genotype: [1, 0],

###### Observe that we get a set of variables with calls whose id has been provided, if the list of calls in the search was increased, more calls would have been returned in the variant search. 
##### Independent variant get method also returns this information for all the available calls. 