# Pandas

* Created by Wes McKinney  
* Built on top of numpy
* Similar to R's data.frame

* Pandas has two main data structures: series and dataframes

## Import Pandas

Importing ```Series``` and ```DataFrame``` bring them into the namespace because they are among the most commonly used parts of Pandas. 

In [3]:
from pandas import Series, DataFrame

In [4]:
import pandas as pd

## Read in files

```read_table``` is used to read in text delimited files (default is tab, but can be specified).
There are other methods for other file types (read_csv, read_excel, read_sql etc). 

You can specify the number of lines to read in using the ```nrows``` argument.
You can specify the number of lines to read at a time using the ```chunksize``` argument.

In [55]:
gene_expression = pd.read_table('GD462.GeneQuantRPKM.50FN.samplename.resk10.txt')

In [6]:
gene_expression

Unnamed: 0,TargetID,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
0,ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,...,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
1,ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,...,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308
2,ENSG00000139269.2,ENSG00000139269.2,12,57846106,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,...,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954
3,ENSG00000169129.8,ENSG00000169129.8,10,116164515,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,...,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565
4,ENSG00000134602.11,ENSG00000134602.11,X,131157293,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,...,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957
5,ENSG00000136237.12,ENSG00000136237.12,7,22396763,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,...,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834
6,ENSG00000259425.1,ENSG00000259425.1,15,23096869,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,...,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376
7,ENSG00000242284.2,ENSG00000242284.2,X,134953994,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,...,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862
8,ENSG00000235027.1,ENSG00000235027.1,11,1781578,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,...,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499
9,ENSG00000228169.3,ENSG00000228169.3,10,116450393,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,...,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429


## Accessing Columns

You can use the dataframe.column_name to access columns. 

In [7]:
gene_expression.TargetID

0      ENSG00000152931.6
1      ENSG00000183696.9
2      ENSG00000139269.2
3      ENSG00000169129.8
4     ENSG00000134602.11
5     ENSG00000136237.12
6      ENSG00000259425.1
7      ENSG00000242284.2
8      ENSG00000235027.1
9      ENSG00000228169.3
10     ENSG00000260083.1
11     ENSG00000247157.2
12     ENSG00000158482.8
13     ENSG00000146072.5
14    ENSG00000183814.10
...
23707    ENSG00000151092.12
23708     ENSG00000243680.1
23709     ENSG00000157045.4
23710    ENSG00000172058.10
23711     ENSG00000143994.9
23712     ENSG00000087095.7
23713     ENSG00000261559.1
23714     ENSG00000162144.4
23715     ENSG00000129473.5
23716     ENSG00000261205.1
23717     ENSG00000235472.1
23718    ENSG00000114423.14
23719     ENSG00000243312.2
23720     ENSG00000257337.1
23721     ENSG00000177494.5
Name: TargetID, Length: 23722, dtype: object

## Filtering

You can filter rows and create new dataframes. 

In [8]:
chr22 = gene_expression[gene_expression.Chr=="22"]

In [9]:
chr22

Unnamed: 0,TargetID,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
22,ENSG00000249263.2,ENSG00000249263.2,22,17140518,0.340656,0.318942,-0.009145,0.231503,0.089713,0.482984,...,0.032117,0.213629,0.225474,0.134216,0.128749,0.060841,0.298061,-0.011292,0.058276,-0.013384
29,ENSG00000224688.1,ENSG00000224688.1,22,21496660,4.194827,3.369440,2.335470,4.477910,3.641758,3.296741,...,4.669164,3.918440,4.977372,3.253683,3.322997,6.613617,3.438039,3.768840,3.248447,4.300825
45,ENSG00000075240.12,ENSG00000075240.12,22,46971909,3.531803,3.635541,1.251434,3.007745,3.574070,4.569758,...,4.057426,3.736969,3.156168,6.785470,5.646243,2.418243,4.678466,4.626435,5.101994,4.100622
81,ENSG00000099937.6,ENSG00000099937.6,22,21128167,0.519054,0.399216,0.078965,0.145628,0.446993,0.217271,...,0.621082,0.414906,1.047612,0.353794,0.253339,0.492650,0.449070,0.081118,0.164197,0.287428
85,ENSG00000099998.12,ENSG00000099998.12,22,24641110,0.073630,0.041109,0.017493,-0.020552,0.000059,-0.007782,...,0.045730,0.003584,0.027056,0.007583,0.063843,0.059052,-0.008820,0.017421,0.010471,0.105646
104,ENSG00000093072.10,ENSG00000093072.10,22,17739125,13.346947,13.498284,2.517852,8.609994,15.017319,14.754466,...,5.203947,7.996095,5.481392,15.924745,19.423264,10.146381,18.068740,35.694934,8.627495,10.176125
179,ENSG00000185838.9,ENSG00000185838.9,22,19842462,2.333713,2.693720,3.923889,12.928828,5.288832,4.620438,...,2.442761,11.477810,6.053884,12.084015,6.758018,4.904289,12.989474,3.966684,6.080704,10.641336
286,ENSG00000226085.2,ENSG00000226085.2,22,40271293,80.484540,93.860069,52.572732,91.116719,102.703260,99.355324,...,86.684368,96.602414,99.933326,117.720169,97.735271,105.324828,91.384174,87.710241,98.425496,100.751823
315,ENSG00000100427.11,ENSG00000100427.11,22,50524331,0.638312,-0.031282,0.538379,0.150230,0.413206,0.287332,...,0.460002,0.253051,0.350348,1.293598,0.796821,0.637531,0.922845,0.932327,0.863170,0.694758
386,ENSG00000100344.6,ENSG00000100344.6,22,44319619,0.568136,0.199058,0.653004,0.774334,0.574502,0.294539,...,0.577487,0.159212,0.695736,0.947237,0.528150,0.426467,0.267442,0.770510,0.108943,0.371218


## Adding and Removing Columns

You can add a column by simply referencing it. 

You can use ```del``` to remove a column.

In [10]:
gene_expression['pop']=1

In [11]:
gene_expression

Unnamed: 0,TargetID,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,...,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828,pop
0,ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,...,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186,1
1,ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,...,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308,1
2,ENSG00000139269.2,ENSG00000139269.2,12,57846106,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,...,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954,1
3,ENSG00000169129.8,ENSG00000169129.8,10,116164515,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,...,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565,1
4,ENSG00000134602.11,ENSG00000134602.11,X,131157293,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,...,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957,1
5,ENSG00000136237.12,ENSG00000136237.12,7,22396763,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,...,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834,1
6,ENSG00000259425.1,ENSG00000259425.1,15,23096869,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,...,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376,1
7,ENSG00000242284.2,ENSG00000242284.2,X,134953994,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,...,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862,1
8,ENSG00000235027.1,ENSG00000235027.1,11,1781578,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,...,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499,1
9,ENSG00000228169.3,ENSG00000228169.3,10,116450393,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,...,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429,1


In [12]:
del gene_expression['pop']

In [13]:
gene_expression

Unnamed: 0,TargetID,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
0,ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,...,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
1,ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,...,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308
2,ENSG00000139269.2,ENSG00000139269.2,12,57846106,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,...,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954
3,ENSG00000169129.8,ENSG00000169129.8,10,116164515,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,...,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565
4,ENSG00000134602.11,ENSG00000134602.11,X,131157293,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,...,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957
5,ENSG00000136237.12,ENSG00000136237.12,7,22396763,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,...,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834
6,ENSG00000259425.1,ENSG00000259425.1,15,23096869,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,...,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376
7,ENSG00000242284.2,ENSG00000242284.2,X,134953994,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,...,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862
8,ENSG00000235027.1,ENSG00000235027.1,11,1781578,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,...,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499
9,ENSG00000228169.3,ENSG00000228169.3,10,116450393,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,...,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429


You can use ```dataframe.shape``` to get the dimensions of the dataframe

In [14]:
gene_expression.shape

(23722, 466)

## Reshaping Dataframes

You can use reindex to select rows or columns (or both). 

In [17]:
gene_expression.reindex([0,1])

Unnamed: 0,TargetID,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
0,ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.07811,0.048981,0.118597,0.004035,0.010925,...,0.088601,0.24001,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
1,ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,...,13.428205,6.0945,12.536,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308


In [18]:
gene_expression.reindex(columns=['TargetID','Chr','Coord','HG00096','HG00097'])

Unnamed: 0,TargetID,Chr,Coord,HG00096,HG00097
0,ENSG00000152931.6,5,59783540,0.101858,0.078110
1,ENSG00000183696.9,7,48128225,8.183805,5.686911
2,ENSG00000139269.2,12,57846106,1.199910,1.573572
3,ENSG00000169129.8,10,116164515,0.831940,0.069778
4,ENSG00000134602.11,X,131157293,27.646422,24.395572
5,ENSG00000136237.12,7,22396763,3.788503,2.050963
6,ENSG00000259425.1,15,23096869,0.054059,0.112185
7,ENSG00000242284.2,X,134953994,0.351716,0.444540
8,ENSG00000235027.1,11,1781578,0.200791,0.190138
9,ENSG00000228169.3,10,116450393,96.182178,101.179262


## index_col

You can use ```index_col``` to select the column in the file to use as the row index.

In [19]:
gene_expression=pd.read_table('GD462.GeneQuantRPKM.50FN.samplename.resk10.txt',index_col='TargetID')

In [20]:
gene_expression

Unnamed: 0_level_0,Gene_Symbol,Chr,Coord,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
TargetID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,-0.000901,...,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,7.348876,...,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308
ENSG00000139269.2,ENSG00000139269.2,12,57846106,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,0.675305,...,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954
ENSG00000169129.8,ENSG00000169129.8,10,116164515,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,1.259393,...,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565
ENSG00000134602.11,ENSG00000134602.11,X,131157293,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,27.881116,...,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957
ENSG00000136237.12,ENSG00000136237.12,7,22396763,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,2.500333,...,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834
ENSG00000259425.1,ENSG00000259425.1,15,23096869,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,0.105538,...,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376
ENSG00000242284.2,ENSG00000242284.2,X,134953994,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,2.954486,...,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862
ENSG00000235027.1,ENSG00000235027.1,11,1781578,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,0.104912,...,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499
ENSG00000228169.3,ENSG00000228169.3,10,116450393,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,70.680737,...,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429


You can actually select multiple columns to use as the index.

In [21]:
gene_expression2 = pd.read_table('GD462.GeneQuantRPKM.50FN.samplename.resk10.txt',index_col=[0,1,2,3])

In [22]:
gene_expression2

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00104,HG00105,HG00106,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
TargetID,Gene_Symbol,Chr,Coord,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
ENSG00000152931.6,ENSG00000152931.6,5,59783540,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,-0.000901,-0.006706,0.098863,0.045285,...,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
ENSG00000183696.9,ENSG00000183696.9,7,48128225,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,7.348876,8.180940,8.721889,8.169477,...,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308
ENSG00000139269.2,ENSG00000139269.2,12,57846106,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,0.675305,3.817395,2.561376,1.231049,...,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954
ENSG00000169129.8,ENSG00000169129.8,10,116164515,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,1.259393,0.734784,1.479124,1.548653,...,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565
ENSG00000134602.11,ENSG00000134602.11,X,131157293,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,27.881116,27.194117,28.579857,27.226416,...,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957
ENSG00000136237.12,ENSG00000136237.12,7,22396763,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,2.500333,4.509277,2.344625,2.358093,...,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834
ENSG00000259425.1,ENSG00000259425.1,15,23096869,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,0.105538,0.058953,0.062852,0.028203,...,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376
ENSG00000242284.2,ENSG00000242284.2,X,134953994,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,2.954486,0.726429,0.033625,-0.013543,...,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862
ENSG00000235027.1,ENSG00000235027.1,11,1781578,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,0.104912,0.134485,0.264147,0.165510,...,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499
ENSG00000228169.3,ENSG00000228169.3,10,116450393,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,70.680737,103.441921,90.962547,99.270960,...,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429


In [23]:
sv_genotypes = pd.read_table('ALL.wgs.mergedSV.v3.20130502.svs.genotypes.vcf.noinfo.10000')

In [24]:
sv_genotypes

Unnamed: 0,#CHROM,POS,ID,REF,ALT,QUAL,FILTER,INFO,FORMAT,HG00096,...,NA21128,NA21129,NA21130,NA21133,NA21135,NA21137,NA21141,NA21142,NA21143,NA21144
0,1,645710,ALU_umary_ALU_2,A,<INS:ME:ALU>,.,.,"TSD=null;SVTYPE=ALU;MEINFO=AluYa4_5,1,223,-;SV...",GT:GL:AVGPOST,"0|0:-0.0,-0.0,-0.0:0.993",...,"0|0:0.0,-0.68,-2.68:0.998","0|0:0.0,-1.96,-10.0:0.9999","0|0:-0.0,-0.0,-0.0:0.9911","0|0:0.0,-1.58,-10.0:0.9998","0|0:0.0,-1.25,-5.34:0.9997","0|0:0.0,-3.43,-10.0:1","0|0:0.0,-2.34,-9.13:1","0|0:0.0,-1.19,-4.58:0.9995","0|0:0.0,-1.1,-4.39:0.9989","0|0:0.0,-2.75,-10.0:1"
1,1,668630,DUP_delly_DUP20532,G,<CN2>,.,PASS,"SVTYPE=DUP;SVLEN=181574;IMPRECISE;CIEND=-150,1...",GT:FT:CN:GL:CNL:AVGPOST,"0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...",...,"0|0:PASS:2:0.0,-8.26,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.63,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.33,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-7.45,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-7.2,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-8.91,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-5.33,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.42,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-5.3,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-3.63,-10.0:-300.0,-300.0,-0.0,..."
2,1,713044,DUP_gs_CNV_1_713044_755966,C,"<CN0>,<CN2>",.,PASS,SVTYPE=CNV;END=755966;CS=DUP_gs,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,"0|0:2:-1000.0,-16.51,-0.0,-5.4,-16.27:-1000.00...",...,0|0:.:.:.:.:.:.:.:0.9838,"0|0:2:-1000.0,-39.82,-0.0,-8.05,-28.05:-1000.0...","0|0:2:-1000.0,-30.36,-0.0,-7.66,-24.81:-1000.0...","0|0:2:-1000.0,-34.9,-0.0,-7.38,-25.32:-1000.00...","0|0:2:-1000.0,-41.59,-0.0,-7.1,-26.36:-1000.00...","0|0:2:-1000.0,-55.63,-0.0,-13.1,-43.37:-1000.0...","2|0:3:-1000.0,-99.81,-10.85,-0.0,-8.67:-1000.0...","0|0:2:-1000.0,-33.72,-0.0,-6.77,-23.65:-1000.0...","2|0:3:-1000.0,-105.4,-9.81,-0.0,-11.63:-1000.0...","0|2:3:-1000.0,-88.5,-9.12,-0.0,-8.43:-1000.00,..."
3,1,738570,UW_VH_21763,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=3801;CIEND=0,354;CIPOS=-348,0...",GT,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
4,1,766600,UW_VH_5595,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=2842;CIEND=0,403;CIPOS=-385,0...",GT,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
5,1,773090,DUP_gs_CNV_1_773090_852664,T,"<CN0>,<CN2>",.,PASS,SVTYPE=CNV;END=852664;CS=DUP_gs,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,"0|0:2:-1000.0,-44.48,-0.0,-12.43,-39.08:-1000....",...,0|0:.:.:.:.:.:.:.:0.9869,"0|0:2:-1000.0,-87.59,0.0,-23.85,-75.56:-1000.0...","0|0:2:-1000.0,-69.73,0.0,-25.35,-74.47:-1000.0...","0|0:2:-1000.0,-83.23,0.0,-23.63,-73.97:-1000.0...","0|0:2:-1000.0,-93.63,0.0,-26.24,-82.44:-1000.0...","0|0:2:-1000.0,-131.74,0.0,-33.89,-109.19:-1000...","0|0:2:-1000.0,-74.38,0.0,-23.62,-71.72:-1000.0...","0|0:2:-1000.0,-65.11,0.0,-24.18,-70.67:-1000.0...","0|0:2:-1000.0,-81.91,0.0,-30.31,-88.67:-1000.0...","0|0:2:-1000.0,-64.49,0.0,-23.86,-69.8:-1000.00..."
6,1,775292,YL_CN_IBS_6,T,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=-16677;CIEND=-500,1000;CIPOS=...",GT,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
7,1,794496,BI_GS_DEL1_B5_P0001_52,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=-5051;CIEND=-8,8;CIPOS=-8,8;E...",GT,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
8,1,812283,L1_umary_LINE1_1,G,<INS:ME:LINE1>,.,.,"TSD=null;SVTYPE=LINE1;MEINFO=LINE1,2926,3363,+...",GT:GL:AVGPOST,"0|0:0.0,-1.2,-10.0:0.9995",...,"0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-9.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-8.43,-10.0:1"
9,1,813866,L1_umary_LINE1_2,C,<INS:ME:LINE1>,.,.,"TSD=null;SVTYPE=LINE1;MEINFO=LINE1,4049,4300,+...",GT:GL:AVGPOST,"0|0:0.0,-5.42,-10.0:1",...,"0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-8.97,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.62,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.58,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.57,-10.0:1"


In [25]:
sv_genotypes = pd.read_table('ALL.wgs.mergedSV.v3.20130502.svs.genotypes.vcf.noinfo.10000',index_col='ID')

In [26]:
sv_genotypes

Unnamed: 0_level_0,#CHROM,POS,REF,ALT,QUAL,FILTER,INFO,FORMAT,HG00096,HG00097,...,NA21128,NA21129,NA21130,NA21133,NA21135,NA21137,NA21141,NA21142,NA21143,NA21144
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ALU_umary_ALU_2,1,645710,A,<INS:ME:ALU>,.,.,"TSD=null;SVTYPE=ALU;MEINFO=AluYa4_5,1,223,-;SV...",GT:GL:AVGPOST,"0|0:-0.0,-0.0,-0.0:0.993","0|0:-0.0,-0.0,-0.0:0.9905",...,"0|0:0.0,-0.68,-2.68:0.998","0|0:0.0,-1.96,-10.0:0.9999","0|0:-0.0,-0.0,-0.0:0.9911","0|0:0.0,-1.58,-10.0:0.9998","0|0:0.0,-1.25,-5.34:0.9997","0|0:0.0,-3.43,-10.0:1","0|0:0.0,-2.34,-9.13:1","0|0:0.0,-1.19,-4.58:0.9995","0|0:0.0,-1.1,-4.39:0.9989","0|0:0.0,-2.75,-10.0:1"
DUP_delly_DUP20532,1,668630,G,<CN2>,.,PASS,"SVTYPE=DUP;SVLEN=181574;IMPRECISE;CIEND=-150,1...",GT:FT:CN:GL:CNL:AVGPOST,"0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-10.0,-10.0:-300.0,-300.0,-0.0,...",...,"0|0:PASS:2:0.0,-8.26,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.63,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.33,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-7.45,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-7.2,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-8.91,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-5.33,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.42,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-5.3,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-3.63,-10.0:-300.0,-300.0,-0.0,..."
DUP_gs_CNV_1_713044_755966,1,713044,C,"<CN0>,<CN2>",.,PASS,SVTYPE=CNV;END=755966;CS=DUP_gs,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,"0|0:2:-1000.0,-16.51,-0.0,-5.4,-16.27:-1000.00...","0|0:2:-1000.0,-39.76,-0.0,-9.23,-30.7:-1000.00...",...,0|0:.:.:.:.:.:.:.:0.9838,"0|0:2:-1000.0,-39.82,-0.0,-8.05,-28.05:-1000.0...","0|0:2:-1000.0,-30.36,-0.0,-7.66,-24.81:-1000.0...","0|0:2:-1000.0,-34.9,-0.0,-7.38,-25.32:-1000.00...","0|0:2:-1000.0,-41.59,-0.0,-7.1,-26.36:-1000.00...","0|0:2:-1000.0,-55.63,-0.0,-13.1,-43.37:-1000.0...","2|0:3:-1000.0,-99.81,-10.85,-0.0,-8.67:-1000.0...","0|0:2:-1000.0,-33.72,-0.0,-6.77,-23.65:-1000.0...","2|0:3:-1000.0,-105.4,-9.81,-0.0,-11.63:-1000.0...","0|2:3:-1000.0,-88.5,-9.12,-0.0,-8.43:-1000.00,..."
UW_VH_21763,1,738570,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=3801;CIEND=0,354;CIPOS=-348,0...",GT,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
UW_VH_5595,1,766600,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=2842;CIEND=0,403;CIPOS=-385,0...",GT,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
DUP_gs_CNV_1_773090_852664,1,773090,T,"<CN0>,<CN2>",.,PASS,SVTYPE=CNV;END=852664;CS=DUP_gs,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,"0|0:2:-1000.0,-44.48,-0.0,-12.43,-39.08:-1000....","0|0:2:-1000.0,-84.91,0.0,-26.6,-81.06:-1000.00...",...,0|0:.:.:.:.:.:.:.:0.9869,"0|0:2:-1000.0,-87.59,0.0,-23.85,-75.56:-1000.0...","0|0:2:-1000.0,-69.73,0.0,-25.35,-74.47:-1000.0...","0|0:2:-1000.0,-83.23,0.0,-23.63,-73.97:-1000.0...","0|0:2:-1000.0,-93.63,0.0,-26.24,-82.44:-1000.0...","0|0:2:-1000.0,-131.74,0.0,-33.89,-109.19:-1000...","0|0:2:-1000.0,-74.38,0.0,-23.62,-71.72:-1000.0...","0|0:2:-1000.0,-65.11,0.0,-24.18,-70.67:-1000.0...","0|0:2:-1000.0,-81.91,0.0,-30.31,-88.67:-1000.0...","0|0:2:-1000.0,-64.49,0.0,-23.86,-69.8:-1000.00..."
YL_CN_IBS_6,1,775292,T,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=-16677;CIEND=-500,1000;CIPOS=...",GT,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
BI_GS_DEL1_B5_P0001_52,1,794496,G,<CN0>,100,PASS,"SVTYPE=DEL;SVLEN=-5051;CIEND=-8,8;CIPOS=-8,8;E...",GT,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
L1_umary_LINE1_1,1,812283,G,<INS:ME:LINE1>,.,.,"TSD=null;SVTYPE=LINE1;MEINFO=LINE1,2926,3363,+...",GT:GL:AVGPOST,"0|0:0.0,-1.2,-10.0:0.9995","0|0:0.0,-6.02,-10.0:1",...,"0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-9.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-8.43,-10.0:1"
L1_umary_LINE1_2,1,813866,C,<INS:ME:LINE1>,.,.,"TSD=null;SVTYPE=LINE1;MEINFO=LINE1,4049,4300,+...",GT:GL:AVGPOST,"0|0:0.0,-5.42,-10.0:1","0|0:0.0,-8.39,-10.0:1",...,"0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-8.97,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.62,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.58,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-9.57,-10.0:1"


## Selecting rows and columns

You can select columns by indexing with the column name.
You can select rows by using the ```dataframe.ix.rowname``` syntax.
You can also select one cell by using both the column and row names.

In [27]:
sv_genotypes['HG00096']

ID
ALU_umary_ALU_2                                        0|0:-0.0,-0.0,-0.0:0.993
DUP_delly_DUP20532            0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...
DUP_gs_CNV_1_713044_755966    0|0:2:-1000.0,-16.51,-0.0,-5.4,-16.27:-1000.00...
UW_VH_21763                                                                 0|0
UW_VH_5595                                                                  0|0
DUP_gs_CNV_1_773090_852664    0|0:2:-1000.0,-44.48,-0.0,-12.43,-39.08:-1000....
YL_CN_IBS_6                                                                 0|0
BI_GS_DEL1_B5_P0001_52                                                      0|0
L1_umary_LINE1_1                                      0|0:0.0,-1.2,-10.0:0.9995
L1_umary_LINE1_2                                          0|0:0.0,-5.42,-10.0:1
UW_VH_6483                                                                  0|0
DEL_pindel_5                               0|0:1,0:-0.000130308,-3.52288,-5.0:1
UW_VH_22703                          

In [28]:
sv_genotypes.ix['DEL_pindel_5']

#CHROM                                                     1
POS                                                   939918
REF        AAGCAGCAGCACCAGGCTGGTGCCACTGCCACCCCACCTGCATGCC...
ALT                                                        A
QUAL                                                       .
FILTER                                                  PASS
INFO           SVTYPE=DEL;SVLEN=-50;END=939968;CS=DEL_pindel
FORMAT                                      GT:AD:GL:AVGPOST
HG00096                 0|0:1,0:-0.000130308,-3.52288,-5.0:1
HG00097                             0|0:14,0:0.0,-5.0,-5.0:1
HG00099                              0|0:7,0:0.0,-5.0,-5.0:1
HG00100                             0|0:11,0:0.0,-5.0,-5.0:1
HG00101                     0|0:2,0:-4.34316e-05,-4.0,-5.0:1
HG00102                              0|0:5,0:0.0,-5.0,-5.0:1
HG00103                              0|0:4,0:0.0,-5.0,-5.0:1
...
NA21123     0|0:9,0:0.0,-5.0,-5.0:1
NA21124     0|0:6,0:0.0,-5.0,-5.0:1
NA21125  

In [29]:
sv_genotypes.ix['DEL_pindel_5','HG00096']

'0|0:1,0:-0.000130308,-3.52288,-5.0:1'

## Summary statistics

Pandas dataframes have methods to calculate a variety of statistics including mean, median, standard deviation and others.

By default these are calulated on a column basis, however some can be used on a row basis using the ```axis=1```. If they don't take an ```axis=1``` option, you can transpose the dataframe using ```dataframe.T```.


In [66]:
gene_expression.mean()

Coord      74053040.639406
HG00096          34.801341
HG00097          34.401585
HG00099          28.925257
HG00100          36.830319
HG00101          34.313935
HG00102          35.189094
HG00103          33.357601
HG00104          35.126686
HG00105          34.005152
HG00106          36.515449
HG00108          35.791778
HG00109          35.697977
HG00110          34.521691
HG00111          35.912282
...
NA20805    33.884594
NA20806    34.839451
NA20807    35.703652
NA20808    36.120202
NA20809    35.819684
NA20810    35.684760
NA20811    35.558003
NA20812    34.302086
NA20813    34.489164
NA20814    34.523332
NA20815    33.162467
NA20816    34.046450
NA20819    35.812278
NA20826    34.043210
NA20828    35.529893
Length: 463, dtype: float64

In [30]:
gene_expression.mean(axis=1)

TargetID
ENSG00000152931.6     129122.189635
ENSG00000183696.9     103954.654489
ENSG00000139269.2     124939.874540
ENSG00000169129.8     250896.544537
ENSG00000134602.11    283302.495214
ENSG00000136237.12     48375.560929
ENSG00000259425.1      49885.317670
ENSG00000242284.2     291477.882645
ENSG00000235027.1       3848.101684
ENSG00000228169.3     251613.522603
ENSG00000260083.1      66710.759187
ENSG00000247157.2      25272.265296
ENSG00000158482.8      46205.911943
ENSG00000146072.5     102113.058692
ENSG00000183814.10    489200.444096
...
ENSG00000151092.12     55822.831685
ENSG00000243680.1     113901.374640
ENSG00000157045.4      32746.028823
ENSG00000172058.10    151626.416724
ENSG00000143994.9      59064.185270
ENSG00000087095.7      56958.306142
ENSG00000261559.1      75145.543623
ENSG00000162144.4     132060.054437
ENSG00000129473.5      51342.081110
ENSG00000261205.1     163164.916968
ENSG00000235472.1      63042.358972
ENSG00000114423.14    228066.432300
ENSG00000243312

In [31]:
gene_expression_data = gene_expression.ix[:,'HG00096':]

In [32]:
gene_expression_data

Unnamed: 0_level_0,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00104,HG00105,HG00106,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
TargetID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ENSG00000152931.6,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,-0.000901,-0.006706,0.098863,0.045285,...,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.029204,0.024423,0.044816,0.139186
ENSG00000183696.9,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,7.348876,8.180940,8.721889,8.169477,...,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,4.052882,1.570378,4.900372,6.737308
ENSG00000139269.2,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,0.675305,3.817395,2.561376,1.231049,...,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,1.004250,3.003316,1.984362,1.684954
ENSG00000169129.8,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,1.259393,0.734784,1.479124,1.548653,...,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,0.826377,1.021201,0.952502,0.740565
ENSG00000134602.11,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,27.881116,27.194117,28.579857,27.226416,...,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,26.462331,25.624009,25.707741,22.824957
ENSG00000136237.12,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,2.500333,4.509277,2.344625,2.358093,...,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,2.145022,3.557598,4.152063,1.216834
ENSG00000259425.1,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,0.105538,0.058953,0.062852,0.028203,...,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.081050,0.070438,0.049859,0.017376
ENSG00000242284.2,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,2.954486,0.726429,0.033625,-0.013543,...,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,0.053560,-0.099780,0.447343,0.002862
ENSG00000235027.1,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,0.104912,0.134485,0.264147,0.165510,...,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.207399,0.288634,0.125602,0.141499
ENSG00000228169.3,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,70.680737,103.441921,90.962547,99.270960,...,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,94.643402,114.259370,96.073384,105.326429


In [33]:
gene_expression_data.mean(axis=1)

TargetID
ENSG00000152931.6       0.073163
ENSG00000183696.9       6.017377
ENSG00000139269.2       2.285523
ENSG00000169129.8       1.266495
ENSG00000134602.11     25.459489
ENSG00000136237.12      2.427944
ENSG00000259425.1       0.071604
ENSG00000242284.2       0.575031
ENSG00000235027.1       0.201472
ENSG00000228169.3     101.012912
ENSG00000260083.1       1.165592
ENSG00000247157.2       0.205264
ENSG00000158482.8       4.381449
ENSG00000146072.5       1.526352
ENSG00000183814.10      4.838996
...
ENSG00000151092.12     31.257727
ENSG00000243680.1     194.892766
ENSG00000157045.4      24.870876
ENSG00000172058.10     14.153557
ENSG00000143994.9       0.077446
ENSG00000087095.7       5.441003
ENSG00000261559.1      12.479865
ENSG00000162144.4      30.377065
ENSG00000129473.5       7.325874
ENSG00000261205.1       0.172200
ENSG00000235472.1      33.857585
ENSG00000114423.14     13.770898
ENSG00000243312.2       0.643022
ENSG00000257337.1       5.199455
ENSG00000177494.5       2.3432

In [34]:
gene_expression_data.std(axis=1)

TargetID
ENSG00000152931.6      0.062742
ENSG00000183696.9      2.505384
ENSG00000139269.2      1.261633
ENSG00000169129.8      2.717542
ENSG00000134602.11     4.070719
ENSG00000136237.12     1.431972
ENSG00000259425.1      0.152495
ENSG00000242284.2      0.718110
ENSG00000235027.1      0.074831
ENSG00000228169.3     12.712324
ENSG00000260083.1      0.365180
ENSG00000247157.2      0.224509
ENSG00000158482.8      1.431686
ENSG00000146072.5      1.741669
ENSG00000183814.10     3.267470
...
ENSG00000151092.12     8.755830
ENSG00000243680.1     41.232212
ENSG00000157045.4      5.606588
ENSG00000172058.10     3.462172
ENSG00000143994.9      0.071614
ENSG00000087095.7      0.659892
ENSG00000261559.1      3.700453
ENSG00000162144.4     11.965967
ENSG00000129473.5      3.411215
ENSG00000261205.1      0.092325
ENSG00000235472.1      4.304333
ENSG00000114423.14     5.194206
ENSG00000243312.2      0.254310
ENSG00000257337.1      1.577778
ENSG00000177494.5      2.383513
Length: 23722, dtype: float

In [35]:
gene_expression_data.var(axis=1)

TargetID
ENSG00000152931.6       0.003937
ENSG00000183696.9       6.276949
ENSG00000139269.2       1.591718
ENSG00000169129.8       7.385035
ENSG00000134602.11     16.570751
ENSG00000136237.12      2.050544
ENSG00000259425.1       0.023255
ENSG00000242284.2       0.515682
ENSG00000235027.1       0.005600
ENSG00000228169.3     161.603186
ENSG00000260083.1       0.133357
ENSG00000247157.2       0.050404
ENSG00000158482.8       2.049724
ENSG00000146072.5       3.033410
ENSG00000183814.10     10.676363
...
ENSG00000151092.12      76.664554
ENSG00000243680.1     1700.095305
ENSG00000157045.4       31.433827
ENSG00000172058.10      11.986635
ENSG00000143994.9        0.005129
ENSG00000087095.7        0.435458
ENSG00000261559.1       13.693354
ENSG00000162144.4      143.184357
ENSG00000129473.5       11.636387
ENSG00000261205.1        0.008524
ENSG00000235472.1       18.527284
ENSG00000114423.14      26.979781
ENSG00000243312.2        0.064674
ENSG00000257337.1        2.489384
ENSG00000177494.

In [36]:
gene_expression_data.corr()

Unnamed: 0,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00104,HG00105,HG00106,...,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20816,NA20819,NA20826,NA20828
HG00096,1.000000,0.977666,0.295469,0.955398,0.990047,0.956303,0.987966,0.906630,0.977152,0.979747,...,0.982624,0.980512,0.988884,0.954015,0.952729,0.980860,0.967849,0.943306,0.966035,0.938119
HG00097,0.977666,1.000000,0.363809,0.969242,0.970921,0.965421,0.972996,0.891733,0.978876,0.980893,...,0.976353,0.975788,0.972010,0.963634,0.963047,0.975033,0.973983,0.957075,0.963886,0.948473
HG00099,0.295469,0.363809,1.000000,0.305989,0.289100,0.323338,0.293967,0.295968,0.368270,0.317390,...,0.319494,0.241056,0.307183,0.346788,0.304273,0.402153,0.337069,0.284690,0.328068,0.352665
HG00100,0.955398,0.969242,0.305989,1.000000,0.953408,0.936164,0.942655,0.839483,0.941429,0.945902,...,0.949535,0.956343,0.944533,0.940629,0.920458,0.942890,0.941660,0.924753,0.933075,0.903385
HG00101,0.990047,0.970921,0.289100,0.953408,1.000000,0.944966,0.977866,0.883980,0.967598,0.972179,...,0.979151,0.974937,0.987588,0.957140,0.940822,0.976235,0.961027,0.938103,0.959284,0.930312
HG00102,0.956303,0.965421,0.323338,0.936164,0.944966,1.000000,0.964189,0.902065,0.978041,0.976338,...,0.976058,0.963728,0.953624,0.936728,0.948938,0.943275,0.982256,0.953349,0.986004,0.982876
HG00103,0.987966,0.972996,0.293967,0.942655,0.977866,0.964189,1.000000,0.922696,0.980886,0.980176,...,0.988509,0.979348,0.987505,0.959505,0.959060,0.976647,0.976777,0.946645,0.969152,0.952684
HG00104,0.906630,0.891733,0.295968,0.839483,0.883980,0.902065,0.922696,1.000000,0.915206,0.931143,...,0.911417,0.911736,0.902232,0.883691,0.928204,0.901609,0.909250,0.879239,0.912149,0.907617
HG00105,0.977152,0.978876,0.368270,0.941429,0.967598,0.978041,0.980886,0.915206,1.000000,0.986701,...,0.986454,0.970943,0.977098,0.955641,0.959678,0.972102,0.985202,0.959664,0.980382,0.965956
HG00106,0.979747,0.980893,0.317390,0.945902,0.972179,0.976338,0.980176,0.931143,0.986701,1.000000,...,0.983624,0.985810,0.978382,0.959090,0.969748,0.968944,0.983001,0.963216,0.978369,0.964524


In [37]:
sv_genotypes.T

ID,ALU_umary_ALU_2,DUP_delly_DUP20532,DUP_gs_CNV_1_713044_755966,UW_VH_21763,UW_VH_5595,DUP_gs_CNV_1_773090_852664,YL_CN_IBS_6,BI_GS_DEL1_B5_P0001_52,L1_umary_LINE1_1,L1_umary_LINE1_2,...,UW_VH_19471,DUP_gs_CNV_2_228240601_228258237,ALU_umary_ALU_2032,BI_GS_DEL1_B4_P0479_67,BI_GS_DEL1_B5_P0479_487,ALU_umary_ALU_2033,BI_GS_DEL1_B3_P0479_33,CINV_delly_INV00063055,DEL_pindel_6601,UW_VH_14812
#CHROM,1,1,1,1,1,1,1,1,1,1,...,2,2,2,2,2,2,2,2,2,2
POS,645710,668630,713044,738570,766600,773090,775292,794496,812283,813866,...,228237555,228240601,228258186,228268940,228383167,228466216,228604432,228608595,228631053,228651316
REF,A,G,C,G,G,T,T,G,G,C,...,G,G,A,C,C,T,T,G,TAATTATTTCCATTTTGTAGATAAAGAAAACAAGGCGCACAGAGGC...,G
ALT,<INS:ME:ALU>,<CN2>,"<CN0>,<CN2>",<CN0>,<CN0>,"<CN0>,<CN2>",<CN0>,<CN0>,<INS:ME:LINE1>,<INS:ME:LINE1>,...,<CN0>,<CN2>,<INS:ME:ALU>,<CN0>,<CN0>,<INS:ME:ALU>,<CN0>,<INV>,T,<CN0>
QUAL,.,.,.,100,100,.,100,100,.,.,...,100,.,.,100,100,.,100,.,.,100
FILTER,.,PASS,PASS,PASS,PASS,PASS,PASS,PASS,.,.,...,PASS,PASS,.,PASS,PASS,.,PASS,PASS,PASS,PASS
INFO,"TSD=null;SVTYPE=ALU;MEINFO=AluYa4_5,1,223,-;SV...","SVTYPE=DUP;SVLEN=181574;IMPRECISE;CIEND=-150,1...",SVTYPE=CNV;END=755966;CS=DUP_gs,"SVTYPE=DEL;SVLEN=3801;CIEND=0,354;CIPOS=-348,0...","SVTYPE=DEL;SVLEN=2842;CIEND=0,403;CIPOS=-385,0...",SVTYPE=CNV;END=852664;CS=DUP_gs,"SVTYPE=DEL;SVLEN=-16677;CIEND=-500,1000;CIPOS=...","SVTYPE=DEL;SVLEN=-5051;CIEND=-8,8;CIPOS=-8,8;E...","TSD=null;SVTYPE=LINE1;MEINFO=LINE1,2926,3363,+...","TSD=null;SVTYPE=LINE1;MEINFO=LINE1,4049,4300,+...",...,"SVTYPE=DEL;SVLEN=5941;CIEND=0,153;CIPOS=-130,0...",SVTYPE=DUP;END=228258237;CS=DUP_gs,"TSD=null;SVTYPE=ALU;MEINFO=AluUndef,1,281,-;SV...","SVTYPE=DEL;SVLEN=-1146;CIEND=-42,42;CIPOS=-42,...","SVTYPE=DEL;SVLEN=-613;CIEND=-2,2;CIPOS=-2,2;EN...","TSD=null;SVTYPE=ALU;MEINFO=AluUndef,25,281,-;S...","SVTYPE=DEL;SVLEN=-8777;CIEND=-58,58;CIPOS=-58,...","SVTYPE=INV;SVLEN=4820;IMPRECISE;CIEND=-172,172...",SVTYPE=DEL;SVLEN=-139;END=228631192;CS=DEL_pindel,"SVTYPE=DEL;SVLEN=6971;CIEND=0,287;CIPOS=-136,0..."
FORMAT,GT:GL:AVGPOST,GT:FT:CN:GL:CNL:AVGPOST,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,GT,GT,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,GT,GT,GT:GL:AVGPOST,GT:GL:AVGPOST,...,GT,GT:CN:CNL:CNP:CNQ:GP:GQ:PL:AVGPOST,GT:GL:AVGPOST,GT,GT,GT:GL:AVGPOST,GT,GT,GT:AD:GL:AVGPOST,GT
HG00096,"0|0:-0.0,-0.0,-0.0:0.993","0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...","0|0:2:-1000.0,-16.51,-0.0,-5.4,-16.27:-1000.00...",0|0,0|0,"0|0:2:-1000.0,-44.48,-0.0,-12.43,-39.08:-1000....",0|0,0|0,"0|0:0.0,-1.2,-10.0:0.9995","0|0:0.0,-5.42,-10.0:1",...,0|0,"0|0:2:-1000.0,-29.29,-0.0,-9.68,-29.1:-1000.00...","0|0:0.0,-1.2,-10.0:0.9995",0|0,0|0,"0|0:0.0,-1.2,-10.0:0.9706",0|0,0|0,"0|0:6,0:-0.000130308,-3.52288,-5.0:1",0|0
HG00097,"0|0:-0.0,-0.0,-0.0:0.9905","0|0:PASS:2:0.0,-10.0,-10.0:-300.0,-300.0,-0.0,...","0|0:2:-1000.0,-39.76,-0.0,-9.23,-30.7:-1000.00...",0|0,0|0,"0|0:2:-1000.0,-84.91,0.0,-26.6,-81.06:-1000.00...",0|0,0|0,"0|0:0.0,-6.02,-10.0:1","0|0:0.0,-8.39,-10.0:1",...,0|0,"0|0:2:-1000.0,-75.53,-0.0,-15.64,-54.06:-1000....","0|0:0.0,-9.62,-10.0:1",0|0,0|0,"0|0:0.0,-9.02,-10.0:1",0|0,0|0,"0|0:7,0:-4.34316e-05,-4.0,-5.0:1",0|0


In [50]:
gene_expression_data.T.corrwith(gene_expression_data.ix['ENSG00000152931.6'])

TargetID
ENSG00000152931.6     1.000000
ENSG00000183696.9    -0.019021
ENSG00000139269.2     0.010370
ENSG00000169129.8     0.002388
ENSG00000134602.11   -0.036752
ENSG00000136237.12    0.022918
ENSG00000259425.1    -0.039332
ENSG00000242284.2     0.041292
ENSG00000235027.1    -0.093954
ENSG00000228169.3     0.072138
ENSG00000260083.1     0.038304
ENSG00000247157.2    -0.044422
ENSG00000158482.8    -0.026646
ENSG00000146072.5    -0.007551
ENSG00000183814.10    0.019022
...
ENSG00000151092.12    0.041370
ENSG00000243680.1     0.035233
ENSG00000157045.4     0.000987
ENSG00000172058.10   -0.030164
ENSG00000143994.9    -0.007899
ENSG00000087095.7    -0.003103
ENSG00000261559.1    -0.009301
ENSG00000162144.4    -0.113226
ENSG00000129473.5     0.001165
ENSG00000261205.1     0.022513
ENSG00000235472.1     0.060102
ENSG00000114423.14   -0.083344
ENSG00000243312.2    -0.061556
ENSG00000257337.1    -0.005940
ENSG00000177494.5    -0.019077
Length: 23722, dtype: float64

In [67]:
gene_expression_data.columns

Index([u'HG00096', u'HG00097', u'HG00099', u'HG00100', u'HG00101', u'HG00102', u'HG00103', u'HG00104', u'HG00105', u'HG00106', u'HG00108', u'HG00109', u'HG00110', u'HG00111', u'HG00112', u'HG00114', u'HG00115', u'HG00116', u'HG00117', u'HG00118', u'HG00119', u'HG00120', u'HG00121', u'HG00122', u'HG00123', u'HG00124', u'HG00125', u'HG00126', u'HG00127', u'HG00128', u'HG00129', u'HG00130', u'HG00131', u'HG00132', u'HG00133', u'HG00134', u'HG00135', u'HG00136', u'HG00137', u'HG00138', u'HG00139', u'HG00141', u'HG00142', u'HG00143', u'HG00145', u'HG00146', u'HG00148', u'HG00149', u'HG00150', u'HG00151', u'HG00152', u'HG00154', u'HG00155', u'HG00156', u'HG00157', u'HG00158', u'HG00159', u'HG00160', u'HG00171', u'HG00173', u'HG00174', u'HG00176', u'HG00177', u'HG00178', u'HG00179', u'HG00180', u'HG00181', u'HG00182', u'HG00183', u'HG00185', u'HG00186', u'HG00187', u'HG00188', u'HG00189', u'HG00231', u'HG00232', u'HG00233', u'HG00234', u'HG00235', u'HG00236', u'HG00238', u'HG00239', u'HG00240

In [73]:
sv_genotypes[sv_genotypes.columns.intersection(gene_expression_data.columns)]

Unnamed: 0_level_0,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00105,HG00106,HG00108,...,NA20809,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20819,NA20826,NA20828
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ALU_umary_ALU_2,"0|0:-0.0,-0.0,-0.0:0.993","0|0:-0.0,-0.0,-0.0:0.9905","0|0:-0.0,-0.0,-0.0:0.9914","0|0:0.0,-3.06,-10.0:1","0|0:0.0,-2.24,-10.0:1","0|0:0.0,-3.43,-10.0:1","0|0:-0.0,-0.0,-0.0:0.9939","0|0:0.0,-3.59,-10.0:1","0|0:-0.0,-0.0,-0.0:0.9905","0|0:-0.0,-0.0,-0.0:0.9955",...,"0|0:0.0,-2.97,-10.0:1","0|0:0.0,-1.35,-8.14:0.9993","0|0:0.0,-1.38,-5.67:0.9996","0|0:0.0,-0.95,-3.85:0.9992","0|0:-0.0,-0.0,-0.0:0.9929","0|0:0.0,-1.19,-4.68:0.9994","0|0:-0.0,-0.0,-0.0:0.9936","0|0:0.0,-0.83,-3.23:0.999","0|0:0.0,-1.86,-7.25:0.9999","0|0:0.0,-0.76,-2.96:0.9988"
DUP_delly_DUP20532,"0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-10.0,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.91,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-9.45,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.8,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-8.56,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-10.0,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.85,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.5,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-10.0,-10.0:-300.0,-300.0,-0.0,...",...,"0|0:PASS:2:0.0,-9.86,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-9.95,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-4.26,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-5.8,-10.0:-300.0,-300.0,-0.0,-...","0|0:PASS:2:0.0,-9.54,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-9.39,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-9.35,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-7.78,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.82,-10.0:-300.0,-300.0,-0.0,...","0|0:PASS:2:0.0,-8.75,-10.0:-300.0,-300.0,-0.0,..."
DUP_gs_CNV_1_713044_755966,"0|0:2:-1000.0,-16.51,-0.0,-5.4,-16.27:-1000.00...","0|0:2:-1000.0,-39.76,-0.0,-9.23,-30.7:-1000.00...","0|0:2:-1000.0,-36.85,-0.0,-6.68,-24.23:-1000.0...","0|0:2:-1000.0,-37.42,0.0,-17.08,-47.77:-1000.0...","0|0:2:-1000.0,-28.54,-0.0,-6.72,-22.24:-1000.0...","0|0:2:-1000.0,-22.99,-0.0,-7.89,-23.49:-1000.0...","0|0:2:-307.28,-12.66,-0.0,-7.81,-20.73:-311.62...","0|0:2:-1000.0,-27.1,-0.0,-7.05,-22.63:-1000.00...",0|0:.:.:.:.:.:.:.:0.9887,"0|0:2:-1000.0,-20.39,-0.0,-5.63,-17.75:-1000.0...",...,"0|0:2:-1000.0,-33.66,-0.0,-8.43,-27.38:-1000.0...","0|0:2:-1000.0,-31.12,-0.0,-6.99,-23.5:-1000.00...","2|0:3:-1000.0,-69.28,-8.12,-0.0,-5.12:-1000.00...","2|0:3:-1000.0,-94.6,-13.11,-0.0,-3.97:-1000.00...","0|0:2:-1000.0,-21.27,-0.0,-5.6,-17.9:-1000.00,...","0|0:2:-1000.0,-24.88,-0.0,-4.19,-15.65:-1000.0...","0|0:2:-1000.0,-21.29,-0.0,-5.39,-17.45:-1000.0...","0|0:2:-1000.0,-36.54,-0.0,-4.7,-19.69:-1000.00...","0|0:2:-1000.0,-23.91,-0.0,-4.71,-16.57:-1000.0...",0|0:.:.:.:.:.:.:.:0.9788
UW_VH_21763,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
UW_VH_5595,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
DUP_gs_CNV_1_773090_852664,"0|0:2:-1000.0,-44.48,-0.0,-12.43,-39.08:-1000....","0|0:2:-1000.0,-84.91,0.0,-26.6,-81.06:-1000.00...","0|0:2:-1000.0,-90.62,0.0,-16.27,-59.24:-1000.0...","0|0:2:-1000.0,-110.92,0.0,-38.04,-113.31:-1000...","0|0:2:-1000.0,-63.06,0.0,-17.07,-54.15:-1000.0...","0|0:2:-1000.0,-69.55,-0.0,-12.37,-45.22:-1000....","0|0:2:-1000.0,-50.87,-0.0,-12.55,-40.94:-1000....","0|0:2:-1000.0,-55.9,0.0,-18.65,-55.92:-1000.00...",0|0:.:.:.:.:.:.:.:0.9886,"0|0:2:-1000.0,-41.75,-0.0,-15.78,-45.93:-1000....",...,"0|0:2:-1000.0,-83.6,0.0,-24.57,-76.18:-1000.00...","0|0:2:-1000.0,-68.81,0.0,-25.35,-74.22:-1000.0...","0|0:2:-1000.0,-52.69,-0.0,-14.77,-46.4:-1000.0...","0|0:2:-1000.0,-57.57,0.0,-19.13,-57.43:-1000.0...","0|0:2:-1000.0,-62.92,-0.0,-12.9,-44.74:-1000.0...","0|0:2:-1000.0,-50.33,0.0,-16.28,-49.2:-1000.00...","0|0:2:-1000.0,-51.08,0.0,-16.1,-48.99:-1000.00...","0|0:2:-1000.0,-74.01,0.0,-21.4,-66.65:-1000.00...","0|0:2:-1000.0,-69.54,-0.0,-10.96,-42.03:-1000....",0|0:.:.:.:.:.:.:.:0.9848
YL_CN_IBS_6,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,...,0|0,0|0,0|0,0|1,0|0,0|0,0|0,0|0,0|0,0|0
BI_GS_DEL1_B5_P0001_52,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,...,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0,0|0
L1_umary_LINE1_1,"0|0:0.0,-1.2,-10.0:0.9995","0|0:0.0,-6.02,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.02,-10.0:1","0|0:0.0,-4.21,-10.0:1","0|0:0.0,-3.01,-10.0:1","0|0:0.0,-3.61,-10.0:1","0|0:0.0,-9.04,-10.0:1","0|0:0.0,-7.21,-10.0:1",...,"0|0:0.0,-7.22,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-4.82,-10.0:1","0|0:0.0,-7.83,-10.0:1","0|0:0.0,-4.21,-10.0:1","0|0:0.0,-4.21,-10.0:1","0|0:0.0,-9.63,-10.0:1","0|0:0.0,-9.63,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.02,-10.0:1"
L1_umary_LINE1_2,"0|0:0.0,-5.42,-10.0:1","0|0:0.0,-8.39,-10.0:1","0|0:0.0,-4.8,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-7.82,-10.0:1","0|0:0.0,-6.61,-10.0:1","0|0:0.0,-3.61,-10.0:1","0|0:0.0,-6.02,-10.0:1","0|0:0.0,-4.2,-10.0:1","0|0:0.0,-9.57,-10.0:1",...,"0|0:0.0,-10.0,-10.0:1","0|0:0.0,-7.8,-10.0:1","0|0:0.0,-7.78,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-7.82,-10.0:1","0|0:0.0,-6.62,-10.0:1","0|0:0.0,-7.71,-10.0:1","0|0:0.0,-10.0,-10.0:1","0|0:0.0,-6.0,-10.0:1","0|0:0.0,-8.96,-10.0:1"


In [75]:
gene_expression_data[sv_genotypes.columns.intersection(gene_expression_data.columns)]

Unnamed: 0_level_0,HG00096,HG00097,HG00099,HG00100,HG00101,HG00102,HG00103,HG00105,HG00106,HG00108,...,NA20809,NA20810,NA20811,NA20812,NA20813,NA20814,NA20815,NA20819,NA20826,NA20828
TargetID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
ENSG00000152931.6,0.101858,0.078110,0.048981,0.118597,0.004035,0.010925,-0.000901,0.098863,0.045285,0.076937,...,0.015805,0.088601,0.240010,0.137175,0.148494,0.038643,0.088509,0.024423,0.044816,0.139186
ENSG00000183696.9,8.183805,5.686911,2.434653,3.830894,6.612288,4.709646,7.348876,8.721889,8.169477,4.090099,...,5.311783,13.428205,6.094500,12.536000,2.217262,3.573394,7.583364,1.570378,4.900372,6.737308
ENSG00000139269.2,1.199910,1.573572,0.521616,1.447225,3.565791,1.982681,0.675305,2.561376,1.231049,1.269440,...,2.284592,3.225880,1.996067,2.854923,2.267343,1.331201,2.187895,3.003316,1.984362,1.684954
ENSG00000169129.8,0.831940,0.069778,0.931086,0.620941,1.660668,0.570481,1.259393,1.479124,1.548653,1.010198,...,0.229538,1.023381,1.127852,0.774409,1.495854,0.895342,1.513521,1.021201,0.952502,0.740565
ENSG00000134602.11,27.646422,24.395572,16.445374,24.806650,25.113349,19.233988,27.881116,28.579857,27.226416,25.384714,...,25.412772,25.079490,28.725528,24.450520,27.264069,26.912814,29.509210,25.624009,25.707741,22.824957
ENSG00000136237.12,3.788503,2.050963,4.000313,3.271619,1.798216,1.516688,2.500333,2.344625,2.358093,2.729958,...,3.679770,2.909393,1.921176,5.083873,2.866573,1.297788,2.888316,3.557598,4.152063,1.216834
ENSG00000259425.1,0.054059,0.112185,0.003592,0.000500,0.029398,0.031266,0.105538,0.062852,0.028203,0.009935,...,-0.001498,0.022056,0.010224,0.000204,0.059104,0.066048,0.013943,0.070438,0.049859,0.017376
ENSG00000242284.2,0.351716,0.444540,0.227708,0.714112,0.450912,0.491438,2.954486,0.033625,-0.013543,0.427457,...,2.125361,0.816645,1.682329,0.686780,1.207540,0.088764,0.962397,-0.099780,0.447343,0.002862
ENSG00000235027.1,0.200791,0.190138,0.092925,0.108790,0.232448,0.250905,0.104912,0.264147,0.165510,0.135443,...,0.162897,0.221223,0.254004,0.294359,0.172155,0.135213,0.121265,0.288634,0.125602,0.141499
ENSG00000228169.3,96.182178,101.179262,58.783063,105.483527,105.818192,136.140843,70.680737,90.962547,99.270960,64.743452,...,98.608328,107.094412,103.412712,81.928224,106.792474,100.898778,100.945145,114.259370,96.073384,105.326429


## Writing files

You can use the write methods to save dataframes. Just like the read methods, there are multiple different options (write_csv, write_table, write_sql etc)