# Tutorial 4: Multi-level column indices (`MultiIndex`)

A MultiIndex is an index with multiple, hierarchical levels. We use a multiindex for the column headers of a 
dataframe. This allows each column to have multiple keys associated with it--one key for each level. 

The multiindex is a native part of the `pandas` package. For more documentation, see their [MultiIndex / advanced indexing page](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html).

MultiIndexes make it easy to associate multiple identifiers with a single column.  To show what this looks like we'll look at phosphoproteomics data in the endometrial cancer dataset.

In [1]:
# Throughout the tutorial we will be using the endometrial and colon datasets
import cptac
cptac.download(dataset="endometrial", version="latest")
cptac.download(dataset="colon", version="latest")

                                                

True

In [2]:
en = cptac.Endometrial()
phospho = en.get_phosphoproteomics()
phospho.head()

                                                

Name,AAAS,AAAS,AAAS,AACS,AAED1,AAGAB,AAGAB,AAK1,AAK1,AAK1,...,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3
Site,S495,S541,Y485,S618,S12,S310,S311,S14,S18,S20,...,S397,S411,S420,S424,S426,S468,S89,T415,T418,Y399
Patient_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
C3L-00006,,,,-0.881,-1.81,,,,-0.242,-0.242,...,0.184,,,,-0.205,,,,,
C3L-00008,,,,,0.084,,,-1.11,-0.383,-1.09,...,-0.171,,,-0.393,-0.171,,0.29,,0.1605,-0.0635
C3L-00032,-0.202,,,,-1.88,,,,0.382,-0.0416,...,,,,,,,,,,
C3L-00090,-0.002,,-0.407,,,,,,,-0.555,...,0.1397,,,,-0.559,,,,,0.298
C3L-00098,0.556,-0.0461,,,0.941,,0.429,0.362,0.697,-0.0529,...,-0.15875,,,0.196,0.06175,,,,,-0.29


The first level of keys is the name of the gene associated with the column of data. The second level is the site of acetylation. Each column has its own Name and Site associated with it. 
Normally, as above, keys that repeat for adjacent columns are not displayed, to make it easier to read the dataframe. For example, in the table above the first 7 columns are all for the A2M gene, but the label is only shown for the first column. 

# Join functions with multiindices 
The join functions have been written to handle multiindices. More information on the join functions can be found in the joining_dataframes tutorial. 
An example of joining a multiindexed dataframe (in this case phosphoproteomics) with a non multiindexed dataframe (in this case CNV) is below. 

In [3]:
phospho_and_CNV = en.join_omics_to_omics(df1_name="CNV", df2_name="phosphoproteomics")
phospho_and_CNV.head()



Name,A1BG_CNV,A1BG-AS1_CNV,A1CF_CNV,A2M_CNV,A2M-AS1_CNV,A2ML1_CNV,A2MP1_CNV,A3GALT2_CNV,A4GALT_CNV,A4GNT_CNV,...,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics
Site,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,...,S397,S411,S420,S424,S426,S468,S89,T415,T418,Y399
Patient_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
C3L-00006,0.0,0.0,-0.01,0.0,0.0,0.0,0.0,0.0,0.01,-0.01,...,0.184,,,,-0.205,,,,,
C3L-00008,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,-0.171,,,-0.393,-0.171,,0.29,,0.1605,-0.0635
C3L-00032,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,,,,,,,,,,
C3L-00090,-0.04,-0.04,0.36,-0.05,-0.05,-0.05,-0.05,-0.04,-0.04,-0.05,...,0.1397,,,,-0.559,,,,,0.298
C3L-00098,0.85,0.85,0.16,-0.41,-0.41,-0.41,-0.41,-0.12,0.15,0.19,...,-0.15875,,,0.196,0.06175,,,,,-0.29


Since the CNV dataframe doesn't have the level "Site", it is filled in with NANs, so that it can be joined to the phosphoproteomics dataframe. 

# How to select from multiindex


## Selecting based on all levels
We can select single columns by passing the proper keys for all levels of the multiindex. For example, to get the acetylproteomics for A2M at site K1176, we'd do the following:

In [4]:
acetyl = en.get_acetylproteomics()
all_levels_selection = acetyl["A2M"]["K1176"]
all_levels_selection.head(10)

Patient_ID
C3L-00006    1.080
C3L-00008    0.477
C3L-00032      NaN
C3L-00090   -0.608
C3L-00098    1.630
C3L-00136   -0.210
C3L-00137   -0.175
C3L-00139   -0.223
C3L-00143   -1.120
C3L-00145      NaN
Name: K1176, dtype: float64

## Selecting based on one level
We can easily select multiple columns from our multiindex dataframe, based on just the "Name" level of the multiindex:

In [5]:
gene1_filter = acetyl.columns.get_level_values("Name").str.startswith("AA") # Select all columns where the gene starts with "AA". This will grab every column where the key "Name" starts with AA
gene1_data = acetyl.loc[:, gene1_filter]
gene1_data.head()

Name,AACS,AAGAB,AAK1,AARS,AASDH,AASS,AASS,AASS,AATF,AATF
Site,K391,K290,K201,K338,K765,K138,K539,K93,K380,K74
Patient_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
C3L-00006,,0.461,,,,,,,1.3,
C3L-00008,,1.77,,,,,,,0.652,
C3L-00032,,-0.815,-0.00573,,,,,-0.305,0.398,
C3L-00090,,,,,,,,,,
C3L-00098,,0.205,,,,,,,0.244,


## Selecting based on a different level of the multiindex
We can also select based on one of the inner levels of the multiindex. For example, to get data for all tyrosine phosphorylation sites:

In [6]:
y_site_filter = phospho.columns.get_level_values("Site").str.contains("Y") # Create a boolean filter selecting all columns where the Site level contains a "Y"

y_sites = phospho.loc[:, y_site_filter] # Select the columns
y_sites.head()

Name,AAAS,AAK1,ABCC1,ABI1,ABI2,ABI3,ABL1,ABL2,ABLIM1,ABLIM1,...,ZNF799,ZNF839,ZNF860,ZNRF2,ZRANB2,ZRANB2,ZRSR2,ZYX,ZZEF1,ZZZ3
Site,Y485,Y687,Y920,Y213,Y213,Y361,Y204,Y80,Y357,Y439,...,Y430,Y283,Y49,Y111,Y114,Y124,Y347,Y172,Y1795,Y399
Patient_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
C3L-00006,,,0.434,-0.591,,,-0.759,,,0.312,...,,,,,,0.181,0.788,-0.378,,
C3L-00008,,,-0.511,,,,-1.21,0.321,,,...,,0.0684,,,,-0.929,,-1.23,,-0.0635
C3L-00032,,,,,,,,,,,...,,,,,,-0.283,0.675,0.7,,
C3L-00090,-0.407,,1.268,,,,-0.173,,,,...,,,,,,-0.123,-0.026,-1.02,,0.298
C3L-00098,,-0.4315,-0.0685,,,,-0.965,0.657,,,...,,,,-0.058,,0.109,,-0.636,,-0.29


# How to use `cptac.utils.reduce_multiindex()`
To make it easier to work with multi-level indices, we provide the `reduce_multiindex` function, available for import from the `cptac.utils` submodule. It can both drop levels from a multiindex, and "flatten" a multi-level index into a single-level index by concatenating the keys from multiple levels into a single key for each column.

In [7]:
import cptac.utils as ut

## Dropping Levels
We can drop levels based on index or name. We can also drop single or multiple levels at once. 
Note that it will warn you if duplicate column key combinations arise due to dropping levels. 

### Dropping by index or name

In [8]:
ut.reduce_multiindex(df=phospho, levels_to_drop="Site").head()



Name,AAAS,AAAS,AAAS,AACS,AAED1,AAGAB,AAGAB,AAK1,AAK1,AAK1,...,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
C3L-00006,,,,-0.881,-1.81,,,,-0.242,-0.242,...,0.184,,,,-0.205,,,,,
C3L-00008,,,,,0.084,,,-1.11,-0.383,-1.09,...,-0.171,,,-0.393,-0.171,,0.29,,0.1605,-0.0635
C3L-00032,-0.202,,,,-1.88,,,,0.382,-0.0416,...,,,,,,,,,,
C3L-00090,-0.002,,-0.407,,,,,,,-0.555,...,0.1397,,,,-0.559,,,,,0.298
C3L-00098,0.556,-0.0461,,,0.941,,0.429,0.362,0.697,-0.0529,...,-0.15875,,,0.196,0.06175,,,,,-0.29


### Dropping single or multiple levels at once
By passing a list (or array-like) to levels_to drop, we can drop multiple levels of the multiindex at the same time. Note that we must leave at least one existing level. 

We will show this with the colon data.

In [9]:
colon = cptac.Colon()
phospho = colon.get_phosphoproteomics()
phospho.head()

                                          

Name,AAAS,AAAS,AAAS,AAED1,AAGAB,AAGAB,AAK1,AAK1,AAK1,AAK1,...,ZZEF1,ZZEF1,ZZEF1,ZZEF1,ZZEF1,ZZZ3,ZZZ3,ZZZ3,ZZZ3,ZZZ3
Site,S495,S525,S541,S12,S310,S311,S20,S21,S26,S618,...,S1501,S1518,S1537,S1540,T1521,S113,S391,S606,S90,S91
Database_ID,Q9NRG9,Q9NRG9,Q9NRG9,Q7RTV5,Q6PD74,Q6PD74,Q2M2I8,Q2M2I8,Q2M2I8,Q2M2I8,...,O43149,O43149,O43149,O43149,O43149,Q8IYH5,Q8IYH5,Q8IYH5,Q8IYH5,Q8IYH5
Patient_ID,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,


In [10]:
# Drop evey level except Database_ID
drop = ["Name", "Site"]
ut.reduce_multiindex(df=phospho, levels_to_drop=drop).head()



Database_ID,Q9NRG9,Q9NRG9,Q9NRG9,Q7RTV5,Q6PD74,Q6PD74,Q2M2I8,Q2M2I8,Q2M2I8,Q2M2I8,...,O43149,O43149,O43149,O43149,O43149,Q8IYH5,Q8IYH5,Q8IYH5,Q8IYH5,Q8IYH5
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,


## Combining levels (Flattening)

We can combine levels of a multiindexed dataframe. When combined the levels will be sepereated by an underscore, by default. We could specify a different seperator using the `sep` parameter.

In [11]:
ut.reduce_multiindex(df=phospho, flatten=True).head()

Name,AAAS_S495_Q9NRG9,AAAS_S525_Q9NRG9,AAAS_S541_Q9NRG9,AAED1_S12_Q7RTV5,AAGAB_S310_Q6PD74,AAGAB_S311_Q6PD74,AAK1_S20_Q2M2I8,AAK1_S21_Q2M2I8,AAK1_S26_Q2M2I8,AAK1_S618_Q2M2I8,...,ZZEF1_S1501_O43149,ZZEF1_S1518_O43149,ZZEF1_S1537_O43149,ZZEF1_S1540_O43149,ZZEF1_T1521_O43149,ZZZ3_S113_Q8IYH5,ZZZ3_S391_Q8IYH5,ZZZ3_S606_Q8IYH5,ZZZ3_S90_Q8IYH5,ZZZ3_S91_Q8IYH5
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,


When flatteing levels , NaNs and empty strings will automitically be dropped.

In [12]:
phospho_and_CNV = en.join_omics_to_omics(df1_name="CNV", df2_name="phosphoproteomics")
phospho_and_CNV.head()

# Note that the CNV columns all have empty strings in the "Site" level of the columns,
# since the CNV data doesn't have any values for that.



Name,A1BG_CNV,A1BG-AS1_CNV,A1CF_CNV,A2M_CNV,A2M-AS1_CNV,A2ML1_CNV,A2MP1_CNV,A3GALT2_CNV,A4GALT_CNV,A4GNT_CNV,...,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics,ZZZ3_phosphoproteomics
Site,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,...,S397,S411,S420,S424,S426,S468,S89,T415,T418,Y399
Patient_ID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
C3L-00006,0.0,0.0,-0.01,0.0,0.0,0.0,0.0,0.0,0.01,-0.01,...,0.184,,,,-0.205,,,,,
C3L-00008,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,-0.171,,,-0.393,-0.171,,0.29,,0.1605,-0.0635
C3L-00032,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,,,,,,,,,,
C3L-00090,-0.04,-0.04,0.36,-0.05,-0.05,-0.05,-0.05,-0.04,-0.04,-0.05,...,0.1397,,,,-0.559,,,,,0.298
C3L-00098,0.85,0.85,0.16,-0.41,-0.41,-0.41,-0.41,-0.12,0.15,0.19,...,-0.15875,,,0.196,0.06175,,,,,-0.29


In [13]:
ut.reduce_multiindex(df=phospho_and_CNV, flatten=True).head()
# Notice that the empty strings have been dropped

Name,A1BG_CNV,A1BG-AS1_CNV,A1CF_CNV,A2M_CNV,A2M-AS1_CNV,A2ML1_CNV,A2MP1_CNV,A3GALT2_CNV,A4GALT_CNV,A4GNT_CNV,...,ZZZ3_phosphoproteomics_S397,ZZZ3_phosphoproteomics_S411,ZZZ3_phosphoproteomics_S420,ZZZ3_phosphoproteomics_S424,ZZZ3_phosphoproteomics_S426,ZZZ3_phosphoproteomics_S468,ZZZ3_phosphoproteomics_S89,ZZZ3_phosphoproteomics_T415,ZZZ3_phosphoproteomics_T418,ZZZ3_phosphoproteomics_Y399
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
C3L-00006,0.0,0.0,-0.01,0.0,0.0,0.0,0.0,0.0,0.01,-0.01,...,0.184,,,,-0.205,,,,,
C3L-00008,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,-0.171,,,-0.393,-0.171,,0.29,,0.1605,-0.0635
C3L-00032,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,,,,,,,,,,
C3L-00090,-0.04,-0.04,0.36,-0.05,-0.05,-0.05,-0.05,-0.04,-0.04,-0.05,...,0.1397,,,,-0.559,,,,,0.298
C3L-00098,0.85,0.85,0.16,-0.41,-0.41,-0.41,-0.41,-0.12,0.15,0.19,...,-0.15875,,,0.196,0.06175,,,,,-0.29


## Getting a single level index of tuples

You can also use `reduce_multiindex` to turn the multi-level column index into a single level index of tuples, with each value in a column's tuple corresponding to the column's value for that level of the index:

In [14]:
ut.reduce_multiindex(df=phospho, tuples=True).head()

Unnamed: 0_level_0,"(AAAS, S495, Q9NRG9)","(AAAS, S525, Q9NRG9)","(AAAS, S541, Q9NRG9)","(AAED1, S12, Q7RTV5)","(AAGAB, S310, Q6PD74)","(AAGAB, S311, Q6PD74)","(AAK1, S20, Q2M2I8)","(AAK1, S21, Q2M2I8)","(AAK1, S26, Q2M2I8)","(AAK1, S618, Q2M2I8)",...,"(ZZEF1, S1501, O43149)","(ZZEF1, S1518, O43149)","(ZZEF1, S1537, O43149)","(ZZEF1, S1540, O43149)","(ZZEF1, T1521, O43149)","(ZZZ3, S113, Q8IYH5)","(ZZZ3, S391, Q8IYH5)","(ZZZ3, S606, Q8IYH5)","(ZZZ3, S90, Q8IYH5)","(ZZZ3, S91, Q8IYH5)"
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,


## Turning off warnings

If your multiindex operation creates duplicate column headers, or has no effect, `reduce_multiindex` will warn you. You can silence these warnings by passing `True` to the `quiet` parameter:

In [15]:
ut.reduce_multiindex(df=phospho, levels_to_drop=["Name", "Database_ID"]).head()



Site,S495,S525,S541,S12,S310,S311,S20,S21,S26,S618,...,S1501,S1518,S1537,S1540,T1521,S113,S391,S606,S90,S91
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,


In [16]:
# No warning will be issued
ut.reduce_multiindex(df=phospho, levels_to_drop=["Name", "Database_ID"], quiet=True).head()

Site,S495,S525,S541,S12,S310,S311,S20,S21,S26,S618,...,S1501,S1518,S1537,S1540,T1521,S113,S391,S606,S90,S91
Patient_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
01CO005,,,-0.24,-0.46,,,-0.231,,,,...,,-0.675,-1.404,-1.404,,-0.572,,0.205,,
01CO006,-0.365,,,-0.424,-0.015,-0.015,,-0.485,,,...,,-0.2875,0.222,0.222,-0.701,0.624,,,,
01CO008,0.725,0.137,0.137,,,,,,,,...,-0.147,-0.147,,,,,,,,-0.03
01CO013,0.2265,,,-1.278,0.403,0.075,-0.223,-0.701,,,...,-0.041,0.043,0.554,0.554,0.127,1.263,,,,
01CO014,0.56,,,-0.382,,,,-0.259,,,...,,-0.129,-0.919,-0.919,,0.032,,0.026,,
