## **Pandas Library - Continuation**



In [14]:
import numpy as np
import pandas as pd
from numpy.random import randn

* **Building a data structure**

Thw `MultiIndex` is a code that joins two lists in pairs and transforms them into a hierarchical index (MultiIndex) in pandas.

In [30]:
outs = ['p1', 'p1', 'p1', 'p2', 'p2','p2']
incs = [1, 2, 3, 1, 2, 3]
index = list(zip(outs, incs))  # [('p1', 1), ('p1', 2), ('p1', 3), ('p2', 1), ('p2', 2), ('p2', 3)]
index = pd.MultiIndex.from_tuples(index)

In [31]:
index

MultiIndex([('p1', 1),
            ('p1', 2),
            ('p1', 3),
            ('p2', 1),
            ('p2', 2),
            ('p2', 3)],
           )

In [33]:
df = pd.DataFrame(randn(6,2), index,['Z','Y'])
df


Unnamed: 0,Unnamed: 1,Z,Y
p1,1,-0.02587,0.615694
p1,2,1.266278,-0.628414
p1,3,0.468007,0.241864
p2,1,-0.743729,-0.183771
p2,2,0.24835,0.862016
p2,3,-1.044059,0.608146


In [None]:
# To Locate
df.loc['p1'].loc[3]

Z    0.468007
Y    0.241864
Name: 3, dtype: float64

In [48]:
df.index.names = ['Groups', 'List']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Z,Y
Groups,List,Unnamed: 2_level_1,Unnamed: 3_level_1
p1,1,-0.02587,0.615694
p1,2,1.266278,-0.628414
p1,3,0.468007,0.241864
p2,1,-0.743729,-0.183771
p2,2,0.24835,0.862016
p2,3,-1.044059,0.608146


In [49]:
df.loc['p1']

Unnamed: 0_level_0,Z,Y
List,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.02587,0.615694
2,1.266278,-0.628414
3,0.468007,0.241864


In [51]:
df.loc['p2']

Unnamed: 0_level_0,Z,Y
List,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.743729,-0.183771
2,0.24835,0.862016
3,-1.044059,0.608146


In [52]:
df.loc['p2'].loc[2]

Z    0.248350
Y    0.862016
Name: 2, dtype: float64

In [53]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Z,Y
Groups,List,Unnamed: 2_level_1,Unnamed: 3_level_1
p1,1,-0.02587,0.615694
p1,2,1.266278,-0.628414
p1,3,0.468007,0.241864
p2,1,-0.743729,-0.183771
p2,2,0.24835,0.862016
p2,3,-1.044059,0.608146


*  **`df.xs()`(cross section):** is used in pandas to select data from a specific level of a MultiIndex.
It lets you quickly "slice" rows or columns based on index values. For example: `df.xs('p1', level='letter')` selects all rows where the index at level **"letter" equals "p1"**.

In [55]:
# Another function: df.xs "cross section"
df.xs

<bound method NDFrame.xs of                     Z         Y
Groups List                    
p1     1    -0.025870  0.615694
       2     1.266278 -0.628414
       3     0.468007  0.241864
p2     1    -0.743729 -0.183771
       2     0.248350  0.862016
       3    -1.044059  0.608146>

In [58]:
df.xs('p1')

Unnamed: 0_level_0,Z,Y
List,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.02587,0.615694
2,1.266278,-0.628414
3,0.468007,0.241864


In [61]:
df.index.names = ['Groups', 'Num']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Z,Y
Groups,Num,Unnamed: 2_level_1,Unnamed: 3_level_1
p1,1,-0.02587,0.615694
p1,2,1.266278,-0.628414
p1,3,0.468007,0.241864
p2,1,-0.743729,-0.183771
p2,2,0.24835,0.862016
p2,3,-1.044059,0.608146


In [62]:
df.xs(1,level='Num')

Unnamed: 0_level_0,Z,Y
Groups,Unnamed: 1_level_1,Unnamed: 2_level_1
p1,-0.02587,0.615694
p2,-0.743729,-0.183771
