# Advanced Pandas Techniques

This notebook covers advanced Pandas features and techniques for more complex data manipulation and analysis.

In [1]:
import pandas as pd
import numpy as np

print("Pandas imported successfully!")
print(f"Pandas version: {pd.__version__}")

Pandas imported successfully!
Pandas version: 2.3.0


## Advanced Indexing with MultiIndex

MultiIndex allows you to have multiple levels of indexing, enabling hierarchical data structures for complex data organization.

In [2]:
# MultiIndex Examples
import pandas as pd
import numpy as np

# Create MultiIndex from arrays
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['X', 'Y'])
print("DataFrame with MultiIndex:")
print(df)

# Create MultiIndex from tuples
tuples = [('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')]
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df_tuples = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['X', 'Y'])
print("\nDataFrame with MultiIndex from tuples:")
print(df_tuples)

# Accessing data with MultiIndex
print("\nAccessing level 0 'A':")
print(df.loc['A'])

print("\nAccessing specific combination ('A', 'one'):")
print(df.loc[('A', 'one')])

# Cross-section with xs()
print("\nCross-section for second level 'one':")
print(df.xs('one', level='second'))

# MultiIndex for columns
columns = pd.MultiIndex.from_arrays([['Math', 'Math', 'Science', 'Science'], ['Midterm', 'Final', 'Midterm', 'Final']])
df_multi_col = pd.DataFrame(np.random.randint(50, 100, (3, 4)), columns=columns)
print("\nDataFrame with MultiIndex columns:")
print(df_multi_col)

# Accessing MultiIndex columns
print("\nAccessing Math scores:")
print(df_multi_col['Math'])

print("\nAccessing Midterm scores for all subjects:")
print(df_multi_col.xs('Midterm', axis=1, level=1))

# Stacking and unstacking
stacked = df.stack()
print("\nStacked DataFrame:")
print(stacked)

unstacked = stacked.unstack()
print("\nUnstacked DataFrame:")
print(unstacked)

# Grouping with MultiIndex
df_multi = df.reset_index()
grouped = df_multi.groupby(['first', 'second']).mean()
print("\nGrouped by MultiIndex levels:")
print(grouped)

DataFrame with MultiIndex:
                     X         Y
first second                    
A     one    -0.410269 -0.434187
      two    -0.831213  1.315004
B     one    -0.554495 -0.293842
      two     0.243295  0.448708

DataFrame with MultiIndex from tuples:
                     X         Y
first second                    
A     one     1.055299 -2.013790
      two    -0.065415 -0.628388
B     one    -1.778786 -0.902640
      two     1.028609  1.390480

Accessing level 0 'A':
               X         Y
second                    
one    -0.410269 -0.434187
two    -0.831213  1.315004

Accessing specific combination ('A', 'one'):
X   -0.410269
Y   -0.434187
Name: (A, one), dtype: float64

Cross-section for second level 'one':
              X         Y
first                    
A     -0.410269 -0.434187
B     -0.554495 -0.293842

DataFrame with MultiIndex columns:
     Math       Science      
  Midterm Final Midterm Final
0      70    81      98    82
1      85    55      59    58
2