#### Pandas Tutorial - Part 44

This notebook covers various Series methods including:
- Dropping levels with `droplevel()`
- Handling missing values with `dropna()`
- Interpolating missing values with `interpolate()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Working with MultiIndex Levels

MultiIndex (hierarchical index) allows for multiple levels of indexing in pandas objects.

### The `droplevel()` Method

The `droplevel()` method removes specified levels from a MultiIndex.

In [2]:
# Create a DataFrame with MultiIndex
df = pd.DataFrame([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
]).set_index([0, 1]).rename_axis(['a', 'b'])

# Set MultiIndex for columns
df.columns = pd.MultiIndex.from_tuples([
   ('c', 'e'), ('d', 'f')
], names=['level_1', 'level_2'])

print("DataFrame with MultiIndex:")
print(df)

DataFrame with MultiIndex:
level_1   c   d
level_2   e   f
a b            
1 2       3   4
5 6       7   8
9 10     11  12


In [3]:
# Drop level from index
df_drop_a = df.droplevel('a')
print("DataFrame after dropping level 'a' from index:")
print(df_drop_a)

DataFrame after dropping level 'a' from index:
level_1   c   d
level_2   e   f
b              
2         3   4
6         7   8
10       11  12


In [4]:
# Drop level from columns
df_drop_level2 = df.droplevel('level_2', axis=1)
print("DataFrame after dropping level 'level_2' from columns:")
print(df_drop_level2)

DataFrame after dropping level 'level_2' from columns:
level_1   c   d
a b            
1 2       3   4
5 6       7   8
9 10     11  12


In [5]:
# Create a Series with MultiIndex
midx = pd.MultiIndex.from_tuples([('a', 'x'), ('a', 'y'), ('b', 'x'), ('b', 'y')],
                                names=['level_1', 'level_2'])
s = pd.Series([1, 2, 3, 4], index=midx)
print("Series with MultiIndex:")
print(s)

Series with MultiIndex:
level_1  level_2
a        x          1
         y          2
b        x          3
         y          4
dtype: int64


In [6]:
# Drop level from Series index
s_drop_level1 = s.droplevel('level_1')
print("Series after dropping level 'level_1':")
print(s_drop_level1)

Series after dropping level 'level_1':
level_2
x    1
y    2
x    3
y    4
dtype: int64


In [7]:
# Drop level by position
s_drop_pos = s.droplevel(0)
print("Series after dropping level at position 0:")
print(s_drop_pos)

Series after dropping level at position 0:
level_2
x    1
y    2
x    3
y    4
dtype: int64


##### Handling Missing Values

Pandas provides methods to handle missing values (NaN) in a Series.

### The `dropna()` Method

The `dropna()` method removes missing values from a Series.

In [8]:
# Create a Series with missing values
ser = pd.Series([1., 2., np.nan, 4., np.nan, 6.])
print("Series with missing values:")
print(ser)

Series with missing values:
0    1.0
1    2.0
2    NaN
3    4.0
4    NaN
5    6.0
dtype: float64


In [9]:
# Drop NA values
ser_no_na = ser.dropna()
print("Series with NA values dropped:")
print(ser_no_na)

Series with NA values dropped:
0    1.0
1    2.0
3    4.0
5    6.0
dtype: float64


In [10]:
# Drop NA values in-place
ser_inplace = ser.copy()
ser_inplace.dropna(inplace=True)
print("Series after in-place dropping NA values:")
print(ser_inplace)

Series after in-place dropping NA values:
0    1.0
1    2.0
3    4.0
5    6.0
dtype: float64


In [11]:
# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4],
    'C': [1, 2, 3, np.nan]
})
print("DataFrame with missing values:")
print(df)

DataFrame with missing values:
     A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  2.0
2  NaN  3.0  3.0
3  4.0  4.0  NaN


In [12]:
# Drop rows with any NA values
df_no_na = df.dropna()
print("DataFrame with rows containing any NA values dropped:")
print(df_no_na)

DataFrame with rows containing any NA values dropped:
     A    B    C
1  2.0  2.0  2.0


In [13]:
# Drop rows with all NA values
df_all_na = df.dropna(how='all')
print("DataFrame with rows containing all NA values dropped:")
print(df_all_na)

DataFrame with rows containing all NA values dropped:
     A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  2.0
2  NaN  3.0  3.0
3  4.0  4.0  NaN


In [14]:
# Drop columns with any NA values
df_cols_na = df.dropna(axis=1)
print("DataFrame with columns containing any NA values dropped:")
print(df_cols_na)

DataFrame with columns containing any NA values dropped:
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3]


##### Interpolating Missing Values

Pandas provides methods to interpolate missing values in a Series.

### The `interpolate()` Method

The `interpolate()` method fills missing values using various interpolation methods.

In [15]:
# Create a Series with missing values
s = pd.Series([0, 1, np.nan, 3])
print("Series with missing values:")
print(s)

Series with missing values:
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64


In [16]:
# Linear interpolation (default)
s_linear = s.interpolate()
print("Series with linear interpolation:")
print(s_linear)

Series with linear interpolation:
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64


In [17]:
# Create a Series with multiple missing values
s_multi = pd.Series([0, 1, np.nan, np.nan, 4, 5, np.nan, 7])
print("Series with multiple missing values:")
print(s_multi)

Series with multiple missing values:
0    0.0
1    1.0
2    NaN
3    NaN
4    4.0
5    5.0
6    NaN
7    7.0
dtype: float64


In [18]:
# Linear interpolation
s_multi_linear = s_multi.interpolate()
print("Series with linear interpolation:")
print(s_multi_linear)

Series with linear interpolation:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64


In [19]:
# Different interpolation methods
methods = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic']
for method in methods:
    try:
        s_interp = s_multi.interpolate(method=method)
        print(f"\nInterpolation method: {method}")
        print(s_interp)
    except:
        print(f"\nMethod {method} requires scipy")


Interpolation method: linear
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Interpolation method: nearest
0    0.0
1    1.0
2    1.0
3    4.0
4    4.0
5    5.0
6    5.0
7    7.0
dtype: float64

Interpolation method: zero
0    0.0
1    1.0
2    1.0
3    1.0
4    4.0
5    5.0
6    5.0
7    7.0
dtype: float64

Interpolation method: slinear
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Interpolation method: quadratic
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Interpolation method: cubic
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64


In [20]:
# Limit the number of consecutive NaNs to fill
s_limit = s_multi.interpolate(limit=1)
print("Series with interpolation limited to 1 consecutive NaN:")
print(s_limit)

Series with interpolation limited to 1 consecutive NaN:
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64


In [21]:
# Limit direction
s_forward = s_multi.interpolate(limit=1, limit_direction='forward')
print("Series with forward interpolation:")
print(s_forward)

s_backward = s_multi.interpolate(limit=1, limit_direction='backward')
print("\nSeries with backward interpolation:")
print(s_backward)

s_both = s_multi.interpolate(limit=1, limit_direction='both')
print("\nSeries with both-direction interpolation:")
print(s_both)

Series with forward interpolation:
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Series with backward interpolation:
0    0.0
1    1.0
2    NaN
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Series with both-direction interpolation:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64


In [22]:
# Limit area
s_inside = s_multi.interpolate(limit=1, limit_area='inside')
print("Series with interpolation limited to inside:")
print(s_inside)

s_outside = s_multi.interpolate(limit=1, limit_area='outside')
print("\nSeries with interpolation limited to outside:")
print(s_outside)

Series with interpolation limited to inside:
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64

Series with interpolation limited to outside:
0    0.0
1    1.0
2    NaN
3    NaN
4    4.0
5    5.0
6    NaN
7    7.0
dtype: float64


In [23]:
# Interpolate in-place
s_inplace = s_multi.copy()
s_inplace.interpolate(inplace=True)
print("Series after in-place interpolation:")
print(s_inplace)

Series after in-place interpolation:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
dtype: float64


In [24]:
# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [np.nan, 2, 3, 4],
    'C': [1, 2, 3, np.nan]
})
print("DataFrame with missing values:")
print(df)

DataFrame with missing values:
     A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  2.0
2  NaN  3.0  3.0
3  4.0  4.0  NaN


In [25]:
# Interpolate DataFrame
df_interp = df.interpolate()
print("DataFrame with interpolated values:")
print(df_interp)

DataFrame with interpolated values:
     A    B    C
0  1.0  NaN  1.0
1  2.0  2.0  2.0
2  3.0  3.0  3.0
3  4.0  4.0  3.0


##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. Working with MultiIndex levels using `droplevel()` to remove specific levels from a MultiIndex.
2. Handling missing values with `dropna()` to remove rows or columns with missing values.
3. Interpolating missing values with `interpolate()` using various methods and options.

These methods are essential tools for data manipulation and cleaning in pandas, allowing for flexible and powerful operations on your data.