#### Pandas Part 69: Advanced DataFrame Operations

This notebook covers various advanced DataFrame operations including reshaping, combining/joining/merging, time series operations, and kernel density estimation (KDE) plotting.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### 1. Reshaping and Pivot Operations

Pandas provides several methods to reshape DataFrames:

In [None]:
# Create a sample DataFrame with MultiIndex
arrays = [
    ['A', 'A', 'B', 'B'],
    [1, 2, 1, 2]
]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame({'value': [1, 2, 3, 4]}, index=index)
df

### Stack and Unstack

Stack moves data from columns to index levels, while unstack does the opposite.

In [None]:
# Create a simple DataFrame
df2 = pd.DataFrame({
    'A': ['a', 'b', 'a', 'b'],
    'B': [1, 2, 3, 4],
    'C': [10, 20, 30, 40]
})
df2

In [None]:
# Set 'A' as index
df2_indexed = df2.set_index('A')
df2_indexed

In [None]:
# Stack the DataFrame
stacked = df2_indexed.stack()
stacked

In [None]:
# Unstack the stacked DataFrame
stacked.unstack()

### Melt

Melt transforms a DataFrame from wide to long format.

In [None]:
# Create a wide format DataFrame
wide_df = pd.DataFrame({
    'name': ['John', 'Mary', 'Bob'],
    'math': [90, 85, 92],
    'science': [88, 95, 85],
    'history': [76, 82, 89]
})
wide_df

In [None]:
# Melt the DataFrame to long format
melted_df = wide_df.melt(id_vars=['name'], value_vars=['math', 'science', 'history'],
                         var_name='subject', value_name='score')
melted_df

### Explode

Explode transforms each element of a list-like to a row.

In [None]:
# Create a DataFrame with list values
df_list = pd.DataFrame({
    'A': [[1, 2, 3], [4, 5], [6]],
    'B': 1
})
df_list

In [None]:
# Explode column A
df_list.explode('A')

##### 2. Combining/Joining/Merging

Pandas provides several methods to combine DataFrames:

In [None]:
# Create two sample DataFrames
df1 = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

df2 = pd.DataFrame({
    'id': [1, 2, 3, 5],
    'salary': [50000, 60000, 70000, 80000]
})

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)

### Merge

Merge combines DataFrames based on common columns or indices.

In [None]:
# Inner merge
pd.merge(df1, df2, on='id', how='inner')

In [None]:
# Left merge
pd.merge(df1, df2, on='id', how='left')

In [None]:
# Right merge
pd.merge(df1, df2, on='id', how='right')

In [None]:
# Outer merge
pd.merge(df1, df2, on='id', how='outer')

### Join

Join combines columns from another DataFrame.

In [None]:
# Set index for both DataFrames
df1_indexed = df1.set_index('id')
df2_indexed = df2.set_index('id')

# Join the DataFrames
df1_indexed.join(df2_indexed)

### Append

Append adds rows from another DataFrame.

In [None]:
# Create two DataFrames with the same columns
df_a = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df_b = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Append df_b to df_a
df_a.append(df_b)

##### 3. Time Series Operations

Pandas provides several methods for time series data:

In [None]:
# Create a time series DataFrame
dates = pd.date_range('2023-01-01', periods=6)
ts_df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
ts_df

### Shift

Shift moves data forward or backward by a specified number of periods.

In [None]:
# Shift data forward by 2 periods
ts_df.shift(2)

In [None]:
# Shift data backward by 1 period
ts_df.shift(-1)

### Resample

Resample changes the frequency of time series data.

In [None]:
# Resample to 2-day frequency
ts_df.resample('2D').mean()

In [None]:
# Resample to month-end frequency
ts_df.resample('M').sum()

### asfreq

asfreq converts a time series to a specified frequency.

In [None]:
# Convert to business day frequency
ts_df.asfreq('B')

##### 4. Kernel Density Estimation (KDE) Plotting

KDE is a non-parametric way to estimate the probability density function of a random variable.

In [None]:
# Create a Series for KDE plotting
s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
s

In [None]:
# Plot KDE with default settings
ax = s.plot.kde()
plt.title('KDE Plot with Default Bandwidth')

In [None]:
# Plot KDE with small bandwidth (potential over-fitting)
ax = s.plot.kde(bw_method=0.3)
plt.title('KDE Plot with Small Bandwidth (0.3)')

In [None]:
# Plot KDE with large bandwidth (potential under-fitting)
ax = s.plot.kde(bw_method=3)
plt.title('KDE Plot with Large Bandwidth (3)')

In [None]:
# Plot KDE with specific evaluation points
ax = s.plot.kde(ind=[1, 2, 3, 4, 5])
plt.title('KDE Plot with Specific Evaluation Points')

### KDE with DataFrame

In [None]:
# Create a DataFrame for KDE plotting
df_kde = pd.DataFrame({
    'x': [1, 2, 2.5, 3, 3.5, 4, 5],
    'y': [4, 4, 4.5, 5, 5.5, 6, 6],
})
df_kde

In [None]:
# Plot KDE for DataFrame with default settings
ax = df_kde.plot.kde()
plt.title('KDE Plot for DataFrame with Default Bandwidth')

In [None]:
# Plot KDE for DataFrame with small bandwidth
ax = df_kde.plot.kde(bw_method=0.3)
plt.title('KDE Plot for DataFrame with Small Bandwidth (0.3)')

In [None]:
# Plot KDE for DataFrame with large bandwidth
ax = df_kde.plot.kde(bw_method=3)
plt.title('KDE Plot for DataFrame with Large Bandwidth (3)')

In [None]:
# Plot KDE for DataFrame with specific evaluation points
ax = df_kde.plot.kde(ind=[1, 2, 3, 4, 5, 6])
plt.title('KDE Plot for DataFrame with Specific Evaluation Points')