#### Pandas Tutorial - Part 59: DataFrame Methods (cumsum, describe, equals, eval)

This notebook covers several important DataFrame methods including:
- `cumsum()` - Return cumulative sum over a DataFrame axis
- `describe()` - Generate descriptive statistics
- `equals()` - Test whether two objects contain the same elements
- `eval()` - Evaluate a string describing operations on DataFrame columns

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### 1. DataFrame.cumsum()

The `cumsum()` method returns the cumulative sum over a DataFrame or Series axis.

In [None]:
# Create a Series with some values
s = pd.Series([2, np.nan, 5, -1, 0])
print("Original Series:")
s

In [None]:
# Calculate cumulative sum (by default, NA values are ignored)
print("Cumulative sum (skipna=True):")
s.cumsum()

In [None]:
# Calculate cumulative sum including NA values
print("Cumulative sum (skipna=False):")
s.cumsum(skipna=False)

In [None]:
# Create a DataFrame with some values
df = pd.DataFrame([
    [2.0, 1.0],
    [3.0, np.nan],
    [1.0, 0.0]
], columns=list('AB'))
print("Original DataFrame:")
df

In [None]:
# Calculate cumulative sum along index (rows)
print("Cumulative sum along index (axis=0):")
df.cumsum()

In [None]:
# Calculate cumulative sum along columns
print("Cumulative sum along columns (axis=1):")
df.cumsum(axis=1)

In [None]:
# Create a DataFrame with more data
df2 = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [-1, -2, -3, -4, -5]
})
print("DataFrame with more data:")
print(df2)

print("\nCumulative sum along index:")
print(df2.cumsum())

In [None]:
# Visualize the cumulative sum
df2.cumsum().plot(figsize=(10, 6))
plt.title('Cumulative Sum')
plt.xlabel('Index')
plt.ylabel('Cumulative Sum')
plt.grid(True)
plt.show()

##### 2. DataFrame.describe()

The `describe()` method generates descriptive statistics that summarize the central tendency, dispersion, and shape of a dataset's distribution.

In [None]:
# Create a DataFrame with numeric data
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 50, 10, 20, 30]
})
print("DataFrame with numeric data:")
df

In [None]:
# Generate descriptive statistics
print("Descriptive statistics:")
df.describe()

In [None]:
# Create a DataFrame with mixed data types
df_mixed = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'C': [True, False, True, True, False],
    'D': pd.date_range('20200101', periods=5)
})
print("DataFrame with mixed data types:")
df_mixed

In [None]:
# By default, describe() only includes numeric columns
print("Default describe() (numeric columns only):")
df_mixed.describe()

In [None]:
# Include all columns
print("describe() with include='all':")
df_mixed.describe(include='all')

In [None]:
# Include only object columns
print("describe() with include=['object']:")
df_mixed.describe(include=['object'])

In [None]:
# Include only numeric columns (explicitly)
print("describe() with include=[np.number]:")
df_mixed.describe(include=[np.number])

In [None]:
# Exclude numeric columns
print("describe() with exclude=[np.number]:")
df_mixed.describe(exclude=[np.number])

In [None]:
# Customize percentiles
print("describe() with custom percentiles:")
df.describe(percentiles=[0.05, 0.25, 0.5, 0.75, 0.95])

##### 3. DataFrame.equals()

The `equals()` method tests whether two objects contain the same elements.

In [None]:
# Create a DataFrame
df = pd.DataFrame({1: [10], 2: [20]})
print("Original DataFrame:")
df

In [None]:
# Create an identical DataFrame
exactly_equal = pd.DataFrame({1: [10], 2: [20]})
print("Identical DataFrame:")
exactly_equal

# Test equality
print("\nAre they equal?", df.equals(exactly_equal))

In [None]:
# Create a DataFrame with different column types but same values
different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
print("DataFrame with different column types:")
different_column_type

# Test equality
print("\nAre they equal?", df.equals(different_column_type))

In [None]:
# Create a DataFrame with different data types
different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
print("DataFrame with different data types:")
different_data_type

# Test equality
print("\nAre they equal?", df.equals(different_data_type))

In [None]:
# Create a DataFrame with different values
different_values = pd.DataFrame({1: [11], 2: [20]})
print("DataFrame with different values:")
different_values

# Test equality
print("\nAre they equal?", df.equals(different_values))

In [None]:
# Create a DataFrame with different index
different_index = pd.DataFrame({1: [10], 2: [20]}, index=[1])
print("DataFrame with different index:")
different_index

# Test equality
print("\nAre they equal?", df.equals(different_index))

In [None]:
# Test equality with Series
s1 = pd.Series([1, 2, 3])
s2 = pd.Series([1, 2, 3])
s3 = pd.Series([1, 2, 4])

print("s1 equals s2:", s1.equals(s2))
print("s1 equals s3:", s1.equals(s3))

##### 4. DataFrame.eval()

The `eval()` method evaluates a string describing operations on DataFrame columns.

In [None]:
# Create a DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 60, 10),
    'C': range(100, 600, 100)
})
print("Original DataFrame:")
df

In [None]:
# Simple arithmetic expression
print("A + B:")
df.eval('A + B')

In [None]:
# More complex expression
print("A + B * C:")
df.eval('A + B * C')

In [None]:
# Comparison expression
print("A < B:")
df.eval('A < B')

In [None]:
# Assignment expression (create a new column)
print("Create new column 'D' = A + B:")
df.eval('D = A + B', inplace=True)
df

In [None]:
# Assignment expression (modify existing column)
print("Modify column 'D' = A * B:")
df.eval('D = A * B', inplace=True)
df

In [None]:
# Using local variables in the expression
x = 10
print(f"Using local variable x = {x}:")
df.eval('A + @x')

In [None]:
# Multiple expressions separated by semicolons
print("Multiple expressions:")
df.eval("""
E = A + C
F = B - A
""", inplace=True)
df

In [None]:
# Using boolean operators
print("Boolean expression:")
df.eval('(A > 3) & (B < 50)')

##### Summary

In this notebook, we've explored several important DataFrame methods:

1. **cumsum()**: Returns the cumulative sum over a DataFrame or Series axis
2. **describe()**: Generates descriptive statistics summarizing the central tendency, dispersion, and shape of a dataset's distribution
3. **equals()**: Tests whether two objects contain the same elements
4. **eval()**: Evaluates a string describing operations on DataFrame columns

These methods are essential for calculating cumulative statistics, understanding data distributions, comparing data structures, and performing operations on DataFrame columns in pandas.