#### Pandas Tutorial - Part 58: DataFrame Methods (between_time, bfill, bool, boxplot, cummin, cumprod)

This notebook covers several important DataFrame methods including:
- `between_time()` - Select values between particular times of the day
- `bfill()` - Backward fill missing values
- `bool()` - Return the boolean value of a single element
- `boxplot()` - Create box plots from DataFrame columns
- `cummin()` - Return cumulative minimum over a DataFrame axis
- `cumprod()` - Return cumulative product over a DataFrame axis

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

##### 1. DataFrame.between_time()

The `between_time()` method selects values between particular times of the day (e.g., 9:00-9:30 AM).

In [None]:
# Create a DataFrame with DatetimeIndex
i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
print("DataFrame with DatetimeIndex:")
ts

In [None]:
# Select values between 0:15 and 0:45
print("Values between 0:15 and 0:45:")
ts.between_time('0:15', '0:45')

In [None]:
# Select values NOT between 0:15 and 0:45 by reversing the order
print("Values NOT between 0:15 and 0:45:")
ts.between_time('0:45', '0:15')

In [None]:
# Create a more detailed DataFrame with different times
i = pd.date_range('2018-04-09', periods=24, freq='1H')
ts2 = pd.DataFrame({'A': range(24)}, index=i)
print("More detailed DataFrame (first 5 rows):")
ts2.head()

In [None]:
# Select values between 9:00 and 17:00 (business hours)
print("Business hours (9:00-17:00):")
ts2.between_time('9:00', '17:00')

In [None]:
# Select values between 22:00 and 5:00 (night hours)
print("Night hours (22:00-5:00):")
ts2.between_time('22:00', '5:00')

##### 2. DataFrame.bfill()

The `bfill()` method is a synonym for `fillna(method='bfill')` and fills missing values using the next valid observation.

In [None]:
# Create a DataFrame with NaN values
df = pd.DataFrame({
    'A': [1, np.nan, 3, np.nan, 5],
    'B': [np.nan, 2, np.nan, 4, np.nan]
})
print("Original DataFrame with NaN values:")
df

In [None]:
# Fill NaN values using backward fill (bfill)
print("DataFrame after bfill():")
df.bfill()

In [None]:
# Compare with forward fill (ffill)
print("DataFrame after ffill():")
df.ffill()

In [None]:
# Fill NaN values along columns (axis=1)
print("DataFrame after bfill(axis=1):")
df.bfill(axis=1)

In [None]:
# Limit the number of consecutive fills
df2 = pd.DataFrame({
    'A': [1, np.nan, np.nan, np.nan, 5],
    'B': [np.nan, 2, np.nan, np.nan, np.nan]
})
print("Original DataFrame with consecutive NaN values:")
print(df2)

print("\nDataFrame after bfill(limit=1):")
print(df2.bfill(limit=1))

##### 3. DataFrame.bool()

The `bool()` method returns the boolean value of a single element PandasObject. The object must contain exactly one element, and that element must be boolean.

In [None]:
# Create a single-element DataFrame with a boolean value
df_true = pd.DataFrame([True])
print("Single-element DataFrame with True:")
df_true

In [None]:
# Get the boolean value
print("Boolean value:")
bool_value = df_true.bool()
print(bool_value)
print("Type:", type(bool_value))

In [None]:
# Create a single-element DataFrame with a False value
df_false = pd.DataFrame([False])
print("Single-element DataFrame with False:")
print(df_false)
print("Boolean value:", df_false.bool())

In [None]:
# This will raise a ValueError because the DataFrame has more than one element
try:
    df = pd.DataFrame([True, False])
    df.bool()
except ValueError as e:
    print(f"Error: {e}")

In [None]:
# This will raise a ValueError because the element is not boolean
try:
    df = pd.DataFrame([1])
    df.bool()
except ValueError as e:
    print(f"Error: {e}")

##### 4. DataFrame.boxplot()

The `boxplot()` method creates box plots from DataFrame columns, optionally grouped by some other columns.

In [None]:
# Create a DataFrame for box plotting
df = pd.DataFrame({
    'A': np.random.normal(0, 1, 100),
    'B': np.random.normal(1, 2, 100),
    'C': np.random.normal(-1, 1.5, 100)
})
print("DataFrame for box plotting (first 5 rows):")
df.head()

In [None]:
# Create a basic box plot
df.boxplot()
plt.title('Basic Box Plot')
plt.ylabel('Values')
plt.show()

In [None]:
# Create a box plot for specific columns
df.boxplot(column=['A', 'C'])
plt.title('Box Plot for Columns A and C')
plt.ylabel('Values')
plt.show()

In [None]:
# Create a DataFrame with a categorical column
df2 = pd.DataFrame({
    'value': np.concatenate([np.random.normal(0, 1, 50), np.random.normal(2, 1, 50)]),
    'group': np.repeat(['A', 'B'], 50)
})
print("DataFrame with categorical column (first 5 rows):")
df2.head()

In [None]:
# Create a box plot grouped by a categorical column
df2.boxplot(column='value', by='group')
plt.title('Box Plot Grouped by Category')
plt.suptitle('')  # Remove the default suptitle
plt.ylabel('Values')
plt.show()

In [None]:
# Customize the box plot
df.boxplot(grid=False, rot=45, fontsize=10, figsize=(10, 6))
plt.title('Customized Box Plot')
plt.ylabel('Values')
plt.show()

##### 5. DataFrame.cummin()

The `cummin()` method returns the cumulative minimum over a DataFrame or Series axis.

In [None]:
# Create a Series with some values
s = pd.Series([2, np.nan, 5, -1, 0])
print("Original Series:")
s

In [None]:
# Calculate cumulative minimum (by default, NA values are ignored)
print("Cumulative minimum (skipna=True):")
s.cummin()

In [None]:
# Calculate cumulative minimum including NA values
print("Cumulative minimum (skipna=False):")
s.cummin(skipna=False)

In [None]:
# Create a DataFrame with some values
df = pd.DataFrame([
    [2.0, 1.0],
    [3.0, np.nan],
    [1.0, 0.0]
], columns=list('AB'))
print("Original DataFrame:")
df

In [None]:
# Calculate cumulative minimum along index (rows)
print("Cumulative minimum along index (axis=0):")
df.cummin()

In [None]:
# Calculate cumulative minimum along columns
print("Cumulative minimum along columns (axis=1):")
df.cummin(axis=1)

##### 6. DataFrame.cumprod()

The `cumprod()` method returns the cumulative product over a DataFrame or Series axis.

In [None]:
# Using the same Series as above
print("Original Series:")
s

In [None]:
# Calculate cumulative product (by default, NA values are ignored)
print("Cumulative product (skipna=True):")
s.cumprod()

In [None]:
# Calculate cumulative product including NA values
print("Cumulative product (skipna=False):")
s.cumprod(skipna=False)

In [None]:
# Using the same DataFrame as above
print("Original DataFrame:")
df

In [None]:
# Calculate cumulative product along index (rows)
print("Cumulative product along index (axis=0):")
df.cumprod()

In [None]:
# Calculate cumulative product along columns
print("Cumulative product along columns (axis=1):")
df.cumprod(axis=1)

In [None]:
# Create a DataFrame with positive and negative values
df2 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [1, -1, 1, -1]
})
print("DataFrame with positive and negative values:")
print(df2)

print("\nCumulative product:")
print(df2.cumprod())

##### Summary

In this notebook, we've explored several important DataFrame methods:

1. **between_time()**: Selects values between specific times of day from a time-indexed DataFrame
2. **bfill()**: Fills missing values using the next valid observation (backward fill)
3. **bool()**: Returns the boolean value of a single-element DataFrame
4. **boxplot()**: Creates box plots from DataFrame columns for visualizing data distributions
5. **cummin()**: Returns the cumulative minimum over a DataFrame or Series axis
6. **cumprod()**: Returns the cumulative product over a DataFrame or Series axis

These methods are essential for time-based filtering, handling missing data, visualization, and calculating cumulative statistics in pandas.