# Pandas Tutorial - Part 62: DataFrame Methods (max, mean, nlargest, notna, notnull)

This notebook covers several important DataFrame methods including:
- `max()` - Return the maximum of the values for the requested axis
- `mean()` - Return the mean of the values for the requested axis
- `nlargest()` - Return the first n rows ordered by columns in descending order
- `notna()` - Detect existing (non-missing) values
- `notnull()` - Alias of notna()

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

## 1. DataFrame.max()

The `max()` method returns the maximum of the values for the requested axis.

In [None]:
# Create a Series with MultiIndex
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']
], names=['blooded', 'animal'])

s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
print("Series with MultiIndex:")
s

In [None]:
# Find the maximum value
print("Maximum value:")
s.max()

In [None]:
# Find the maximum value by level
print("Maximum value by level 'blooded':")
s.max(level='blooded')

In [None]:
# Find the maximum value by level index
print("Maximum value by level 0:")
s.max(level=0)

In [None]:
# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [100, 50, 10, 20, 30]
})
print("DataFrame:")
df

In [None]:
# Find the maximum value for each column
print("Maximum value for each column:")
df.max()

In [None]:
# Find the maximum value for each row
print("Maximum value for each row:")
df.max(axis=1)

In [None]:
# Create a DataFrame with NaN values
df_with_na = pd.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': [10, np.nan, 30, 40, 50],
    'C': [100, 50, 10, np.nan, 30]
})
print("DataFrame with NaN values:")
df_with_na

In [None]:
# Find the maximum value for each column (skipna=True by default)
print("Maximum value for each column (skipna=True):")
df_with_na.max()

In [None]:
# Find the maximum value for each column (skipna=False)
print("Maximum value for each column (skipna=False):")
df_with_na.max(skipna=False)

## 2. DataFrame.mean()

The `mean()` method returns the mean of the values for the requested axis.

In [None]:
# Using the same DataFrame
print("DataFrame:")
df

In [None]:
# Calculate the mean for each column
print("Mean for each column:")
df.mean()

In [None]:
# Calculate the mean for each row
print("Mean for each row:")
df.mean(axis=1)

In [None]:
# Using the DataFrame with NaN values
print("DataFrame with NaN values:")
df_with_na

In [None]:
# Calculate the mean for each column (skipna=True by default)
print("Mean for each column (skipna=True):")
df_with_na.mean()

In [None]:
# Calculate the mean for each column (skipna=False)
print("Mean for each column (skipna=False):")
df_with_na.mean(skipna=False)

In [None]:
# Create a DataFrame with mixed data types
df_mixed = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': ['a', 'b', 'c', 'd', 'e'],
    'C': [True, False, True, True, False]
})
print("DataFrame with mixed data types:")
df_mixed

In [None]:
# Calculate the mean (numeric_only=None by default)
print("Mean (numeric_only=None):")
df_mixed.mean()

In [None]:
# Calculate the mean (numeric_only=True)
print("Mean (numeric_only=True):")
df_mixed.mean(numeric_only=True)

## 3. DataFrame.nlargest()

The `nlargest()` method returns the first n rows ordered by columns in descending order.

In [None]:
# Create a DataFrame for countries
df = pd.DataFrame({
    'population': [59000000, 65000000, 434000, 434000, 37800000],
    'GDP': [1937894, 2583560, 12128, 17036, 1493000],
    'alpha-2': ['IT', 'FR', 'BN', 'TL', 'CA']
}, index=['Italy', 'France', 'Brunei', 'Timor-Leste', 'Canada'])

print("Countries DataFrame:")
df

In [None]:
# Get the 3 largest countries by population
print("3 largest countries by population:")
df.nlargest(3, 'population')

In [None]:
# Get the 3 largest countries by GDP
print("3 largest countries by GDP:")
df.nlargest(3, 'GDP')

In [None]:
# Get the 3 largest countries by multiple columns
print("3 largest countries by population and GDP:")
df.nlargest(3, ['population', 'GDP'])

In [None]:
# Get the 3 largest countries by GDP and population
print("3 largest countries by GDP and population:")
df.nlargest(3, ['GDP', 'population'])

In [None]:
# Create a Series
s = pd.Series([3, 2, 1, 5, 4])
print("Series:")
print(s)

# Get the 3 largest values
print("\n3 largest values:")
print(s.nlargest(3))

## 4. DataFrame.notna() and DataFrame.notnull()

The `notna()` and `notnull()` methods detect existing (non-missing) values. `notnull()` is an alias of `notna()`.

In [None]:
# Create a DataFrame with missing values
df = pd.DataFrame({
    'age': [5, 6, np.nan],
    'born': [pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')],
    'name': ['Alfred', 'Batman', ''],
    'toy': [None, 'Batmobile', 'Joker']
})

print("DataFrame with missing values:")
df

In [None]:
# Detect non-missing values using notna()
print("Non-missing values (notna):")
df.notna()

In [None]:
# Detect non-missing values using notnull()
print("Non-missing values (notnull):")
df.notnull()

In [None]:
# Verify that notna() and notnull() return the same result
print("Are notna() and notnull() results equal?")
print(df.notna().equals(df.notnull()))

In [None]:
# Create a Series with missing values
ser = pd.Series([5, 6, np.nan])
print("Series with missing values:")
print(ser)

In [None]:
# Detect non-missing values in the Series
print("\nNon-missing values in Series:")
print(ser.notna())

In [None]:
# Use notna() to filter a DataFrame
print("Filtering DataFrame to keep only rows where 'age' is not NA:")
df[df['age'].notna()]

In [None]:
# Count non-missing values in each column
print("Count of non-missing values in each column:")
df.notna().sum()

In [None]:
# Check if all values in a row are non-missing
print("Rows where all values are non-missing:")
df[df.notna().all(axis=1)]

## Summary

In this notebook, we've explored several important DataFrame methods:

1. **max()**: Returns the maximum of the values for the requested axis. It can be used with the `level` parameter for hierarchical indices.

2. **mean()**: Returns the mean of the values for the requested axis. It can be used with the `skipna` parameter to control how missing values are handled.

3. **nlargest()**: Returns the first n rows ordered by columns in descending order. It's useful for quickly finding the largest values in a DataFrame.

4. **notna()** and **notnull()**: Detect existing (non-missing) values in a DataFrame or Series. `notnull()` is an alias of `notna()`. These methods are essential for identifying and handling missing data.

These methods are essential for data analysis, statistical calculations, and handling missing data in pandas.