# Pandas Tutorial - Part 63: DataFrame Methods (nsmallest, rank)

This notebook covers two important DataFrame methods:
- `nsmallest()` - Return the first n rows ordered by columns in ascending order
- `rank()` - Compute numerical data ranks along the specified axis

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)

## 1. DataFrame.nsmallest()

The `nsmallest()` method returns the first n rows ordered by columns in ascending order. It's equivalent to `df.sort_values(columns, ascending=True).head(n)`, but more performant.

In [None]:
# Create a DataFrame for countries
df = pd.DataFrame({
    'population': [59000000, 65000000, 434000, 434000, 434000, 337000, 11300, 11300, 11300],
    'GDP': [1937894, 2583560, 12011, 4520, 12128, 17036, 182, 38, 311],
    'alpha-2': ["IT", "FR", "MT", "MV", "BN", "IS", "NR", "TV", "AI"]
}, index=["Italy", "France", "Malta", "Maldives", "Brunei", "Iceland", "Nauru", "Tuvalu", "Anguilla"])

print("Countries DataFrame:")
df

In [None]:
# Get the 3 smallest countries by population
print("3 smallest countries by population:")
df.nsmallest(3, 'population')

In [None]:
# Get the 3 smallest countries by GDP
print("3 smallest countries by GDP:")
df.nsmallest(3, 'GDP')

In [None]:
# Get the 3 smallest countries by multiple columns
print("3 smallest countries by population and GDP:")
df.nsmallest(3, ['population', 'GDP'])

In [None]:
# Get the 3 smallest countries by GDP and population
print("3 smallest countries by GDP and population:")
df.nsmallest(3, ['GDP', 'population'])

In [None]:
# Demonstrate the 'keep' parameter with duplicate values
# Note that there are duplicate population values (434000 and 11300)

# Keep='first' (default)
print("nsmallest with keep='first' (default):")
df.nsmallest(3, 'population', keep='first')

In [None]:
# Keep='last'
print("nsmallest with keep='last':")
df.nsmallest(3, 'population', keep='last')

In [None]:
# Keep='all'
print("nsmallest with keep='all':")
df.nsmallest(3, 'population', keep='all')

In [None]:
# Create a Series
s = pd.Series([3, 2, 1, 5, 4])
print("Series:")
print(s)

# Get the 3 smallest values
print("\n3 smallest values:")
print(s.nsmallest(3))

In [None]:
# Compare nsmallest with sort_values().head()
print("Using nsmallest(3, 'GDP'):")
print(df.nsmallest(3, 'GDP'))

print("\nUsing sort_values('GDP').head(3):")
print(df.sort_values('GDP').head(3))

## 2. DataFrame.rank()

The `rank()` method computes numerical data ranks along the specified axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

In [None]:
# Create a DataFrame with some duplicate values and NaN
df = pd.DataFrame(data={
    'Animal': ['cat', 'penguin', 'dog', 'spider', 'snake'],
    'Number_legs': [4, 2, 4, 8, np.nan]
})

print("Animals DataFrame:")
df

In [None]:
# Default rank (method='average')
df['default_rank'] = df['Number_legs'].rank()
print("Default rank (method='average'):")
df

In [None]:
# Rank with method='max'
df['max_rank'] = df['Number_legs'].rank(method='max')
print("Rank with method='max':")
df

In [None]:
# Rank with na_option='bottom'
df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
print("Rank with na_option='bottom':")
df

In [None]:
# Rank with pct=True (percentile rank)
df['pct_rank'] = df['Number_legs'].rank(pct=True)
print("Rank with pct=True (percentile rank):")
df

In [None]:
# Create a DataFrame to demonstrate different ranking methods
df_methods = pd.DataFrame({
    'values': [1, 2, 2, 3, 3, 3, 4, 5]
})

print("DataFrame with duplicate values:")
df_methods

In [None]:
# Demonstrate all ranking methods
methods = ['average', 'min', 'max', 'first', 'dense']

for method in methods:
    df_methods[f'rank_{method}'] = df_methods['values'].rank(method=method)

print("Comparison of different ranking methods:")
df_methods

In [None]:
# Explanation of each method
print("Explanation of ranking methods:")
print("- average: average rank of the group (default)")
print("- min: lowest rank in the group")
print("- max: highest rank in the group")
print("- first: ranks assigned in order they appear in the array")
print("- dense: like 'min', but rank always increases by 1 between groups")

In [None]:
# Demonstrate ranking with ascending=False
df_methods['rank_desc'] = df_methods['values'].rank(ascending=False)
print("Ranking with ascending=False:")
df_methods[['values', 'rank_desc']]

In [None]:
# Demonstrate ranking along different axes in a DataFrame
df_axis = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 4, 3, 2],
    'C': [3, 3, 2, 1]
})

print("DataFrame for axis demonstration:")
df_axis

In [None]:
# Rank along axis=0 (default, rank within each column)
print("Rank along axis=0 (within each column):")
df_axis.rank(axis=0)

In [None]:
# Rank along axis=1 (rank within each row)
print("Rank along axis=1 (within each row):")
df_axis.rank(axis=1)

In [None]:
# Create a DataFrame with mixed data types
df_mixed = pd.DataFrame({
    'numeric': [1, 2, 3, 4],
    'string': ['a', 'b', 'c', 'd']
})

print("DataFrame with mixed data types:")
df_mixed

In [None]:
# Rank with numeric_only=True
try:
    print("Rank with numeric_only=True:")
    df_mixed.rank(numeric_only=True)
except Exception as e:
    print(f"Error: {e}")

## Summary

In this notebook, we've explored two important DataFrame methods:

1. **nsmallest()**: Returns the first n rows ordered by columns in ascending order. This method is useful for quickly finding the smallest values in a DataFrame. It's equivalent to `df.sort_values(columns, ascending=True).head(n)`, but more performant. The `keep` parameter controls how to handle duplicate values.

2. **rank()**: Computes numerical data ranks along the specified axis. This method offers several options for handling ties (equal values) through the `method` parameter:
   - 'average': average rank of the group (default)
   - 'min': lowest rank in the group
   - 'max': highest rank in the group
   - 'first': ranks assigned in order they appear in the array
   - 'dense': like 'min', but rank always increases by 1 between groups
   
   Additional parameters include:
   - `na_option`: How to handle NaN values ('keep', 'top', or 'bottom')
   - `ascending`: Whether to rank in ascending order (default) or descending order
   - `pct`: Whether to return percentile ranks

These methods are valuable for data analysis, particularly when you need to identify the smallest values in a dataset or assign ranks to values for statistical analysis.