# Range
The range is the simplest measure of dispersion that shows how spread out the values in a dataset are. It represents the difference between the highest and lowest values.

**Why Use Range?**

Advantages:
- Quick overview of data spread
- Easy to interpret and explain
- Useful for detecting outliers
- Good for preliminary data analysis

Limitations:
- Misleading with outliers: A single extreme value can distort the picture
- Ignores data distribution: Doesn't show how values are distributed between min and max
- Not robust: Only uses two data points from the entire dataset


## Implementation

### 1. Basic Range Calculation

In [2]:
# Sample dataset
data = [23, 45, 67, 34, 89, 56, 42, 12, 78, 91]

# Calculate range manually
data_min = min(data)
data_max = max(data)
data_range = data_max - data_min

print(f"Dataset: {data}")
print(f"Minimum: {data_min}")
print(f"Maximum: {data_max}")
print(f"Range: {data_range}")  # 91 - 12 = 79

Dataset: [23, 45, 67, 34, 89, 56, 42, 12, 78, 91]
Minimum: 12
Maximum: 91
Range: 79


### 2. Using NumPy

In [3]:
import numpy as np

data = [23, 45, 67, 34, 89, 56, 42, 12, 78, 91]

# Using NumPy functions
data_min = np.min(data)
data_max = np.max(data)
data_range = np.ptp(data)  # "peak to peak" - equivalent to max-min

print(f"NumPy Min: {data_min}")
print(f"NumPy Max: {data_max}")
print(f"NumPy Range (ptp): {data_range}")
print(f"Manual Range: {data_max - data_min}")

NumPy Min: 12
NumPy Max: 91
NumPy Range (ptp): 79
Manual Range: 79


### 3. Using pandas (for DataFrames)

In [4]:
import pandas as pd
import numpy as np

# Create a DataFrame
data = {
    'temperature': [22, 25, 19, 30, 18, 35, 17, 28, 23, 40],
    'sales': [100, 150, 80, 200, 75, 180, 70, 160, 120, 250],
    'customer_rating': [4, 5, 3, 4, 5, 4, 3, 5, 4, 5]
}

df = pd.DataFrame(data)

print("DataFrame:")
print(df)

# Range for each column
print("\nRange for each column:")
for column in df.columns:
    col_min = df[column].min()
    col_max = df[column].max()
    col_range = col_max - col_min
    print(f"{column}: {col_range} ({col_min} to {col_max})")

# Using describe() for comprehensive statistics
print("\nComprehensive Statistics:")
print(df.describe())

DataFrame:
   temperature  sales  customer_rating
0           22    100                4
1           25    150                5
2           19     80                3
3           30    200                4
4           18     75                5
5           35    180                4
6           17     70                3
7           28    160                5
8           23    120                4
9           40    250                5

Range for each column:
temperature: 23 (17 to 40)
sales: 180 (70 to 250)
customer_rating: 2 (3 to 5)

Comprehensive Statistics:
       temperature       sales  customer_rating
count    10.000000   10.000000        10.000000
mean     25.700000  138.500000         4.200000
std       7.572611   60.094832         0.788811
min      17.000000   70.000000         3.000000
25%      19.750000   85.000000         4.000000
50%      24.000000  135.000000         4.000000
75%      29.500000  175.000000         5.000000
max      40.000000  250.000000         5.000000

## Advanced Usage

### 1. Handling Outliers in Range Calculation

In [5]:
import numpy as np

def robust_range(data, outlier_threshold=1.5):
    """
    Calculate range while handling outliers using IQR method
    """
    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    
    # Define outlier bounds
    lower_bound = q1 - outlier_threshold * iqr
    upper_bound = q3 + outlier_threshold * iqr
    
    # Filter out outliers
    filtered_data = [x for x in data if lower_bound <= x <= upper_bound]
    
    if len(filtered_data) == 0:
        return np.ptp(data)  # Return original range if no data remains
    
    robust_min = np.min(filtered_data)
    robust_max = np.max(filtered_data)
    robust_range = robust_max - robust_min
    
    return {
        'original_range': np.ptp(data),
        'robust_range': robust_range,
        'outliers_removed': len(data) - len(filtered_data),
        'filtered_data_size': len(filtered_data)
    }

# Test with data containing outliers
data_with_outliers = [10, 12, 13, 14, 15, 16, 17, 18, 19, 50]  # 50 is an outlier

result = robust_range(data_with_outliers)
print(f"Original data: {data_with_outliers}")
print(f"Original range: {result['original_range']}")  # 40
print(f"Robust range: {result['robust_range']}")      # 9
print(f"Outliers removed: {result['outliers_removed']}")

Original data: [10, 12, 13, 14, 15, 16, 17, 18, 19, 50]
Original range: 40
Robust range: 9
Outliers removed: 1


### 2. Range for Multiple Dataset Comparison

In [6]:
import pandas as pd
import numpy as np

def compare_datasets_ranges(datasets_dict):
    """
    Compare ranges across multiple datasets
    """
    comparison = {}
    
    for name, data in datasets_dict.items():
        data_array = np.array(data)
        comparison[name] = {
            'min': np.min(data_array),
            'max': np.max(data_array),
            'range': np.ptp(data_array),
            'mean': np.mean(data_array),
            'size': len(data_array)
        }
    
    # Create comparison DataFrame
    comparison_df = pd.DataFrame(comparison).T
    comparison_df = comparison_df.sort_values('range')
    
    return comparison_df

# Example with different datasets
datasets = {
    'Student_Heights': [160, 165, 170, 175, 168, 172, 169, 171],
    'Product_Weights': [100, 102, 101, 103, 100, 104, 102, 101],
    'Monthly_Sales': [5000, 15000, 8000, 12000, 7000, 20000, 9000, 18000],
    'Response_Times': [0.1, 0.2, 0.15, 0.3, 0.12, 0.25, 0.18, 5.0]  # With outlier
}

comparison_result = compare_datasets_ranges(datasets)
print("Dataset Range Comparison:")
print(comparison_result)

Dataset Range Comparison:
                    min      max    range        mean  size
Product_Weights   100.0    104.0      4.0    101.6250   8.0
Response_Times      0.1      5.0      4.9      0.7875   8.0
Student_Heights   160.0    175.0     15.0    168.7500   8.0
Monthly_Sales    5000.0  20000.0  15000.0  11750.0000   8.0


**Key Python Functions Summary**

| Method                        | Library  | Usage                  | Best For                    |
|-------------------------------|----------|------------------------|-----------------------------|
| `max(data) - min(data)`       | Built-in | Manual calculation     | Simple cases                |
| `np.ptp(data)`                | NumPy    | Array operations       | Numerical arrays            |
| `np.max(data) - np.min(data)` | NumPy    | Explicit calculation   | When you need min/max separately |
| `df.max() - df.min()`         | pandas   | DataFrame operations   | Data analysis workflows     |
| `df.describe()`               | pandas   | Comprehensive stats    | Full data summary           |


Practical Insights
- Small range → Data points are close together (consistent)
- Large range → Data points are spread out (variable)
- Always check for outliers before interpreting range
- Use with other measures like standard deviation for complete picture
- Good for quick data quality checks - very large ranges might indicate data errors

The range is your go-to tool for a quick understanding of how spread out your data is, but always complement it with other dispersion measures for a complete analysis!

