#Quartiles and InterQuartileRange

Quartiles are statistical measures that divide a dataset into four equal parts, providing insights into the distribution and spread of the data. Each quartile represents a specific percentile in the dataset, allowing for a better understanding of how the data is distributed around the median.

## Definition of Quartiles

There are three quartiles in a dataset:

- **First Quartile (Q1)**: This is the value below which 25% of the data falls. It represents the lower 25% of the dataset.

- **Second Quartile (Q2)**: This is the median of the dataset, dividing it into two equal halves. It is the value below which 50% of the data falls.

- **Third Quartile (Q3)**: This is the value below which 75% of the data falls, representing the upper 25% of the dataset.

The quartiles are calculated after sorting the data in ascending order. The formulas to find the positions of the quartiles in a dataset with $ n $ items are:

- $ Q1 = \left(\frac{n+1}{4}\right) $th item
- $ Q2 = \left(\frac{n+1}{2}\right) $th item
- $ Q3 = \left(\frac{3(n+1)}{4}\right) $th item.

## Interquartile Range (IQR)

The **Interquartile Range (IQR)** is a measure of statistical dispersion that describes the range within which the central 50% of the data lies. It is calculated as the difference between the third quartile and the first quartile:

$$
\text{IQR} = Q3 - Q1
$$

The IQR is particularly useful for identifying outliers and understanding the variability of the dataset. A larger IQR indicates a wider spread of the middle 50% of the data, while a smaller IQR suggests that the data points are closer to each other.

### Importance of Quartiles and IQR

- **Data Distribution**: Quartiles provide a summary of the data distribution, allowing for comparisons between different datasets.

- **Outlier Detection**: The IQR can be used to detect outliers. A common rule is that any data point below $ Q1 - 1.5 \times \text{IQR} $ or above $ Q3 + 1.5 \times \text{IQR} $ is considered an outlier.

- **Box Plots**: Quartiles are often visualized using box plots, which graphically represent the median, quartiles, and potential outliers, making it easier to analyze the data's distribution.

In summary, quartiles and the interquartile range are essential tools in statistics for understanding data distribution, variability, and identifying outliers.



In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd

# Sample data
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]



In [3]:

# Number of data points
n = len(data)
n

10

In [4]:

# Calculate quartile positions
Q1_pos = (n + 1) / 4
Q2_pos = (n + 1) / 2
Q3_pos = 3 * (n + 1) / 4


In [5]:
Q1_pos, Q2_pos, Q3_pos

(2.75, 5.5, 8.25)

In [6]:

# Calculate quartiles manually
def interpolate(position, data):
    """Interpolate the value at a given position"""
    lower_index = int(position) - 1
    upper_index = lower_index + 1
    fraction = position - lower_index - 1
    return data[lower_index] + fraction * (data[upper_index] - data[lower_index])

# Sort data
sorted_data = sorted(data)

# Calculate quartiles
Q1 = interpolate(Q1_pos, sorted_data)
Q2 = interpolate(Q2_pos, sorted_data)  # Median
Q3 = interpolate(Q3_pos, sorted_data)


In [7]:

# Display calculated quartiles
print("Calculated Quartiles:")
print(f"Q1 (25th percentile): {Q1}")
print ("numpy Q1", np.percentile(data, 25, method='weibull'))
print(f"Q2 (Median, 50th percentile): {Q2}")
print(f"Q3 (75th percentile): {Q3}")


Calculated Quartiles:
Q1 (25th percentile): 27.5
numpy Q1 27.5
Q2 (Median, 50th percentile): 55.0
Q3 (75th percentile): 82.5


In [8]:

# Create a DataFrame for the data
df = pd.DataFrame(data, columns=['Values'])

# Use pandas describe to get descriptive statistics
desc = df.describe()

# Display descriptive statistics
print("\nDescriptive Statistics using describe():")
print(desc)


Descriptive Statistics using describe():
           Values
count   10.000000
mean    55.000000
std     30.276504
min     10.000000
25%     32.500000
50%     55.000000
75%     77.500000
max    100.000000


In [10]:
Q1, Q2, Q3

(27.5, 55.0, 82.5)