### **Descriptive and Graphical Data Analysis**

#### **Numerical Descriptive Statistics:**
**1 - Measures of Central Tendency:**

Indicate the center of data: where most values cluster.

In [9]:
# Importing Necessary Libraries
import numpy as np
import matplotlib.pyplot as plt

- ``Mean`` (Arithmetic Mean): The sum of values divided by their count.

In [8]:
# Calculating mean in Python
data = [10, 20, 30, 40, 50]
mean_value = np.mean(data)
print(mean_value)

30.0


- ``Weighted Mean``: A mean that gives different weights to each value.

In [12]:
# Calculating weighted mean in Python
values = [10, 20, 30]
weights = [0.2, 0.5, 0.3]
weighted_mean = np.average(values, weights=weights)
print(weighted_mean)

21.0


- ``Median:`` The central value when data is sorted.

In [None]:
# Calculating median
data = [10, 20, 30, 40, 50]
median_value = np.median(data)
print (median_value)

30.0


- ``Mode``: The most frequently occurring value.

In [33]:
data = [10, 20, 20, 30, 40]
mode_result = stats.mode(data, keepdims=False)  
mode_value = mode_result.mode
mode_count = mode_result.count
print(mode_result)

ModeResult(mode=20, count=2)



**2 - Measures of Dispersion:**

**Why Study Dispersion?**

Measures of central tendency (mean, median) show only the center of data, not its spread.

**Example:** If you are told that the average depth of a river is 1 meter, this does not mean that the depth is uniform.

Dispersion allows for comparing variability across multiple datasets.

- ``Range``: Difference between the maximum and minimum values.

In [10]:
# Calculating range
data = [10, 20, 30, 40, 50]
data_range = np.max(data) - np.min(data)
print(data_range)

40


- ``Mean Absolute Deviation`` (MAD): 

Average of absolute deviations from the mean.

Measures the true average dispersion of values.

Not very sensitive to extreme values.

In [12]:
# Calculating MAD
data = [10, 20, 30, 40, 50]
mad_value = np.mean(np.abs(data - np.mean(data)))
print(mad_value)

12.0


- ``Variance`` (σ²): Average of squared deviations from the mean.

- ``Standard Deviation`` (σ): Square root of variance.

In [14]:
# Calculating variance and standard deviation
data = [10, 20, 30, 40, 50]
variance_value = np.var(data)
std_dev_value = np.std(data)
print(variance_value)
print(std_dev_value)

200.0
14.142135623730951


Dispersion measures how far the values ​​in a data set deviate from the mean.
- Use ``MAD``: If your data may contain outliers.
- Use ``Standard Deviation``: If you want a measure that highlights outliers.

In [16]:
# Normal data and data with extreme value
data_normal = [10, 20, 30, 40, 50]
data_extreme = [10, 20, 30, 40, 500]

mad_normal = np.mean(np.abs(data_normal - np.mean(data_normal)))
std_normal = np.std(data_normal)

mad_extreme = np.mean(np.abs(data_extreme - np.mean(data_extreme)))
std_extreme = np.std(data_extreme)

print("MAD (Normal):", mad_normal)
print("Standard Deviation (Normal):", std_normal)
print("MAD (Extreme):", mad_extreme)
print("Standard Deviation (Extreme):", std_extreme)

MAD (Normal): 12.0
Standard Deviation (Normal): 14.142135623730951
MAD (Extreme): 152.0
Standard Deviation (Extreme): 190.26297590440447


**3 - Skewness and Kurtosis:**

``Skewness``: Measures the asymmetry of the distribution.
-	Positive : Right tail longer
-	Negative : Left tail longer

In [54]:
import numpy as np

data_normal = np.array([10, 20, 30, 40, 50])      
data_extreme = np.array([10, 20, 30, 40, 500])

def interpret_skewness(data):
    skew = np.mean(((data - np.mean(data)) / np.std(data)) ** 3)
    
    # Automatic Interpretation
    if abs(skew) < 0.1:
        interpretation = "Symmetrical"
    elif skew > 0:
        interpretation = "Positive Skewness"
    else:
        interpretation = "Negative Skewness"
    
    return skew, interpretation

skew_normal, interpretation_normal = interpret_skewness(data_normal)
skew_extreme, interpretation_extreme = interpret_skewness(data_extreme)

print(f"Skewness (Normal): {skew_normal:.4f} ; {interpretation_normal}")
print(f"Skewness (Extreme): {skew_extreme:.4f} ; {interpretation_extreme}")

Skewness (Normal): 0.0000 ; Symmetrical
Skewness (Extreme): 1.4897 ; Positive Skewness


**4 - Kurtosis:** Measures the “peakedness” of the distribution.

-	Leptokurtic: Sharp peak with heavy tails (frequent extreme values).
-	Platykurtic: Flat peak with thin tails (few extreme values).
-	Mesokurtic: Normal distribution (moderate peak and tails).


In [55]:
import numpy as np

# Your data
data_normal = np.array([10, 20, 30, 40, 50])
data_extreme = np.array([10, 20, 30, 40, 500])

# Calculating Kurtosis (Excess Kurtosis)
kurt_normal = np.mean(((data_normal - np.mean(data_normal)) / np.std(data_normal)) ** 4) - 3
kurt_extreme = np.mean(((data_extreme - np.mean(data_extreme)) / np.std(data_extreme)) ** 4) - 3

# Displaying results
print(f"Kurtosis (Normal): {kurt_normal:.4f}")
print(f"Kurtosis (Extreme): {kurt_extreme:.4f}")

Kurtosis (Normal): -1.3000
Kurtosis (Extreme): 0.2362
