## Hands-on - Probability Distributions and Variability

In [263]:
# Import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
from scipy.stats import binom, norm, ttest_1samp, mannwhitneyu, chi2_contingency

# Load dataset from GitHub URL
file_path = "https://raw.githubusercontent.com/Hamed-Ahmadinia/DASP-2025/main/epa-sea-level.csv"  # URL link to the dataset stored on GitHub

# Read the dataset into a pandas dataframe
df = pd.read_csv(file_path)  # Load the dataset as a pandas DataFrame

# Display the first few rows of the dataframe to confirm the data has been loaded correctly
print("Dataset Preview:")  # Print a label for context
print(df.head(5))  # Display the first 5 rows of the dataset

Dataset Preview:
   Year  CSIRO Adjusted Sea Level  Lower Error Bound  Upper Error Bound  \
0  1880                  0.000000          -0.952756           0.952756   
1  1881                  0.220472          -0.732283           1.173228   
2  1882                 -0.440945          -1.346457           0.464567   
3  1883                 -0.232283          -1.129921           0.665354   
4  1884                  0.590551          -0.283465           1.464567   

   NOAA Adjusted Sea Level  
0                      NaN  
1                      NaN  
2                      NaN  
3                      NaN  
4                      NaN  


## **Exercise 1: Understanding the Dataset**
🔹 **Question:** Display key statistics about the dataset using `.describe()`. `.

In [264]:
df.describe()

Unnamed: 0,Year,CSIRO Adjusted Sea Level,Lower Error Bound,Upper Error Bound,NOAA Adjusted Sea Level
count,134.0,134.0,134.0,134.0,21.0
mean,1946.5,3.650341,3.204666,4.096016,7.363746
std,38.826537,2.485692,2.663781,2.312581,0.691038
min,1880.0,-0.440945,-1.346457,0.464567,6.297493
25%,1913.25,1.632874,1.07874,2.240157,6.84869
50%,1946.5,3.312992,2.915354,3.71063,7.488353
75%,1979.75,5.587598,5.329724,5.845472,7.907365
max,2013.0,9.326772,8.992126,9.661417,8.546648


## **Exercise 2: Identifying Outliers**
🔹 **Question:** Use the **interquartile range (IQR)** method to detect outliers in the **CSIRO Adjusted Sea Level** column. .)re)  


In [265]:
# Your code here:

outlier_found = False
sea_level = df['CSIRO Adjusted Sea Level']
Q1= sea_level.quantile(0.25)
Q3= sea_level.quantile(0.75)
IQR = Q3-Q1

lower_bound= Q1-1.5*IQR
upper_bound = Q3+1.5*IQR
outliners=[]
for x in sea_level:
    if x < lower_bound or x > upper_bound:
        outlier_found = True
        outliners.append(x)

if not outlier_found:
    print(f'no outliers in range <- {lower_bound} or {upper_bound} -> ')
else:
    print(outliners)

no outliers in range <- -4.299212594249999 or 11.51968502775 -> 


## **Exercise 3: Exploring Distributions**
🔹 **Question:** Create a **boxplot** of sea level rise grouped by decade (e.g., 1880-1890, 1891-1900, etc.). e?

In [266]:
df['decade']= (df['Year']//10)*10
sea_level_rise = df.groupby('decade')['CSIRO Adjusted Sea Level']
sea_level_rise.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
decade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1880,10.0,0.198425,0.33218,-0.440945,0.054134,0.259843,0.418307,0.590551
1890,10.0,0.659449,0.323039,0.30315,0.447835,0.586614,0.747047,1.338583
1900,10.0,1.214173,0.166746,0.984252,1.114173,1.198819,1.269685,1.606299
1910,10.0,1.732677,0.257442,1.271654,1.560039,1.793307,1.854331,2.106299
1920,10.0,1.915748,0.105416,1.712598,1.864173,1.929134,1.997047,2.047244
1930,10.0,2.376378,0.244758,2.047244,2.239173,2.36811,2.501969,2.826772
1940,10.0,3.141732,0.294724,2.61811,2.992126,3.098425,3.343504,3.562992
1950,10.0,4.01378,0.252591,3.598425,3.884843,3.968504,4.229331,4.358268
1960,10.0,4.514173,0.169887,4.169291,4.459646,4.494094,4.593504,4.751968
1970,10.0,5.227559,0.28227,4.677165,5.062992,5.332677,5.399606,5.555118


## **Exercise 4: Hypothesis Testing on Trends**
🔹 **Question:** Perform a **T-test** to check if the mean sea level in the 21st century (2000-2014) is significantly higher than in the 20th century (1900-1999).  


In [267]:
# Your code here:

## **Exercise 5: Probability Distributions**
🔹 **Question:** Fit a **Poisson distribution** to the sea level rise data. ? 


In [268]:
# Your code here:

## **Exercise 6: Correlation Analysis**
🔹 **Question:** Check if there is a correlation between **CSIRO Adjusted Sea Level** and **NOAA Adjusted Sea Level**.  

In [269]:
# Your code here:

## **Exercise 7: Summary Reflection**
🔹 **Question:** Summarize your findings from this analysis.  

In [270]:
# Your code here: