# **Distribution Analysis with Dist Class**
**Tolga Barış Terzi – 2025**

This notebook demonstrates fitting a **continuous probability distribution** to univariate data using the `Dist` class.  
The class allows you to:

- Fit SciPy continuous distributions.
- Compute PDF, CDF, inverse CDF (PPF).
- Perform goodness-of-fit tests (KS, AD).
- Compute information criteria (AIC, BIC).
- Calculate return levels for specified return periods.

---

## **Required Packages**


In [1]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import pydrght

---
## **Load the Data**

The example dataset contains monthly values of:

- **Streamflow**  
- **Precipitation**  

The data is from the **Seyhan River Basin, Turkey**, covering the period **October 1964 – September 2011**, which corresponds to hydrological years **1965–2011**.

In [2]:
df = pd.read_csv("data.csv", index_col=0, parse_dates=True)
df.head()

Unnamed: 0_level_0,STREAMFLOW,PRECIPITATION,MINT,MAXT,MEANT,PET
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1964-10-01,10.5,46.67,2.0,22.7,12.0,95.471
1964-11-01,11.0,99.7,-1.7,12.3,5.0,43.318
1964-12-01,12.5,64.7,-4.1,4.1,-0.4,22.32
1965-01-01,12.3,41.0,-5.0,3.0,-1.4,22.607
1965-02-01,15.8,104.5,-6.9,3.4,-2.1,29.121


---
## **Initialize the Distribution Object**
Here we create a `Dist` object using a Gamma distribution and fix the location parameter at 0 (`floc0=True`).  
This ensures the fitted distribution starts at zero, which is common for precipitation data.

In [3]:
# Select the precipitation column
precip = df["PRECIPITATION"]
# Initialize Dist class
Dist = pydrght.Dist(precip, stats.gamma, floc0=True)

---
## **View Fitted Parameters**
Once the distribution is fitted, we can examine the estimated parameters:

- **Shape**: Determines the form of the distribution.
- **Loc**: Location parameter (fixed at 0 here).
- **Scale**: Scale factor for the distribution.

In [4]:
print("Fitted parameters:")
print("Shape:", Dist.shape)
print("Loc:", Dist.loc)
print("Scale:", Dist.scale)

Fitted parameters:
Shape: [0.9141604377597456]
Loc: 0
Scale: 56.84749648197027


---
## **Goodness-of-Fit Test**

We perform the Kolmogorov-Smirnov (KS) test to assess how well the fitted Gamma distribution matches the observed data.
A higher p-value indicates better agreement with the empirical data.

In [5]:
ks_stat, ks_p = Dist.ks_test()
print("KS test statistic:", ks_stat)
print("KS test p-value:", ks_p)

KS test statistic: 0.07156335123835356
KS test p-value: 0.0058824587882349635


---
## **Information Criteria**

We compute AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) for the fitted distribution.
Lower values indicate a better model fit relative to model complexity.

In [6]:
print("AIC:", Dist.aic())
print("BIC:", Dist.bic())

AIC: 5587.250543656458
BIC: 5600.255706410952


---
## **Cumulative Distribution Function (CDF)**
The CDF gives the probability that a random variable is less than or equal to a given value.
This can be useful to estimate the probability of precipitation not exceeding a certain threshold.

In [7]:
cdf_values = Dist.cdf()
print("CDF values:")
display(cdf_values.head())

CDF values:


0    0.600475
1    0.849206
2    0.713896
3    0.555842
4    0.861812
dtype: float64

---
## **Return Period Analysis**

Return periods estimate extreme event thresholds, such as precipitation expected once every 200 years.

In [8]:
return_level = Dist.return_period(T=200, interarrival=5)
print("Return level for 200 years:", return_level)

Return level for 200 years: 199.28898784924868


---
## **References**

- Kolmogorov, A. (1933). *Sulla determinazione empirica di una legge di distribuzione.* Giornale dell’Istituto Italiano degli Attuari, 4, 83–91.
- Smirnov, N. (1948). *Table for estimating the goodness of fit of empirical distributions.* Annals of Mathematical Statistics, 19, 279–281.
- Anderson, T. W., & Darling, D. A. (1952). *Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes.* Annals of Mathematical Statistics, 23, 193–212.
- Akaike, H. (1974). *A new look at the statistical model identification.* IEEE Transactions on Automatic Control, 19(6), 716–723.
- Schwarz, G. (1978). *Estimating the dimension of a model.* Annals of Statistics, 6(2), 461–464.
- Katz, R. W., Parlange, M. B., & Naveau, P. (2002). *Statistics of extremes in hydrology.* Advances in Water Resources, 25(8-12), 1287–1304.
- Coles, S. (2001). *An Introduction to Statistical Modeling of Extreme Values.* Springer Series in Statistics.
