# Using the dcnorm module

* This notebook teaches some of the basic functionality of the `dcnorm` module.

* You should have already installed the `NCGR` package (https://github.com/adirkson/sea-ice-timing) before running this notebook.  
As a first step, we'll import the modules used in this notebook.

In [1]:
from NCGR.dcnorm import dcnorm_gen
import numpy as np
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'dcnorm'

Before instantiating `dcnorm_gen`, we need to create variables for the **minimum** and **maximum** values that the DCNORM distribution takes (i.e. its support):

In [None]:
a=120
b=273

For instantiating `dcnorm_gen`, the only important arguments are these values: 

In [None]:
dcnorm = dcnorm_gen(a=a, b=b)

We can now make a DCNORM distribution object with the `a` and `b` values fixed, and some arbitrary parameter values $\mu$ and $\sigma$ (note that it is necessary to have $a\leq \mu \leq b$.

In [None]:
# mu variable for the DCNORM distribution
m=132.
# sigma variable for the DCNORM distribution
s=20.

# instantiate a dcnorm distribution object
rv = dcnorm(m,s)

`rv` is now a frozen object representing a random variable described by the DCNORM distribution with the given parameters (try some others, too!); it has several methods (see documentation for `dcnorm_gen` and `scipy.stats.rv_continuous`). We'll go through some main ones now.

For instance, its PDF can be plotted simply with:

In [None]:
x = np.linspace(a, b, 1000) # discretize the range from a to b
x_sub = x[(x!=a)&(x!=b)] # extract from those values where x is not a or b
plt.figure()
plt.plot(x_sub, rv.pdf(x_sub), color='r') # plot for a<x<b
plt.plot(a, rv.pdf(a)*1e-1, 'o', color='r') # point mass at a (re-scale by 1/10 for plotting)
plt.plot(b, rv.pdf(b)*1e-1, 'o', color='r') # point mass at b (re-scale by 1/10 for plotting)
plt.show()

Note that the maginute of the circles have been reduced by a factor of 1e-1 to make it easier to see the shape of the PDF. Above, it's not actually necessary to plot the different components of the PDF seperately; the reason for doing so was purely cosmetic. We could have also typed 
```python 
plt.plot(x,rv.pdf(x))
```

A random sample of size 20 can be drawn from the distribution using the `dcnorm.rvs` method:

In [None]:
X = rv.rvs(size=20)

We'll now plot the this sample that was generated, along with the true PDF:

In [None]:
plt.figure()
plt.hist(X, density=True, color='b', alpha=0.5)
plt.plot(x_sub, rv.pdf(x_sub), color='r') # plot for a<x<b
plt.plot(a, rv.pdf(a)*1e-1, 'o', color='r') # point mass at a (re-scale by 1/10 for plotting)
plt.plot(b, rv.pdf(b)*1e-1, 'o', color='r') # point mass at b (re-scale by 1/10 for plotting)
plt.title('PDF')
plt.show()

Next, we'll use the `dcnorm.fit` method to fit the sample of data to a DCNORM distribution:

In [None]:
m_fit, s_fit = dcnorm.fit(X) # fit parameters to data
rv_fit = dcnorm(m_fit, s_fit) # create new distribution object with fitted parameters

and recreate the previous plot, but also include the fitted distribution:

In [None]:
plt.figure()
plt.hist(X, density=True, color='b', alpha=0.5, label='data')
plt.plot(x_sub, rv.pdf(x_sub), color='r', label='true dist.') # plot for a<x<b
plt.plot(a, rv.pdf(a)*1e-1, 'o', color='r') # point mass at a (re-scale by 1/10 for plotting)
plt.plot(b, rv.pdf(b)*1e-1, 'o', color='r') # point mass at b (re-scale by 1/10 for plotting)

plt.plot(x_sub, rv_fit.pdf(x_sub), color='b', label='fitted dist.') # plot for a<x<b
plt.plot(a, rv_fit.pdf(a)*1e-1, 'o', color='b') # point mass at a (re-scale by 1/10 for plotting)
plt.plot(b, rv_fit.pdf(b)*1e-1, 'o', color='b') # point mass at b (re-scale by 1/10 for plotting)

plt.legend()
plt.title('PDF')
plt.show()

We can make an analogous plot for the CDF's as well, making used of the `dcnorm.ecdf` method to plot the CDF for the data (i.e. the empirical CDF).

In [None]:
plt.figure()
plt.plot(x, dcnorm.ecdf(x,X), color='b', alpha=0.5, label='data')
plt.plot(x, rv.cdf(x), color='r', label='true dist.') # plot for a<x<b
plt.plot(x, rv_fit.cdf(x), color='b', label='fitted dist.') # plot for a<x<b
plt.legend()
plt.title('CDF')
plt.show()

Finally, we'll compute the statistical moments of the distribution (mean, variance, skewness, kurtosis), noting that the mean and variance are calculalated using closed-form expressions.

In [2]:
m, v, s, k = rv.stats('mvsk')
print("mean, variance, skewness, kurtosis", (m, v, s, k))

NameError: name 'rv' is not defined