# Confidence Intervals Code Appendix

## Student's t Distribution

### Coding Key Words:

- rvs: random variates
- pdf: probability density function
- cdf: cumulative distribution function
- ppf: percent point function or percentile (inverse of cdf)

### Required parameters:

- df: degree of freedom, n - 1
- size: size of the simulated dataset
- x: location for probability calculation
- q: lower tail probability for percentile calculation

``` Python
# Import dependency
from scipy import stats

# Create the parameter values
df = 34
x = 1
q = 0.75

# Create a random variable that follows t distribution
t = stats.t.rvs(df=df, size=1000)

# Calculate the PDF at x
tx = stats.t.pdf(x=x, df=df)

# Calculate the probability equal or less than x
txx = stats.t.cdf(x=x, df=df)

# Calculate the 75th percentile
tpct = stats.t.ppf(q=q, df=df)
```

In [1]:
# Import t from scipy.stats
from scipy import stats

In [2]:
# Create the degree of freedon 
df = 20

# Create a set of random values (1000) that follows t distribution
t = stats.t.rvs(df=df, size=1000)

print(f"Student's t Random Variable: \n{t}\n")

Student's t Random Variable: 
[-3.21545394e-01 -1.96574763e+00 -1.77821278e-01  5.30471136e-01
 -1.27590862e+00 -2.19775559e-02  4.98928007e-01  4.36126331e-01
 -8.54266366e-01 -5.13850610e-01  3.39344479e-01 -5.07999319e-01
  5.20395991e-01 -1.85464003e-01  2.09419801e+00  1.14780385e+00
 -7.78464503e-01 -3.13710978e-01 -1.41115268e+00 -3.03940709e-02
 -5.95616321e-01  2.86825162e-01  3.08511636e-01  1.34564598e+00
  1.10119545e+00  1.45864858e+00  7.65060187e-01  1.30014713e+00
  1.06240968e+00 -1.01191091e+00 -2.58299854e-01 -2.08291307e+00
 -1.12842020e+00 -7.10747173e-01  1.51115464e+00  2.45948719e-01
 -9.05075756e-01  3.11197537e-01 -4.28515605e-01  1.24667568e-01
 -8.64692220e-01  9.83038673e-01  6.14177629e-01 -7.75098039e-01
  6.28337858e-01  1.27545485e-01  5.37323644e-01  5.48685049e-02
  1.74953997e+00  5.47309071e-01 -1.08792324e+00  6.46682064e-01
 -9.12561022e-01  3.97324246e-01 -9.11292694e-02  5.45963481e-01
  1.40691595e+00  5.28348178e-01  8.74399180e-01  5.22190933

In [3]:
# Calculate the pdf at x = 1
stats.t.pdf(x=1, df=df)

0.23604564912670092

In [4]:
# Calculate the cdf at x = 1
stats.t.cdf(x=1, df=df)

0.8353717114141455

In [5]:
# Calculate the t statistic at 50% percentile
stats.t.ppf(q=0.5, df=df)

6.72145054561635e-17

## Confidence Intervals for Student's t Distribution

``` python
# Import stats module
from scipy import stats

# Calculate the confidence interval
stats.t.interval(
    confidence=0.9,           # Confidence level
    df=df,               # Degrees of freedom
    loc=sample_mean,     # Sample mean
    scale=standard_error # Estimated standard error for t-distribution
)
```

### Confidence Intervals for Sample Means Distribution

When the population standard deviation is known, we should use the standard z (normal) distribution.

``` Python
# Import stats module
from scipy import stats

# Calculate the confidence interval 
stats.norm.interval(
    confidence=0.9,       # Confidence level
    loc=sample_mean,      # Sample mean
    scale=standard_error  # Standard error for sample distribution
)
```

In [6]:
# Import t from scipy.stats
from scipy import stats

In [8]:
# Calculate the 95% confidence interval
sample_mean = 15
standard_error = 1.2

stats.t.interval(
    alpha=0.95,
    df=df,
    loc=sample_mean,
    scale=standard_error
)

(12.496843863280997, 17.503156136719003)

## Bootstrap

Re-sampling method to create new samples for statistical inference.

``` Python
from sklearn.utils import resample

data = [...]

boot = resample(data,              # original data set
                replace=True,      # resampling with replacement
                n_samples=10,      # number of samples
                random_state=123   # random seed to ensure consistent result
)

print(f'Bootstrap Sample: \n{boot}')
```

In [9]:
# Import resample from sklearn.utils
from sklearn.utils import resample

In [10]:
# Create 10 boostrap samples from the random variable t
boot = resample(t,
               replace=True,
               n_samples=100,
               random_state=777
)

print(f'Boostrap Sample: \n{boot}')

Boostrap Sample: 
[ 7.11921401e-02  9.83178146e-02  6.60803641e-01  1.50708547e-03
 -5.42114639e-01  7.12797519e-01  1.19623426e+00  1.14881622e+00
 -1.27045705e+00  4.97696915e-01  7.38260153e-01 -3.44718652e-01
  7.49429953e-01 -5.57169002e-01  9.37826443e-01 -5.69400861e-02
  1.20844628e+00  1.86893860e-01  2.35132377e-01 -1.12842020e+00
  1.15555451e+00 -2.65501452e-01  3.04677635e-01  6.14451321e-01
 -8.21326719e-01 -2.25868379e+00  1.38520745e+00 -1.17556770e+00
  9.56337762e-01  7.11921401e-02  1.08798511e+00  3.04677635e-01
  4.04061892e-01  1.03108421e-01 -2.57797805e+00 -2.02193205e+00
 -5.34798173e-01  8.19550936e-01  8.02576077e-01  4.50683962e-02
  1.20111575e+00  1.20291917e+00 -2.57797805e+00  6.32956931e-01
 -1.29189333e+00 -5.54491961e-01 -7.81493713e-01  4.85726226e-01
  9.48822604e-01 -1.33959709e+00 -7.78464503e-01  7.76594671e-01
 -7.96420615e-01 -1.03065924e+00  6.47400852e-01 -8.52060626e-01
 -1.26436211e+00  1.09940072e+00  6.79514209e-01  3.92804849e-01
  9.853