## Student T Distribution 
Student T Distribution AKA simply as T - Distribution is a family of distributions that look almost identical to the normal distribution curve, only a bit shorter and fatter. The t distribution is used instead of the normal distribution when you have small samples (for more on this, see: t-score vs. z-score). The larger the sample size, the more the t distribution looks like the normal distribution. In fact, for sample sizes larger than 20 (e.g. more degrees of freedom), the distribution is almost exactly like the normal distribution. The has heavier tails than normal distributions 


In [34]:
""" 
scipy.stats.t() represents a student's t continuous random variable. 
It is inherited from the generic methods as an instance of the rv_continuous class. 
The rv_continuous class in scipy.stats provides a framework for defining and working
with continuous random variables. 
"""
import numpy as np 
from scipy .stats import t

degrees_of_freedom, location, = 10, 0

rv = t(degrees_of_freedom, location)

# Generate random values from the t-distribution
# Replace 10 with the desired number of random values
random_values = rv.rvs(size=5) 
print(f"Random Values:  {random_values}\nmean: {np.mean(random_values)}")


Random Values:  [ 1.35071269 -0.62996986 -0.76952433  0.97646792  0.12664593]
mean: 0.21086646974315332


### T - Distribution Anatomy 
As you can see below the student t distribution is flatter and wider than the standard normal distribution, so more of the area under the t-distribution is pushed out toward the tails. That means the standard deviation of the t-distribution is larger than the standard deviation of the standard normal distribution<br>
<Br>
Keep in mind there isn't one generic t-distribution. The specific shape of a t-distribution, and therefore the t-scores we find in the t-table will depend on the number of <b>degrees of freedom</b>, which is given by <b>n - 1</b><br>
<br>


In [1]:
import numpy as np 
from scipy.stats import t
from IPython.display import display, Math 
import sys
sys.path.insert(0, '..')
import resources.datum as datum 
import resources.glyph as glyph

data = datum.Data()
dash = glyph.Glyph(title=f'Z and T Distributions - 100 Samples')

mu = 0
sigma = 1
N = 31

mu_t = 0 
sigma_t = 2
N_t = 29


x = np.linspace(-3, 3, N)
y = data.get_normal_dist(x = x, mu = mu, sigma = sigma)

degrees_of_freedom, location = (N_t - 1), 0
x_t = np.linspace(-3, 3, N_t)
t = t.pdf(x_t, degrees_of_freedom, location)
print(t[0])
y_t = t*6 # I scaled up by 6 for visual representation only 

# tin weight probabilities 
dash.make_line(x = x, y = y, width = 2, label = 'standard normal')
dash.make_line(x = x_t, y = y_t, width = 2, color='firebrick', label = 'student t')


dash.show()

msg = '\\displaystyle \\color{dodgerblue} \\mu : %s\\\\~\\\\'
msg = msg + '\\sigma: %s \\\\~\\\\'
msg = msg + '\\N: %s \\\\~\\\\'
msg = msg + '\\color{firebrick} \\text{Degrees of Freedom: %s}\\\\~\\\\'
msg = msg + '\\text{Student T N: %s}'

display(Math(msg%(mu, sigma, N, degrees_of_freedom, N_t)))


0.006948638448784028


<IPython.core.display.Math object>

In [4]:
import sys
sys.path.insert(0, '..')
import resources.datum as datum

data = datum.Data()

t = data.get_t_auc(list(x_t), degrees_of_freedom, location)

[31mInvalid Argument passed


InvalidParamEntry: 

Regardless of sample size hence the degrees of freedom as well, the t-distribution is always normal. But the larger the sample size or degrees of freedom, the taller and narrower the t-distribution gets. The smaller the sample size , the shorter and wider the t-distribution gets. In other words, as the sample size gets larger, the data becomes more tightly clustered around the mean, and the standard deviation gets smaller. <br>
<br>

### The Application Of T Distribution
T Distributions are used in inferential statistics when the sample size is small and tie population standard deviation is unknown. <br>
Use when: <br>

* the sample size less than or equal to 30
* Population standard deviation is unknown
* The population distribution is unimodal and skewed 
<br>

Some statisticians prefer to use the t-distribution exclusively because it's closes proximity to z - distribution with samples 30 and greater, and it is more accurate below sample size 30, but many choose to use the t-distribution for samples below 30 and use the z-distribution for samples 30 and above. We will follow the latter.<br>
<br>
ℹ️NOTE: <br>
You will also use the T-Distribution even when the sample size is greater than 30 if you do not know the population standard deviation<Br>
<br>

### Finding Values In The T-Table 
You need to know the degrees of freedom (n - 1) to find values in the t-table and either the upper-tail probability or the confidence level <br>
<br>
Lets say we know the upper-tail (right tail) probability is 0.01, and the sample size is n = 20<br>

* degrees of freedom = n - 1 = 20 - 1 = 19
* probability is 0.01 (scipy right tail format 1-0.01)

In [21]:

from scipy.stats import t

q = 1-0.01 # q = probability 
dof = 19

print(f'The right tail t-table value for a probability of 0.01 and 20 degrees of fredom is {t.ppf(q=q, df = dof)}')


The right tail t-table value for a probability of 0.01 and 20 degrees of fredom is 2.539483190622288


In [4]:
import sys
sys.path.insert(0, '..')
import resources.datum as datum 

n = 20
q = 0.01
df = n - 1 
tail = 'right'

data = datum.Data()

data.get_t_critical_value(tail = tail, q = q, df = df, std_out='Y')

<IPython.core.display.Math object>

To find the left tail t-table value for a probability of 0.01, and the sample size is n = 20<br

* degrees of freedom = 20 - 1 = 19
* probability is 0.01 (scipy left tail format 0.01)

In [3]:
from scipy.stats import t
import numpy as np 

q = 0.05 # q = probability 
dof = 15

print(f'The left tail t-table value for a probability of 0.01 and 20 degrees of fredom is {np.abs(t.ppf(q=q, df = dof))}')

The left tail t-table value for a probability of 0.01 and 20 degrees of fredom is 1.7530503556925552


In [None]:
import sys

In [5]:
import sys
sys.path.insert(0, '..')
import resources.datum as datum 

n = 20
q = 0.01
df = n - 1 
tail = 'left'

data = datum.Data()

data.get_t_critical_value(tail = tail, q = q, df = df, std_out='Y')

<IPython.core.display.Math object>

In [1]:
import sys
sys.path.insert(0, '..')
import resources.datum as datum 

n = 20
q = 0.01
df = n - 1 
tail = 'to'

data = datum.Data()

data.get_t_critical_value(tail = tail, q = q, df = df, std_out='Y')

InvalidParamEntry: [31mtail value (to) is not a valid tail option.
valid tail options are lower, upper, two.