# Descriptive and Inferential Statistics

#### Inferential statistics tries to uncover attributes about a larger population, often based on a sample. It is often misunderstood and less intuitive than descriptive statistics. Often we are interested in studying a group that is too large to observe (e.g., average height of adolescents in North America) and we have to resort to using only a few members of that group to infer conclusions about them. As you can guess, this is not easy to get right. After all, we are trying to represent a population with a sample that may not be representative. We will explore these caveats along the way.

## COEFFICIENT OF VARIATION
#### A helpful tool for measuring spread is the coefficient of variation. It compares two distributions and quantifies how spread out each of them is.

## Confidence Intervals

#### You may have heard the term “confidence interval,” which often confuses statistics newcomers and students. A confidence interval is a range calculation showing how confidently we believe a sample mean (or other parameter) falls in a range for the population mean.

#### One interesting thing to note here too is that in our margin of error formula, the larger n becomes, the narrower our confidence interval becomes! This makes sense because if we have a larger sample, we are more confident in the population mean falling in a smaller range, hence why it’s called a confidence interval.

## Exercises
#### A manufacturer says the Z-Phone smart phone has a mean consumer life of 42 months with a standard deviation of 8 months. Assuming a normal distribution, what is the probability a given random Z-Phone will last between 20 and 30 months?

In [2]:
from scipy.stats import norm
mean = 42
std = 8

z_score1 = (20 - 42) / 8
z_score2 = (30 - 42) / 8

prob_less_20 = norm.cdf(z_score1)
prob_less_30 = norm.cdf(z_score2)

print(prob_less_30 - prob_less_20)


0.0638274380338035


## Exercise 3:
#### I am skeptical that my 3D printer filament is not 1.75 mm in average diameter as advertised. I sampled 34 measurements with my tool. The sample mean is 1.715588 and the sample standard deviation is 0.029252. What is the 99% confidence interval for the mean of my entire spool of filament?

In [3]:
from scipy.stats import t
import numpy as np

# Given values
n = 34
mean = 1.715588
std_dev = 0.029252
confidence_level = 0.99

# Degrees of freedom
df = n - 1

# t-critical value for 99% confidence level
t_critical = t.ppf((1 + confidence_level) / 2, df)

# Standard error of the mean
sem = std_dev / np.sqrt(n)

# Confidence interval calculation
margin_of_error = t_critical * sem
confidence_interval = (mean - margin_of_error, mean + margin_of_error)

print(f"99% Confidence Interval: {confidence_interval}")


99% Confidence Interval: (1.7018760349925839, 1.729299965007416)


## What Does Standard Error Tell Us?

#### Standard error measures the dispersion or variability of the sample mean from the population mean. It tells us how much the sample mean would vary from the true population mean if you were to repeat the experiment multiple times under the same conditions.

## What is Margin of Error?
#### Margin of error in a confidence interval calculation represents the extent of the interval on either side of the sample mean, indicating the precision of our estimate.

## Exercise 4

#### Your marketing department has started a new advertising campaign and wants to know if it affected sales, which in the past averaged $10,345 a day with a standard deviation of $552. The new advertising campaign ran for 45 days and averaged $11,641 in sales. Did the campaign affect sales? Why or why not? (Use a two-tailed test for more reliable significance.)

In [6]:
from scipy.stats import norm

mean = 10345
std_dev = 552

p1 = 1.0 - norm.cdf(11641, mean, std_dev)

# Take advantage of symmetry
p2 = p1

# P-value of both tails
# I could have also just multiplied by 2
p_value = p1 + p2

print("Two-tailed P-value", p_value)
if p_value <= .05:
    print("Passes two-tailed test")
else:
    print("Fails two-tailed test")

Two-tailed P-value 0.01888333596496139
Passes two-tailed test
