

### **Statistics 705: Day 1 – Introduction to Statistics**

#### **What is Statistics?**
Statistics is the science of collecting, organizing, analyzing, and interpreting data to make decisions. In simpler terms, it helps us understand patterns, trends, and relationships in data.

---

### **Types of Statistics**
There are two main branches of statistics:
1. **Descriptive Statistics**: These help us describe, summarize, and organize data. Examples include calculating the average of a dataset or understanding how spread out the data is.
2. **Inferential Statistics**: These allow us to make predictions or inferences about a larger group (population) based on a smaller sample of data.

---

### **Populations vs. Samples**
- **Population**: This is the entire group we are interested in studying. For example, if we want to know the average height of all students at a university, the population would be *all students*.
- **Sample**: Since it's hard to study an entire population, we often take a small portion called a sample. For example, we might select 100 students from the university to estimate the average height.

---

### **Key Concepts**
1. **Variable**: A variable is any characteristic or attribute that can take different values. For example, height, age, and income are all variables.
   - **Quantitative Variables**: Variables that can be measured numerically (e.g., height in centimeters).
   - **Qualitative Variables**: Variables that describe qualities or categories (e.g., favorite color, gender).

2. **Data**: Data are the values of the variables. For instance, if height is the variable, the data might be 170 cm, 165 cm, etc., for different individuals.

---

### **Types of Data**
1. **Nominal Data**: This is categorical data without any order. For example, gender (male, female) or types of pets (dog, cat).
2. **Ordinal Data**: This is categorical data with a meaningful order, but the difference between the values is not important. For example, a satisfaction rating (poor, average, good).
3. **Interval Data**: This is numerical data where the differences between values are meaningful, but there is no true zero point. For example, temperature in Celsius (0°C does not mean "no temperature").
4. **Ratio Data**: This is numerical data with a true zero point, meaning zero represents the absence of the quantity. For example, weight (0 kg means no weight).

---

### **Measures of Central Tendency**
These help us summarize the center of a dataset:
1. **Mean (Average)**: The mean is the sum of all the data value$$divi thenumber ofpoints.
   \[
   \tet{= \frac{\text{Sum of al da$$ values}}{\text{Number of data values}}
   \]
   - **Example**: Suppose we have the ages of 5 p$$ple:5, 3, 35, and 40. The mean is:
   \[
   \text{$$an} = \frac{20 + 25 + 30 + 35 + 40}{5} = 30
   \]
   
2. **Median**: The median is the middle value when the data is arranged in order. If there is an even number of data points, the median is the average of the two middle values.
   - **Example**: For the same ages (20, 25, 30, 35, 40), the median is 30 (the middle value). If we had another age, say 45, the median would be the average of 30 and 35:
   \[
   \text{Median} = \frac{30 + 35}{2} = 32.5
   \]

3. **Mode**: The mode is the value that appears most frequently in the data.
   - **Example**: If we have the following data for shoe sizes: 7, 8, 7, 9, 10, the mode is 7 because it appears twice, while all others appear once.

---

### **Measures of Spread (Variability)**
These tell us how spread out or concentrated the data is:
1. **Range**: The difference between the largest and smallest values in the dataset.
   - **Example**: For the ages 20, 25, 30, 35, 40, the range is:
   \[
   \text{Range} = 40 - 20 = 20
   \]

2. **Variance**: The variance measures how far each data point is from the mean. It’s calculated as the average of the squared differences from the mean.
   \[
   \text{Variance} = \frac{\sum (X - \text{Mean})^2}{n}
   \]
   where \(X\) represents each data point, and \(n\) is the number of data points.

3. **Standard Deviation**: This is the square root of the variance and provides a measure of spread in the same units as the data.
   \[
   \text{Standard Deviation} = \sqrt{\text{Variance}}
   \]
   - **Example**: For a dataset where the mean age is 30, if the variance is 25, then the standard deviation is:
   \[
   \text{Standard Deviation} = \sqrt{25} = 5
   \]
   This means that on average, the ages deviate from the mean by 5 years.

---

### **Example Recap**
Let's use a small dataset of exam scores: 60, 70, 80, 90, 100.

1. **Mean**: 
   \[
   \text{Mean} = \frac{60 + 70 + 80 + 90 + 100}{5} = 80
   \]
   
2. **Median**: When ordered (60, 70, 80, 90, 100), the middle value is 80.

3. **Mode**: Since all values are unique, there is no mode.

4. **Range**: 
   \[
   \text{Range} = 100 - 60 = 40
   \]

5. **Variance**: First, find the squared differences from the mean (80):
   \[
   (60 - 80)^2 = 400, \quad (70 - 80)^2 = 100, \quad (80 - 80)^2 = 0
   \]
   \[
   (90 - 80)^2 = 100, \quad (100 - 80)^2 = 400
   \]
   Then, average these values:
   \[
   \text{Variance} = \frac{400 + 100 + 0 + 100 + 400}{5} = 200
   \]

6. **Standard Deviation**:
   \[
   \text{Stanll build on this knowledge in the coming days with more advanced topics. Does this approach work for you? Let me know your thoughts!

### **Statistics 705: Day 2 – Introduction to Probability**

On Day 2, we'll begin introducing probability, a fundamental concept in statistics. Understanding probability is essential because it helps us quantify uncertainty and make predictions based on data. Here’s a detailed lesson plan, along with simple examples.

---

### **What is Probability?**
Probability is a measure of how likely an event is to occur. It ranges from 0 to 1:
- A probability of **0** means the event is impossible.
- A probability of **1** means the event is certain.
- A probability of **0.5** means the event is equally likely to happen or not happen.

---

### **Key Terminology in Probability**
Before we dive into calculating probabilities, let’s define a few important terms:

1. **Experiment**: Any process that generates a set of results. For example, rolling a die or flipping a coin.
   
2. **Outcome**: A possible result of an experiment. For instance, getting a 6 when rolling a die is an outcome.

3. **Sample Space (S)**: The set of all possible outcomes of an experiment. For example, when rolling a die, the sample space is:
   \[
   S = \{1, 2, 3, 4, 5, 6\}
   \]

4. **Event (E)**: A subset of the sample space. An event is a specific outcome or set of outcomes we're interested in. For example, the event of rolling an even number is:
   \[
   E = \{2, 4, 6\}
   \]

---

### **Basic Probability Formula**
The probability of an event \(E\) is given by the ratio of the number of favorable outcomes (outcomes in the event) to the total number of possible outcomes (the sample space).

\[
P(E) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes in sample space}}
\]

---

### **Example 1: Rolling a Die**
Let’s consider the simple experiment of rolling a fair six-sided die.
- The sample space \(S\) is \{1, 2, 3, 4, 5, 6\}.
- Suppose we want to find the probability of rolling a 4.

   There is only **one favorable outcome** (rolling a 4), and there are **six possible outcomes** in total, so:
   \[
   P(\text{rolling a 4}) = \frac{1}{6}
   \]

---

### **Example 2: Flipping a Coin**
Now, let’s consider flipping a fair coin. The sample space is:
\[
S = \{\text{Heads (H), Tails (T)}\}
\]

- Suppose we want to know the probability of getting heads. There is **one favorable outcome** (heads), and there are **two possible outcomes**, so:
   \[
   P(\text{Heads}) = \frac{1}{2}
   \]

---

### **Types of Events**
1. **Simple Event**: An event with only one outcome. For example, rolling a 2 on a die.
   
2. **Compound Event**: An event that includes two or more outcomes. For example, rolling an even number (which includes outcomes 2, 4, and 6).

3. **Mutually Exclusive Events**: Events that cannot happen at the same time. For example, rolling a 2 and rolling a 5 on a single roll of a die are mutually exclusive because only one of these can happen.

4. **Independent Events**: Two events are independent if the outcome of one does not affect the outcome of the other. For example, flipping a coin and rolling a die are independent events.

---

### **Example 3: Probability of Compound Events**
Let’s consider the event of rolling an even number on a die. The possible outcomes are 2, 4, and 6. So the probability is:

\[
P(\text{Even number}) = \frac{3}{6} = \frac{1}{2}
\]

---

### **Addition Rule of Probability**
If two events are **mutually exclusive** (cannot occur together), the probability that either one of the events happens is the sum of their individual probabilities. The formula is:

\[
P(A \text{ or } B) = P(A) + P(B)
\]

#### **Example 4:**
If we roll a die, what’s the probability of rolling either a 3 or a 5?
- \(P(\text{3}) = \frac{1}{6}\)
- \(P(\text{5}) = \frac{1}{6}\)

Since rolling a 3 and rolling a 5 are mutually exclusive (both can’t happen at the same time), we can add the probabilities:
\[
P(\text{3 or 5}) = \frac{1}{6} + \frac{1}{6} = \frac{2}{6} = \frac{1}{3}
\]

---

### **Multiplication Rule of Probability**
For **independent events**, the probability that both events \(A\) and \(B\) happen is the product of their individual probabilities:

\[
P(A \text{ and } B) = P(A) \times P(B)
\]

#### **Example 5:**
What is the probability of flipping heads on a coin and rolling a 4 on a die?
- \(P(\text{Heads}) = \frac{1}{2}\)
- \(P(\text{4}) = \frac{1}{6}\)

Since these events are independent, we can multiply the probabilities:
\[
P(\text{Heads and 4}) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12}
\]

---

### **Complementary Events**
The complement of an event \(E\) is the event that \(E\) does not happen. The sum of the probabilities of an event and its complement is always 1:
\[
P(E) + P(E') = 1
\]
Where \(E'\) is the complement of \(E\).

#### **Example 6:**
If the probability of it raining tomorrow is 0.3, then the probability that it will not rain is:
\[
P(\text{Not Rain}) = 1 - P(\text{Rain}) = 1 - 0.3 = 0.7
\]

---

### **Conditional Probability**
Sometimes, we want to find the probability of an event happening given that another event has already occurred. This is called conditional probability. The formula is:
\[
P(A \mid B) = \frac{P(A \text{ and } B)}{P(B)}
\]
This reads as "the probability of \(A\) given \(B\)" is equal to the probability of both \(A\) and \(B\) happening, divided by the probability of \(B\).

#### **Example 7:**
Imagine we have a deck of 52 cards, and we draw a card. What is the probability that the card is an ace given that it’s a spade?
- There are 13 spades in the deck, and 1 of them is an ace.
- The probability of drawing a spade is \(P(\text{Spade}) = \frac{13}{52} = \frac{1}{4}\).
- The probability of drawing an ace and a spade is \(P(\text{Ace and Spade}) = \frac{1}{52}\).

Using the conditional probability istributions, which help describe the likelihood of different outcomes in more complex situations. Let me know if this format works, and feel free to suggest any changes!

### **Statistics 705: Day 3 – Probability Distributions**

On Day 3, we’ll build on our understanding of probability by introducing probability distributions. These allow us to describe how probabilities are distributed over different outcomes in more complex situations. This lesson will cover both **discrete** and **continuous** probability distributions.

---

### **1. What is a Probability Distribution?**
A **probability distribution** is a function or rule that assigns probabilities to all possible outcomes of a random variable. It helps us understand the likelihood of each possible outcome in an experiment.

A **random variable** represents numerical outcomes of an experiment. There are two types:
- **Discrete random variables**: These have a countable number of outcomes (e.g., number of heads when flipping a coin multiple times).
- **Continuous random variables**: These have an infinite number of possible values within a given range (e.g., height, temperature).

---

### **2. Discrete Probability Distributions**

A **discrete probability distribution** shows the probabilities associated with each possible value of a discrete random variable. Two commonly used discrete distributions are **Binomial Distribution** and **Poisson Distribution**.

---

#### **2.1 Binomial Distribution**

A binomial distribution is used when an experiment has:
1. A fixed number of trials \(n\).
2. Only two possible outcomes in each trial (usually called success and failure).
3. The probability of success \(p\) is the same in each trial.
4. The trials are independent.

The **binomial probability formula** is:

\[
P(X = k) = \binom{n}{k} p^k (1 - p)^{n-k}
\]

Where:
- \(X\) is the number of successes in \(n\) trials.
- \(k\) is the specific number of successes.
- \(n\) is the total number of trials.
- \(p\) is the probability of success on a single trial.
- \(\binom{n}{k}\) is the binomial coefficient, which is calculated as:
\[
\binom{n}{k} = \frac{n!}{k!(n - k)!}
\]

---

#### **Example 1:**
Suppose you flip a fair coin 4 times. What’s the probability of getting exactly 3 heads?

- \(n = 4\) (total flips).
- \(p = 0.5\) (probability of heads).
- \(k = 3\) (we want 3 heads).

The binomial probability formula is:
\[
P(X = 3) = \binom{4}{3} (0.5)^3 (1 - 0.5)^{4-3}
\]
First, calculate the binomial coefficient:
\[
\binom{4}{3} = \frac{4!}{3!1!} = 4
\]
Now, apply the values:
\[
P(X = 3) = 4 \times (0.5)^3 \times (0.5)^1 = 4 \times 0.125 \times 0.5 = 0.25
\]
So, the probability of getting exactly 3 heads is 0.25 or 25%.

---

#### **2.2 Poisson Distribution**

A Poisson distribution is used when you want to find the probability of a given number of events happening in a fixed interval of time or space, assuming the events occur with a constant rate and independently of each other.

The **Poisson probability formula** is:

\[
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
\]

Where:
- \(X\) is the number of events.
- \(\lambda\) is the average number of events that happen in the interval.
- \(k\) is the number of events we want to find the probability for.
- \(e\) is the mathematical constant approximately equal to 2.71828.

---

#### **Example 2:**
Suppose on average 3 customers enter a store every hour. What is the probability that exactly 5 customers enter the store in the next hour?

- \(\lambda = 3\) (average number of customers per hour).
- \(k = 5\) (we want the probability of 5 customers).

Using the Poisson formula:
\[
P(X = 5) = \frac{3^5 e^{-3}}{5!}
\]
First, calculate the factorial \(5!\):
\[
5! = 5 \times 4 \times 3 \times 2 \times 1 = 120
\]
Then:
\[
P(X = 5) = \frac{243 \times e^{-3}}{120} \approx \frac{243 \times 0.04979}{120} \approx \frac{12.1}{120} \approx 0.1008
\]
So, the probability of exactly 5 customers is approximately 0.1008 or 10.08%.

---

### **3. Continuous Probability Distributions**

Continuous probability distributions are used for continuous random variables. Since continuous variables can take an infinite number of values, we don’t use exact probabilities for specific values (since the probability of one exact value is 0). Instead, we look at the probability of the variable falling within a range of values.

---

#### **3.1 Normal Distribution (Gaussian Distribution)**

The **normal distribution** is one of the most important probability distributions in statistics. It’s symmetric and bell-shaped, and most real-world phenomena follow a normal distribution when measured (e.g., heights of people, test scores).

A normal distribution is described by two parameters:
- **Mean** (\(\mu\)): The center of the distribution.
- **Standard deviation** (\(\sigma\)): How spread out the values are.

The probability density function (PDF) of the normal distribution is:

\[
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
\]

Where:
- \(x\) is a value of the random variable.
- \(\mu\) is the mean.
- \(\sigma\) is the standard deviation.
- \(e\) is the mathematical constant (approx. 2.71828).

---

#### **Properties of the Normal Distribution**:
1. **Symmetry**: The distribution is symmetric around the mean.
2. **68-95-99.7 Rule** (Empirical Rule): For a normal distribution:
   - About 68% of the data falls within 1 standard deviation of the mean.
   - About 95% of the data falls within 2 standard deviations of the mean.
   - About 99.7% of the data falls within 3 standard deviations of the mean.

---

#### **Example 3:**
Suppose the heights of students in a class are normally distributed with a mean height of 170 cm and a standard deviation of 10 cm. What is the probability that a randomly selected student is taller than 180 cm?

We can standardize this using the **z-score** formula:

\[
z = \frac{x - \mu}{\sigma}
\]

Where:
- \(x\) is the value we’re interested in (180 cm).
- \(\mu\) is the mean (170 cm).
- \(\sigma\) is the standard deviation (10 cm).

So:
\[
z = \frac{180 - 170}{10} = 1
\]

A z-score of 1 means the value is 1 standard deviation above the mean. Using a standard normal table, we find that the probability of a z-score greater than 1 is approximately 0.1587.

Thus, the probability that a student is taller than 180 cm is approximately 15.87%.

---

#### **3.2 Uniform Distribution**

In a **uniform distribution**, all outcomes are equally likely within a given range. The continuous uniform distribution is used when every value within an interval has an equal probability.

The probability density function for a uniform distribution between \(a\) and \(b\) is:

\[
f(x) = \frac{1}{b - a}, \quad a \leq x \leq b
\]

---

#### **Example 4:**
Suppose a bus arrives at a bus stop sometime between 10:00 AM and 10:30 AM, and the arrival time is uniformly distributed. What’s the probability that the bus will arrive between 10:10 AM and 10:20 AM?

Here, \(a = 0\) minutes and \(b = 30\) minutes (because we’re measuring the time interval from 10:00 AM to 10:30 AM). The probability that the bus arrives between 10:10 AM and 10:20 AM (which is fr---

This wraps up Day 3’s lesson on probability distributions! Tomorrow, we’ll delve deeper into working with the normal distribution, including the central limit theorem and its importance in statistics.

### **Statistics 705: Day 4 – The Normal Distribution and Central Limit Theorem**

Welcome to Day 4! Today’s lesson will dive deeper into the **normal distribution**, one of the most important concepts in statistics, and introduce the **Central Limit Theorem (CLT)**, which is fundamental for understanding sampling distributions. This will help you see how even non-normally distributed data can lead to normal-like results under certain conditions.

---

### **1. The Normal Distribution: A Quick Recap**

The **normal distribution** is a continuous probability distribution that is symmetric and bell-shaped. It is described by two parameters:
- **Mean** (\(\mu\)): Determines the center of the distribution.
- **Standard deviation** (\(\sigma\)): Determines the spread or width of the distribution.

#### **Probability Density Function (PDF)**:
The PDF of the normal distribution is given by:

\[
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
\]

Where:
- \(x\) is a value of the random variable.
- \(\mu\) is the mean of the distribution.
- \(\sigma\) is the standard deviation.
- \(e\) is the mathematical constant approximately equal to 2.71828.

---

### **2. Properties of the Normal Distribution**
Let’s highlight the key properties:
1. **Symmetry**: The distribution is symmetric around the mean, meaning the left and right sides are mirror images.
2. **The total area under the curve** equals 1.
3. **68-95-99.7 Rule (Empirical Rule)**:
   - About 68% of the data lies within 1 standard deviation of the mean (\(\mu \pm \sigma\)).
   - About 95% of the data lies within 2 standard deviations of the mean (\(\mu \pm 2\sigma\)).
   - About 99.7% of the data lies within 3 standard deviations of the mean (\(\mu \pm 3\sigma\)).

---

### **3. Z-Scores and Standard Normal Distribution**

To make any normal distribution easier to work with, we can standardize it into what’s called the **standard normal distribution**, which has a mean of 0 and a standard deviation of 1. We do this using **z-scores**.

#### **Z-Score Formula**:
The **z-score** tells us how many standard deviations a data point is from the mean:

\[
z = \frac{x - \mu}{\sigma}
\]

Where:
- \(x\) is the value we’re interested in.
- \(\mu\) is the mean of the distribution.
- \(\sigma\) is the standard deviation of the distribution.

The z-score allows us to compare values from different normal distributions or find probabilities from standard normal distribution tables.

---

#### **Example 1:**
Suppose we have a class of students, and their exam scores are normally distributed with a mean of 70 and a standard deviation of 10. What’s the probability that a randomly selected student scores more than 85?

Step 1: Standardize using the z-score formula:
\[
z = \frac{85 - 70}{10} = 1.5
\]
Step 2: Using a standard normal table (or calculator), find the probability for \(z > 1.5\). From the table, the probability is approximately 0.0668.

Thus, the probability that a student scores more than 85 is about 6.68%.

---

### **4. The Central Limit Theorem (CLT)**

Now let’s introduce one of the most powerful concepts in statistics: the **Central Limit Theorem (CLT)**.

#### **4.1 What is the Central Limit Theorem?**
The Central Limit Theorem states that **the distribution of the sample mean** of a large enough sample size from any population will be approximately **normally distributed**, regardless of the original population’s distribution, as long as the sample size is sufficiently large (usually \(n > 30\)).

This theorem is incredibly important because it allows us to use normal distribution methods to make inferences about population means, even when the original data is not normally distributed.

---

#### **4.2 Why is the CLT Important?**
- It **justifies the use of the normal distribution** for inference (such as confidence intervals and hypothesis tests) when we work with sample means.
- Even if the population is not normally distributed, the sampling distribution of the sample mean becomes normal as the sample size increases.
- This gives us the ability to make probabilistic statements about sample means.

---

#### **4.3 The Sampling Distribution of the Sample Mean**
Let’s explain this concept in more detail:
- **Population Mean (\(\mu\))**: The true mean of the entire population.
- **Sample Mean (\(\bar{x}\))**: The mean of a sample taken from the population.
- **Sampling Distribution of \(\bar{x}\)**: If we take multiple samples of size \(n\) from a population and calculate the mean of each sample, the distribution of these sample means forms a sampling distribution.

According to the CLT, for a large sample size \(n\), the sampling distribution of \(\bar{x}\):
- Has a **mean** equal to the population mean \(\mu\).
- Has a **standard deviation** (called the **standard error**) equal to:

\[
\text{Standard Error} = \frac{\sigma}{\sqrt{n}}
\]

Where:
- \(\sigma\) is the standard deviation of the population.
- \(n\) is the sample size.

---

#### **4.4 How the CLT Works**

Imagine we’re studying the heights of people in a city. The population distribution is not normal; maybe it’s skewed. However, if we take random samples of heights, say 30 people at a time, and calculate the average height for each sample, the distribution of those sample means will be approximately normal, even if the original data is skewed.

---

#### **Example 2:**
Suppose the weight of apples in a farm is skewed to the right (heavier apples are rarer). The mean weight is 200 grams with a standard deviation of 50 grams. What’s the probability that the average weight of a random sample of 36 apples will be less than 190 grams?

Step 1: Calculate the standard error:
\[
\text{Standard Error} = \frac{50}{\sqrt{36}} = \frac{50}{6} = 8.33
\]

Step 2: Standardize using the z-score formula:
\[
z = \frac{190 - 200}{8.33} = \frac{-10}{8.33} \approx -1.2
\]

Step 3: Using a standard normal table (or calculator), find the probability for \(z < -1.2\). From the table, the probability is approximately 0.1151.

Thus, the probability that the average weight of the sample of apples is less than 190 grams is about 11.51%.

---

### **5. Applying the Central Limit Theorem in Real Life**

- **Surveying populations**: If you survey a random sample of 100 people about their opinions, the sample mean opinion score will be normally distributed, even if individual opinions are skewed.
- **Manufacturing quality control**: Checking the average diameter of screws produced in a factory. If you take enough samples, the mean diameter will follow a normal distribution, even if individual diameters vary unpredictably.

---

### **6. Conditions for the Central Limit Theorem**
For the CLT to hold, certain conditions should be met:
- The sample size should be **large enough** (\(n \geq 30\)).
- The samples must be **independent** (one sample doesn’t affect another).
- If the population is highly skewed or has outliers, a larger sample size is needed for the CLT to apply.

---

### **Summary of Day 4**:
1. We reviewed the **normal distribution** and its importance in statistics.
2. We learned about **z-scores** to standardize any normal distribution into a standard normal distribution.
3. We introduced the **Central Limit Theorem (CLT)**, which explains how sample means from any populat---e.

---

That concludes Day 4! In our next class, we’ll move on to **confidence intervals** and see how we can estimate population parameters using sample data.

### **Statistics 705: Day 5 – Confidence Intervals**

Welcome to Day 5! Today’s lesson will focus on **confidence intervals**, one of the most commonly used methods in statistics for estimating unknown population parameters. We’ll explain how to construct confidence intervals for the population mean, what they mean, and how to interpret them.

---

### **1. Introduction to Confidence Intervals**

A **confidence interval (CI)** provides a range of values that, with a certain level of confidence, is believed to contain the population parameter (such as the mean). The idea is that, instead of estimating the population mean (\(\mu\)) with a single value, we estimate it with a range.

#### **Key Components of a Confidence Interval**:
1. **Point estimate**: A single statistic calculated from sample data (such as the sample mean \(\bar{x}\)).
2. **Margin of error**: The range of uncertainty around the point estimate. It accounts for the variability in the sample data.
3. **Confidence level**: The probability that the interval contains the population parameter. Common confidence levels are 90%, 95%, and 99%.

---

#### **1.1 Confidence Interval Formula for Population Mean**

If the population standard deviation (\(\sigma\)) is known and the sample size is large (\(n \geq 30\)), the formula for the confidence interval for the population mean is:

\[
\text{CI} = \bar{x} \pm z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right)
\]

Where:
- \(\bar{x}\) is the sample mean.
- \(z_{\alpha/2}\) is the **z-value** corresponding to the desired confidence level (e.g., for 95% confidence, \(z_{\alpha/2} = 1.96\)).
- \(\sigma\) is the population standard deviation (if known).
- \(n\) is the sample size.
- \(\frac{\sigma}{\sqrt{n}}\) is called the **standard error**.

---

#### **1.2 Common Z-values for Different Confidence Levels**:
- **90% confidence**: \(z_{\alpha/2} = 1.645\)
- **95% confidence**: \(z_{\alpha/2} = 1.96\)
- **99% confidence**: \(z_{\alpha/2} = 2.576\)

---

#### **Example 1: Confidence Interval for a Population Mean (Known \(\sigma\))**

Suppose a company wants to estimate the average amount of time employees spend commuting to work. They randomly sample 100 employees and find a sample mean commute time of 30 minutes. The population standard deviation is known to be 8 minutes. Calculate a 95% confidence interval for the population mean commute time.

**Step 1: Identify the values**:
- \(\bar{x} = 30\) (sample mean)
- \(\sigma = 8\) (population standard deviation)
- \(n = 100\) (sample size)
- For 95% confidence, \(z_{\alpha/2} = 1.96\)

**Step 2: Calculate the standard error**:
\[
\text{Standard Error} = \frac{\sigma}{\sqrt{n}} = \frac{8}{\sqrt{100}} = \frac{8}{10} = 0.8
\]

**Step 3: Calculate the margin of error**:
\[
\text{Margin of Error} = z_{\alpha/2} \times \text{Standard Error} = 1.96 \times 0.8 = 1.568
\]

**Step 4: Calculate the confidence interval**:
\[
\text{CI} = 30 \pm 1.568 = (30 - 1.568, 30 + 1.568) = (28.432, 31.568)
\]

So, the 95% confidence interval for the population mean commute time is **(28.43, 31.57)** minutes. This means that we are 95% confident that the true population mean commute time falls within this range.

---

### **2. Interpreting Confidence Intervals**

#### **What does a confidence interval mean?**
If we say we are "95% confident" that the true mean is between 28.43 and 31.57 minutes, it means that if we took many samples and calculated a confidence interval for each one, about 95% of those intervals would contain the true population mean. **It does not mean there’s a 95% chance that the population mean is within any one specific interval**.

#### **Key points to remember**:
1. **Wider intervals** provide more certainty but less precision.
2. **Narrower intervals** provide more precision but less certainty.
3. A higher confidence level (e.g., 99%) will result in a wider interval because you are being more conservative.
4. A larger sample size decreases the margin of error and results in a narrower confidence interval.

---

### **3. Confidence Interval for Population Mean (Unknown \(\sigma\))**

If the population standard deviation \(\sigma\) is unknown, we use the sample standard deviation \(s\) instead, and the **t-distribution** replaces the **z-distribution**.

#### **Formula for Confidence Interval (Unknown \(\sigma\))**:

\[
\text{CI} = \bar{x} \pm t_{\alpha/2, df} \left( \frac{s}{\sqrt{n}} \right)
\]

Where:
- \(t_{\alpha/2, df}\) is the **t-value** from the t-distribution, based on the confidence level and degrees of freedom (\(df = n - 1\)).
- \(s\) is the sample standard deviation.
- The rest of the terms are the same as before.

---

#### **Example 2: Confidence Interval for a Population Mean (Unknown \(\sigma\))**

A sample of 25 people gives a sample mean weight of 150 pounds and a sample standard deviation of 15 pounds. Calculate a 95% confidence interval for the population mean.

**Step 1: Identify the values**:
- \(\bar{x} = 150\) (sample mean)
- \(s = 15\) (sample standard deviation)
- \(n = 25\) (sample size)
- For 95% confidence and \(df = 24\), the t-value \(t_{\alpha/2, 24}\) is approximately 2.064 (from a t-table).

**Step 2: Calculate the standard error**:
\[
\text{Standard Error} = \frac{s}{\sqrt{n}} = \frac{15}{\sqrt{25}} = \frac{15}{5} = 3
\]

**Step 3: Calculate the margin of error**:
\[
\text{Margin of Error} = t_{\alpha/2, df} \times \text{Standard Error} = 2.064 \times 3 = 6.192
\]

**Step 4: Calculate the confidence interval**:
\[
\text{CI} = 150 \pm 6.192 = (150 - 6.192, 150 + 6.192) = (143.808, 156.192)
\]

So, the 95% confidence interval for the population mean weight is **(143.81, 156.19)** pounds.

---

### **4. Confidence Intervals for Proportions**

Sometimes, we are interested in estimating a population proportion rather than a mean. In this case, we use a similar method for constructing a confidence interval, but the formula is slightly different.

#### **Formula for Confidence Interval for a Proportion**:

\[
\text{CI} = \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}
\]

Where:
- \(\hat{p}\) is the sample proportion (e.g., the proportion of successes in the sample).
- \(n\) is the sample size.
- \(z_{\alpha/2}\) is the z-value corresponding to the desired confidence level.

---

#### **Example 3: Confidence Interval for a Population Proportion**

A survey of 200 people finds that 120 people support a new policy. Estimate the 95% confidence interval for the proportion of the population that supports the policy.

**Step 1: Identify the values**:
- \(\hat{p} = \frac{120}{200} = 0.60\)
- \(n = 200\)
- For 95% confidence, \(z_{\alpha/2} = 1.96\)

**Step 2: Calculate the standard error**:
\[
\text{Standard Error} = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} = \sqrt{\frac{0.60(1 - 0.60)}{200}} = \sqrt{\frac{0.24}{200}} = \sqrt{0.0012} \approx 0.03464
\]

**Step 3: Calculate the margin of error**:
\[
\text{Margin of Error} = 1.96 \times 0.03464 \approx 0.0679
\]

**Step 4: Calculate the confidence interval**:
\[
\text{CI} = 0.60 \pm 0.0679 = (0.60 - 0.0679, 0.60 + 0.0679) = (0

.5321, 0.6679)
\]

So, the 95% confidence interval for the proportion of the population that supports the policy is **(0.532, 0.668)**, or between 53.2% and 66.8%.

---

### **5. Summary of Key Concepts**

- A **confidence interval** is a range of values that we believe contains the true population parameter.
- The **wider the interval**, the more confident we are that it contains the population parameter, but with less precision.
- **Higher confidence levels** lead to wider intervals, while **larger sample sizes** lead to narrower intervals.
- We use the **t-distribution** when the population standard d to decision-making in statistics.

---

This concludes Day 5! Feel free to reach out if you have any questions or need additional explanations on any of the concepts discussed today.

### **Statistics 705: Day 6 – Hypothesis Testing**

Welcome to Day 6! Today, we’ll dive into **hypothesis testing**, a fundamental concept in statistics that helps us make inferences about population parameters based on sample data. We'll cover the steps involved in hypothesis testing, types of errors, and practical examples.

---

### **1. Introduction to Hypothesis Testing**

**Hypothesis testing** is a statistical method that allows us to make decisions about a population based on sample data. We start with a claim (hypothesis) about a population parameter and use sample data to test whether there is enough evidence to support that claim.

#### **Key Terms**:
- **Null Hypothesis (\(H_0\))**: The statement we want to test. It typically represents no effect or no difference. 
- **Alternative Hypothesis (\(H_a\) or \(H_1\))**: The statement we want to support. It represents an effect or a difference.
- **Significance Level (\(\alpha\))**: The probability of rejecting the null hypothesis when it is actually true. Common values are 0.05, 0.01, and 0.10.

---

### **2. Steps in Hypothesis Testing**

1. **State the Hypotheses**:
   - Formulate the null hypothesis (\(H_0\)) and alternative hypothesis (\(H_a\)).
   
2. **Choose a Significance Level (\(\alpha\))**:
   - Decide how much risk of a Type I error (rejecting \(H_0\) when it is true) is acceptable.

3. **Collect Data**:
   - Gather sample data relevant to the hypotheses.

4. **Calculate the Test Statistic**:
   - Determine the appropriate statistical test (e.g., t-test, z-test) and calculate the test statistic.

5. **Determine the Critical Value(s) or P-value**:
   - Compare the test statistic to critical values or calculate the P-value.

6. **Make a Decision**:
   - Reject \(H_0\) if the test statistic is in the critical region or if the P-value is less than \(\alpha\). Otherwise, do not reject \(H_0\).

7. **Draw a Conclusion**:
   - Interpret the results in the context of the problem.

---

### **3. Types of Hypotheses**

#### **3.1 One-Tailed vs. Two-Tailed Tests**:
- **One-Tailed Test**: Tests for the possibility of the relationship in one direction (either greater than or less than).
  - Example: \(H_0: \mu \leq 50\) vs. \(H_a: \mu > 50\)
  
- **Two-Tailed Test**: Tests for the possibility of the relationship in both directions (either greater than or less than).
  - Example: \(H_0: \mu = 50\) vs. \(H_a: \mu \neq 50\)

---

### **4. Types of Errors**

1. **Type I Error** (\(\alpha\)): Rejecting \(H_0\) when it is true. The probability of making this error is the significance level (\(\alpha\)).
  
2. **Type II Error** (\(\beta\)): Failing to reject \(H_0\) when it is false. The probability of making this error depends on the true parameter value and the sample size.

---

### **5. Example of Hypothesis Testing**

#### **Example 1: Testing a Population Mean**

Suppose a manufacturer claims that the average lifetime of a light bulb is 1,000 hours. A consumer group believes that the actual average lifetime is less than this. To test this claim, we will conduct a hypothesis test.

**Step 1: State the Hypotheses**:
- \(H_0: \mu = 1000\) hours (the manufacturer's claim)
- \(H_a: \mu < 1000\) hours (the consumer group’s belief)

**Step 2: Choose a Significance Level**:
- Let’s choose \(\alpha = 0.05\).

**Step 3: Collect Data**:
- A sample of 30 light bulbs is tested, and the sample mean lifetime is found to be 980 hours, with a sample standard deviation of 50 hours.

**Step 4: Calculate the Test Statistic**:
- We will use a t-test because the population standard deviation is unknown and \(n < 30\).

The formula for the t-test statistic is:
\[
t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}
\]
Where:
- \(\bar{x} = 980\)
- \(\mu_0 = 1000\)
- \(s = 50\)
- \(n = 30\)

Calculating:
\[
t = \frac{980 - 1000}{50/\sqrt{30}} = \frac{-20}{9.1287} \approx -2.19
\]

**Step 5: Determine the Critical Value**:
- For a one-tailed test with \(df = n - 1 = 29\) and \(\alpha = 0.05\), the critical t-value from the t-table is approximately \(-1.699\).

**Step 6: Make a Decision**:
- Since \(-2.19 < -1.699\), we reject \(H_0\).

**Step 7: Draw a Conclusion**:
- There is sufficient evidence at the 0.05 significance level to conclude that the average lifetime of the light bulbs is less than 1,000 hours.

---

### **6. Example of Hypothesis Testing for Proportions**

#### **Example 2: Testing a Population Proportion**

A survey shows that 60 out of 100 voters support a new policy. We want to test if more than 50% of the voters support it.

**Step 1: State the Hypotheses**:
- \(H_0: p = 0.5\) (50% support)
- \(H_a: p > 0.5\) (more than 50% support)

**Step 2: Choose a Significance Level**:
- Let’s choose \(\alpha = 0.05\).

**Step 3: Collect Data**:
- \(\hat{p} = \frac{60}{100} = 0.60\)
- \(n = 100\)

**Step 4: Calculate the Test Statistic**:
The formula for the z-test statistic for proportions is:
\[
z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}}
\]
Where \(p_0 = 0.5\).

Calculating:
\[
z = \frac{0.60 - 0.5}{\sqrt{\frac{0.5(1 - 0.5)}{100}}} = \frac{0.10}{0.05} = 2.00
\]

**Step 5: Determine the Critical Value**:
- For a one-tailed test at \(\alpha = 0.05\), the critical z-value is approximately \(1.645\).

**Step 6: Make a Decision**:
- Since \(2.00 > 1.645\), we reject \(H_0\).

**Step 7: Draw a Conclusion**:
- There is sufficient evidence at the 0.05 significance level to conclude that more than 50% of the voters support the policy.

---

### **7. Summary of Key Concepts**

- **Hypothesis testing** involves making claims about population parameters and using sample data to validate those claims.
- The process involves stating hypotheses, collecting data, calculating test statistics, and making decisions based on significance levels and critical values or P-values.
- Understanding the concepts of Type I and Type II errors is crucial for interpreting results correctly.

In the next class, we’ll cover **power analysis**, which helps us determine the sample size needed for a study to detect an effect if it exists.

---

This concludes Day 6! If you have any questions or need clarification on any topic, please feel free to ask!

### **Statistics 705: Day 7 – Power Analysis**

Welcome to Day 7! Today, we’ll explore **power analysis**, an important concept in hypothesis testing that helps us determine the sample size needed for a study to detect an effect if it exists. Understanding power is crucial for designing experiments and interpreting results effectively.

---

### **1. What is Power Analysis?**

**Power** is the probability of correctly rejecting the null hypothesis when it is false. In other words, it measures the likelihood that a test will detect an effect or difference when there is one.

#### **Key Terms**:
- **Power (1 - β)**: The probability of rejecting \(H_0\) when it is false.
- **Type I Error (α)**: The probability of rejecting \(H_0\) when it is true.
- **Type II Error (β)**: The probability of failing to reject \(H_0\) when it is false.
- **Effect Size**: A measure of the magnitude of a phenomenon. It helps to determine how big of a difference we want to detect.
- **Sample Size (n)**: The number of observations in the study.

---

### **2. Importance of Power Analysis**

Conducting a power analysis before collecting data can help researchers:
- Determine the **minimum sample size** required to detect an effect.
- Ensure that the study has a **sufficient chance of detecting an effect**, minimizing the risk of a Type II error.
- Optimize resource allocation by not oversampling or undersampling.

---

### **3. Factors Affecting Power**

Several factors can influence the power of a statistical test:

1. **Sample Size (n)**: Larger sample sizes increase power because they provide more information and reduce variability.
  
2. **Effect Size**: Larger effect sizes are easier to detect, thus increasing power. For example, detecting a difference in means of 10 is easier than detecting a difference of 1.

3. **Significance Level (α)**: Increasing \(\alpha\) (e.g., from 0.01 to 0.05) increases power because it makes it easier to reject \(H_0\), but it also increases the risk of a Type I error.

4. **Variability in the Data**: Lower variability (smaller standard deviation) in the data increases power. More consistent data allows for clearer detection of effects.

---

### **4. Conducting a Power Analysis**

#### **4.1 Example: Determining Sample Size**

Suppose we want to conduct a study to determine if a new teaching method is more effective than the traditional method. We expect a medium effect size (0.5) and want to use a significance level of \(\alpha = 0.05\).

**Step 1: Define Parameters**:
- Effect Size (Cohen’s d): 0.5 (medium)
- Significance Level (\(\alpha\)): 0.05
- Desired Power: 0.80 (80% chance of detecting an effect)

**Step 2: Use a Power Analysis Formula**:
For a two-sample t-test, the formula for sample size (\(n\)) per group is:
\[
n = \left( \frac{(Z_{\alpha/2} + Z_{1-\beta})^2 \cdot (2\sigma^2)}{d^2} \right)
\]

Where:
- \(Z_{\alpha/2}\) = z-score corresponding to the significance level
- \(Z_{1-\beta}\) = z-score corresponding to the desired power
- \(\sigma\) = standard deviation (assumed or estimated)
- \(d\) = effect size

Using standard values:
- For \(\alpha = 0.05\), \(Z_{\alpha/2} \approx 1.96\)
- For \(1 - \beta = 0.80\), \(Z_{1-\beta} \approx 0.84\)

**Step 3: Plug in Values**:
Assuming \(\sigma = 1\) for simplicity:
\[
n = \left( \frac{(1.96 + 0.84)^2 \cdot (2 \cdot 1^2)}{0.5^2} \right) = \left( \frac{(2.8)^2 \cdot 2}{0.25} \right) \approx \left( \frac{7.84 \cdot 2}{0.25} \right) \approx 62.72
\]
This means we would need approximately **63 participants per group** (total of 126).

---

### **5. Software for Power Analysis**

Power analysis can be complex and is often conducted using statistical software or specific tools. Popular options include:

- **G*Power**: A widely used free tool for conducting power analyses for various tests.
- **R and Python**: Both programming languages have packages (e.g., `pwr` in R, `statsmodels` in Python) for power analysis.

---

### **6. Conclusion and Importance of Power Analysis**

Power analysis is essential for:
- Ensuring that studies are adequately designed to detect meaningful effects.
- Avoiding waste of resources on studies that cannot provide definitive results.
- Enhancing the credibility of research findings by reducing the risk of Type II errors.

In our next class, we will discuss **regression analysis**, which helps us understand relationships between variables.

---

This concludes Day 7! If you have any questions or need further clarification, please don’t hesitate to ask!

### **Statistics 705: Day 8 – Introduction to Regression Analysis**

Welcome to Day 8! Today, we’ll explore **regression analysis**, a powerful statistical method used to understand relationships between variables. We'll cover the basics of linear regression, including the concepts of dependent and independent variables, how to interpret regression coefficients, and how to evaluate model fit.

---

### **1. What is Regression Analysis?**

**Regression analysis** is a statistical technique used to model the relationship between a dependent variable (outcome) and one or more independent variables (predictors). It helps us understand how changes in the independent variables affect the dependent variable.

#### **Key Terms**:
- **Dependent Variable (Response Variable)**: The outcome we are trying to predict or explain (e.g., sales, test scores).
- **Independent Variable (Predictor Variable)**: The variable(s) used to predict the dependent variable (e.g., advertising budget, study hours).

---

### **2. Simple Linear Regression**

**Simple linear regression** involves one dependent variable and one independent variable. The goal is to find the best-fitting line that describes the relationship between these variables.

#### **2.1 The Regression Equation**

The equation for a simple linear regression model is:
\[
Y = \beta_0 + \beta_1X + \epsilon
\]
Where:
- \(Y\) = dependent variable
- \(X\) = independent variable
- \(\beta_0\) = y-intercept (the predicted value of \(Y\) when \(X = 0\))
- \(\beta_1\) = slope of the line (the change in \(Y\) for a one-unit change in \(X\))
- \(\epsilon\) = error term (the difference between the observed and predicted values)

---

### **3. Interpreting the Coefficients**

1. **Intercept (\(\beta_0\))**: The predicted value of the dependent variable when the independent variable is zero. 
   - Example: If \(\beta_0 = 5\), when \(X = 0\), \(Y\) is predicted to be 5.

2. **Slope (\(\beta_1\))**: Indicates the strength and direction of the relationship between the independent and dependent variable.
   - Example: If \(\beta_1 = 2\), for every one-unit increase in \(X\), \(Y\) is expected to increase by 2 units.

---

### **4. Example of Simple Linear Regression**

#### **Example: Predicting Sales Based on Advertising Spend**

Suppose a company wants to predict its sales based on its advertising expenditure. Here’s the data collected for the past five months:

| Month | Advertising Spend (X) | Sales (Y) |
|-------|------------------------|-----------|
| 1     | 1000                   | 15000     |
| 2     | 2000                   | 25000     |
| 3     | 3000                   | 35000     |
| 4     | 4000                   | 45000     |
| 5     | 5000                   | 55000     |

**Step 1: Fit the Regression Model**
Using statistical software or calculations, we fit a simple linear regression model. The output might give us:
- \(\beta_0 = 10000\) (intercept)
- \(\beta_1 = 10\) (slope)

**Step 2: Write the Regression Equation**
\[
Y = 10000 + 10X
\]

**Step 3: Interpret the Results**
- The intercept (10000) means that if no advertising is done (X=0), the company can expect sales of 10,000.
- The slope (10) indicates that for every additional dollar spent on advertising, sales increase by $10.

---

### **5. Evaluating the Model Fit**

To assess how well the model fits the data, we commonly use the following metrics:

#### **5.1 Coefficient of Determination (R²)**

The **R²** value indicates the proportion of variance in the dependent variable that can be explained by the independent variable(s). It ranges from 0 to 1:
- \(R² = 0\): The independent variable does not explain any of the variability in the dependent variable.
- \(R² = 1\): The independent variable explains all the variability in the dependent variable.

#### **5.2 Residuals**

**Residuals** are the differences between observed values and predicted values:
\[
\text{Residual} = Y - \hat{Y}
\]
Where \(\hat{Y}\) is the predicted value from the regression equation.

Analyzing residuals helps to check assumptions of the regression model (e.g., linearity, homoscedasticity, and normality).

---

### **6. Multiple Linear Regression**

When there are multiple independent variables, we can extend simple linear regression to **multiple linear regression**. The equation for multiple linear regression is:
\[
Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_kX_k + \epsilon
\]
Where \(X_1, X_2, ..., X_k\) are the independent variables.

This allows us to assess the effect of several predictors simultaneously. The interpretation of coefficients remains similar, indicating the effect of each independent variable while holding others constant.

---

### **7. Conclusion and Importance of Regression Analysis**

Regression analysis is a valuable tool for:
- Understanding relationships between variables.
- Making predictions based on data.
- Guiding decision-making in various fields, including business, healthcare, and social sciences.

In our next class, we will delve deeper into **assumptions of regression analysis**, including how to check them and why they are important.

---

This concludes Day 8! If you have any questions or need further clarification, please don’t hesitate to ask!