# Statistics of Basics Questions and Answers

## Question No.1 Explain the different types of data (qualitative and quantitative) and provide examples of each. Discuss nominal, ordinal, interval, and ratio scales.
### Types of Data: Qualitative vs. Quantitative

Data can generally be divided into two main categories: **qualitative** and **quantitative** data.

#### 1. **Qualitative Data (Categorical Data)**
- **Definition:** This type of data describes qualities or characteristics. It’s used to categorize things based on some attribute or characteristic. Qualitative data doesn't deal with numbers but instead with labels or categories.
- **Examples:**
  - **Colors**: Red, blue, green (categorizes items based on color)
  - **Names of cities**: Paris, New York, Tokyo
  - **Types of fruits**: Apple, banana, orange
  - **Genres of music**: Pop, rock, jazz

##### Qualitative Data is further divided into:
- **Nominal Data**: This is data that represents categories with no particular order or ranking. The categories are just different from each other but don't have a logical order.
  - **Examples:** Gender (male, female), hair color (black, brown, blonde), car brands (Toyota, Ford, Honda)

- **Ordinal Data**: This data also represents categories, but unlike nominal data, there is a clear order or ranking between the categories. However, the difference between these ranks isn’t necessarily equal.
  - **Examples:** 
    - **Education level**: High school, Bachelor's degree, Master's degree, Ph.D. (There’s an order, but the difference between each level isn't the same.)
    - **Movie ratings**: 1 star, 2 stars, 3 stars, etc. (There’s a rank, but we can’t say the difference between 1 and 2 stars is the same as 3 and 4 stars.)

#### 2. **Quantitative Data (Numerical Data)**
- **Definition:** This type of data is expressed in numbers and can be measured or counted. Quantitative data involves values that can be added, subtracted, multiplied, or divided. It's used when you want to quantify things.
- **Examples:**
  - **Age**: 25 years, 30 years, 45 years
  - **Height**: 160 cm, 175 cm, 180 cm
  - **Income**: $20,000, $50,000, $100,000
  
##### Quantitative Data is further divided into:
- **Interval Data**: This data has ordered values, and the differences between values are meaningful. However, it does not have a true zero point (the "zero" is arbitrary and doesn't mean "nothing").
  - **Example**: Temperature in Celsius or Fahrenheit (0°C does not mean there is no temperature; it's just a point on the scale).

- **Ratio Data**: This data has ordered values with meaningful differences between them, and it also has a true zero point, meaning "zero" means none or nothing.
  - **Example**: Height (0 cm means no height), weight (0 kg means no weight), income (0 dollars means no income). Since ratio data has a true zero, you can perform all arithmetic operations, such as multiplying or dividing.

### Summary

- **Qualitative Data** is about **categories** and includes:
  - **Nominal**: No order (e.g., gender, car brands)
  - **Ordinal**: Ordered categories (e.g., education level, rankings)
  
- **Quantitative Data** is about **numbers** and includes:
  - **Interval**: Ordered numbers with meaningful differences but no true zero (e.g., temperature)
  - **Ratio**: Ordered numbers with meaningful differences and a true zero (e.g., height, weight, income)

Understanding these types of data is crucial for choosing the right methods to analyze and interpret them in research or surveys.

    ## Question No.2 What are the measures of central tendency, and when should you use each? Discuss the mean, median, and mode with examples and situations where each is appropriate.
    ### Measures of Central Tendency: Mean, Median, and Mode

In statistics, **measures of central tendency** are ways to describe the "center" or average of a data set. These measures help summarize a large amount of data with a single value that represents the middle or typical value. There are three main measures of central tendency: **mean**, **median**, and **mode**.

#### 1. **Mean (Average)**
- **Definition**: The **mean** is the sum of all the values in a data set divided by the number of values. It is the most commonly used measure of central tendency and is often referred to as the "average."
  
  **Formula**:  
  \[
  \text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}}
  \]

- **Example**:  
  Imagine you have the following test scores: 85, 90, 92, 88, and 94. To find the mean:
  \[
  \text{Mean} = \frac{85 + 90 + 92 + 88 + 94}{5} = \frac{449}{5} = 89.8
  \]
  So, the average test score is 89.8.

- **When to use the Mean**:  
  The **mean** is best used when the data is **evenly distributed** (not skewed) and there are no extreme values (outliers). It is most useful for data that doesn’t have extreme high or low values that could distort the average.

  - **Good for**: Heights, test scores, incomes, etc.
  - **Not good for**: When the data has outliers (like one very high or very low value), because they can skew the result.

#### 2. **Median**
- **Definition**: The **median** is the middle value of a data set when the values are arranged in ascending or descending order. If there is an odd number of values, the median is the exact middle value. If there is an even number, the median is the average of the two middle values.

- **Example**:  
  Using the same test scores (85, 90, 92, 88, and 94), first arrange them in order:  
  85, 88, 90, 92, 94  
  Since there is an odd number of values, the median is the middle value:  
  The **median** is 90.

  For an even number of values, let’s say we had these scores: 85, 90, 92, 88. Arrange them:  
  85, 88, 90, 92  
  The median would be the average of the two middle values:  
  \[
  \text{Median} = \frac{88 + 90}{2} = 89
  \]
  So, the median is 89.

- **When to use the Median**:  
  The **median** is useful when the data has **outliers** or is **skewed** (not symmetrical). It is more accurate than the mean in situations where extreme values could distort the average.

  - **Good for**: Income data (where a few very high incomes could skew the mean), home prices, age distribution, etc.
  - **Not good for**: When you need to know the exact average and the data is fairly evenly distributed.

#### 3. **Mode**
- **Definition**: The **mode** is the value that appears most frequently in a data set. There can be no mode, one mode, or multiple modes if several values occur with the same highest frequency.

- **Example**:  
  Let’s look at the following numbers: 5, 7, 7, 9, 10, 10, 10  
  Here, the **mode** is 10, because it appears more often than any other number (three times).

  - If you have the set: 1, 2, 2, 3, 4, 5, the **mode** is 2 because it appears twice.

- **When to use the Mode**:  
  The **mode** is best when you want to know which value occurs the most frequently, especially with categorical or qualitative data, or when you need to identify the most common occurrence.

  - **Good for**: Fashion trends (most common colors, sizes), the most common responses to a survey question, the most frequent category in a set of data.
  - **Not good for**: Numerical data with few repeated values.

### Summary of When to Use Each Measure

1. **Mean**: Use when the data is evenly distributed without extreme values. It gives a good overall average when outliers aren’t a concern.
   - **Example**: Average score of students in a class with no extreme outliers.

2. **Median**: Use when the data is skewed or has outliers, as it is not affected by extreme values. The median gives a better "middle" value in these cases.
   - **Example**: Median income (because a few super-rich people can distort the mean).

3. **Mode**: Use when you want to know the most common value in a set of data. It’s particularly useful for categorical data.
   - **Example**: Most common shoe size sold in a store or most popular genre of music.

### Quick Recap:
- **Mean**: Average of all values (best for symmetric data).
- **Median**: Middle value when the data is ordered (best for skewed data with outliers).
- **Mode**: Most frequent value (best for identifying the most common category).

## Question No.3 Explain the concept of dispersion. How do variance and standard deviation measure the spread of data?  
### Concept of Dispersion

**Dispersion** in statistics refers to how spread out or how "varied" the data is. In other words, it measures how far the individual data points are from the average (mean) value. When data is tightly packed together, the dispersion is low. When data points are spread out over a wide range, the dispersion is high.

Dispersion helps us understand if the values in a data set are generally similar to each other or if they are very different. This is important because two sets of data can have the same average, but they might be very different in terms of how spread out the individual values are.

### Common Measures of Dispersion

There are several ways to measure dispersion, but the two most commonly used are **variance** and **standard deviation**. Both of these measure the spread of data, but in slightly different ways. Let’s break them down:

#### 1. **Variance**
- **Definition**: Variance measures how far each number in the data set is from the mean and then averages those squared differences. In simple terms, it’s the average of the squared differences from the mean.
  
  **Formula for Variance**:  
  \[
  \text{Variance} (\sigma^2) = \frac{\sum (x_i - \mu)^2}{n}
  \]
  Where:
  - \(x_i\) = Each individual data point
  - \(\mu\) = Mean (average) of the data
  - \(n\) = Total number of data points

- **How It Works**: 
  To calculate the variance:
  1. Find the mean of the data.
  2. Subtract the mean from each data point to get the difference.
  3. Square each of these differences (so that negative numbers don’t cancel out positive ones).
  4. Find the average of these squared differences.

- **Example**:
  Let’s say you have the following test scores: 4, 6, 8, and 10.

  1. First, find the **mean**:  
     \[
     \frac{4 + 6 + 8 + 10}{4} = 7
     \]
  2. Now, subtract the mean (7) from each score:
     - \(4 - 7 = -3\)
     - \(6 - 7 = -1\)
     - \(8 - 7 = 1\)
     - \(10 - 7 = 3\)
  3. Square each difference:
     - \((-3)^2 = 9\)
     - \((-1)^2 = 1\)
     - \(1^2 = 1\)
     - \(3^2 = 9\)
  4. Find the average of these squared differences:
     \[
     \frac{9 + 1 + 1 + 9}{4} = 5
     \]
  So, the **variance** is 5.

- **When to Use Variance**: Variance is useful for understanding the spread of data, but it is in "squared" units, which makes it hard to interpret directly. It is often used in more advanced statistical analysis.

#### 2. **Standard Deviation**
- **Definition**: The **standard deviation** is the square root of the variance. It gives a measure of the spread of the data in the **same units** as the data itself, making it easier to understand compared to variance.

  **Formula for Standard Deviation**:  
  \[
  \text{Standard Deviation} (\sigma) = \sqrt{\text{Variance}}
  \]

- **How It Works**:  
  Standard deviation is simply the square root of the variance. It tells you, on average, how far each data point is from the mean. A higher standard deviation means the data points are spread out more from the mean, and a lower standard deviation means the data points are closer to the mean.

- **Example**:
  Using the previous variance of 5, we can calculate the standard deviation by taking the square root:
  \[
  \text{Standard Deviation} = \sqrt{5} \approx 2.24
  \]
  So, the standard deviation is approximately 2.24.

- **When to Use Standard Deviation**:  
  Standard deviation is often more useful than variance because it is expressed in the same units as the data, making it easier to interpret. For example, if you are measuring people's heights in centimeters, the standard deviation will also be in centimeters, which is easier to understand than a variance measured in square centimeters.

### Key Differences Between Variance and Standard Deviation
- **Units**: 
  - Variance is in **squared units** (e.g., squared centimeters, squared dollars).
  - Standard deviation is in the **same units** as the original data (e.g., centimeters, dollars).
  
- **Interpretability**: 
  - Standard deviation is generally easier to interpret because it gives a sense of how much the data points typically differ from the mean in the original units.
  - Variance is more abstract because it is in squared units.

### Summary

- **Variance** tells you how spread out the data is, but in **squared units**, which makes it a bit harder to understand directly.
- **Standard Deviation** also tells you how spread out the data is, but in the **same units** as the original data, making it easier to interpret.
  
Both **variance** and **standard deviation** measure the **spread** of data, but standard deviation is more commonly used because it is more intuitive and gives you a clearer idea of how much variation exists in the data set. 

### Example in Context:
- **Small Standard Deviation**: If you measured the heights of a group of people, and the standard deviation was small (e.g., 2 cm), it means most people are close to the average height.
- **Large Standard Deviation**: If the standard deviation was large (e.g., 15 cm), it means people’s heights vary a lot from the average. Some people might be much shorter or taller than others.

In conclusion, dispersion (variance and standard deviation) helps us understand how much the data varies from the average, which is important when analyzing data for consistency, reliability, or variability.

## Question No.4 What is a box plot, and what can it tell you about the distribution of data?  
### What is a Box Plot?

A **box plot** (also known as a **box-and-whisker plot**) is a graphical representation of a data set that shows the **distribution** of the data, its **spread**, and where most of the values lie. It is especially useful for comparing multiple data sets and identifying **outliers** (extreme values that differ significantly from other data points).

A box plot displays:
- The **minimum** value
- The **first quartile (Q1)**: The median of the lower half of the data
- The **median (Q2)**: The middle value of the data
- The **third quartile (Q3)**: The median of the upper half of the data
- The **maximum** value
- **Outliers** (if any) — values that are far away from most of the other data points

### Components of a Box Plot

A box plot typically consists of the following parts:

1. **Box**: The rectangular box in the middle of the plot represents the **interquartile range (IQR)**, which is the range between the first quartile (Q1) and the third quartile (Q3). This box contains the **middle 50%** of the data.
   - **Q1** (first quartile): The value that marks the 25th percentile of the data (25% of data points are below this value).
   - **Q3** (third quartile): The value that marks the 75th percentile (75% of data points are below this value).

2. **Median**: Inside the box, there is a line that marks the **median (Q2)**, which is the middle value of the data when sorted. Half of the data points are below this line, and half are above it.

3. **Whiskers**: The lines extending from both ends of the box are called "whiskers." These represent the range of data outside the IQR, from the **minimum** to the **maximum** values.
   - The whiskers extend to the smallest and largest values that are not outliers. Outliers are typically marked separately.

4. **Outliers**: These are data points that fall far outside the range of most other data points. They are often shown as individual dots or symbols outside the whiskers. Outliers are typically more than 1.5 times the IQR away from the quartiles.

### What a Box Plot Tells You About the Distribution of Data

A box plot provides several key pieces of information about the data's distribution:

1. **Median** (Q2): It shows where the center of the data lies. If the median is close to the center of the box, the data is likely **symmetrical**.
   - If the median is **to the left** of the center, the data might be **skewed right** (positively skewed).
   - If the median is **to the right** of the center, the data might be **skewed left** (negatively skewed).

2. **Spread (Range)**: The length of the box (from Q1 to Q3) shows the **interquartile range (IQR)**, which represents the **middle 50%** of the data. The whiskers show the **full range** (from the minimum to the maximum), giving you an idea of the **overall spread** of the data.
   - A **longer box and whiskers** mean the data is more spread out.
   - A **shorter box and whiskers** mean the data is more tightly grouped.

3. **Skewness**:
   - **Symmetrical Distribution**: If the box and whiskers are roughly the same length on both sides of the median, the data is symmetrical.
   - **Right-Skewed (Positively Skewed)**: If the right whisker is longer than the left, or the median is closer to Q1 than Q3, the data is **skewed right**, meaning there are a few large values pulling the data to the right.
   - **Left-Skewed (Negatively Skewed)**: If the left whisker is longer than the right, or the median is closer to Q3 than Q1, the data is **skewed left**, meaning there are a few small values pulling the data to the left.

4. **Outliers**: Outliers are any data points that fall outside of the typical range defined by the whiskers. These are points that are significantly different from the rest of the data.
   - **Outliers** might represent **errors** in data collection, or they could indicate **rare events** that are important to investigate further.

### Example of a Box Plot

Imagine you have test scores from 10 students:  
**60, 62, 70, 73, 75, 80, 85, 88, 90, 92**

A box plot of this data might look like this:

- **Minimum**: 60
- **Q1**: 70 (first quartile)
- **Median (Q2)**: 75
- **Q3**: 88 (third quartile)
- **Maximum**: 92

This means:
- Half of the students scored between 70 and 88.
- The middle score (the median) is 75.
- The overall range of scores is from 60 to 92, with no outliers.

If the box plot had a very long whisker on the right (towards the maximum), this would indicate a **right-skewed distribution**, showing that a few students scored much higher than the others.

### When to Use a Box Plot

Box plots are particularly helpful when you want to:

1. **Compare multiple data sets**: You can quickly compare the spread, center, and outliers of different groups.
2. **Understand the distribution of data**: Box plots show if the data is skewed or symmetric, and if there are outliers.
3. **Identify outliers**: They make it easy to see if any data points are far away from the rest.

### Conclusion

A **box plot** is a powerful tool that helps you visualize the **distribution** of data, showing the spread (how wide or narrow the data is), the central value (median), and whether there are any outliers. It’s especially useful for comparing different sets of data and quickly understanding key characteristics, like the **spread**, **center**, and presence of **outliers**.

## Question No.5  Discuss the role of random sampling in making inferences about populations.
### Role of Random Sampling in Making Inferences About Populations

**Random sampling** is a technique used in statistics where each member of a population has an equal chance of being selected for a sample. This approach plays a crucial role in **making inferences** (drawn conclusions) about larger populations based on smaller sample data. By selecting a representative sample through random sampling, we can avoid bias and make more reliable generalizations about the population as a whole.

Let’s break down how random sampling works and why it’s so important for making inferences:

### 1. **What is Random Sampling?**
Random sampling means that the selection of individual participants or items from a population is done purely by chance, without any systematic pattern or bias. The idea is that every element in the population has an equal likelihood of being included in the sample.

There are several methods for achieving random sampling, including:
- **Simple Random Sampling**: Every individual in the population has an equal chance of being selected (e.g., drawing names from a hat).
- **Stratified Random Sampling**: The population is divided into different groups (strata), and random samples are taken from each group.
- **Systematic Random Sampling**: You choose every nth individual from a list or a sequence.

### 2. **Why is Random Sampling Important?**

Random sampling is crucial because it helps create a sample that is **representative** of the entire population. When you use random sampling, you are reducing the chances of bias influencing the results, which is critical for making **valid inferences**. Let’s explore why random sampling is so important:

#### a) **Reduces Bias**:
When you select a sample randomly, every individual has an equal chance of being chosen. This means there is less chance of favoring a certain group, characteristic, or outcome, which would lead to **biased results**. Without random sampling, the sample might not reflect the diversity of the entire population, and conclusions drawn from such a sample could be misleading.

- **Example**: If a researcher only surveys people who live in a particular area, the sample will likely only reflect the characteristics of people in that area, and not the broader population. Random sampling avoids such errors.

#### b) **Ensures Representativeness**:
The goal of random sampling is to select a group that mirrors the population as closely as possible. When the sample is representative, it becomes easier to make valid inferences about the population as a whole.

- **Example**: If you want to know the average income in a city, random sampling allows you to select a diverse group of individuals from different age groups, income levels, and neighborhoods. This diversity ensures that your sample is a good reflection of the population’s income distribution.

#### c) **Enables Statistical Inference**:
Random sampling is the foundation for **statistical inference**—the process of using data from a sample to make generalizations about a larger population. By taking random samples and calculating statistics like the **mean**, **standard deviation**, or **proportions**, we can make predictions or draw conclusions about the entire population, even though we haven't measured every individual.

- **Example**: If we randomly sample 100 students in a school to find the average test score, we can use that sample to infer the average test score of all students in the school.

#### d) **Reduces Sampling Error**:
Sampling error refers to the natural variability that occurs when using a sample to estimate population parameters. Random sampling helps to minimize sampling error by giving each individual in the population an equal chance of being selected. This makes the sample more likely to closely match the true characteristics of the population.

- **Example**: If you randomly select 1,000 people from a large city to estimate the percentage of people who support a specific political candidate, the variability between different random samples will be smaller than if you deliberately chose certain groups of people (e.g., only city center residents).

### 3. **How Random Sampling Affects Inferences**

Inferences are conclusions we draw about a population based on data from a sample. Random sampling directly impacts the accuracy and reliability of these inferences.

#### a) **Estimating Population Parameters**:
By using random sampling, you can estimate **population parameters** (such as the average height, income, or age of a population) from sample statistics (like the sample mean, sample median, etc.). The law of large numbers states that as the sample size increases, the sample mean will get closer to the true population mean, making the inferences more accurate.

- **Example**: Suppose you randomly sample 50 households to estimate the average monthly electricity bill in a city. The larger your sample (say, 500 households), the more confident you can be that your sample mean is close to the true population mean.

#### b) **Estimating Proportions**:
Random sampling allows you to estimate proportions (like the percentage of people who support a policy, use a product, or prefer a service). By analyzing the sample proportion, you can make inferences about the population proportion.

- **Example**: In a random survey of 200 people, you find that 60% of them support a new policy. You can then infer, with some margin of error, that around 60% of the entire population might support the policy.

#### c) **Generalizing Findings**:
Because random sampling helps ensure the sample is representative, you can more confidently **generalize** your findings from the sample to the population. This generalization is the core of statistical inference, allowing researchers and organizations to make decisions based on sample data.

- **Example**: A political poll that randomly samples 1,000 voters can make predictions about how the entire electorate will vote, which helps political candidates make informed decisions about their campaigns.

### 4. **Limitations of Random Sampling**
While random sampling is a powerful tool, it's not without limitations:
- **Practical Challenges**: Sometimes it’s difficult to achieve perfect randomness. For example, not everyone in a population might be easily accessible, or it may be difficult to create a truly random sample if the population is poorly defined.
- **Sampling Errors**: Even with random sampling, there’s always some degree of error due to chance. Larger sample sizes reduce this error, but it never completely disappears.
- **Cost and Time**: Random sampling can sometimes be expensive and time-consuming, especially in large populations or when it’s hard to access data.

### Conclusion

**Random sampling** is a vital tool in statistics that enables us to make **accurate and reliable inferences** about a **population** based on a **sample**. It helps ensure that the sample is representative, reduces bias, and allows for statistical generalizations. Whether you're estimating averages, proportions, or making decisions based on sample data, random sampling is the foundation for drawing valid conclusions about larger groups or populations.

## Question No.6 Explain the concept of skewness and its types. How does skewness affect the interpretation of data?
### What is Skewness?

**Skewness** is a statistical term that describes the **asymmetry** or **lopsidedness** of a data set. In simpler terms, it tells you whether the data is evenly distributed or if it has a long tail on one side. When data is **skewed**, it means that most of the values are clustered on one side of the mean (average), and the distribution is not symmetric.

Skewness can affect how we interpret the **center** of the data (e.g., mean, median) and can influence how we summarize and analyze the data.

### Types of Skewness

There are three main types of skewness:

#### 1. **Positive Skewness (Right Skewness)**
- **Definition**: In a **positively skewed** distribution, the **right tail** (higher values) is longer or fatter than the left tail. This means that most of the data points are clustered on the **left side**, and there are a few extremely high values pulling the mean to the right.

- **Visual Example**: Imagine a graph where the peak of the data is on the left, and the tail stretches out towards the right. It looks like a **right-leaning curve**.

- **Characteristics**:
  - **Mean > Median > Mode** (the mean is pulled to the right because of the long tail).
  - The **right tail** (larger values) is longer.

- **Example**: 
  - **Income** is often right-skewed because most people earn average or lower incomes, but a few people earn extremely high incomes, which pulls the average (mean) upwards.
  
#### 2. **Negative Skewness (Left Skewness)**
- **Definition**: In a **negatively skewed** distribution, the **left tail** (lower values) is longer or fatter than the right tail. This means that most of the data points are clustered on the **right side**, and there are a few extremely low values pulling the mean to the left.

- **Visual Example**: In a negatively skewed graph, the peak is towards the **right**, and the tail stretches out towards the **left**. It looks like a **left-leaning curve**.

- **Characteristics**:
  - **Mean < Median < Mode** (the mean is pulled to the left because of the long tail).
  - The **left tail** (smaller values) is longer.

- **Example**:
  - **Age at retirement** might be negatively skewed, with most people retiring around the age of 60 or 65, but a few retire much earlier (in their 30s or 40s), pulling the average age of retirement down.

#### 3. **Zero Skewness (Symmetric Distribution)**
- **Definition**: A distribution with **zero skewness** is symmetric, meaning the data is evenly distributed around the mean. In this case, the left and right tails are roughly equal in length.

- **Visual Example**: A perfectly symmetrical bell curve, like a **normal distribution**, has zero skewness.

- **Characteristics**:
  - The **mean** and **median** are approximately equal.
  - No long tail on either side.

- **Example**: 
  - **Heights of people** often have a normal distribution, where most people are of average height, and very few are extremely tall or short.

### How Skewness Affects the Interpretation of Data

Skewness can affect how we interpret data, particularly when it comes to **measures of central tendency** (mean, median, mode), **spread**, and **the overall shape of the data distribution**.

#### 1. **Mean vs. Median**
- In **positively skewed** data (right-skewed), the **mean** will be **greater** than the **median**, because the long right tail pulls the mean toward the higher values.
  - **Example**: If most people earn $30,000 a year but a few earn $1,000,000, the average (mean) income will be much higher than the median income, which reflects a more typical person.
  
- In **negatively skewed** data (left-skewed), the **mean** will be **less** than the **median**, because the long left tail pulls the mean toward the lower values.
  - **Example**: If most people retire at age 65, but a few retire early at age 30, the average retirement age (mean) will be lower than the typical retirement age (median).

#### 2. **Impact on Data Interpretation**
- **Positive Skewness**:
  - Since the mean is higher than the median, we know that the data has a **few large values** that are skewing the distribution to the right.
  - This can be important for understanding phenomena where extreme high values have more influence, such as in **income distribution** or **real estate prices**.
  
- **Negative Skewness**:
  - Since the mean is lower than the median, it suggests that there are **some extreme low values** pulling the distribution to the left.
  - This is important when interpreting **data that involves age**, **waiting times**, or **debt levels**, where a small number of very low values can drag the mean down.

#### 3. **Shape of the Distribution**
- The **shape** of the distribution gives us important clues about how the data is structured. For example:
  - A **positively skewed** distribution indicates that most data points are clustered at lower values, with a few outliers on the higher end.
  - A **negatively skewed** distribution shows that most data points are at higher values, with a few outliers on the lower end.

#### 4. **Outliers**
- Skewness can indicate the presence of **outliers**, which are extreme values far from the rest of the data. These outliers can greatly influence the mean, making it an unreliable measure of central tendency for skewed distributions.
  - In a **right-skewed** distribution, a few very high values are outliers.
  - In a **left-skewed** distribution, a few very low values are outliers.

### Summary

- **Skewness** refers to the asymmetry of the data distribution, and it can be **positive**, **negative**, or **zero** (symmetrical).
- **Positive skewness** (right skew) means the tail on the right is longer, and the mean is greater than the median.
- **Negative skewness** (left skew) means the tail on the left is longer, and the mean is less than the median.
- **Zero skewness** means the data is symmetric, and the mean and median are approximately the same.
- **Skewness affects the interpretation of the data** by influencing how we use the mean and median. In highly skewed data, the median often provides a better measure of central tendency than the mean, because the mean can be distorted by extreme values (outliers).

Understanding skewness helps us interpret data more accurately, particularly when deciding which measure of central tendency (mean or median) best represents the data.

## Question No.7 What is the interquartile range (IQR), and how is it used to detect outliers?
### What is the Interquartile Range (IQR)?

The **Interquartile Range (IQR)** is a measure of **spread** in a data set, and it tells us how the middle 50% of the data is distributed. It’s calculated by finding the difference between the **third quartile (Q3)** and the **first quartile (Q1)**. In simple terms, it represents the range within which the central half of the data lies.

- **Q1 (First Quartile)**: The median of the lower half of the data (25th percentile). It’s the value below which 25% of the data falls.
- **Q3 (Third Quartile)**: The median of the upper half of the data (75th percentile). It’s the value below which 75% of the data falls.

The formula for the **IQR** is:

\[
\text{IQR} = Q3 - Q1
\]

### Example of How to Calculate IQR

Let’s say you have the following data set (sorted in ascending order):

**4, 7, 8, 10, 15, 18, 21, 25, 28, 30**

1. **Find Q1 and Q3**:
   - **Q1 (First Quartile)**: The median of the lower half (4, 7, 8, 10, 15) is 8.
   - **Q3 (Third Quartile)**: The median of the upper half (18, 21, 25, 28, 30) is 25.

2. **Calculate the IQR**:
   \[
   \text{IQR} = Q3 - Q1 = 25 - 8 = 17
   \]

So, the IQR for this data set is **17**.

### How the IQR is Used to Detect Outliers

The IQR is useful for detecting **outliers**—data points that are unusually far away from the rest of the data. Outliers can skew the results and give misleading interpretations of the data.

To detect outliers using the IQR, we use the following rule:

1. **Find the "Lower Bound"**: Any data point below this value is considered a potential outlier.
   \[
   \text{Lower Bound} = Q1 - 1.5 \times \text{IQR}
   \]

2. **Find the "Upper Bound"**: Any data point above this value is considered a potential outlier.
   \[
   \text{Upper Bound} = Q3 + 1.5 \times \text{IQR}
   \]

If any data points fall outside the **lower bound** or **upper bound**, they are considered **outliers**.

### Example: Detecting Outliers

Continuing with the same data set:

**4, 7, 8, 10, 15, 18, 21, 25, 28, 30**

- We already know that **Q1 = 8** and **Q3 = 25**, and the IQR is **17**.

1. **Calculate the Lower Bound**:
   \[
   \text{Lower Bound} = Q1 - 1.5 \times \text{IQR} = 8 - 1.5 \times 17 = 8 - 25.5 = -17.5
   \]
   Any data point **below -17.5** would be an outlier. Since all data points are greater than -17.5, there are no outliers on the lower side.

2. **Calculate the Upper Bound**:
   \[
   \text{Upper Bound} = Q3 + 1.5 \times \text{IQR} = 25 + 1.5 \times 17 = 25 + 25.5 = 50.5
   \]
   Any data point **above 50.5** would be an outlier. Since all data points are below 50.5, there are no outliers on the upper side.

In this case, **there are no outliers** in this data set because all the data points fall between -17.5 and 50.5.

### Summary

- The **Interquartile Range (IQR)** measures the spread of the middle 50% of the data, calculated as \( Q3 - Q1 \).
- The **IQR** is useful for identifying **outliers**. Outliers are data points that lie **below** \( Q1 - 1.5 \times \text{IQR} \) or **above** \( Q3 + 1.5 \times \text{IQR} \).
- **Outliers** can be unusual or extreme values that don’t fit the general pattern of the data and might need further investigation.

In short, the IQR helps you understand the **typical spread** of the data and **identify values that don't belong**.

## Question No.8 Discuss the conditions under which the binomial distribution is used.
### Conditions for Using the Binomial Distribution

The **binomial distribution** is used to model situations where there are **two possible outcomes** for each trial, often referred to as **success** and **failure**. It describes the number of successes in a **fixed number of independent trials**, where each trial has the same probability of success. For the binomial distribution to be appropriate, certain conditions must be met. 

### The 4 Key Conditions for the Binomial Distribution

1. **Two Possible Outcomes (Success or Failure)**:
   - Each trial has only **two possible outcomes**: one that is considered a **success** and one that is considered a **failure**. These outcomes should be mutually exclusive, meaning if one happens, the other cannot.
   - **Example**: Flipping a coin—either you get **heads (success)** or **tails (failure)**.

2. **Fixed Number of Trials**:
   - The number of trials (or experiments) is fixed and determined in advance. For example, you might decide to flip the coin **10 times**.
   - **Example**: You conduct a survey with **100 people**, asking whether they like a product or not. You ask all 100 people, so the number of trials is fixed.

3. **Independent Trials**:
   - Each trial is **independent**, meaning the outcome of one trial does not affect the outcome of another. The probability of success in one trial remains the same for every trial.
   - **Example**: If you're flipping a fair coin, the outcome of the first flip doesn’t affect the outcome of the second flip. Each flip is independent.

4. **Constant Probability of Success**:
   - The probability of success (denoted as \( p \)) is **the same** for every trial. Similarly, the probability of failure (denoted as \( 1 - p \)) is also constant.
   - **Example**: In a survey, the probability that a person likes the product is always the same, say **60%** (or \( p = 0.6 \)) for every person surveyed.

### Key Formula for the Binomial Distribution

If the above conditions are met, the **binomial distribution** can be used to calculate the probability of getting exactly **k successes** in **n trials**. The formula is:

\[
P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}
\]

Where:
- \( P(X = k) \) is the probability of getting exactly **k successes**.
- \( n \) is the number of trials.
- \( k \) is the number of successes you want.
- \( p \) is the probability of success on a single trial.
- \( \binom{n}{k} \) is the **binomial coefficient**, which represents the number of ways to choose **k successes** from **n trials**.

### Examples of When to Use the Binomial Distribution

1. **Coin Tosses**:
   - **Scenario**: You flip a coin 10 times and want to know the probability of getting exactly 6 heads.
   - **Conditions**: There are two outcomes (heads or tails), a fixed number of flips (10), independent flips, and a constant probability of heads (50% for a fair coin).

2. **Product Preferences**:
   - **Scenario**: A survey asks 100 people whether they like a new product. The probability that a person likes the product is 0.7. You want to know the probability that exactly 75 people like the product.
   - **Conditions**: Two outcomes (like or not like), fixed number of survey participants (100), independent responses, and constant probability of liking the product (0.7).

3. **Quality Control**:
   - **Scenario**: A factory produces 200 light bulbs per day, and each light bulb has a 95% chance of being non-defective. You want to know the probability that exactly 5 light bulbs are defective in a day's production.
   - **Conditions**: Two outcomes (defective or not defective), fixed number of light bulbs produced (200), independent production of each bulb, and constant probability of a bulb being defective (5%).

### Conclusion

The **binomial distribution** is used when you have a **fixed number of trials**, each trial has **two possible outcomes**, the trials are **independent**, and the probability of success is **the same** for each trial. If these conditions are met, the binomial distribution can be used to calculate the probability of a specific number of successes in those trials.

## Question No.9 Explain the properties of the normal distribution and the empirical rule (68-95-99.7 rule).
### Properties of the Normal Distribution

The **normal distribution** is a very important concept in statistics. It’s often used to model things like heights, weights, test scores, and other naturally occurring data that tends to cluster around an average. The **normal distribution** is also known as the **bell curve** because of its characteristic shape.

Here are the key properties of the **normal distribution**:

1. **Symmetry**:
   - The normal distribution is **symmetric** around its mean. This means that if you fold the graph in half at the mean, the two halves will be identical. There’s an equal amount of data on both sides of the mean.
   - **Example**: In a population of people, the number of people shorter than average is approximately equal to the number of people taller than average.

2. **Bell-Shaped Curve**:
   - The graph of the normal distribution is a smooth, **bell-shaped curve**. Most of the data points are close to the mean, and fewer data points are found as you move farther away from the mean.
   - **Example**: If you measure the heights of adults, most people will have a height close to the average, with fewer being extremely tall or extremely short.

3. **Mean, Median, and Mode are Equal**:
   - In a perfectly normal distribution, the **mean**, **median**, and **mode** all coincide at the same point at the center of the distribution. This is because of the symmetry of the normal curve.
   - **Example**: In a normal distribution of test scores, the average score, the middle score, and the most frequent score are all the same.

4. **Tails Approach Zero**:
   - The tails of the normal distribution curve go on forever, but they approach **zero** as they move further from the mean. This means there’s a very small chance of observing values far away from the mean.
   - **Example**: In a normal distribution of IQ scores, there are a few individuals with extremely high or low IQs, but the number of these individuals becomes extremely rare as you go further from the mean.

5. **Area under the Curve**:
   - The total area under the normal distribution curve is equal to **1** (or 100%). This represents the total probability of all possible outcomes.
   - **Example**: If the normal distribution is used to model test scores, the area under the curve represents the **total probability** of all students' scores.

---

### The Empirical Rule (68-95-99.7 Rule)

The **empirical rule** is a simple way to understand the spread of data in a normal distribution. It tells you about the proportion of data that falls within certain distances from the mean. The rule is also known as the **68-95-99.7 rule** because it provides the percentages of data within 1, 2, and 3 standard deviations from the mean.

Here’s how the empirical rule works:

1. **68% of the Data**:
   - About **68%** of the data in a normal distribution falls within **1 standard deviation** from the mean. This means that most of the data points are close to the average.
   - **Example**: If the average height of adults is 5'7" with a standard deviation of 3 inches, about 68% of adults will have a height between 5'4" and 5'10".

2. **95% of the Data**:
   - About **95%** of the data falls within **2 standard deviations** from the mean. This means that nearly all of the data is within this range, with only a small percentage of extreme values beyond this range.
   - **Example**: Using the same example of adult heights, about 95% of adults will have a height between 5'1" and 6'1".

3. **99.7% of the Data**:
   - About **99.7%** of the data falls within **3 standard deviations** from the mean. This means that almost all of the data is contained within this range, and only a very small percentage of extreme values lie outside this range.
   - **Example**: In the height example, nearly all adults will have a height between 4'11" and 6'5".

### Visualizing the Empirical Rule

If you look at a normal distribution curve, the empirical rule can be visualized like this:

- **68%** of the data is within 1 standard deviation of the mean (this covers the central part of the curve).
- **95%** of the data is within 2 standard deviations of the mean (this covers a broader area of the curve).
- **99.7%** of the data is within 3 standard deviations of the mean (this covers nearly all the data).

### Why is the Empirical Rule Useful?

The **empirical rule** is useful because it gives you a quick way to understand the spread of data in a normal distribution. It helps you know:

- **How much data is close to the average** (68% is within 1 standard deviation).
- **How much data is farther from the average** (95% is within 2 standard deviations).
- **How much data is in the "extreme" range** (99.7% is within 3 standard deviations).

This rule is especially helpful in areas like:

- **Quality control**: In manufacturing, to see if products fall within acceptable limits.
- **Test scores**: To understand how many students fall within a certain range of scores.
- **Medical measurements**: To identify if a measurement is unusually high or low compared to typical values.

### Summary

- The **normal distribution** is a symmetric, bell-shaped curve where the mean, median, and mode are equal. It has tails that go to infinity and the area under the curve is 1.
- The **empirical rule** (68-95-99.7 rule) helps you understand the spread of data in a normal distribution:
  - **68%** of the data falls within **1 standard deviation** of the mean.
  - **95%** of the data falls within **2 standard deviations** of the mean.
  - **99.7%** of the data falls within **3 standard deviations** of the mean.

This rule gives us a quick and easy way to interpret and understand data that follows a normal distribution.

## Question No.10 Provide a real-life example of a Poisson process and calculate the probability for a specific event.
### What is a Poisson Process?

A **Poisson process** is a type of statistical model that describes **events happening randomly** over a fixed period of time or space, where the events occur at a **constant average rate** and independently of each other. The key points to remember are:
- The events happen **one at a time**.
- They happen **randomly** but at a known average rate.
- The time between events (or the number of events in a given time period) can be described using the **Poisson distribution**.

### Real-Life Example of a Poisson Process: Calls at a Call Center

Let’s consider a **call center** that receives calls throughout the day. Suppose, on average, the call center receives **3 calls per hour**. We can use a Poisson process to model this situation.

#### Scenario:
The call center receives calls **randomly**, but on average, there are **3 calls per hour**. We want to calculate the probability that the call center will receive **exactly 4 calls** in the next hour.

### Poisson Distribution Formula

The **Poisson distribution** gives the probability of a given number of events (k) happening in a fixed interval of time, when the events happen at a constant rate (λ). The formula is:

\[
P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
\]

Where:
- \( P(X = k) \) is the probability of getting exactly **k** events.
- \( \lambda \) is the average number of events in the given time period (the rate).
- \( k \) is the actual number of events we want to calculate the probability for.
- \( e \) is approximately **2.71828** (Euler’s number).

### Step-by-Step Calculation

In our example:
- \( \lambda = 3 \) (the average number of calls per hour).
- \( k = 4 \) (we want to find the probability of receiving exactly 4 calls).
- \( e \) is approximately **2.71828**.

Now, plug these values into the Poisson formula:

\[
P(X = 4) = \frac{3^4 e^{-3}}{4!}
\]

#### Step 1: Calculate \( 3^4 \)
\[
3^4 = 81
\]

#### Step 2: Calculate \( e^{-3} \)
\[
e^{-3} \approx 0.0498
\]

#### Step 3: Calculate \( 4! \) (the factorial of 4)
\[
4! = 4 \times 3 \times 2 \times 1 = 24
\]

#### Step 4: Put everything together
\[
P(X = 4) = \frac{81 \times 0.0498}{24}
\]

\[
P(X = 4) = \frac{4.034}{24} \approx 0.1681
\]

### Interpretation of the Result

So, the probability that the call center will receive **exactly 4 calls** in the next hour is approximately **0.1681**, or **16.81%**.

### Summary

- A **Poisson process** models events happening randomly over time or space, with a constant average rate.
- In the call center example, with an average of **3 calls per hour**, we used the **Poisson distribution** to calculate the probability of receiving exactly **4 calls** in the next hour.
- The probability of receiving exactly 4 calls is about **16.81%**.

This example shows how the Poisson distribution is useful for predicting the likelihood of a specific number of random events occurring within a fixed time period, especially in situations like customer service, traffic accidents, or emails arriving at a specific rate.

## Question No.11 Explain what a random variable is and differentiate between discrete and continuous random variables.
### What is a Random Variable?

A **random variable** is a **numerical outcome** of a random process or experiment. It represents a quantity that can take on different values, depending on the outcome of an uncertain event or process. In other words, a random variable is a function that maps outcomes of a random experiment to numerical values.

For example:
- In a dice roll, the outcome (the number rolled) can be represented by a random variable.
- In a survey, the number of people who prefer a certain product can also be represented as a random variable.

### Types of Random Variables

Random variables can be classified into two types:
1. **Discrete Random Variables**
2. **Continuous Random Variables**

---

### 1. **Discrete Random Variables**

A **discrete random variable** is one that can take only a **finite** or **countably infinite** number of values. These values are typically distinct and separate, meaning you can count them individually. Discrete random variables often arise from counting processes.

#### Characteristics of Discrete Random Variables:
- They can only take specific, distinct values (no in-between values).
- They are often associated with counting something.
- The values can be **counted** (e.g., number of heads in coin tosses, number of students in a classroom).

#### Examples:
- **Number of children in a family**: A family can have 0, 1, 2, 3, or more children, but not 2.5 children.
- **Number of cars passing a checkpoint in an hour**: You can count the number of cars, such as 0, 1, 2, 3, etc., but not 2.5 cars.
- **Number of goals scored in a soccer match**: You can score 0, 1, 2, etc., but not 2.3 goals.

#### Probability Distribution:
- A discrete random variable has a **probability mass function (PMF)** that assigns a probability to each possible value.
- The sum of all probabilities must equal **1**.

---

### 2. **Continuous Random Variables**

A **continuous random variable** is one that can take an **infinite number of values** within a given range. These values are not countable because they can be any real number within a certain interval. Continuous variables are usually associated with measuring something, like height, weight, or time.

#### Characteristics of Continuous Random Variables:
- They can take any value within a continuous range (not just distinct values).
- They are associated with measurements, like length, weight, or time.
- You cannot count the exact number of values they can take because there are infinite possibilities.

#### Examples:
- **Height of a person**: A person’s height could be 170 cm, 170.1 cm, 170.01 cm, and so on. There are infinitely many possible values.
- **Time to run a race**: The time it takes for someone to run a race can be any positive number, such as 10.23 seconds, 10.235 seconds, etc.
- **Temperature**: Temperature can be any value within a range, such as 25.3°C, 25.35°C, 25.355°C, etc.

#### Probability Distribution:
- A continuous random variable has a **probability density function (PDF)**. Instead of assigning probabilities to exact values, the probability is described by areas under the curve of the PDF.
- For continuous variables, the probability of any exact value occurring is always **0**. Instead, probabilities are described over intervals (e.g., the probability that a person's height is between 170 cm and 180 cm).

---

### Key Differences Between Discrete and Continuous Random Variables

| **Feature**                     | **Discrete Random Variable**               | **Continuous Random Variable**            |
|----------------------------------|--------------------------------------------|-------------------------------------------|
| **Nature of Values**            | Takes specific, countable values.          | Takes an infinite number of values within a range. |
| **Examples**                     | Number of cars, number of students, number of goals. | Height, weight, time, temperature. |
| **Values**                       | Finite or countably infinite.              | Infinite (uncountable) values within a range. |
| **Probability Distribution**     | Probability Mass Function (PMF).           | Probability Density Function (PDF). |
| **Probability of Exact Value**   | Can assign a non-zero probability to exact values. | Probability of a specific value is 0; probabilities are over intervals. |
| **Measurement Type**             | Associated with **counting**.              | Associated with **measuring**. |

### Examples to Illustrate

#### 1. Discrete Random Variable:
- **Example**: **Rolling a Die**
  - Random Variable \( X \) = the number rolled.
  - Possible values: \( X = 1, 2, 3, 4, 5, 6 \).
  - Since the outcome is one of six distinct values, the random variable is **discrete**.

#### 2. Continuous Random Variable:
- **Example**: **Time to Complete a Task**
  - Random Variable \( Y \) = time taken to finish a task.
  - Possible values: \( Y = 12.3 \) seconds, \( Y = 12.31 \) seconds, \( Y = 12.314 \) seconds, and so on.
  - Since the value can be any number within a range, the random variable is **continuous**.

---

### Conclusion

- **Random variables** are used to represent the outcomes of random processes.
- A **discrete random variable** can only take distinct, countable values (like the number of cars passing a point), while a **continuous random variable** can take any value within a range (like the height of a person).
- Understanding the distinction between **discrete** and **continuous** random variables is essential because they are handled differently in statistical analysis.

## Question No.12 Provide an example dataset, calculate both covariance and correlation, and interpret the results.
### What Are Covariance and Correlation?

Both **covariance** and **correlation** are measures that help us understand the relationship between two variables. Let’s break down their meanings:

- **Covariance** tells you how two variables change together. If both variables tend to increase together, covariance will be positive. If one increases while the other decreases, covariance will be negative. If there is no consistent relationship, covariance will be close to zero.

- **Correlation** is a standardized version of covariance, which gives you a measure of how strong the relationship is, and it also tells you whether the relationship is positive or negative. The value of correlation ranges from **-1** (perfect negative correlation) to **+1** (perfect positive correlation). A correlation close to **0** means there is little or no linear relationship between the variables.

---

### Example Dataset

Let's consider the following dataset, which shows the number of **hours studied** and the **exam scores** of 5 students:

| Student | Hours Studied (X) | Exam Score (Y) |
|---------|-------------------|----------------|
| 1       | 2                 | 50             |
| 2       | 3                 | 55             |
| 3       | 5                 | 70             |
| 4       | 6                 | 75             |
| 5       | 8                 | 90             |

We will calculate the **covariance** and **correlation** between the two variables: **Hours Studied (X)** and **Exam Score (Y)**.

---

### Step 1: Calculate Covariance

Covariance is calculated using the following formula:

\[
\text{Cov}(X, Y) = \frac{\sum (X_i - \overline{X})(Y_i - \overline{Y})}{n}
\]

Where:
- \( X_i \) and \( Y_i \) are the individual values of the variables \( X \) and \( Y \),
- \( \overline{X} \) and \( \overline{Y} \) are the means (averages) of \( X \) and \( Y \),
- \( n \) is the number of data points (in this case, 5).

#### Step 1.1: Find the Means of \( X \) and \( Y \)

\[
\overline{X} = \frac{2 + 3 + 5 + 6 + 8}{5} = \frac{24}{5} = 4.8
\]

\[
\overline{Y} = \frac{50 + 55 + 70 + 75 + 90}{5} = \frac{340}{5} = 68
\]

#### Step 1.2: Calculate the Differences from the Mean and Multiply

Now, we’ll calculate the differences between each data point and the mean, and multiply them for each pair:

| Student | \( X_i \) | \( Y_i \) | \( X_i - \overline{X} \) | \( Y_i - \overline{Y} \) | \( (X_i - \overline{X})(Y_i - \overline{Y}) \) |
|---------|----------|----------|--------------------------|--------------------------|-----------------------------------------------|
| 1       | 2        | 50       | 2 - 4.8 = -2.8           | 50 - 68 = -18            | (-2.8)(-18) = 50.4                           |
| 2       | 3        | 55       | 3 - 4.8 = -1.8           | 55 - 68 = -13            | (-1.8)(-13) = 23.4                           |
| 3       | 5        | 70       | 5 - 4.8 = 0.2            | 70 - 68 = 2              | (0.2)(2) = 0.4                               |
| 4       | 6        | 75       | 6 - 4.8 = 1.2            | 75 - 68 = 7              | (1.2)(7) = 8.4                               |
| 5       | 8        | 90       | 8 - 4.8 = 3.2            | 90 - 68 = 22             | (3.2)(22) = 70.4                              |

#### Step 1.3: Sum of the Products

Now, sum the products of the differences:

\[
\text{Sum of Products} = 50.4 + 23.4 + 0.4 + 8.4 + 70.4 = 153
\]

#### Step 1.4: Calculate Covariance

Now, divide the sum by the number of data points (5) to get the covariance:

\[
\text{Cov}(X, Y) = \frac{153}{5} = 30.6
\]

So, the covariance between **Hours Studied** and **Exam Score** is **30.6**.

---

### Step 2: Calculate Correlation

Correlation is calculated using the formula:

\[
\text{Correlation}(X, Y) = \frac{\text{Cov}(X, Y)}{ \sigma_X \sigma_Y }
\]

Where:
- \( \text{Cov}(X, Y) \) is the covariance we just calculated,
- \( \sigma_X \) is the standard deviation of \( X \) (Hours Studied),
- \( \sigma_Y \) is the standard deviation of \( Y \) (Exam Score).

#### Step 2.1: Calculate Standard Deviations of \( X \) and \( Y \)

The standard deviation is the square root of the variance. The formula for the variance is:

\[
\text{Variance} = \frac{\sum (X_i - \overline{X})^2}{n}
\]

##### Standard Deviation of \( X \) (Hours Studied):

\[
\text{Variance}_X = \frac{(-2.8)^2 + (-1.8)^2 + (0.2)^2 + (1.2)^2 + (3.2)^2}{5} = \frac{7.84 + 3.24 + 0.04 + 1.44 + 10.24}{5} = \frac{22.8}{5} = 4.56
\]

\[
\sigma_X = \sqrt{4.56} \approx 2.13
\]

##### Standard Deviation of \( Y \) (Exam Score):

\[
\text{Variance}_Y = \frac{(-18)^2 + (-13)^2 + 2^2 + 7^2 + 22^2}{5} = \frac{324 + 169 + 4 + 49 + 484}{5} = \frac{1030}{5} = 206
\]

\[
\sigma_Y = \sqrt{206} \approx 14.36
\]

#### Step 2.2: Calculate Correlation

Now that we have the covariance and standard deviations, we can calculate the correlation:

\[
\text{Correlation}(X, Y) = \frac{30.6}{2.13 \times 14.36} \approx \frac{30.6}{30.56} \approx 0.999
\]

So, the correlation between **Hours Studied** and **Exam Score** is approximately **0.999**, which is very close to **1**.

---

### Interpretation of Results

- **Covariance** = **30.6**: This is a positive covariance, which means that as the number of hours studied increases, the exam scores tend to increase as well. The magnitude of the covariance (30.6) doesn’t tell us how strong the relationship is; we need to look at the correlation for that.
  
- **Correlation** = **0.999**: This is a very strong positive correlation, very close to **1**. It indicates that there is a very strong, nearly perfect linear relationship between the number of hours studied and the exam score. In other words, the more hours a student studies, the higher their exam score tends to be.

### Conclusion

- **Covariance** helps us understand the direction of the relationship (positive or negative), but the **correlation** gives us a clearer idea of how strong that relationship is.
- In this case, the positive covariance and the very strong correlation suggest that studying more hours is closely associated with higher exam scores.