# Q1. What are the three measures of central tendency?

The three measures of central tendency are:

1. **Mean**: The arithmetic average of a dataset, calculated by summing all the data points and dividing by the number of data points. It provides a measure of the central value in a dataset.

2. **Median**: The middle value of a dataset when the data points are arranged in ascending or descending order. If there is an even number of data points, the median is the average of the two middle values. It is particularly useful for datasets with outliers or skewed distributions.

3. **Mode**: The value that appears most frequently in a dataset. A dataset may have one mode, more than one mode (bimodal or multimodal), or no mode if all values are unique. It is especially useful for categorical data.

# Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

The **mean, median, and mode** are all measures of central tendency, but they each capture different aspects of a dataset's central point. Here’s a breakdown of the differences and how they are used:

### **1. Mean (Arithmetic Average)**
- **Definition**: The mean is calculated by adding up all the data points in a dataset and then dividing by the number of data points.
- **Calculation**: 
  \[
  \text{Mean} = \frac{\sum X_i}{N}
  \]
  Where \(X_i\) represents each data point and \(N\) is the total number of data points.
- **Use**: 
  - **Strengths**: The mean is useful for datasets with a symmetrical distribution, where all values are fairly evenly spread around the central point. It takes into account every data point in the dataset.
  - **Weaknesses**: The mean is sensitive to outliers (extremely high or low values) which can skew the average and make it unrepresentative of the typical value in the dataset.
  - **Example**: In a dataset of salaries [30,000, 35,000, 40,000, 50,000, 500,000], the mean salary would be higher due to the one very high salary, potentially giving a misleading impression of the typical salary.

### **2. Median**
- **Definition**: The median is the middle value in a dataset when the data points are arranged in ascending or descending order. If there is an even number of data points, the median is the average of the two middle values.
- **Use**: 
  - **Strengths**: The median is robust against outliers and skewed data. It provides a better measure of central tendency in datasets that are not symmetrically distributed.
  - **Weaknesses**: The median does not take into account the exact values of all data points, just their relative order.
  - **Example**: In the same salary dataset [30,000, 35,000, 40,000, 50,000, 500,000], the median salary would be 40,000, which might be a more accurate reflection of the typical salary than the mean.

### **3. Mode**
- **Definition**: The mode is the value that occurs most frequently in a dataset. A dataset can be unimodal (one mode), bimodal (two modes), or multimodal (more than two modes).
- **Use**: 
  - **Strengths**: The mode is especially useful for categorical data or for identifying the most common value in a dataset. It is the only measure of central tendency that can be used with nominal data (categories without a numerical order).
  - **Weaknesses**: The mode might not exist in a dataset where all values are unique, and it is not as informative when there are multiple modes.
  - **Example**: In a dataset of shoe sizes [7, 8, 8, 9, 10], the mode is 8, indicating that size 8 is the most common.

### **How They Measure Central Tendency**:
- **Mean**: Measures the central value by considering all data points and balancing them around the arithmetic average. It gives a sense of the overall level of data but can be skewed by outliers.
- **Median**: Measures the central point by focusing on the middle value, providing a more accurate central tendency for skewed distributions or datasets with outliers.
- **Mode**: Measures the most frequent value, giving insight into the most common or typical case in the dataset, especially useful for categorical or discrete data.

### **When to Use Each Measure**:
- **Use the Mean**: When the data is symmetrically distributed without outliers, and you want an overall average.
- **Use the Median**: When the data is skewed or contains outliers, and you want a central value that is not affected by extreme values.
- **Use the Mode**: When dealing with categorical data or when you need to identify the most common value in a dataset.

# Q3. Measure the three measures of central tendency for the given height data:
## [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [1]:
import numpy as np
import pandas as pd


In [9]:
df = pd.DataFrame([178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5])
df.head()

Unnamed: 0,0
0,178.0
1,177.0
2,176.0
3,177.0
4,178.2


In [10]:
mean = df.mean()

In [11]:
mean

0    177.01875
dtype: float64

In [12]:
median = df.median()

In [13]:
median

0    177.0
dtype: float64

In [14]:
(180 + 175)/2

177.5

In [15]:
from scipy import stats

In [16]:
stats.mode(df)

  stats.mode(df)


ModeResult(mode=array([[177.]]), count=array([[3]]))

# Q4. Find the standard deviation for the given data:
## [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [17]:
std = df.std()
std

0    1.847239
dtype: float64

# Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Measures of dispersion, such as **range**, **variance**, and **standard deviation**, are used to describe how spread out the data points in a dataset are. These measures provide insights into the variability, consistency, and predictability of the data. Here’s how each measure works and how they are used:

### **1. Range**
- **Definition**: The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset.
- **Calculation**:
  \[
  \text{Range} = \text{Maximum Value} - \text{Minimum Value}
  \]
- **Use**: 
  - The range provides a quick snapshot of the spread of data, showing the extent between the smallest and largest values. 
  - However, it only considers the two extreme values and does not provide information about the distribution of the other data points.
- **Example**: 
  - In a dataset of exam scores: [45, 50, 65, 70, 90], the range is 90 - 45 = 45.
  - This tells us that the difference between the highest and lowest scores is 45 points.

### **2. Variance**
- **Definition**: Variance measures the average squared deviation of each data point from the mean of the dataset. It quantifies how much the data points differ from the mean.
- **Calculation**:
  \[
  \text{Variance} (\sigma^2) = \frac{\sum (X_i - \mu)^2}{N}
  \]
  Where \(X_i\) is each data point, \(\mu\) is the mean of the dataset, and \(N\) is the number of data points.
- **Use**:
  - Variance provides a measure of how much the data points are spread out around the mean. A higher variance indicates that the data points are more dispersed.
  - It is particularly useful for comparing the spread of two or more datasets.
- **Example**:
  - Consider two datasets of exam scores:
    - Dataset A: [50, 60, 70, 80, 90]
    - Dataset B: [70, 72, 74, 76, 78]
  - Both datasets have the same mean (70), but Dataset A has a larger variance than Dataset B, indicating that the scores in Dataset A are more spread out around the mean.

### **3. Standard Deviation**
- **Definition**: The standard deviation is the square root of the variance. It represents the average distance of each data point from the mean, expressed in the same units as the original data.
- **Calculation**:
  \[
  \text{Standard Deviation} (\sigma) = \sqrt{\text{Variance}}
  \]
- **Use**:
  - Standard deviation is widely used because it provides a clear interpretation of the spread in the same units as the data. 
  - It tells us how much the data points typically deviate from the mean. A smaller standard deviation indicates that the data points are closely clustered around the mean, while a larger standard deviation indicates greater spread.
  - Standard deviation is also crucial for understanding the reliability and predictability of data, as well as for making inferences about a population from a sample.
- **Example**:
  - Using the same datasets from the variance example:
    - Dataset A will have a larger standard deviation than Dataset B, reflecting the greater spread of scores around the mean in Dataset A.

### **Example Illustration**:
Let's say we have two classes, A and B, with the following test scores:

- **Class A**: [70, 75, 80, 85, 90]
- **Class B**: [60, 70, 80, 90, 100]

- **Range**:
  - Class A: 90 - 70 = 20
  - Class B: 100 - 60 = 40
  - **Interpretation**: Class B has a wider range, indicating a greater spread between the lowest and highest scores compared to Class A.

- **Variance**:
  - Variance for Class A will be smaller than that for Class B, as Class B has scores that are farther from the mean.

- **Standard Deviation**:
  - Class A will have a smaller standard deviation, indicating that the scores are more closely clustered around the mean, whereas Class B's larger standard deviation indicates more variability in scores.

### **Summary of Usage**:
- **Range**: Useful for a quick understanding of the spread between the extreme values.
- **Variance**: Provides a deeper understanding of the overall spread of the data around the mean, useful for comparing variability between datasets.
- **Standard Deviation**: Offers a practical measure of spread in the same units as the data, widely used for assessing consistency, predictability, and making inferences.

Together, these measures help to describe the variability in a dataset, which is essential for understanding the overall distribution and making informed decisions based on the data.

# Q6. What is a Venn diagram?

A **Venn diagram** is a graphical representation used to show the relationships between different sets or groups of items. It consists of overlapping circles, where each circle represents a set, and the areas where the circles overlap represent the elements that are common to those sets.

### **Key Features of a Venn Diagram:**

1. **Circles Representing Sets**: Each circle in the diagram represents a set, which is a collection of items that share a common characteristic.

2. **Overlapping Areas**: The areas where the circles overlap represent the intersection of the sets, meaning the elements that are common to both or all sets.

3. **Non-Overlapping Areas**: The areas of the circles that do not overlap represent elements that are unique to each set.

4. **Universal Set (Sometimes Included)**: In some Venn diagrams, a rectangle surrounding all the circles represents the universal set, which includes all possible elements under consideration.

### **Uses of Venn Diagrams:**

- **Illustrating Relationships**: Venn diagrams are commonly used to visually illustrate the relationships between different sets, such as showing commonalities and differences.
  
- **Problem Solving**: They are used in mathematics and logic to solve problems involving set theory, probability, and logic.

- **Comparison**: Venn diagrams help in comparing and contrasting different groups or categories.

### **Example of a Venn Diagram:**

Consider three sets:
- **Set A**: People who like apples.
- **Set B**: People who like bananas.
- **Set C**: People who like cherries.

A Venn diagram would have three overlapping circles, each representing one of these sets:
- The area where all three circles overlap represents people who like apples, bananas, and cherries.
- The area where only circles A and B overlap represents people who like both apples and bananas, but not cherries.
- The non-overlapping part of circle A represents people who only like apples, and so on.

Venn diagrams are a simple yet powerful tool to visualize the logical relationships between different sets.

# Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:
## (i) A Intersect B
## (ii) A ⋃ B

(i) (2,6)


(ii) (0,2,3,4,5,6,7,10)

# Q8. What do you understand about skewness in data?

**Skewness** refers to the asymmetry or lack of symmetry in the distribution of data. It measures the degree to which a dataset deviates from a normal distribution, where the data is perfectly symmetrical around the mean. Skewness helps in understanding the direction and extent to which the data is skewed, providing insights into the shape of the data distribution.

### **Types of Skewness:**

1. **Positive Skewness (Right-Skewed)**
   - **Description**: In a positively skewed distribution, the tail on the right side of the distribution (the higher values) is longer or fatter than the left side. Most of the data points are concentrated on the left side of the distribution.
   - **Characteristics**:
     - The mean is greater than the median.
     - The majority of the data values are lower, with a few higher values pulling the tail to the right.
   - **Example**: Income distribution in many populations is often positively skewed, with a large number of people earning lower to middle incomes and a smaller number of people earning very high incomes.

2. **Negative Skewness (Left-Skewed)**
   - **Description**: In a negatively skewed distribution, the tail on the left side of the distribution (the lower values) is longer or fatter than the right side. Most of the data points are concentrated on the right side of the distribution.
   - **Characteristics**:
     - The mean is less than the median.
     - The majority of the data values are higher, with a few lower values pulling the tail to the left.
   - **Example**: Exam scores where most students score high but a few score much lower could result in a negatively skewed distribution.

3. **No Skewness (Symmetrical Distribution)**
   - **Description**: A perfectly symmetrical distribution has no skewness, meaning the left and right sides of the distribution are mirror images. The mean, median, and mode are all equal.
   - **Example**: A perfect bell curve or normal distribution is an example of a distribution with no skewness.

### **Implications of Skewness:**
- **Mean vs. Median**: In skewed distributions, the mean is pulled in the direction of the skew (towards the tail), while the median remains more central, providing a better measure of central tendency in such cases.
- **Data Analysis**: Skewness is important in data analysis because many statistical models and methods, such as regression, assume normally distributed data. If data is skewed, transformations or non-parametric methods may be needed.
- **Decision Making**: Understanding skewness helps in making informed decisions, especially in areas like finance or economics, where skewed distributions can have significant implications.

### **Example:**
- **Right-Skewed**: Consider a distribution of house prices in a city. If most houses are moderately priced but a few luxury houses are extremely expensive, the distribution would be right-skewed.
- **Left-Skewed**: Consider a distribution of ages at retirement in a company where most employees retire at an older age, but a few retire much earlier. This would result in a left-skewed distribution.

In summary, skewness provides valuable information about the direction and extent of asymmetry in a dataset, which is crucial for accurate data interpretation and analysis.

# Q9. If a data is right skewed then what will be the position of median with respect to mean?

If a dataset is **right-skewed** (positively skewed), the **mean** will be greater than the **median**.

### **Explanation:**
- In a right-skewed distribution, the tail on the right side of the distribution is longer, which means there are some extreme values that are much higher than the rest of the data.
- These higher values pull the mean toward the right (higher values), while the median, being the middle value, is less affected by the extreme values and remains closer to the majority of the data points.
- As a result, the mean is positioned to the right of the median in a right-skewed distribution.

### **Visual Representation:**
- **Mean** > **Median** > **Mode** (in a typical right-skewed distribution).

For example, consider the distribution of income in a population where most people earn a moderate income, but a few people earn very high incomes. The mean income will be higher than the median income because the very high incomes will increase the average, but they won't affect the middle (median) as much.

# Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

**Covariance** and **correlation** are both measures that describe the relationship between two variables, but they have different meanings, interpretations, and uses in statistical analysis. Here’s how they differ and how they are used:

### **1. Covariance**

- **Definition**: Covariance measures the degree to which two variables change together. It indicates whether an increase in one variable tends to result in an increase (or decrease) in the other variable.
  
- **Calculation**:
  \[
  \text{Cov(X, Y)} = \frac{\sum (X_i - \mu_X)(Y_i - \mu_Y)}{N}
  \]
  Where \(X_i\) and \(Y_i\) are the data points for the variables \(X\) and \(Y\), \(\mu_X\) and \(\mu_Y\) are the means of \(X\) and \(Y\), and \(N\) is the number of data points.

- **Interpretation**:
  - **Positive Covariance**: Indicates that as one variable increases, the other tends to increase as well.
  - **Negative Covariance**: Indicates that as one variable increases, the other tends to decrease.
  - **Magnitude**: The magnitude of covariance is not standardized, so it depends on the scale of the variables, making it difficult to compare across different datasets or variables.

- **Use**: Covariance is used to understand the direction of the linear relationship between two variables. However, due to its scale dependency, it’s more commonly used as an intermediate step in calculating other metrics like correlation.

### **2. Correlation**

- **Definition**: Correlation is a standardized measure of the relationship between two variables that indicates both the strength and direction of the linear relationship between them. The most common measure is the Pearson correlation coefficient.

- **Calculation**:
  \[
  \text{Correlation (r)} = \frac{\text{Cov(X, Y)}}{\sigma_X \sigma_Y}
  \]
  Where \(\sigma_X\) and \(\sigma_Y\) are the standard deviations of variables \(X\) and \(Y\).

- **Interpretation**:
  - **Range**: Correlation values range from -1 to +1.
    - **+1**: Perfect positive linear relationship.
    - **0**: No linear relationship.
    - **-1**: Perfect negative linear relationship.
  - **Standardization**: Unlike covariance, correlation is unitless, making it easier to compare across different datasets and variables.

- **Use**: Correlation is widely used to quantify the strength and direction of the relationship between two variables. It’s particularly useful in fields like finance, economics, and social sciences to understand associations and predict outcomes.

### **Key Differences:**

1. **Scale Dependency**:
   - **Covariance**: Depends on the scale of the variables. It can take any positive or negative value and is influenced by the units of measurement.
   - **Correlation**: Standardized measure, independent of the scale of the variables. It always ranges between -1 and +1.

2. **Interpretability**:
   - **Covariance**: Indicates only the direction of the relationship (positive or negative) but does not provide a clear sense of the strength of the relationship.
   - **Correlation**: Provides both the direction and the strength of the relationship, making it more interpretable.

3. **Usage**:
   - **Covariance**: Often used as a preliminary step to calculate correlation or in certain types of financial models.
   - **Correlation**: More commonly used in statistical analysis to assess the strength and direction of relationships between variables.

### **Example:**
Suppose you are analyzing the relationship between the number of hours studied and the scores obtained in an exam.

- **Covariance**: If the covariance is positive, it suggests that more hours studied are associated with higher scores. However, the magnitude of the covariance might be hard to interpret on its own.

- **Correlation**: If the correlation coefficient is 0.8, it not only tells you that there is a positive relationship (more study hours lead to higher scores) but also that this relationship is strong.

### **Summary:**
- **Covariance** gives you an idea of the direction of the relationship between two variables, but its scale dependency makes it less interpretable.
- **Correlation** provides a more intuitive and standardized measure of both the strength and direction of the linear relationship between two variables, making it a more commonly used metric in statistical analysis.

# Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

The **sample mean** is the average of all the data points in a sample. It is calculated by summing up all the observations in the sample and then dividing by the number of observations.

### **Formula for Sample Mean**:
\[
\bar{X} = \frac{\sum_{i=1}^{n} X_i}{n}
\]
Where:
- \(\bar{X}\) is the sample mean.
- \(X_i\) represents each individual observation in the sample.
- \(n\) is the number of observations in the sample.
- \(\sum_{i=1}^{n} X_i\) is the sum of all the observations in the sample.

### **Example Calculation**:

Suppose we have a dataset representing the number of hours five students studied for an exam: [4, 6, 8, 5, 7].

To calculate the sample mean:

1. **Step 1: Sum of the Observations**:
   \[
   4 + 6 + 8 + 5 + 7 = 30
   \]

2. **Step 2: Number of Observations (n)**:
   \[
   n = 5
   \]

3. **Step 3: Calculate the Sample Mean**:
   \[
   \bar{X} = \frac{30}{5} = 6
   \]

### **Result**:
The sample mean \(\bar{X}\) is **6**. This means that, on average, the students studied for 6 hours.

# Q12. For a normal distribution data what is the relationship between its measure of central tendency?

For a **normal distribution**, the measures of central tendency—**mean**, **median**, and **mode**—are all **equal**. This is one of the key properties of a normal distribution.

### **Key Points:**
- **Mean**: The average of all data points in the distribution.
- **Median**: The middle value when the data points are arranged in ascending or descending order.
- **Mode**: The value that appears most frequently in the distribution.

### **Relationship in a Normal Distribution:**
- In a perfectly normal distribution, the curve is symmetrical around the center.
- The **mean**, **median**, and **mode** all lie at the peak of the curve, at the center of the distribution.
- Therefore, in a normal distribution:
  \[
  \text{Mean} = \text{Median} = \text{Mode}
  \]

### **Visual Representation**:
Imagine a bell-shaped curve (the normal distribution curve):
- The highest point of the curve represents the **mean**, **median**, and **mode**.
- The left and right sides of the curve are mirror images, meaning that the distribution of values is symmetric around this central point.

### **Example**:
If a dataset of exam scores follows a normal distribution, and the mean score is 75, then:
- The median score will also be 75.
- The most common score (mode) will also be 75.

### **Summary**:
In a normal distribution, the equality of the mean, median, and mode reflects the perfect symmetry of the data around the center. This property is essential for many statistical methods that assume normality in data distribution.

# Q12. For a normal distribution data what is the relationship between its measure of central tendency?

For a **normal distribution**, the three measures of central tendency—the **mean**, **median**, and **mode**—are all **equal**. This is a fundamental characteristic of a normal distribution.

### **Relationship in a Normal Distribution:**
- **Mean = Median = Mode**

### **Explanation:**
- **Mean**: The mean is the average of all the data points in the distribution.
- **Median**: The median is the middle value when the data points are arranged in order. In a normal distribution, the data is symmetrically distributed around the mean, so the median coincides with the mean.
- **Mode**: The mode is the value that appears most frequently in the distribution. In a normal distribution, the mode is also at the peak of the curve, which corresponds to the mean and median.

### **Visual Representation**:
- Imagine a **bell-shaped curve** (the normal distribution curve):
  - The curve is symmetric around the center.
  - The peak of the curve (the highest point) represents the mean, median, and mode.

### **Summary**:
In a perfectly normal distribution, the **mean**, **median**, and **mode** are all the same, located at the center of the distribution. This equality indicates the perfect symmetry of the distribution around the central point.

# Q13. How is covariance different from correlation?

**Covariance** and **correlation** are both measures that describe the relationship between two variables, but they differ in their definitions, interpretations, and uses. Here’s how they compare:

### **1. Covariance**

- **Definition**: Covariance measures the degree to which two variables change together. It indicates whether an increase in one variable tends to result in an increase or decrease in another variable.

- **Calculation**:
  \[
  \text{Cov(X, Y)} = \frac{\sum (X_i - \mu_X)(Y_i - \mu_Y)}{N}
  \]
  Where \(X_i\) and \(Y_i\) are the individual observations for variables \(X\) and \(Y\), \(\mu_X\) and \(\mu_Y\) are the means of \(X\) and \(Y\), and \(N\) is the number of observations.

- **Interpretation**:
  - **Positive Covariance**: Indicates that as one variable increases, the other variable also tends to increase.
  - **Negative Covariance**: Indicates that as one variable increases, the other variable tends to decrease.
  - **Magnitude**: The magnitude of covariance is not standardized, so it can be difficult to interpret the strength of the relationship without context.

- **Use**: Covariance is often used to understand the direction of the relationship between two variables. It is also a component of the calculation for correlation.

### **2. Correlation**

- **Definition**: Correlation measures the strength and direction of the linear relationship between two variables. The most common type is the Pearson correlation coefficient.

- **Calculation**:
  \[
  \text{Correlation (r)} = \frac{\text{Cov(X, Y)}}{\sigma_X \sigma_Y}
  \]
  Where \(\sigma_X\) and \(\sigma_Y\) are the standard deviations of \(X\) and \(Y\).

- **Interpretation**:
  - **Range**: Correlation values range from -1 to +1.
    - **+1**: Perfect positive linear relationship.
    - **0**: No linear relationship.
    - **-1**: Perfect negative linear relationship.
  - **Standardization**: Correlation is a standardized measure, making it easier to interpret and compare across different datasets or variables.

- **Use**: Correlation provides both the strength and direction of the relationship between two variables. It is widely used in statistical analysis to quantify how strongly two variables are related.

### **Key Differences:**

1. **Scale Dependency**:
   - **Covariance**: Dependent on the units of measurement of the variables, which makes its magnitude difficult to interpret on its own. It can take any value, positive or negative.
   - **Correlation**: Standardized, so it is unitless and ranges between -1 and +1. This makes it easier to interpret and compare the strength and direction of relationships.

2. **Interpretability**:
   - **Covariance**: Indicates the direction of the relationship but not its strength in a standardized way. It is less interpretable without additional context.
   - **Correlation**: Provides a clear and standardized measure of both the strength and direction of the linear relationship between two variables.

3. **Usage**:
   - **Covariance**: Often used in multivariate statistics and in the calculation of correlation. It is also used in portfolio theory to understand how asset returns move together.
   - **Correlation**: Commonly used to understand and quantify the relationship between two variables in various fields, including finance, social sciences, and more.

### **Example:**

Suppose you are analyzing the relationship between hours studied and exam scores:

- **Covariance**: If the covariance is positive, it suggests that as the number of hours studied increases, exam scores tend to increase as well. However, the actual magnitude of the covariance might not be very informative on its own.

- **Correlation**: If the correlation coefficient is 0.8, it indicates a strong positive linear relationship between hours studied and exam scores. This standardized measure provides a clearer sense of how strongly the two variables are related.

In summary, while both covariance and correlation measure the relationship between variables, correlation provides a more interpretable and standardized measure of that relationship.

# Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

**Outliers** are data points that significantly differ from the other observations in a dataset. They can have a considerable impact on both measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation). Here's how outliers affect these measures:

### **1. Measures of Central Tendency**

- **Mean**: The mean is particularly sensitive to outliers. Because it is calculated by summing all the data points and dividing by the number of observations, extreme values can skew the mean significantly. A single outlier can dramatically change the mean, especially in smaller datasets.

  - **Example**: Consider a dataset of test scores: [50, 52, 53, 55, 60, 100]. 
    - Without the outlier, the mean is:
      \[
      \text{Mean} = \frac{50 + 52 + 53 + 55 + 60}{5} = 54
      \]
    - With the outlier (100), the mean becomes:
      \[
      \text{Mean} = \frac{50 + 52 + 53 + 55 + 60 + 100}{6} = 59.33
      \]
    - The outlier (100) increases the mean from 54 to 59.33, showing the sensitivity of the mean to outliers.

- **Median**: The median, which is the middle value in a sorted dataset, is less affected by outliers. It only depends on the order of the values and not their magnitude, so extreme values have little effect on the median.

  - **Example**: Using the same dataset: [50, 52, 53, 55, 60, 100]
    - Without the outlier, the median (middle value) is:
      \[
      \text{Median} = \frac{53 + 55}{2} = 54
      \]
    - With the outlier (100), the median remains:
      \[
      \text{Median} = \frac{53 + 55}{2} = 54
      \]
    - The median does not change, showing its robustness to outliers.

- **Mode**: The mode is the most frequently occurring value in the dataset. Outliers do not generally affect the mode unless they are frequent enough to become the most common value.

### **2. Measures of Dispersion**

- **Range**: The range is the difference between the maximum and minimum values. Outliers can greatly increase the range, as they affect the extreme values of the dataset.

  - **Example**: Using the test scores: [50, 52, 53, 55, 60] (no outliers)
    - Range = 60 - 50 = 10
    - With the outlier (100), the range becomes:
      \[
      \text{Range} = 100 - 50 = 50
      \]
    - The outlier increases the range from 10 to 50.

- **Variance** and **Standard Deviation**: Both variance and standard deviation measure the spread of data points around the mean. Because they are based on squared deviations from the mean, outliers can disproportionately increase these measures, reflecting a higher dispersion than what might be representative of the majority of the data.

  - **Example**: For the dataset [50, 52, 53, 55, 60], let’s calculate variance and standard deviation:
    - Mean = 54
    - Variance:
      \[
      \text{Variance} = \frac{(50 - 54)^2 + (52 - 54)^2 + (53 - 54)^2 + (55 - 54)^2 + (60 - 54)^2}{5}
      \]
      \[
      \text{Variance} = \frac{16 + 4 + 1 + 1 + 36}{5} = 11.6
      \]
    - Standard Deviation = \(\sqrt{11.6} \approx 3.41\)

    - With the outlier (100):
      - Mean = 59.33
      - Variance:
        \[
        \text{Variance} = \frac{(50 - 59.33)^2 + (52 - 59.33)^2 + (53 - 59.33)^2 + (55 - 59.33)^2 + (60 - 59.33)^2 + (100 - 59.33)^2}{6}
        \]
        \[
        \text{Variance} = \frac{86.3 + 53.6 + 40.6 + 18.6 + 0.45 + 1641.6}{6} = 392.7
        \]
      - Standard Deviation = \(\sqrt{392.7} \approx 19.8\)

    - The variance and standard deviation increase significantly with the outlier, reflecting increased dispersion.

### **Summary:**
- **Mean**: Sensitive to outliers; can be skewed significantly.
- **Median**: Less affected by outliers; robust measure of central tendency.
- **Mode**: Generally unaffected by outliers unless the outlier is frequent.
- **Range**: Highly affected by outliers; reflects the spread between extreme values.
- **Variance and Standard Deviation**: Increase with outliers due to their squared deviation from the mean, showing greater dispersion.

Understanding the impact of outliers is crucial for accurate data analysis and interpretation, as they can distort statistical measures and lead to misleading conclusions.