### Q1. What are the three measures of central tendency?

Ans:

The **three main measures of central tendency** are:

### 1.Mean

* The **average** of all values in a dataset.

**Example:**
For data: 2, 4, 6
Mean = (2 + 4 + 6) / 3 = 4

Used when data has no extreme outliers.

### 2. Median

* The **middle value** when data is arranged in ascending or descending order.
* If the number of values is even, take the average of the two middle numbers.

**Example:**

For data: 1, 3, 5 → Median = 3

For data: 1, 3, 5, 7 → Median = (3 + 5)/2 = 4

Best used when data contains **outliers**.

### 3. Mode

* The **most frequently occurring value** in a dataset.

**Example:**
For data: 2, 3, 3, 5 → Mode = 3

Useful for **categorical data**.

---

### Summary

| Measure | Meaning             | Best Used When               |
| ------- | ------------------- | ---------------------------- |
| Mean    | Average             | No extreme values            |
| Median  | Middle value        | Data has outliers            |
| Mode    | Most frequent value | Categorical or repeated data |



### Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

Ans:

The **mean, median, and mode** are all measures of **central tendency**, but they differ in how they are calculated and when they are most useful.

##1. Mean (Average)

**Definition:**
The mean is the sum of all values divided by the total number of values.

**Key Characteristics:**

* Uses **all data values**
* Affected by **extreme values (outliers)**
* Best for **numerical data** that is evenly distributed

**Example:**
Data: 10, 20, 30
Mean = (10 + 20 + 30) / 3 = 20

If data becomes: 10, 20, 100
Mean = 43.3 (heavily affected by 100)

##2. Median (Middle Value)

**Definition:**
The median is the middle number when data is arranged in order.

**Key Characteristics:**

* Not affected much by **outliers**
* Best for **skewed data**
* Useful in income, salary, or property price data

**Example:**
Data: 10, 20, 30 → Median = 20
Data: 10, 20, 100 → Median = 20

Even though 100 is large, the median stays stable.

##3. Mode (Most Frequent Value)

**Definition:**
The mode is the value that appears most frequently.

**Key Characteristics:**

* Can be used for **categorical data**
* A dataset can have **one mode (unimodal)**, **two modes (bimodal)**, or **no mode**

**Example:**
Data: 2, 4, 4, 5, 6
Mode = 4

## How They Measure Central Tendency

All three measures try to find the **center** of a dataset:

* **Mean** → Gives overall average position
* **Median** → Shows central position in ordered data
* **Mode** → Shows most common value

In a **perfectly symmetric dataset**, mean = median = mode.

In a **skewed dataset**, they differ.


### Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [1]:
import statistics

# Given height data
heights = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

# Mean
mean_value = statistics.mean(heights)

# Median
median_value = statistics.median(heights)

# Mode (this dataset is bimodal)
mode_value = statistics.multimode(heights)

print("Mean:", mean_value)
print("Median:", median_value)
print("Mode:", mode_value)

Mean: 177.01875
Median: 177.0
Mode: [178, 177]


### Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [2]:
import statistics

# Given height data
heights = [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

# Population Standard Deviation
population_std = statistics.pstdev(heights)

# Sample Standard Deviation
sample_std = statistics.stdev(heights)

print("Population Standard Deviation:", population_std)
print("Sample Standard Deviation:", sample_std)

Population Standard Deviation: 1.7885814036548633
Sample Standard Deviation: 1.8472389305844188


### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

Ans:

Measures of **dispersion** describe how much the data values are **spread out** or scattered around the center (mean/median).

They help us understand whether the dataset is **consistent (low spread)** or **highly variable (high spread)**.

## 1.Range

-> Definition:

The difference between the **maximum and minimum** values.


-> Use:

* Quick measure of total spread
* Very sensitive to outliers

-> Example:

Data: 10, 12, 15, 18, 20

Range = 20 − 10 = **10**

If we change 20 → 50

New Range = 50 − 10 = **40**

Range changes drastically due to one extreme value.

## 2. Variance

-> Definition:

The **average of squared deviations** from the mean.

-> Use:

* Measures how far each value is from the mean
* Larger variance → more spread
* Units are squared (e.g., cm²), so less intuitive

## 3.Standard Deviation

-> Definition:

Square root of variance.

-> Use:

* Most commonly used measure of spread
* Same unit as original data
* Tells how much data typically deviates from mean

# Complete Example

Consider two datasets:

Dataset A: 10, 12, 14, 16, 18
Dataset B: 5, 10, 15, 20, 25

Both have the same:

Mean = 14

Now compare dispersion:

| Measure            | Dataset A | Dataset B |
| ------------------ | --------- | --------- |
| Range              | 8         | 20        |
| Variance           | Small     | Large     |
| Standard Deviation | Small     | Large     |

Dataset B has greater spread even though the mean is same.



### Q6. What is a Venn diagram?

Ans:

A Venn diagram is a visual representation used to show the relationships between different sets using overlapping circles.

## Main Features

-> Each circle represents a set (a collection of items).

-> The overlapping region shows common elements between sets.

-> Non-overlapping parts show elements unique to a set.

### Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:

(i) A intersection B

(ii) A union B


In [3]:
# Given sets
A = {2, 3, 4, 5, 6, 7}
B = {0, 2, 6, 8, 10}

# Intersection
intersection = A.intersection(B)
# or A & B

# Union
union = A.union(B)
# or A | B

print("A ∩ B =", intersection)
print("A ∪ B =", union)

A ∩ B = {2, 6}
A ∪ B = {0, 2, 3, 4, 5, 6, 7, 8, 10}


### Q8. What do you understand about skewness in data?

Ans:

Skewness is a measure of the asymmetry of a dataset around its mean.

It tells us whether the data is:

1)Symmetrically distributed

2)Skewed to the right (positively skewed)

3)Skewed to the left (negatively skewed)

### Q9. If a data is right skewed then what will be the position of median with respect to mean?

Ans:

In a right-skewed distribution, the tail of the distribution extends toward the right side (higher values).

### Position of Mean and Median

In right-skewed data:

Mean > Median > Mode

So,

The median lies to the left of the mean.
Or, Median < Mean

### Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

Ans:

## Difference Between Covariance and Correlation

Both **covariance** and **correlation** measure the **relationship between two variables**, but they differ in interpretation and scale.

### 1. Covariance

-> Definition:

Covariance measures the **direction** of the relationship between two variables.

-> Interpretation:

* **Positive covariance** → Variables move in the same direction
* **Negative covariance** → Variables move in opposite directions
* **Zero covariance** → No linear relationship

-> Limitation:

* Value depends on units (not standardized)
* Hard to interpret magnitude

### 2. Correlation

-> Definition:

Correlation measures both the **strength and direction** of the relationship.

-> Interpretation:

* Value ranges from **−1 to +1**
* **+1** → Perfect positive relationship
* **−1** → Perfect negative relationship
* **0** → No linear relationship

* Points rising upward → Positive correlation
* Points downward → Negative correlation
* Random scatter → No correlation

---

#  Key Differences

| Feature        | Covariance       | Correlation          |
| -------------- | ---------------- | -------------------- |
| Measures       | Direction        | Strength + Direction |
| Range          | −∞ to +∞         | −1 to +1             |
| Unit           | Depends on units | Unit-free            |
| Interpretation | Difficult        | Easy                 |

---

### How They Are Used in Statistical Analysis

✔ Identify relationships between variables

✔ Used in regression analysis

✔ Portfolio risk analysis (finance)

✔ Feature selection in machine learning

✔ Detect multicollinearity



### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

Ans:

The **sample mean** is the average of all observations in a sample.

formula = sum of values/no. of observations

### Example Calculation

Suppose we have the following dataset:

[
5, 8, 12, 15, 10
]

-> Step 1: Find the sum of values

[
5 + 8 + 12 + 15 + 10 = 50
]

-> Step 2: Count the number of observations

[
n = 5
]

### Step 3: Apply the formula

The **sample mean** of the dataset is: 10


### Q12. For a normal distribution data what is the relationship between its measure of central tendency?

Ans:

In a normal distribution (bell-shaped curve), the data is perfectly symmetrical around the center.

## Relationship:

Mean = Median = Mode


### Q13. How is covariance different from correlation?

Ans:

### Difference Between Covariance and Correlation

Both covariance and correlation measure the relationship between two variables, but they differ in scale and interpretation.

| Feature        | Covariance | Correlation          |
| -------------- | ---------- | -------------------- |
| Measures       | Direction  | Strength + Direction |
| Range          | −∞ to +∞   | −1 to +1             |
| Unit           | Has units  | Unit-free            |
| Interpretation | Difficult  | Easy                 |


Covariance measures only the direction of the relationship between two variables and depends on units, whereas correlation measures both the strength and direction of the relationship in a standardized form ranging from −1 to +1.

### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

Ans:

### Effect of Outliers on Central Tendency and Dispersion

An **outlier** is an extreme value that is very different from other observations in a dataset.

Outliers can significantly affect statistical measures.

#### 1.Effect on Measures of Central Tendency

#### a) Mean

* Highly affected by outliers.
* Extreme values pull the mean toward them.

#### b) Median

* Not much affected.
* Depends only on the middle position.

#### c) Mode

* Usually not affected unless the outlier repeats frequently.

##### Example

Dataset A:
[
10, 12, 14, 16, 18
]

Mean = 14

Median = 14

Now add an outlier (100):

Dataset B:
[
10, 12, 14, 16, 18, 100
]

New Mean =
[
\frac{170}{6} = 28.33
]

New Median =
[
\frac{14 + 16}{2} = 15
]

 The **mean increased drastically**, but the **median changed only slightly**.


#### 2.Effect on Measures of Dispersion

#### a) Range

* Strongly affected (since it depends on minimum and maximum).

#### b) Variance

* Increases significantly (because deviations are squared).

#### c) Standard Deviation

* Also increases due to large deviation from the mean.


#### Conclusion

* **Mean, range, variance, and standard deviation are highly sensitive to outliers.**
* **Median is more robust (resistant) to outliers.**
* In skewed data with outliers, the **median is preferred over the mean**.
