### Q1. What are the three measures of central tendency?

A1. The three measures of central tendency are:
1. **Mean**
2. **Median**
3. **Mode**

### Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?

A2.
- **Mean**: The mean is the arithmetic average of a dataset, calculated by adding all the values and dividing by the number of values. It is used to measure the central tendency when the data is symmetrically distributed without outliers.

- **Median**: The median is the middle value when a dataset is ordered from smallest to largest. If the number of values is even, the median is the average of the two middle numbers. It is used to measure the central tendency for skewed distributions or when outliers are present.

- **Mode**: The mode is the value that appears most frequently in a dataset. There can be more than one mode if multiple values have the highest frequency. It is used to measure the central tendency for categorical data or to identify the most common value in a dataset.

### Q3. Measure the three measures of central tendency for the given height data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]


In [8]:
#A3.
import statistics
import numpy as np

# Given height data
height_data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

# Mean
mean_height = np.mean(height_data)

# Median
median_height = np.median(height_data)

# Mode
mode_height = statistics.mode(height_data)

print(f"Mean: {mean_height}")
print(f"Median: {median_height}")
print(f"Mode: {mode_height}")


Mean: 177.01875
Median: 177.0
Mode: 178


### Q4. Find the standard deviation for the given data:
[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

In [9]:
#A4.
import numpy as np

data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

std_deviation = np.std(data)

std_deviation

np.float64(1.7885814036548633)

### Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.

A5. 
- **Range**: The difference between the maximum and minimum values in a dataset. It provides a quick sense of the spread but is sensitive to outliers.
  Example: For the data [2, 5, 8, 10], the range is \(10 - 2 = 8\).

- **Variance**: The average of the squared differences from the mean. It gives more weight to outliers and provides a measure of how data points spread out from the mean.
  Example: For the data [2, 5, 8, 10], the variance can be calculated using \(\frac{\sum (x - \bar{x})^2}{n}\).

- **Standard Deviation**: The square root of the variance. It is in the same units as the data and provides a measure of the average distance of each data point from the mean.
  Example: For the data [2, 5, 8, 10], the standard deviation is the square root of the variance.

### Q6. What is a Venn diagram?

A6. A **Venn diagram** is a diagram that shows all possible logical relations between a finite collection of different sets. They are used to illustrate the relationships and intersections between sets.

### Q7. For the two given sets A = {2, 3, 4, 5, 6, 7} & B = {0, 2, 6, 8, 10}, find:
(i) A intersection B
(ii) A union B

A7. (i) {2,6}
(ii) {0, 2, 3, 4, 5, 6, 7, 8, 10}

### Q8. What do you understand about skewness in data?

A8. Skewness refers to the asymmetry of the distribution of values in a dataset. It indicates whether the data points are skewed to the left (negative skew) or to the right (positive skew).

### Q9. If data is right-skewed, what will be the position of the median with respect to the mean?

A9. If data is right-skewed, the mean will be greater than the median. This is because the long tail on the right side pulls the mean towards higher values.

### Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?

A10. **Covariance**: Measures the degree to which two variables change together. It can be positive (both variables increase together) or negative (one variable increases as the other decreases).
**Correlation**: Standardizes covariance to a value between -1 and 1, providing a measure of the strength and direction of the relationship between two variables. 

Correlation is used to assess the strength and direction of a linear relationship between variables, while covariance gives the direction but not the strength.

### Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.

A11. The formula for the sample mean is:

\[
\text{Sample Mean} (\bar{x}) = \frac{1}{n} \sum_{i=1}^{n} x_i
\]

Where:
- \( \bar{x} \) is the sample mean.
- \( n \) is the number of observations in the sample.
- \( x_i \) represents each individual observation.

Example


In [10]:
# Sample dataset
data = [12, 15, 20, 22, 30]

# Calculate the sample mean
sample_mean = sum(data) / len(data)

# Print the result
print("Sample Mean:", sample_mean)

Sample Mean: 19.8



### Q12. For a normal distribution data, what is the relationship between its measure of central tendency?

A12. For a normal distribution, the mean, median, and mode are all equal. This is because the distribution is perfectly symmetrical around its center.

### Q13. How is covariance different from correlation?

A13. **Covariance**: Measures the degree to which two variables change together. It is not standardized and can take any value. Positive covariance indicates that the variables tend to increase together, while negative covariance indicates that one variable tends to increase as the other decreases.
**Correlation**: Measures the strength and direction of a linear relationship between two variables. It is standardized to range between -1 and 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship. Correlation is a scaled version of covariance.

### Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.

A14. Outliers can significantly affect measures of central tendency and dispersion:

- **Mean**: Outliers can skew the mean, making it higher or lower than the central value of the bulk of the data. For example, in the dataset [1, 2, 3, 4, 100], the mean is 22, which does not represent the central tendency of most data points.

- **Median**: Outliers have little effect on the median because it is based on the middle value. For the same dataset [1, 2, 3, 4, 100], the median is 3.

- **Mode**: Outliers generally do not affect the mode, as it is the most frequent value. For the dataset [1, 1, 2, 3, 100], the mode is 1.

- **Range**: Outliers increase the range, making it less representative of the spread of the majority of the data. For the dataset [1, 2, 3, 4, 100], the range is 99.

- **Variance and Standard Deviation**: Outliers increase these measures, indicating greater dispersion. For the dataset [1, 2, 3, 4, 100], the variance and standard deviation are much higher compared to a dataset without the outlier.
