# Statistics basics 2

**Q1. What are the three measures of central tendency?**

The three primary measures of central tendency are:
- Mean: The average of a dataset.
- Median: The middle value when data is sorted.
- Mode: The most frequent value in a dataset.

**Q2. What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?**
- Mean: Calculated by summing all values and dividing by the number of values. Sensitive to outliers.
- Median: The middle value when data is sorted. Less affected by outliers.
- Mode: The most frequent value. Useful for categorical data.

These measures help identify the typical or central value of a dataset. The choice of measure depends on the data distribution and the desired outcome.

**Q3. Measure the three measures of central tendency for the given height data:[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]**  

To calculate these measures, we'll need to use statistical functions or libraries. Here's how you would do it in Python using the NumPy library:

In [1]:
import numpy as np

height_data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

mean_height = np.mean(height_data)
median_height = np.median(height_data)
mode_height = np.bincount(height_data).argmax()  # Find the most frequent value

print("Mean height:", mean_height)
print("Median height:", median_height)
print("Mode height:", mode_height)


Mean height: 177.01875
Median height: 177.0
Mode height: 178


**Q4. Find the standard deviation for the given data:**
**[178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]**

In [2]:
import numpy as np

height_data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

std_deviation = np.std(height_data)
print("Standard deviation:", std_deviation)


Standard deviation: 1.7885814036548633


**Q5. How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.**

Measures of dispersion describe how spread out the data is.
- Range: The difference between the highest and lowest values. Gives a basic idea of spread.
- Variance: The average of the squared differences from the mean. Measures how far data points are from the mean.
- Standard Deviation: The square root of the variance. Easier to interpret than variance as it's in the same units as the data.

Example: Consider two datasets:
Dataset A: [1, 2, 3, 4, 5]
Dataset B: [1, 10, 1, 10, 1]
Both datasets have the same mean (3), but dataset B has a larger range and standard deviation, indicating more spread in the data.

**Q6. What is a Venn diagram?**
A Venn diagram is a visual representation of sets and their relationships. It uses overlapping circles to show elements that belong to one or more sets.

**Q7. For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:**
**(i) A ∩ B**
**(ii) A ∪ B**

- A ∩ B (Intersection): The set of elements that are common to both A and B.
- A ∩ B = {2, 6}
- A ∪ B (Union): The set of all elements in both A and B.
- A ∪ B = {0, 2, 3, 4, 5, 6, 7, 8, 10}

**Q8. What do you understand about skewness in data?**

Skewness is a measure of asymmetry in a probability distribution. A positively skewed distribution has a long tail to the right, while a negatively skewed distribution has a long tail to the left.   

**Q9. If a data is right skewed then what will be the position of median with respect to mean?**

In a right-skewed distribution, the median is less than the mean.

**Q10. Explain the difference between covariance and correlation. How are these measures used in statistical analysis?**

Covariance measures the relationship between two variables. 
It indicates whether they tend to move together (positive covariance) or in opposite directions (negative covariance). However, its value depends on the units of measurement.  

Correlation is a standardized measure of the linear relationship between two variables. 
It ranges from -1 to 1, where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no correlation.   

Both covariance and correlation are used to understand the relationship between variables. Correlation is often preferred because it's standardized and easier to interpret.

**Q11. What is the formula for calculating the sample mean? Provide an example calculation for a dataset.**

The formula for calculating the sample mean is:
`mean = (sum of all values) / (number of values)`

Example:
For the dataset [2, 4, 6, 8], the mean is (2+4+6+8)/4 = 5.

**Q12. For a normal distribution data what is the relationship between its measure of central tendency?**

In a normal distribution, the mean, median, and mode are all equal.   

**Q13. How is covariance different from correlation?**

Covariance measures the direction of the linear relationship between two variables, while correlation measures both the direction and strength of the relationship. 

Correlation is standardized, making it easier to interpret compared to covariance.

**Q14. How do outliers affect measures of central tendency and dispersion? Provide an example.**

Outliers can significantly affect measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation).

For example, in a dataset of house prices, an extremely expensive house (outlier) would increase the mean price but might not significantly affect the median. Similarly, it would increase the range and standard deviation.   

