# **Q1**

**What are the three measures of central tendency?**

**Answer:**

 * Mean
 * Median
 * Mode


# **Q2**

**What is the difference between the mean, median, and mode? How are they used to measure the central tendency of a dataset?**


**Answer:**

The mean, median, and mode are three measures of central tendency used to describe the typical or central value of a dataset. The mean is the arithmetic average, obtained by summing all values and dividing by the number of data points; it is sensitive to outliers. The median is the middle value when the data is ordered, or the average of the two middle values in case of an even number of data points; it is less affected by extreme values and useful in skewed distributions. The mode is the most frequently occurring value in the dataset; it is useful for identifying the most common observation. Each measure offers unique insights into the dataset's central behavior, allowing analysts to better understand the overall characteristics and make informed decisions based on the data's central tendency.

# **Q3**

**Measure the three measures of central tendency for the given height data:**

       [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]


**Answer:**


In [1]:
import numpy as np

height_data = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

mean_height = np.mean(height_data)
print("Mean height:", mean_height)

median_height = np.median(height_data)
print("Median height:", median_height)

unique_values, counts = np.unique(height_data, return_counts=True)
mode_height = unique_values[np.argmax(counts)]
print("Mode height:", mode_height)


Mean height: 177.01875
Median height: 177.0
Mode height: 177.0


# **Q4**

**Find the standard deviation for the given data:**

    [178,177,176,177,178.2,178,175,179,180,175,178.9,176.2,177,172.5,178,176.5]

**Answer:**



In [3]:
l = [178, 177, 176, 177, 178.2, 178, 175, 179, 180, 175, 178.9, 176.2, 177, 172.5, 178, 176.5]

standard_deviation = np.std(l)
print("Standard Deviation:", standard_deviation)

Standard Deviation: 1.7885814036548633


# **Q5**

**How are measures of dispersion such as range, variance, and standard deviation used to describe the spread of a dataset? Provide an example.**

**Answer:**

Measures of dispersion, such as range, variance, and standard deviation, are used to describe the spread or variability of a dataset. They provide valuable insights into how the data points are dispersed around the central tendency (mean, median, or mode). By understanding the spread of the data, analysts can assess the data's consistency, variability, and potential outliers.

**Example:**
Let's consider the following dataset representing the ages of individuals in a group:
[25, 28, 30, 27, 24, 31, 29]

 * Range:
The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values in the dataset. In this example, the range is 31 - 24 = 7, indicating that the ages range from 24 to 31.

 * Variance:
The variance quantifies the average squared deviation of each data point from the mean. It gives a more comprehensive understanding of the data spread than the range. To calculate the variance, follow these steps:

 * Calculate the mean:

        (25 + 28 + 30 + 27 + 24 + 31 + 29) / 7 ≈ 28

* Calculate the squared difference of each data point from the mean:

      (25-28)^2 + (28-28)^2 + (30-28)^2 + (27-28)^2 + (24-28)^2 + (31-28)^2 + (29-28)^2 ≈ 14

* Divide the sum of squared differences by the number of data points:

        14 / 7 ≈ 2

  So, the variance is approximately 2.

* Standard Deviation:
The standard deviation is the square root of the variance. It represents the average deviation of data points from the mean, providing a measure of dispersion in the original units of the dataset. In this example, the standard deviation is √2 ≈ 1.41.

# **Q6**

**What is a Venn diagram?.**

**Answer:**

A Venn diagram is a graphical representation used to show the relationships between different sets of data or elements. It consists of overlapping circles, where each circle represents a set, and the overlapping areas indicate the intersection of those sets. Venn diagrams are commonly used to visualize the similarities and differences between various groups or categories. They help in understanding the relationships and overlaps among the elements being compared, making complex data more easily comprehensible at a glance.

# **Q7**

**For the two given sets A = (2,3,4,5,6,7) & B = (0,2,6,8,10). Find:**


    (i) A B
    (ii) A ⋃ B


**Answer:**

    (i) A ∩ B = {2, 6}
    (ii) A ∪ B = {0, 2, 3, 4, 5, 6, 7, 8, 10}

# **Q8**

**What do you understand about skewness in data?**

**Answer:**

Skewness in data refers to the measure of asymmetry in the distribution of values. In a symmetric distribution, the data is evenly distributed around the central tendency, such as the mean, median, or mode. However, in a skewed distribution, the data is not evenly distributed, and there is a tail on one side of the central tendency that is longer or more spread out than the other side. Skewness can be either positive (right-skewed), where the tail extends towards the right, or negative (left-skewed), where the tail extends towards the left. Skewness is an essential concept in data analysis as it can significantly impact the interpretation and choice of appropriate statistical methods for analysis. Understanding skewness helps identify patterns and potential outliers in the data, ensuring accurate and meaningful insights are drawn.


# **Q9**

**If a data is right skewed then what will be the position of median with respect to mean?**

**Answer:**

If a data is right-skewed, the median will be less than the mean. The long tail on the right side of the distribution pulls the mean towards higher values, resulting in the mean being greater than the median.

# **Q10**

**Explain the difference between covariance and correlation. How are these measures used in statistical analysis?**

**Answer:**
Covariance measures the extent to which two variables vary together. Positive covariance indicates that the variables increase or decrease together, while negative covariance shows an inverse relationship. Correlation is a standardized version of covariance, ranging between -1 and 1, indicating the strength and direction of the linear relationship between variables. Both measures are used in statistical analysis to understand the relationship between two variables, with correlation providing a more interpretable and comparable value across different datasets.

# **Q11**

**What is the formula for calculating the sample mean? Provide an example calculation for a dataset.**

**Answer:**

The formula for calculating the sample mean (x̄) is:

    x̄ = (Sum of all data points) / (Number of data points)

Example:
Let's calculate the sample mean for the following dataset: [10, 15, 20, 25, 30]

**Sample Mean**

    (x̄) = (10 + 15 + 20 + 25 + 30) / 5 = 100 / 5 = 20

So, the sample mean for the given dataset is 20.

# **Q12**

**For a normal distribution data what is the relationship between its measure of central tendency?**

**Answer:**

For a normal distribution, the measures of central tendency (mean, median, and mode) are equal. In a perfectly symmetrical normal distribution, the peak (mode) is at the center, and the mean and median are also at the same point, creating a balanced distribution. This characteristic is a key feature of a normal distribution and reflects its symmetrical bell-shaped curve.


# **Q13**

**How is covariance different from correlation?**

**Answer:**

Covariance measures the extent to which two variables vary together, indicating the direction of their relationship (positive or negative). However, correlation is a standardized version of covariance, ranging between -1 and 1, representing both the strength and direction of the linear relationship between variables. While covariance provides a raw measure of the relationship, correlation provides a more interpretable and comparable value, as it is not influenced by the scale of the variables.

# **Q14**

**How do outliers affect measures of central tendency and dispersion? Provide an example.**

**Answer:**


Outliers can significantly affect measures of central tendency and dispersion. When outliers are present in the data, the mean can be skewed towards extreme values, making it less representative of the typical values in the dataset. Additionally, outliers can increase the spread of the data, leading to larger values for measures of dispersion like the variance and standard deviation.

Example:
Consider the following dataset representing exam scores:

    [85, 90, 92, 88, 89, 87, 91, 1000].

The outlier "1000" significantly influences the mean, making it much larger than the other scores. The mean without the outlier would be around 89, but with the outlier, it becomes 213. The outlier also impacts measures of dispersion like the variance and standard deviation, leading to larger values that do not accurately reflect the spread of the majority of the data.