**Q-2.** Consider a dataset containing the heights (in centimeters) of 1000 individuals. The
mean height is 170 cm with a standard deviation of 10 cm. The dataset is approximately
normally distributed, and its skewness is approximately zero. Based on this information,
answer the following questions:

**a.** What percentage of individuals in the dataset have heights between 160 cm
and 180 cm?

**b.** If we randomly select 100 individuals from the dataset, what is the probability
that their average height is greater than 175 cm?

**c.** Assuming the dataset follows a normal distribution, what is the z-score
corresponding to a height of 185 cm?

**d.** We know that 5% of the dataset has heights below a certain value. What is
the approximate height corresponding to this threshold?

**e.** Calculate the coefficient of variation (CV) for the dataset.

**f.** Calculate the skewness of the dataset and interpret the result.

**Ans:**

**2a.** To determine the percentage of individuals with heights between 160 cm and 180 cm, we need to calculate the z-scores for these heights and then find the corresponding area under the normal distribution curve.

The z-score formula is given by: z = (x - μ) / σ, where x is the height, μ is the mean height, and σ is the standard deviation.

For 160 cm:
z1 = (160 - 170) / 10 = -1

For 180 cm:
z2 = (180 - 170) / 10 = 1

To find the percentage of individuals between these heights, we can use a standard normal distribution table or a statistical software. Since the dataset is approximately normally distributed and has a skewness close to zero, we can assume a standard normal distribution.

Using a standard normal distribution table, the area between z1 and z2 is approximately 0.6826. This means that approximately 68.26% of individuals in the dataset have heights between 160 cm and 180 cm.

In [1]:
import scipy.stats as stats

# Given information
mean_height = 170
std_dev = 10
sample_size = 100
threshold_percentile = 0.05

# a. Percentage of individuals with heights between 160 cm and 180 cm
z_160 = (160 - mean_height) / std_dev
z_180 = (180 - mean_height) / std_dev
percentage_between_160_180 = stats.norm.cdf(z_180) - stats.norm.cdf(z_160)
percentage_between_160_180 *= 100

print(f"a. Percentage of individuals with heights between 160 cm and 180 cm: {percentage_between_160_180}%")

a. Percentage of individuals with heights between 160 cm and 180 cm: 68.26894921370858%


**Ans:**

**2b.** The distribution of the sample means is also approximately normal, with the same mean as the population mean but with a standard deviation equal to the population standard deviation divided by the square root of the sample size.

To calculate the probability that the average height of a random sample of 100 individuals is greater than 175 cm, we need to calculate the z-score for 175 cm using the sample mean and the sample standard deviation.

The z-score formula is given by: z = (x - μ) / (σ / sqrt(n)), where x is the sample mean, μ is the population mean, σ is the population standard deviation, and n is the sample size.

In this case, x = 175, μ = 170, σ = 10, and n = 100.

z = (175 - 170) / (10 / sqrt(100)) = 5 / 1 = 5

Using a standard normal distribution table, we can find the probability associated with a z-score of 5. The probability is extremely small, close to 0. Therefore, the probability that the average height of a random sample of 100 individuals is greater than 175 cm is very low.

In [2]:
se = std_dev / (sample_size ** 0.5)
z_175 = (175 - mean_height) / se
probability_greater_than_175 = 1 - stats.norm.cdf(z_175)

print(f"b. Probability that average height > 175 cm for a sample of 100 individuals: {probability_greater_than_175:.4f}")

b. Probability that average height > 175 cm for a sample of 100 individuals: 0.0000


**Ans:**

**2c.** To calculate the z-score corresponding to a height of 185 cm, we can use the formula mentioned earlier.

z = (x - μ) / σ

In this case, x = 185, μ = 170, and σ = 10.

z = (185 - 170) / 10 = 15 / 10 = 1.5

The z-score corresponding to a height of 185 cm is 1.5.

In [3]:
#  Z-score corresponding to a height of 185 cm

z_185 = (185 - mean_height) / std_dev

print(f"c. Z-score corresponding to a height of 185 cm: {z_185}")

c. Z-score corresponding to a height of 185 cm: 1.5


**Ans:**

**2d.** To find the approximate height corresponding to a threshold of 5%, we need to determine the z-score that corresponds to this percentile and then convert it back to the original height using the z-score formula.

The z-score associated with a percentile of 5% is approximately -1.645. Using the z-score formula, we can find the height:

z = (x - μ) / σ

In this case, z = -1.645, μ = 170, and σ = 10.

-1.645 = (x - 170) / 10

-16.45 = x - 170

x = 170 - 16.45

x ≈ 153.55

Therefore, the approximate height corresponding to the threshold of 5% is approximately 153.55 cm.

In [4]:
# Approximate height corresponding to the 5% threshold

threshold_z = stats.norm.ppf(threshold_percentile)

threshold_height = mean_height + threshold_z * std_dev

print(f"d. Approximate height corresponding to the 5% threshold: {threshold_height:.2f} cm")

d. Approximate height corresponding to the 5% threshold: 153.55 cm


**Ans:**

**2e.** The coefficient of variation (CV) is a measure of relative variability and is calculated as the ratio of the standard deviation to the mean, expressed as a percentage.

CV = (σ / μ) * 100

In this case, σ = 10 and μ = 170.

CV = (10 / 170) * 100 ≈ 5.88%

The coefficient of variation for the dataset is approximately 5.88%.

In [5]:
# Coefficient of Variation (CV) for the dataset

cv = (std_dev / mean_height) * 100

print(f"e. Coefficient of Variation (CV) for the dataset: {cv:.2f}%")

e. Coefficient of Variation (CV) for the dataset: 5.88%


**Ans:**

**2f.** Skewness is a measure of the asymmetry of a distribution. If the skewness is close to zero, it indicates that the dataset is approximately symmetric.

Since the dataset is stated to have a skewness of approximately zero, it suggests that the heights of the individuals are distributed symmetrically around the mean.

This means that there is no significant skewness in the height distribution, and the distribution is relatively symmetrical.

In [6]:
# f. Skewness of the dataset

skewness = 0  # Given that the skewness is approximately zero

print(f"f. Skewness of the dataset: {skewness} (indicating a symmetric distribution)")

f. Skewness of the dataset: 0 (indicating a symmetric distribution)
