

# Chapter 3 Quiz Instructions

Read Chapter 3 of Essential Math for Data Science by Thomas Nield. This is available to you through UMGC's Library. Look closely at the questions at the end of the chapter and then look at the solutions provided in the appendix. When you are ready, work through this notebook to answer these quiz questions. Once you are satisfied with your answers, go to the classroom to submit your answers. You have two attempts, so if you miss a question, come back here to work out another answer.

# Question 1

1.	A carpenter wants to know the average length of the wooden planks they have in their workshop. They measure 12 planks and get the following lengths in inches: 24, 36, 18, 30, 24, 30, 36, 18, 24, 36, 30, 18 Calculate the mean and sample standard deviation (3 decimal places) for this set of values.

Enter the sizes of the planks below.

In [1]:
import statistics

# create a list of the plank lengths
plank_lengths = [24, 36, 18, 30, 24, 30, 36, 18, 24, 36, 30, 18]

# calculate the mean
mean_length = statistics.mean(plank_lengths)

# calculate the sample standard deviation
stdev_length = statistics.stdev(plank_lengths)

# print the results
print("Mean length: ", mean_length)
print("Sample standard deviation: %.3f" %(stdev_length))

Mean length:  27
Sample standard deviation: 7.006


# Question 2

2.	A clothing store sells t-shirts in two sizes: small and medium. The store manager claims that 70% of their customers buy a medium-sized t-shirt. Assuming a binomial distribution, what is the probability that out of 100 customers, 60 or more will buy a medium-sized t-shirt? (show answer up to 2 decimal places)

Hint: we are using the binom function in scipy but calculating the cumulative distribution function (cdf).

Enter the probability, number of trials and expected number of shirts.

In [3]:
from scipy.stats import binom

# probability of success (buying a medium-sized t-shirt)
p = 0.7

# number of trials (customers)
n = 100

# expected number of shirts
x = 60

# probability of getting 60 or more medium-sized t-shirts
prob = 1 - binom.cdf(x-1, n, p)

print("Probability of 60 or more medium-sized t-shirts: %.2f" %(prob))

Probability of 60 or more medium-sized t-shirts: 0.99


# Question 3


3.	A coffee shop owner wants to estimate the average time their customers spend in the shop. They sample 40 customers and find that the sample mean time is 45 minutes with a sample standard deviation of 10 minutes. What is the 95% confidence interval for the mean time that all customers spend in the shop?

Enter the sample mean and standard deviation below. Enter the number of customers.

What should the alpha be if your confidence interval is 95%?

In [4]:
import scipy.stats as stats

sample_mean = 45
sample_std = 10
n = 40
alpha = 0.05

t_value = stats.t.ppf(1 - alpha/2, n-1)

lower = sample_mean - t_value * sample_std / (n ** 0.5)
upper = sample_mean + t_value * sample_std / (n ** 0.5)

print(f"The 95% confidence interval is [{lower:.2f}, {upper:.2f}]")

The 95% confidence interval is [41.80, 48.20]


# Question 4

4.	A company claims that their new software can process data on average in less than 5 seconds with a standard deviation of 1.5 seconds. A software engineer tests the software on 25 different occasions and finds that the sample mean time to process data is 4.2 seconds. Is there evidence to support the company's claim? (Use a one-tailed test with a significance level of 0.05.)

What are the t-value, p-value, and critical t-value?

Enter the values provided in the question into the code below.

In [5]:
import math
from scipy.stats import t

sample_mean = 4.2
hypothesized_mean = 5
sample_std = 1.5
sample_size = 25
significance_level = 0.05
degrees_of_freedom = sample_size - 1

# Calculate the t-value
t_value = (sample_mean - hypothesized_mean) / (sample_std / math.sqrt(sample_size))

# Calculate the p-value
p_value = t.cdf(t_value, degrees_of_freedom)

# Calculate the critical t-value
critical_t_value = t.ppf(significance_level, degrees_of_freedom)

# Compare the t-value to the critical t-value
if t_value < critical_t_value:
    print("There is evidence to support the company's claim.")
else:
    print("There is not enough evidence to support the company's claim.")

# Print the t-value, p-value, and critical t-value
print("t-value: {:.2f}".format(t_value))
print("p-value: {:.4f}".format(p_value))
print("critical t-value: {:.2f}".format(critical_t_value))
if p_value < significance_level:
  print("p-value < alpha, so we reject H0")
else:
  print("p-value > alpha, so we do not reject H0")

There is evidence to support the company's claim.
t-value: -2.67
p-value: 0.0067
critical t-value: -1.71
p-value < alpha, so we reject H0


# Question 5

5.	A bakery wants to know how many cupcakes they should prepare each day to meet the demand. They have collected data for the past 30 days and found that the daily demand for cupcakes follows a normal distribution with a mean of 120 and a standard deviation of 15. What is the minimum number of cupcakes the bakery should prepare each day to meet the demand 90% of the time?

Enter the values provided in the question above into the python code below. What should you p value be?
Hint: mu and sigma are Greek letters, what do they represent?

In [6]:
from scipy.stats import norm
import math

mu = 120
sigma = 15
p = 0.9

# Calculate the z-score for the 10th percentile
x = norm.ppf(p, loc=mu, scale=sigma)

# calculate the minimum demand level to meet the demand 90% of the time
print("x = %.2f" %x)
print("The bakery should prepare at least", math.ceil(x), "cupcakes each day to meet the demand 90% of the time.")

x = 139.22
The bakery should prepare at least 140 cupcakes each day to meet the demand 90% of the time.
