In [1]:
import scipy.stats as stats
from scipy.special import comb, perm
import numpy as np

### Chapter 5

    22. You arrive at a bus stop at 10 o’clock, knowing that the bus will arrive at some
    time uniformly distributed between 10 and 10:30. What is the probability that
    you will have to wait longer than 10 minutes? If at 10:15 the bus has not yet
    arrived, what is the probability that you will have to wait at least an additional
    10 minutes?

In [2]:
dist = stats.uniform(loc=0, scale=30)

In [3]:
#Longer Than Ten Minutes
1 - dist.cdf(10)

0.6666666666666667

In [4]:
#Longer Than Ten Minutes Given That You've Already Waited Fifteen Minutes
(1 - dist.cdf(25)) / (1 - dist.cdf(15))

0.33333333333333326

In [5]:
#If this were an Exponential Distribution and Not Uniform
dist = stats.expon(loc=0, scale=30)
#Longer Than Ten Minutes
1 - dist.cdf(10)

0.7165313105737893

In [6]:
#Longer Than Ten Minutes Given That You've Already Waited Fifteen Minutes
1 - dist.cdf(10)

0.7165313105737893

    23. If X is a normal random variable with parameters µ = 10, σ2 = 36, compute:

In [7]:
dist = stats.norm(10, 6)

In [8]:
#(a) P{X > 5};
1 - dist.cdf(5)

0.7976716190363569

In [9]:
#(b) P{4 < X < 16};
dist.cdf(16) - dist.cdf(4)

0.6826894921370859

In [10]:
#(c) P{X < 8};
dist.cdf(8)

0.36944134018176367

In [11]:
#(d) P{X < 20};
dist.cdf(20)

0.9522096477271853

In [12]:
#(e) P{X > 16}.
1 - dist.cdf(16)

0.15865525393145707

    24. The Scholastic Aptitude Test mathematics test scores across the population of
    high school seniors follow a normal distribution with mean 500 and standard
    deviation 100. If five seniors are randomly chosen, find the probability that
    (a) all scored below 600 and (b) exactly three of them scored above 640.

In [13]:
dist = stats.norm(500, 100)

In [14]:
np.power(dist.cdf(600), 5) * comb(5,5)

0.42157023045754516

In [15]:
np.power(dist.cdf(640), 2) * np.power(1 - dist.cdf(640), 3) * comb(5, 3)

0.004450368968234036

    25. The annual rainfall (in inches) in a certain region is normally distributed with
    µ = 40, σ = 4. What is the probability that in 2 of the next 4 years the rainfall
    will exceed 50 inches? Assume that the rainfalls in different years are independent

In [16]:
dist = stats.norm(40, 4)
np.power(dist.cdf(50), 2) * np.power(1 - dist.cdf(50), 2) * comb(4,2)

0.00022849524983804603

    26. The width of a slot of a duralumin forging is (in inches) normally distributed with
    µ = .9000 and σ = .0030. The specification limits were given as .9000±.0050.
    What percentage of forgings will be defective? What is the maximum allowable
    value of σ that will permit no more than 1 in 100 defectives when the widths are
    normally distributed with µ = .9000 and σ = .0030?

In [17]:
dist = stats.norm(0.9, 0.003)
#Percentage of Defective Forging
dist.cdf(.9 - .005) + (1 - dist.cdf(.9 + .005))

0.09558070454562914

In [18]:
i = 0.00300
dist = stats.norm(0.9, i)
x = dist.cdf(.9 - .005) + (1 - dist.cdf(.9 + .005))
while x > 0.01:
    i -= 0.00001
    dist = stats.norm(0.9, i)
    x = dist.cdf(.9 - .005) + (1 - dist.cdf(.9 + .005))
print('Maximum Standard Deviation', i, dist.cdf(.9 - .005) + (1 - dist.cdf(.9 + .005)))

Maximum Standard Deviation 0.0019399999999999973 0.009956984381494834


    27. A certain type of lightbulb has an output that is normally distributed with mean
    2,000 end foot candles and standard deviation 85 end foot candles. Determine
    a lower specification limit L so that only 5 percent of the lightbulbs produced
    will be defective. (That is, determine L so that P{X ≥ L} = .95, where X is the
    output of a bulb.)

In [19]:
dist = stats.norm(2000, 85)
dist.ppf(0.05)

1860.1874417091249

    28. A manufacturer produces bolts that are specified to be between 1.19 and
    1.21 inches in diameter. If its production process results in a bolt’s diameter
    being normally distributed with mean 1.20 inches and standard deviation .005,
    what percentage of bolts will not meet specifications?

In [20]:
dist = stats.norm(1.2, 0.005)
dist.cdf(1.19) + (1 - dist.cdf(1.21))

0.045500263896358195

### Chapter 6

    10. A tobacco company claims that the amount of nicotine in its cigarettes is a random
    variable with mean 2.2 mg and standard deviation .3 mg. However, the sample
    mean nicotine content of 100 randomly chosen cigarettes was 3.1 mg. What is the
    approximate probability that the sample mean would have been as high or higher
    than 3.1 if the company’s claims were true?

In [21]:
dist = stats.norm(2.2, 0.3 / np.sqrt(100))
1 - dist.cdf(3.1)

0.0

    11. The lifetime (in hours) of a type of electric bulb has expected value 500 and
    standard deviation 80. Approximate the probability that the sample mean of n
    such bulbs is greater than 525 when


In [22]:
#(a) n = 4;
dist = stats.norm(500, 80/np.sqrt(4))
1 - dist.cdf(525)

0.26598552904870054

In [23]:
#(b) n = 16;
dist = stats.norm(500, 80/np.sqrt(16))
1 - dist.cdf(525)

0.10564977366685535

In [24]:
#(c) n = 36;
dist = stats.norm(500, 80/np.sqrt(36))
1 - dist.cdf(525)

0.030396361765261393

In [25]:
#(d) n = 64.
dist = stats.norm(500, 80/np.sqrt(64))
1 - dist.cdf(525)

0.006209665325776159

    12. An instructor knows from past experience that student exam scores have mean
    77 and standard deviation 15. At present the instructor is teaching two separate
    classes — one of size 25 and the other of size 64.

    a) Approximate the probability that the average test score in the class of size 25
    lies between 72 and 82.

In [26]:
dist = stats.norm(77, 15/np.sqrt(25))
dist.cdf(82) - dist.cdf(72)

0.9044192954543706

    b) Repeat part (a) for a class of size 64.

In [27]:
dist = stats.norm(77, 15/np.sqrt(64))
dist.cdf(82) - dist.cdf(72)

0.9923392388648204

c) What is the approximate probability that the average test score in the class of
size 25 is higher than that of the class of size 64?

In [28]:
np.random.seed(42)
dist_a = stats.norm(77, 15/np.sqrt(25))
dist_b = stats.norm(77, 15/np.sqrt(64))
total_count = 0
for i in range(10000):
    if dist_a.rvs() > dist_b.rvs():
        total_count += 1
print(total_count / 10000)

0.5018


This answer is technically 50%, we can do a mock simulation to see that it's around that. Intuitively this makes sense, even though the tails are longer for the class of 25, this goes both ways, and both distributions have the same mean.

d) Suppose the average scores in the two classes are 76 and 83. Which class, the
    one of size 25 or the one of size 64, do you think was more likely to have
    averaged 83?

In [29]:
dist_a.pdf(83)

0.017996988837729353

In [30]:
dist_b.pdf(83)

0.0012715137074479149

The class of 25 has a higher chance of scoring 83 as it has a higher PDF with that value. Again, this intuitively makes sense, as the class of 25 has a longer tail than the class of 64 and has more of a chance of having an 'outlier' value

    13. If X is binomial with parameters n = 150, p = .6, compute the exact value of
    P{X ≤ 80} and compare with its normal approximation both (a) making use of
    and (b) not making use of the continuity correction.

In [31]:
binom_dist = stats.binom(n=150, p=0.6)
binom_dist.cdf(80)

0.05745956249718806

In [32]:
norm_dist = stats.norm(binom_dist.mean(), binom_dist.std())
#NormalApproximationWithContinuityCorrection
norm_dist.cdf(80.5)

0.05667275460976292

In [33]:
norm_dist.cdf(80)

0.0477903522728147