I would not like my problem and solution published

# Report 1

From http://study.com/academy/lesson/bayes-theorem-practice-problems.html

Widgets are manufactured in three factories: A B and C. The proportion of defective widgets from each factory are as follows:

Factory A: .01

Factory B: .04

Factory C: .02

Factories A and B produce 30% of the widgets apiece, and the remaining 40% come from Factory C. Imagine that an upset customer returns a defective widget to your company. As the manager you need to figure out the probability of each factory producing a defective widget. Although we have three factories, not two, we can still use the basic form of Bayes' theorem, given that Z represents the event that a widget is defective.

In [2]:
from __future__ import print_function, division

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import math
import numpy as np
from scipy.special import gamma

from thinkbayes2 import Pmf, Suite
import thinkplot

In [3]:
factories = Pmf(['A','B','C'])

factories['A'] = .01
factories['B'] = .04
factories['C'] = .02

factories.Print()

A 0.01
B 0.04
C 0.02


In [4]:
factories['A'] *= 30
factories['B'] *= 30
factories['C'] *= 40

factories.Print()

A 0.3
B 1.2
C 0.8


In [5]:
factories.Normalize()

factories.Print()

A 0.130434782609
B 0.521739130435
C 0.347826086957


Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of Bowl #1?

In [6]:
cookies = Pmf(['Bowl 1', 'Bowl 2'])

cookies["Bowl 1"] *= .75
cookies["Bowl 2"] *= .5

cookies.Normalize()
cookies.Print()

Bowl 1 0.6
Bowl 2 0.4


The blue M&M was introduced in 1995.  Before then, the color mix in a bag of plain M&Ms was (30% Brown, 20% Yellow, 20% Red, 10% Green, 10% Orange, 10% Tan).  Afterward it was (24% Blue , 20% Green, 16% Orange, 14% Yellow, 13% Red, 13% Brown). 

A friend of mine has two bags of M&Ms, and he tells me that one is from 1994 and one from 1996.  He won't tell me which is which, but he gives me one M&M from each bag.  One is yellow and one is green.  What is the probability that the yellow M&M came from the 1994 bag?

In [7]:
mm = Pmf([1994, 1996])

mm[1994] *= .1*.2
mm[1996] *= .2*.14

mm.Normalize()
mm.Print()

1994 0.416666666667
1996 0.583333333333


Elvis Presley had a twin brother who died at birth.  What is the probability that Elvis was an identical twin?

In [8]:
twin = Pmf(dict(identical=8, fraternal=92))

twin['identical'] *= 1
twin['fraternal'] *= .5

twin.Normalize()
twin.Print()

fraternal 0.851851851852
identical 0.148148148148


According to the CDC, "Compared to nonsmokers, men who smoke are about 23 times more likely to develop lung cancer and women who smoke are about 13 times more likely.''
If you learn that a woman has been diagnosed with lung cancer, and you know nothing else about her, what is the probability that she is a smoker?

In [9]:
smoke = Pmf(dict(smoker=14.8, nonsmoker=75.2))

smoke['smoker'] *= 13
smoke['nonsmoker'] *= 1

smoke.Normalize()
smoke.Print()

nonsmoker 0.281016442451
smoker 0.718983557549


Suppose you are on Let's Make a Deal and you are playing the Monty Hall Game, with one twist.  Before you went on the show you analyzed tapes of previous shows and discovered that Monty has a tell: when the contestant picks the correct door, Monty is more likely to blink.

Of the 18 shows you watched, the contestant chose the correct door 5 times, and Monty blinked three of those times.  Of the other 13 times, Monty blinked three times. 

Assume that you choose Door A.  Monty opens door B and blinks.  What should you do, and what is your chance of winning?


In [10]:
from sympy import symbols
p = symbols('p')

door = Pmf(['a', 'b', 'c'])

door['a'] *= p
door['b'] *= 0
door['c'] *= 1

door.Normalize()
door.Print()

a 0.333333333333333*p/(0.333333333333333*p + 0.333333333333333)
b 0
c 0.333333333333333/(0.333333333333333*p + 0.333333333333333)


## If a person begins to type a word that begins with the letter 't', what are the chances that the next letter will be a vowel?

To get started with this problem, it would be good to create a helper function that would search for all words that start with certain letters. I will be using the word list that is included in Ubuntu and a regular expresion.

In [148]:
import re

def search_words(letters):
    words = open('/usr/share/dict/words', 'r')
    s = ""
    for word in words:
        s += word.lower()
    x = re.findall(r'\b'+letters+'\w+', s)
    words.close()
    return set(x)

Now we can get started on the problem and define our hypotheses based on our prior knowledge

Our hypotheses are:
* the next letter is a vowel
* the next letter is a consonant

There are 6 vowels and 26 letters so the probabilities for our hypotheses are:
* the next letter is a vowel has a probability of 6/26
* the next letter is a consonant has a probability of 20/26

In [149]:
probability_vowel = 6/26
probability_consonant = 20/26

Next, we can class letter_prediction that inherits from Suite. Since it is a Suite, we have to define the Likelihood function which is called when the Update function is used. The Update function calls the Likelihood function for each hypothesis.

* When the hypothesis is that the next letter is a vowel and the next letter is a vowel, we want the probability to increase.
* When the hypothesis is that the next letter is a consonant and the next letter is a consonant, we want the probablity to increase.

From this, we can write an if-elif statement based on the above points. It is in a for loop to go through the list of words that is passed in. To finish it, we should put it in a try-except statement in case there are words that are shorter than the letter position that we want to check.

In [150]:
class letter_prediction(Suite):
    def Likelihood(self, data, hypo):
        words, letters, position = data
        probability = 0
        for word in words:
            try:
                if word[position] in letters and hypo == 'vowel':
                    probability += 1
                elif word[position] not in letters and hypo == 'consonant':
                    probability += 1
            except:
                pass
        return probability/len(words)

With the class defined, we can create a prediction object with our hypotheses.

In [151]:
prediction = letter_prediction(dict(vowel = probability_vowel, consonant = probability_consonant))
prediction.Print()

consonant 0.769230769231
vowel 0.230769230769


We can define the words that start with 't' by using the function that we wrote earlier.

At the same time, we can define the letters that we are searching for as vowels. For now, I will consider *y* to be a vowel.

In [152]:
search = 't'
words = search_words(search)
vowels = 'aeiouy'

We can then update the prediction based on the set of words we have, the letters that we are looking for, and the position of the next letter in the word. 

In [153]:
search = 't'
words = search_words(search)
vowels = 'aeiouy'

prediction.Update((words, vowels, len(search)))

prediction.Print()

consonant 0.693825887206
vowel 0.306174112794


The probability  that the next letter is a vowel is about 30.6%

We can also see how the probability changes is if we consider *y* to be a consonant.

For this we need to redefine our probabilities and vowel string:
* the next letter is a vowel has a probability of 5/26
* the next letter is a consonant has a probability of 21/26

In [154]:
probability_vowel = 5/26
probability_consonant = 21/26

prediction = letter_prediction(dict(vowel = probability_vowel, consonant = probability_consonant))
vowels = 'aeiou'
prediction.Update((words, vowels, len(search)))

prediction.Print()

consonant 0.757338823475
vowel 0.242661176525


The percentage goes down from 30.6% to about 24.3% when *y* is not considered a vowel. The letter_prediction class could be used to predict the chances of the next letter in a word being a vowel given some starting characters.