# Galvanize DSI Self Assessment v3.0

* This document is a non-exhaustive study guide for applicants to the Galvanize Data Science Immersive (DSI). It is meant to help prepare you for the Technical Interview. 
* You may have been asked to complete the problems below in order to continue your application to the DSI. If this is the case, you will attach the completed notebook in an email to your Enrollment Officer. Once your work is graded, you will receive a link to book your next Technical Interview.
* Completion of this document is not mandatory if you have not been asked to do so. However, you are expected to be able to perform all problem topics within this document in the Technical Interview to the Galvanize DSI. 
* This document is not exhaustive in terms of the skills you should be developing prior to starting the DSI. Though not covered in this document or in the Technical Interview, you will want to develop your understanding of:
    * SQL
    * Hypothesis Testing for One and Two samples
    * Basic Linear Algebra
* Questions may be directed to: admissions@galvanize.com

<hr>

<hr>

## Contents
* Section A: Python
    * SA:Q1: Functions and For Loops
    * SA:Q2: More Functions
    * SA:Q3: Dictionaries and Strings
* Section B: Probability and Statistics
    * SB:Q1: Descriptive Statistics
    * SB:Q2: Probability and Bayes Theorem
    * SB:Q3: Probability Distributions


<hr>

<hr>

<hr>



# Section A: Python
* complete each problem below according to the instructions

<hr>

<hr>

### SA:Q1 Functions and For Loops
* In general, you want to write code that atomizes and encapsulates procedures within functions. In the problems below, you will write a procedure without a function, and then encapsulate that same procedure within a function. 

<hr>

#### a. Write a for loop that sums the numbers in `list_num`
* Print the output once the for loop is completed
    * expected result: `219`

In [None]:
list_num = [5,3,9,7,23,108,64]
######## Enter your code below ##########
sum_n = 0
for n in list_num:
    sum_n += n
print(sum_n)

<hr>

#### b. Write a function `sum_nums()` that returns the sum of the numbers in list_num
* Print the value returned by the function
* expected result: `219`

In [None]:
def sum_nums(list_num):
    return sum(list_num)

##### Enter your code above, testing code below #####
list_num = [5,3,9,7,23,108,64]
print(sum_nums(list_num))

<hr>


#### c. Write a for loop that concatenates all strings in `list_strings` into a single string
* There are multiple ways to do this, please use a for loop in this implementation
* Print the output once the for loop is completed
    * expected result: `hot coffee milk cinnamon nutmeg ginger`
        * note that the strings are separated by spaces

In [None]:
list_strings = ['hot coffee', 'milk', 'cinnamon', 'nutmeg', 'ginger']
######## Enter your code below ##########
string = ' '
for word in list_strings:
    string += word + ' '
print(string)

<hr>

#### d. Write a function `concat_strings` that concatenates all strings passed in as an argument into a single string
* Please use a for loop in your function
* Print the output returned by the function
    * expected result: `hot coffee milk cinnamon nutmeg ginger`
        * note that the strings are separated by spaces

In [None]:

def concat_strings(list_strings):
    string = ' '
    for word in list_strings:
        string += word + ' '
    return string

##### Enter your code above, test your code below #####
list_strings = ['hot coffee', 'milk', 'cinnamon', 'nutmeg', 'ginger']
print(concat_strings(list_strings))

<hr>

<hr>

### SA:Q2 More Functions
* Seriously, we want you to use functions, a lot.
* Why?
    * Well-named functions make your code easier to read and organize.
    * Functions are reusable.
    * Functions encapsulate an atomic operation.
    * Functions are easier to run tests on.

<hr>

#### a. Write a function `plus_ten()` that takes a list of numbers and returns a list with 10 added to each number
* Print the value returned by the function
* expected result: `[22, 12, 15, 17, 19, 24, 25]`

In [None]:
def plus_ten(list_nums):
    ten_list = []
    for num in list_nums:
        ten_list.append(num+10)
    return ten_list
    

##### Enter your code above, testing code below #####
list_nums = [12, 2, 5, 7, 9, 14, 15]
print(plus_ten(list_nums))

<hr>


#### b. Write a function `list_of_word_lens()` that takes in a list of strings and returns a list of the each word length
* Print the value returned by the function
* expected result: `[10, 4, 8, 6, 6]`

In [None]:
def list_of_word_lens(list_nums):
    word_len = []
    for word in list_nums:
        word_len.append(len(word))
    return word_len
    
##### Enter your code above, testing code below #####
list_nums = ['hot coffee', 'milk', 'cinnamon', 'nutmeg', 'ginger']
print(list_of_word_lens(list_nums))

<hr>

<hr>

### SA:Q3 Dictionaries and Strings
* This challenge involves writing functions that use dictionaries
* To learn more about dictionaries, [Read the Python 3 docs on dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)

<hr>

#### a. Write a function `lower_first_rest_upper()` that takes a string as input and returns a string where the first letter of each word is lowercase, and the rest of the word is uppercase
* You will likely to use `string` methods:
    * `string.lower()`
    * `string.upper()`
* Print the result of your function call
* expected result: `fAVARTIA pERITA iS a sPECIES oF sEA sNAIL`

In [None]:
string_to_mod = 'Favartia perita is a species of sea snail'

#########  Complete the Code Below ###########
def lower_first_rest_upper(string_to_mod):
    s_ = string_to_mod.upper().split()
    st = ''
    for word in s_:
        wor = word[0].lower()
        st += wor + word[1:] + ' '
    return st

#this was also done with .title() and .swapcase()  

######## Test your Code Below ###########
print(lower_first_rest_upper(string_to_mod))

<hr>

#### b. Write a function `letter_counter()` that takes a string as input and returns a dictionary associating letters with the count of that letter's occurrence
* Note that we are using a DOCSTRING to describe the function's behavior
* Print the dictionary returned by the function
* expected result: `{'a': 36, 'e': 50, 'i': 30, 'o': 31, 'u': 9, 'n': 28}`

In [None]:
paragraph_to_read = '''
Kepler-20f (also known by its Kepler Object of Interest 
designation KOI-70.05) is an exoplanet orbiting the 
Sun-like star Kepler-20, the second outermost of five 
such planets discovered by NASA's Kepler spacecraft. 
It is located approximately 929 light-years 
(285 parsecs, or about 8.988*1015 km) from Earth in 
the constellation Lyra. The exoplanet was found by 
using the transit method, in which the dimming effect 
that a planet causes as it crosses in front of its star 
is measured. The planet is notable as it has the 
closest radius to Earth known so far.
'''

#########  Complete the Function Code Below ###########
def letter_counter(paragraph, letters_to_count):
    '''
    Returns the number of times a list of specified
    letters appear in a string. The count should be
    case-insensitive
    
    PARAMETERS
    ----------
    paragraph: str
        A potentially multi-line string
    letters_to_count: list of strings
        A list of letters
        
    RETURNS
    -------
    letter_dict: dict
        - key: letter
        - value: the count of that letter in the 
                 paragraph
                 
    EXAMPLE
    -------
    ```
    This is the string of interest. Count the vowels!
    ```
    >>> letter_counter(example_string, [a,e,i,o,u])
    {'a':0, 'e':5, 'i':4, 'o':3, 'u':1}
    '''
    
    letter_dict = {'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0,'n': 0}
    for let in paragraph_to_read.lower():
        if let in letter_dict.keys():
            letter_dict[let] += 1
    return letter_dict
                


######## Test your Code Below ###########
print(letter_counter(paragraph_to_read, ['a','e','i','o','u', 'n']))

<hr>

#### c. (OPTIONAL EXTRA CREDIT) Write a function `word_permutations_dict()` that takes a string as input and returns a dictionary where the keys are every word in the input string, and the values are lists containing strings of every permutation of letters of that word
* **DO NOT use an import of a permutations method here, implement an algorithm!**
    * You will likely want to use [Heap's Algorithm](https://en.wikipedia.org/wiki/Heap%27s_algorithm) to generate your permutations, although there are multiple other algorithms to accomplish this task.
* You may want to use a helper function
* When you test your function, it will be more helpful to print the lengths of the lists for each key in the dictionary instead of printing the lists themselves.
    * expected result:
           moths : 120 permutations
           are : 6 permutations
           insect : 720 permutations
           teddy : 60 permutations
           bears : 120 permutations
    * Hint:
        * What happens if you have repeated letters in a word?

In [None]:
test_string = 'moths are insect teddy bears'

#########  Complete your Function Code Below ###########

def word_permutations_dict(input_string):
    '''
    Returns a dictionary containing lists of permutations of every word of an input string.      Assume the length of the permutation is all letters in that given word. Assume no            punctuation in the string and that words are separated by spaces. Assume all letters         are lower case.
    
    PARAMETERS
    ----------
    input_string: str
            
    RETURNS
    -------
    letter_dict: dict
        - key: word
        - value: list of permutations of that word
                 
    EXAMPLE
    -------
    ```
    some_string = 'art bib'
    ```
    >>> word_permutations_dict(some_string)
    {'art': ['rta', 'tra', 'tar', 'rat', 'art', 'atr'],
     'bib': ['bbi', 'bib', 'ibb']}
    '''
   
######## Test your Code Below ###########
perms_dict = word_permutations_dict(test_string)
for k,v in perms_dict.items():
    print('{} : {} permutations'.format(k, len(v)))

<hr>

<hr>

<hr>



# Section B: Probability and Statistics
* complete each problem below according to the instructions

<hr>

<hr>

### SB:Q1 Descriptive Statistics
* In this section, you will implement functions to describe data that is input.
* [Khan Academy Course on Summarizing Quantitative Data](https://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data)
* **DO NOT USE SPECIFIC IMPORTED FUNCTIONS**
    * You are welcome to use vectorized math with the [numpy.array object](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html), which is completely fine and encouraged. However, you should implement any measurement functions yourself and not import any specific, precoded metric functions.

<hr>

#### a. Write a function `central_tendencies` that returns the mean, median and mode of a list of numbers
* Print the output from the function:
    * expected result: `(5.933333333333334, 6.0, 1.0)`

In [None]:
def central_tendencies(num_list):
    '''
    Returns the mean, median and mode
    of a list of numbers
    
    PARAMETERS
    ----------
    num_list: list of ints
        
    RETURNS
    -------
    central_vals: tuple of len=3
        mean: float
            the arithmetic mean of the num_list

        median: float
            the median value in the num_list

        mode: float
            the most common value in the num_list
            assume only one mode in the distribution
        
    EXAMPLE
    -------
    ```
    nums = [2, 2, 3, 4, 5]
    ```
    >>> central_tendencies(nums)
    (3.2, 3.0, 2.0)
    '''
    mean = sum(num_list) / len(num_list)
    if len(num_list) % 2 == 1:
        ind = len(num_list) // 2
        median = num_list[ind]
    elif len(num_list) % 2 == 0:
        ind1 = len(num_list) / 2
        ind2 = (len(num_list) / 2) - 1
        median = num_list[ind1] / num_list[ind2]

    for n in num_list:
       mode = max([num_list.count(n)]) 

    return mean, median, mode
    
    


######## Test your Code Below ###########
nums = [1,1,1,2,3,4,5,6,7,8,9,9,10,11,12]
print(central_tendencies(nums))

<hr>

#### b. Write a function `distr_spread()` that returns the range, variance, and standard deviation of a list of numbers
* Hints: 
    * You can call your previous function 
    * You may want to import the python `math` module 
        * read about the `math` module [here](https://docs.python.org/3/library/math.html)
* Print the output from the function:
    * expected result: `(11, 14.638095238095238, 3.8259763770958175)`

In [None]:
import math

def distr_spread(num_list):
    '''
    Returns the range, sample variance, and sample standard deviation
    
    PARAMETERS
    ----------
    num_list: list of ints
        
    RETURNS
    -------
    spread_vals: tuple of len=3
        samp_range: float
            the span of values in the data

        samp_variance: float
            the median value in the num_list

        samp_std: float
            the most common value in the num_list
            assume only one mode in the distribution
        
    EXAMPLE
    -------
    ```
    nums = [2, 2, 3, 4, 5]
    ```
    >>> distr_spread(nums)
    (3, 1.7, 1.3038404810405297)
    '''
    call =  central_tendencies(num_list)
    samp_range = max(num_list) - min(num_list)
    samp = 0
    for item in num_list:
        samp += (item - call[0])**2
    samp_variance = samp / (len(num_list) - 1)
    
    samp_std = samp_variance**0.5
        
    return samp_range, samp_variance, samp_std


######## Test your Code Below ###########
nums = [1,1,1,2,3,4,5,6,7,8,9,9,10,11,12]
print(distr_spread(nums))


<hr>

#### c. Write a function `get_mse()` that returns the Mean Squared Error between two lists of numbers
* Note: We are referring to MSE in the context of the of the sum of squared difference between actual observations and the observations predicted by a model.
* Print the output from the function:
    * expected result: `2.2666666666666666`

In [None]:
def get_mse(actual, predictions):
    '''
    Returns the Mean Squared Error between
    two lists
    
    PARAMETERS
    ----------
    actual: list of ints
    
    predictions: list of ints
        
    RETURNS
    -------
    mse: float
        Representing the average of the sum
        of squared errors between two lists
        
    EXAMPLE
    -------
    ```
    nums1 = [2, 2, 3, 4, 5]
    nums2 = [3, 3, 3, 5, 5]
    ```
    >>> get_mse(nums)
    0.6
    '''
    squ_sum_me = 0
    if len(actual) == len(predictions):
        for i, j in enumerate(actual):
            diff = actual[i] - predictions[i]
            squ_sum_me += diff**2
            mse =  squ_sum_me / len(actual)
        return mse
    

######## Test your Code Below ###########
actual = [1,1,1,2,3,4,5,6,7,8,9,9,10,11,12]
predictions = [2,2,2,3,4,5,6,7,8,9,10,10,8,14,15]
print(get_mse(actual, predictions))


<hr>

#### d. Write a function `sigmoid_logistic()` that returns $f(x)$ for the sigmoid/logistic function $f(x)=\frac{1}{1+e^{-x}}$,  as well as 0 or 1 based on the setting of a threshold 
* The Logistic Function will come up a lot in terms of Logistic Regression.
* You can get Euler's number $e$ from the `math` module.
* Print the output from the function:
    * expected result: `(0.7310585786300049, 1)`

In [None]:
import math

def sigmoid_logistic(x, threshold):
    '''
    Returns the output of the sigmoid logistic function and 0 or 1 based on a given              threshold.
    
    PARAMETERS
    ----------
    x: number
        Input to the sigmoid/logistic function
    
    threshold: float
        value between 0 and 1 that dictates
        threshold
        
    RETURNS
    -------
    tuple:
        f_of_x: float
            Representing the output of the 
            sigmoid function
            
        threshed: int
            0 or 1, based on the given
            result of f_of_x being above
            or below the threshold
        
    EXAMPLE
    -------
    ```
    x = -9
    threshold = 0.4
    ```
    >>> sigmoid_logistic(x, threshold)
    (0.00012339457598623172, 0)
    '''
    if threshold >= 0.5:
        return 1 / (1 + (math.e**-x)), 1
    else:
        return 1 / (1 + (math.e**-x)), 0
            
######## Test your Code Below ###########
x = 1
threshold = 0.73
print(sigmoid_logistic(x, threshold))

<hr>

<hr>

### SB:Q2 Probability and Bayes Theorem
* In this section, you will fill in markdown cells with responses to probability problems.
    * You should also write a brief justification for your numerical response.
* You should perform any calculations using Python in the code cells immediately above the markdown cell.
* For a review of probability, see 
    * [Khan Academy](https://www.khanacademy.org/math/precalculus/prob-comb)
    * [Interactive Mathematics](https://www.intmath.com/counting-probability/counting-probability-intro.php)

<hr>

#### a. Permutations
* [Related KA Practice Problems](https://www.khanacademy.org/math/precalculus/prob-comb/combinatorics-precalc/e/permutations_1)
* [Related IM Practice Problems](https://www.intmath.com/counting-probability/3-permutations.php)

1. How many ways can you arrange the numbers 1, 2, 3, 4 and 5?

In [None]:
# perform any calculations here
def factorial(n):
    fact = 1
    for i in range(2, n + 1):
        fact *= i
    return fact
print(factorial(5))

```
120. 5 choices for the 1st spot...4 for the 2nd..3 for the 3rd...2 for the 4th...and the last for the 5th

```

2. How many ways can you arrange 1, 1, 2, 3, 4?


In [None]:
# perform any calculations here
def permutations(n, k):
    return factorial(n) / factorial(n-k)
print(permutations(5,5))

```
120. 5 nums from which to choose.4 unique

```

3. How many ways can you arrange two 3s and three 5s?

In [None]:
# perform any calculations here
print(permutations(5, 5))

```
120. again 5 numbers, therefore 2 are unique
```

<hr>

#### b. Combinations
* [Related KA Practice Problems](https://www.khanacademy.org/math/precalculus/prob-comb/combinations/e/combinations_1)
* [Related IM Practice Problems](https://www.intmath.com/counting-probability/4-combinations.php)

1. How many different poker hands (5 cards) can you have? A deck holds 52 cards

In [None]:
# perform any calculations here
def combinations(n,k):
    return factorial(n) / (factorial(n-k)*factorial(k))
print(combinations(52,5))

```
2598960. there are 52! totaly ways, but we only want 5 cards and also need to get rid of the repeats. there are several ways to get 4 of a kind 3 of a kind full house, royal flush...etc...

```

2. There are five flavors of ice cream: Chocolate, Vanilla, Pistachio, Strawberry, and Mint. How
many three scoop ice-creams can you make if all the scoops must be different flavors?

In [None]:
# perform any calculations here
def combinations(n,k):
    return factorial(n) / (factorial(n-k)*factorial(k))
print(combinations(5,3))

```
10. there are 5 total flavors but we want to choose 3 distinct flavors..in other words matters because we don't want repeats.
```

<hr>

#### c. Probability of an Event
* [Related IM Practice Problems 1](https://www.intmath.com/counting-probability/6-probability-event.php)
* [Related IM Practice Problems 2](https://www.intmath.com/counting-probability/poker.php)

1. In a deck of cards (52 cards), what’s the probability of picking a queen? A heart? Of picking a
card that’s not a queen nor a heart?

In [None]:
# perform any calculations here
print(4/52)
print(13/52)
print(36/52)

```
queen: 0.07692307692307693, there are 4 queens in a 52 card deck
heart: 0.25, there are 13 hearts in a 52 card deck
not a queen nor a heart: 0.6923076923076923, there are 13 hearts, 4 queens but one of those hearts is a queen so 13 + 3 = 1 - 16/52 or 36/52
```

2. If I do not replace the cards, what is the probability of picking 2 kings? 4 diamonds? How do these
probabilities evolve if I replace the cards after each draw?

In [None]:
# perform any calculations here
print(4/52 * 3/52)
print((13/52) * (12/52) * (11/52) * (10/52))

```
2 kings: 0.004437869822485207; 1st king is 4/52 and 2nd king is 4/52 because you don't replace. multiple them
4 diamonds: 0.0023469503868912152; 1st diamond is 13/52, 2nd is 12/52, 3rd is 11/52, 4th is 10/52. multiply them
if you replace the cards after each draw then it becomes; 4/52 * 4/52 and 13/52 * 13/52 * 13/52 * 13/52
```

<hr>

#### d. Independent and Dependent Events
* [Related IM Practice Problem](https://www.intmath.com/counting-probability/8-independent-dependent-events.php)

1. The table below represents the number of kids dressed as pumpkins or ghosts on Halloween
night and the amount of candy they received:

| Amount of Candy | Less than 10 | 10-20 | 20-30 | Greater than 30 |
|-----------------|:------------:|------:|-------|-----------------|
| Pumpkins        |       5      |    10 | 60    | 25              |
| Ghosts          |      15      |    40 | 80    | 15              |

> 1a.  What is the probability that a kid dressed as a pumpkin gets 20 or more pieces of candy?
What about if they dress like a ghost?

In [None]:
# perform any calculations here
#P(A|B) = P(AB) / P(B)
#P(B) = a pumpkin = 100 / 250
#P(AB) = intersect of pumpkin and >20 pieces = 85 / 280
def dep(Pb, Pab):
    return Pab / Pb
print(dep(100/250, 85/250))
print(dep(150/250, 95/250))

```
0.85
0.6333333333333334

```

> 1b.  What is the probability that a kid obtains less than 10 pieces of candy?

In [None]:
# perform any calculations here
#P(A) = amount of kids that got < 10 pieces / total amount of kids
def indep(a,b):
    return a / b
print(indep(20, 250))

```
0.08

```

> 1c.   What is the probability that two siblings, one dressed as a ghost and one dressed as a
pumpkin, each receive 20 to 30 pieces of candy?

In [None]:
# perform any calculations here
print(dep(100/250, 60/250)*dep(150/250, 80/250))

```
0.32

```

2. You toss a fair die twice. What is the probability of getting less than 3 on the first toss and an
even number on the second?

In [None]:
# perform any calculations here
print((2/6) * (3/6))

```
0.16666666666666666, 2/6 chance of getting less than 3 * 3/6 chance of getting an even

```

<hr>

#### e. Bayes' Theorem
* Related Links:
    * [Bayes Intuition](https://betterexplained.com/articles/an-intuitive-and-short-explanation-of-bayes-theorem/)
    * [Interactive Bayes](http://rpsychologist.com/d3/bayes/)
    * [Related IM Practice Problems](https://www.intmath.com/counting-probability/10-bayes-theorem.php)
    * [Related KA Practice Problems](https://www.khanacademy.org/math/ap-statistics/probability-ap/stats-conditional-probability/e/calculating-conditional-probability)

1. You run the appliance department at your local ACME Superstore. Your toaster
inventory is evenly split between those produced by the ACME corporation and General
Products. ACME-branded toasters have a history of failing 9% of the time before the end
of the warranty where as the General Products’ toasters fail 3% of the time before the
end of the warranty.

> 1a. A customer walks in with a broken toaster that they bought at your store, what is
the probability that their toaster was made by ACME?

In [None]:
# perform any calculations here
#P(A | B) = (P(B | A) * P(A)) / P(B)
#P(A | B) = made by acme given it's broken
#P(B | A) = broken given from acme 
#P(A) = from acme
#P(B) = it is broken = P(A)*P(B|A) + P(not A)*P(B|not A) = (from acme * broken given acme) + (not acme * broken given not acme)
def bayes(Pa, Pba, Pna, Pbna):
    Pb = (Pa * Pba) + (Pna * Pbna)
    return (Pba * Pa) / Pb
print(bayes(0.50, 0.2, 0.50, 0.6))

```
0.75

```

> 1b. You make changes to your inventory such that you now have 2 ACME toasters
for every 3 toasters made by General Products. Using the same scenario as
question 1, what is the probability that the toaster brought in by the customer was
made by ACME?

In [None]:
# perform any calculations here
print(bayes(0.40, 0.09, 0.60, 0.03))

```
0.6666666666666667

```

2. They say life is a box of chocolates. On Valentine’s day you get two boxes of chocolates,
both made by the ACME corporation. The first box has 8 chocolates filled with Caramel,
4 filled with Toothpaste, and 3 filled with Cream. The second box has 10 chocolates filled
with Caramel, 2 filled with Toothpaste, 4 filled with Cream and 4 filled with Shaving
Cream.

> 2a. What is the probability that after you a pick a box at random, that you would get a
nasty-filled chocolate (either filled with shaving cream or toothpaste)?

In [None]:
# perform any calculations here
print((4/15)*0.50 + (6/20)*0.50)

```
0.2833333333333333-the problem does not state that it comes from a specific box nor does it ask for a specific box, therefore one could (tp + sc) / total
```

>2b. You take a bite of one of the chocolates and it turns out to be Caramel (you lucky
duck), what is probability that the chocolate came from the second box?

In [None]:
# perform any calculations here
print(bayes(0.50, 0.50, 0.50, 8/15))

```
0.48387096774193555

```

3. The Umbrella Corporation is developing a blood test to detect early-onset zombification.
In their medical ward, their patients traditionally have a ⅓ chance of actually having the
zombie virus. Through their experiments the Umbrella Corporation scientists have
determined that their test has a specificity and sensitivity of 95% 
    * [Wikipedia Definition of Sensitivity and Specificity](https://en.wikipedia.org/wiki/Sensitivity_and_specificity).

>3a. If a patient tests positive, what is the chance that they actually have the zombie
virus? Calculate the positive predictive value (PPV) of this test.

In [None]:
# perform any calculations here
#Pa, Pba, Pna, Pbna
print(bayes(1/3, .95, 2/3, .05))

```
0.9047619047619048

```

>3b. The Umbrella Corporation is unhappy with their current test and develops two
new tests:

|       metric      | Test A | Test B |
|-------------|:------:|-------:|
| Sensitivity |   97%  |    93% |
| Specificity |   93%  |    97% |


Which test is superior? Defend your decision. (Hint: you can’t tell just by looking
at information above, you will need to do some calculations)

In [None]:
# perform any calculations here
'''
P(A | B) = (P(B | A) * P(A)) / P(B)
P(A | B) = actually have virus given test positive
P(B | A) = test positive given actually have virus 
P(A) = has virus
P(B) = positive = P(A)*P(B|A) + P(not A)*P(B|not A) = (has virus * test positive given has virus) + (not have virus * test positive given not have virus)
def bayes(Pa, Pba, Pna, Pbna)'''
print(bayes(1/3, 0.97, 2/3, 0.07))
print(bayes(1/3, 0.93, 2/3, 0.03))

```
Test A: 0.8738738738738738 

Test B: 0.9393939393939393; Test B is superior

```

<hr>

<hr>

### SB:Q3 Probability Distributions
* In this section, you will fill in markdown cells with responses to problems related to Probability Distributions.
* You should perform any calculations using Python in the code cells immediately above the markdown cell.
* For a survey of probability distributions, see [http://www.jbstatistics.com](http://www.jbstatistics.com)

<hr>

#### a. Uniform Distribution

1. Let the random variable X be the angle of a slice of pizza. The angle X has a uniform distribution
on the interval [0, 90]. What is the probability that your slice of pizza will have an angle between
30° and 40°?


In [None]:
# perform any calculations here
#height = f(x) = 1/(b-a) = 1/90 = 0.011
#find (90 - 40) =  0.02; (90 - 30) = 1/60 = 0.0167
def uniform_distr(a,b,x):
    height = 1 / (b-a)
    return 1 - (height*(b - x))
print(uniform_distr(0, 90, 40))
print(uniform_distr(0, 90, 30))
print(uniform_distr(0, 90, 40) - uniform_distr(0, 90, 30))

```
0.11111111111111116

```

2. X is uniform on the interval [a,b], can you derive the expected value E(X)? The variance V(X)?

In [None]:
# perform any calculations here
def see_below():
    return 'see below'
print(see_below())

```
The expected value is the mid-point between a and b. This value is the mean and the median due to the uniform distribution of values between a and b. The equation is (a+b)/2, which is the average.
The variance = E(x**2) - E(x)**2 which requires integration on x**2(f(x)). f(x) is the constant 1/(b-a). After integration of x**2 and doing some algebra, the variance is 
((b-a)**2)/12
```

<hr>

#### b. Geometric Distributions
Suppose you have an unfair coin, with a 68% chance of getting tails. What is the probability that the first
head will be on the 3rd trial?

In [None]:
# perform any calculations here
def geometric_pmf_before(p, k):
 return ((1-p)**k)*p
print(geometric_pmf_before(.32, 2))

```
0.147968
```

<hr>

#### c. Poisson Distribution
On average 20 taxis drive past your office every 30 minutes. What is the probability that 30 taxis will drive
by in 1 hour?

In [None]:
# perform any calculations here
from math import e
def poisson_pmf(lmbda, k):
    return (lmbda**k)*(e**-lmbda)/factorial(k)
print(poisson_pmf(40, 30))

```
0.01846547096073411

```

<hr>

#### d. Binomial Distribution

1. Fair coin: Imagine you were to flip a fair coin 10 times. What would be the probability of getting 5
heads?

In [None]:
# perform any calculations here
def binomial_pmf(n, k, p=0.5):
    return combinations(n, k) * (p**k) * ((1 - p)**(n - k))
print(binomial_pmf(10, 5, 0.5))

```
0.24609375

```

2. Unfair coin: You have a coin with which you are 2 times more likely to get heads than tails. You
flip the coin 100 times. What is the probability of getting 20 tails? What is the probability of getting
at least one heads?

In [None]:
# perform any calculations here
def binomial_pmf(n, k, p=0.5):
    return combinations(n, k) * (p**k) * ((1 - p)**(n - k))
print(binomial_pmf(100, 20, 0.33))

def cdf_binomial_distr(n, k, p):
    cdf_accum = 0
    for i in range(k + 1):
        cdf_accum += binomial_pmf(n, i, p)
    return cdf_accum

print(cdf_binomial_distr(100, 99, .33))

```
0.0015325488585154659
0.9999999999999946
```

<hr>

#### e. Normal Distribution

1. Suppose X has a standard normal distribution (Mean = 0, Std. Dev. = 1). Compute P (X > 9), P (1
< X < 3) and P (X > −3).

In [None]:
# perform any calculations here
from scipy.stats import norm
print(1 - norm.cdf(9, loc=0, scale=1))
print(norm.cdf(3, loc=0, scale=1) - norm .cdf(1, loc=0, scale=1))
print(1 - norm.cdf(-3, loc=0, scale=1))

```
P(X > 9) = 0
P(1 < X < 3) = 0.15730535589982697
P(X > −3) = 0.9986501019683699
```

2. The weight in pounds of individuals in a population of interest has a normal distribution, with a
mean of 150 and a standard deviation of 40. What is the expected range of values that describe
the weight of 68% of the population (Hint: use the empirical rule)? Of the people who weigh more
than 170 pounds, what percent weigh more than 200 pounds? (Hint: this is conditional probability)

In [None]:
# perform any calculations here
from scipy.stats import norm
def empirical_rule(mean, std):
    return mean - 1*std, mean + 1*std
print(empirical_rule(150, 40))
#mean - 1*std, mean + 1*std...68%
#mean - 2*std, mean + 2*std...95%
#mean - 3*std, mean + 3*std...99.7%
print(11.75/33)

```
110 - 190: this is 1 std on either side of the mean which is 68%
weigh more than 200 given they are 170: 35.6%
```

<hr>

#### f. Exponential Distribution

1. Let X, the number of years a computer works, be a random variable that follows an exponential
distribution with a lambda of 3 years. You just bought a computer, what is the probability that the
computer will work in 8 years?

In [None]:
# perform any calculations here
def exp_distr_pdf_cdf(lam, x):
    return 1 - (1 - (e**-(lam*x)))
print(exp_distr_pdf_cdf(1/3, 8))

'''PDF = lamba*(e**-lamba*x) for x >= 0. mean = 1/lambda. var = 1/(lambda**2). std = 1/      lambda = mean. CDF = 1 - (e**(-lambda*x)).'''


```
0.06948345122280153

```

2. Let X be a random variable that now follows an exponential distribution with a half-life of 6 years.
Find the parameter of the exponential distribution and the probability P(X > 10).

In [None]:
# perform any calculations here
print(exp_distr_pdf_cdf(1/12, 10))

```
the parameter is 1/12
0.4345982085070783

```

3. What is the conditional probability P (X > 20 | X > 10)?

In [None]:
# perform any calculations here
print(exp_distr_pdf_cdf(1/12, 20) / exp_distr_pdf_cdf(1/12, 10))

```
0.43459820850707825

```