# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## TaskList 08: Probability 1.

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

***

### Intro

In this tasklist we will deal with the basics of Probability Theory as presented in Session 08 and 09. 
This tasklist is meant as a refresher of what was discussed in the sessions. It is not meant to improve your Python coding skills except for very basic things in Numpy and Scipy and all in relation to Probability Theory alone. 

**00.** There are 15 red, 20 blue, and 10 yellow marbles in a bowl. The statistical experiment that we wish to study is a random draw of a marble from the bowl. Use a Numpy vector to represent the event space of this experiment (i.e. create a "bowl" that contains 15 red, 20 blue, and 10 yellow marbles; you can perhaps use numbers to represent different colors).

In [91]:
import numpy as np

reds = np.repeat(1,repeats=15, axis=0)
blues = np.repeat(2,repeats=20,axis=0)
yellows = np.repeat(3,repeats=10,axis=0)
event_space = np.concatenate([reds, blues, yellows])
event_space


array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3])

**01.** Use Numpy to compute $P(Blue)$, $P(Red)$, and $P(Yellow)$. What is their sum?

In [92]:
p_blue = event_space[event_space==2].size/event_space.size
p_red = event_space[event_space==1].size/event_space.size
p_yellow = event_space[event_space==3].size/event_space
sum_all = (p_blue+p_red+p_yellow)
p_blue

0.4444444444444444

**02.** Compute $P(Blue \cup Red)$

In [93]:


{p_blue} | {p_red} # Set 

p_blue_red = p_blue+p_red

**03.** Assume that we have already drew five blue, two red, and six yellow marbles from the bowl. Following each draw we **did not return** the marble back into the bowl. What is now $P(Blue)$, the probability to draw a blue marble at random?

In [94]:
blue_deleted = 5
red_deleted = 2
yellow_deleted = 6

new_event_space_size = event_space.size - (blue_deleted+red_deleted+yellow_deleted)
new_number_of_blues = event_space[event_space==2].size - blue_deleted

new_blue_p = new_number_of_blues/new_event_space_size


new_blue_p



0.46875

**04.** I have a tricky coin with $P(Head)=.78$. What is the probability that I obtain sixty-five $Heads$ from one hundred tosses of my tricky coin? (I like tricky things, indeed). Use Scipy!

binomial it is. we are looking for k hits in n trials of the same hit/miss experiment.

In [95]:
import scipy
from scipy.stats import binom
rng=np.random.default_rng(seed=10012)

x = 65
n = 100
p = 0.78

outcome = binom.pmf(x,n,p)
outcome

print(f'The probability of obtaining 65 Heads from one hundrer tosses is {outcome}')

The probability of obtaining 65 Heads from one hundrer tosses is 0.0010241976052498668


**05.** And what is the probability to obtain sixty-five **or more than sixty-five** Heads? **N.B. Tricky**. 

In [96]:
outcome_final = 1- outcome

outcome_final

0.9989758023947501

**06.** My cat scratches me - accidentally or intentionally - 9.97 times monthly on the average. What is the probability that the darned animal will scratch me 11 times in the following month?

I think this one is Poission distribution. Because we have :

λ (lambda) is the expected number of occurrences in a given time interval. Also, lambda in scipy is referred as the `mu`. 

k is the actual observed count of occurrences (e.g., how many events actually happened).

### mu = λ (expected occurrences per interval).

In [104]:
from scipy.stats import poisson
mu = 9.97
k =11
animal_scratches = poisson.pmf(k,mu)

animal_scratches

0.11339007402506597

**07.** And what is the probability that the cat will scratch me five or less than five times (on a good month)?

                                                                                                    P(X ≤ 5)

- We need to sum P(X = 0) + P(X = 1) + ... + P(X = 5).

- Instead of summing manually, we use Poisson's cumulative distribution function (poisson.cdf)

In [109]:


animal_scratches_v2 = poisson.cdf(k=5,mu=9.97)
animal_scratches_v2

0.0682295077107305

**08.** The probability that any of my bank's customer support agents answer my phone call on weekends is $P(Answer)=.05$ (**N.B.** the website says it's a 24h support any given day; they just don't give a d*). What is the probability that my phone call will be answered in my 10th attempt?

**N.B.** If you wonder if Scipy can handle a Geometric distribution... [scipy.stats.geom](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.geom.html)

In [113]:
p = 0.05
N = 10

from scipy.stats import geom

probability = geom.pmf(N,p)

probability

0.03151247048623045

**09.** And what is the probability that my call will be answered in the first ten attempts, including the tenth attempt?

In [115]:
probability_ten_less = geom.cdf(N,0.05)
probability_ten_less

0.4012630607616211

**10.** As [Alex](https://www.linkedin.com/in/alegzndr/) explained in Session08, we consider finite, countably infinite, and uncountably infinite event sets in Probability Theory. The set of all natural numbers, $\mathbb{N}$, is countably infinite, and its cardinal number - meaning its size - is $\aleph_0$ (aleph-zero). A set is uncountably infinite if it cannot be brought into a 1:1 correspondence with the set of natural numbers, $\mathbb{N}$, and of course $\mathbb{R}$ - the set of real numbers - is thus uncountably infinite. We say that the cardinal number of $\mathbb{R}$ is $\aleph_1$ (aleph-one). Prove that there are no sets whose cardinality lies between $\aleph_0$ and $\aleph_1$. Please make sure to keep the proof concise, elegant, and easy to read. You can use OpenAI's [ChatGPT](https://openai.com/blog/chatgpt) to solve this task if you prefer.

Some background material is provided in [Continuum hypothesis|English Wikipedia](https://en.wikipedia.org/wiki/Continuum_hypothesis).