## Tutorial: Mathematics and Linear Algebra  in Python

Tackle probability and statistics in Python:

learn more about combinations and permutations, dependent and independent events, and expected value.
Data scientists create **machine learning models** to make predictions and optimize decisions. In **online poker**, the options are whether to bet, call, or fold. You aren't allowed to use software to make those decisions though. 

That's where most online poker sites draw the line in the rules. Since you can't train a machine learning model, you must train your brain. This requires an endless stream of equity calculations away from the poker table, which use many different probability and statistics concepts.

More specifically, you'll cover the following topics:
* Probability Theory: An Introduction
* Key Concepts
* Calculating Probability
* Probability with Combinations and Permutations
* Independent versus Dependent Events
* Multiple Events
* Mutually Exclusive Events
* Non-Mutually Exclusive Events
* Intersection of Independent Events
* Intersection of Dependent Events
* Expected Value
* type of Random

---------------------------------------------------------------------
you can follow me on:
> ###### [ GitHub](https://github.com/mjbahmani)
> ###### [LinkedIn](https://www.linkedin.com/in/bahmani/)
> ###### [Kaggle](https://www.kaggle.com/mjbahmani/)

-------------------------------------------------------------------------------------------------------------
 **I hope you find this kernel helpful and some upvotes would be very much appreciated**
 
 -----------

 <a id="0"></a> <br>
**Notebook Content**
1. [Installation](#1)
    1. [Windows](#2)
    1. [Linux](#3)
    1. [Jupyter notebook](#4)
    1. [Kaggle Kernel](#5)
    1. [Colab notebook](#6)
    1. [What browsers are supported?](#7)
    1. [Is it free to use?](#8)
    1. [What is the difference between Jupyter and Colaboratory?](#9)
1. [Loading Packages](#10)
1. [Introduction](#11)
1. [probability](#12)
1. [Combinations and Permutations](#13)
    1. [Permutations](#14)
1. [Independent versus Dependent Events](#15)
1. [Expected Value](#16)
1. [Random](#17)
1. [Generate a list of Random Numbers](#18)
1. [Generating Random Strings or Passwords with Python](#19)
1. [Random Integer Numbers](#20)
1. [Conclusion](#21)
1. [References](#22)

 <a id="1"></a> <br>
## 1-Installation
 <a id="2"></a> <br>
#### Windows:

* Anaconda (from https://www.continuum.io) is a free Python distribution for SciPy stack. It is also available for Linux and Mac.
* Canopy (https://www.enthought.com/products/canopy/) is available as free as well as commercial distribution with full SciPy stack for Windows, Linux and Mac.
* Python (x,y) is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from http://python-xy.github.io/)
 <a id="3"></a> <br>
#### Linux
Package managers of respective Linux distributions are used to install one or more packages in SciPy stack.

For Ubuntu Users:
sudo apt-get install python-numpy python-scipy python-matplotlibipythonipythonnotebook
python-pandas python-sympy python-nose

 <a id="4"></a> <br>
## 1-1 Jupyter notebook
I strongly recommend installing **Python** and **Jupyter** using the **[Anaconda Distribution](https://www.anaconda.com/download/)**, which includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.

First, download Anaconda. We recommend downloading Anaconda’s latest Python 3 version.

Second, install the version of Anaconda which you downloaded, following the instructions on the download page.

Congratulations, you have installed Jupyter Notebook! To run the notebook, run the following command at the Terminal (Mac/Linux) or Command Prompt (Windows):

> jupyter notebook
> 

 <a id="5"></a> <br>
## 1-2 Kaggle Kernel
Kaggle kernel is an environment just like you use jupyter notebook, it's an **extension** of the where in you are able to carry out all the functions of jupyter notebooks plus it has some added tools like forking et al.

 <a id="6"></a> <br>
## 1-3 Colab notebook
**Colaboratory** is a research tool for machine learning education and research. It’s a Jupyter notebook environment that requires no setup to use.
<a id="7"></a> <br>
### 1-3-1 What browsers are supported?
Colaboratory works with most major browsers, and is most thoroughly tested with desktop versions of Chrome and Firefox.
<a id="8"></a> <br>
### 1-3-2 Is it free to use?
Yes. Colaboratory is a research project that is free to use.
<a id="9"></a> <br>
### 1-3-3 What is the difference between Jupyter and Colaboratory?
Jupyter is the open source project on which Colaboratory is based. Colaboratory allows you to use and share Jupyter notebooks with others without having to download, install, or run anything on your own computer other than a browser.

<a id="10"></a> <br>
## 2- Loading Packages
In this kernel we are using the following packages:

 <img src="http://s8.picofile.com/file/8338227868/packages.png">
 Now we import all of them 

In [58]:
# packages to load 
# Check the versions of libraries
# Python version
import warnings
warnings.filterwarnings('ignore')
import sys
print('Python: {}'.format(sys.version))
# scipy
import scipy
print('scipy: {}'.format(scipy.__version__))
import numpy
# matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# numpy
import numpy as np # linear algebra
print('numpy: {}'.format(np.__version__))
# pandas
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
print('pandas: {}'.format(pd.__version__))
import seaborn as sns
print('seaborn: {}'.format(sns.__version__))
sns.set(color_codes=True)
import matplotlib.pyplot as plt
print('matplotlib: {}'.format(matplotlib.__version__))
%matplotlib inline
# scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory
import os
%matplotlib inline
from sklearn.metrics import accuracy_score
# Importing metrics for evaluation
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report


Python: 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
scipy: 0.19.0
matplotlib: 2.0.2
numpy: 1.12.1
pandas: 0.20.1
seaborn: 0.7.1
matplotlib: 2.0.2
sklearn: 0.18.1


 <a id="11"></a> <br>
## 3- Introduction
Before you get your hands dirty, it's time to consider what probability theory is and why it's important to learn about it when you're getting into data science. Additionally, you'll learn some key concepts that will be handy to consider throughout the tutorial and you'll learn how to calculate the probability of single events.

You'll often wonder in real-life situations what the probabilities are of some event occurring, such as winning the lottery, the victory of your soccer team or a discount on your favorite pair of shoes. "What are the chances..." is an expression you probably use very often. Determining the chances of an event occurring is called "probability".

 <a id="12"></a> <br>
## 4- probability
There are 52 cards In a standard deck of cards and of those 52 cards, 4 are Aces. If you follow the example of the coin flipping from above to know the probability of drawing an Ace, you'll divide the number of possible event outcomes (4), by the sample space (52):

P(A)=4/52
<img src="https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Probability+%26+Statistics+Python/image7.png"></img>
Note how A represents the event of "drawing an Ace".

Now, determine the probability of drawing an Ace with the help of Python:

In [59]:
# Sample Space
cards = 52

# Outcomes
aces = 4

# Divide possible outcomes by the sample set
ace_probability = aces / cards

# Print probability rounded to two decimal places
print(round(ace_probability, 2))


0.08


The probability of drawing an Ace from a standard deck is 0.08. To determine probability in percentage form, simply multiply by 100.

In [60]:
# Ace Probability Percent Code
ace_probability_percent = ace_probability * 100

# Print probability percent rounded to one decimal place
print(str(round(ace_probability_percent, 0)) + '%')

8.0%


The probability of drawing an Ace as a percent is 8%.

Now that you have seen two examples where you calculated probabilities, it's easy to assume that you might build out your probability calculations to determine, for example, the probability of drawing a card that is a Heart, a face card (such as Jacks, Queens, or Kings), or a combination of both, such as a Queen of Hearts.

In such cases, you might want to create a User-Defined Function (UDF) event_probability() to which you pass the event_outcomes and the sample_space to find the probability of an event in percentage form, since you'll be reusing a lot of the code:

In [61]:
# Create function that returns probability percent rounded to one decimal place
def event_probability(event_outcomes, sample_space):
    probability = (event_outcomes / sample_space) * 100
    return round(probability, 1)

# Sample Space
cards = 52

# Determine the probability of drawing a heart
hearts = 13
heart_probability = event_probability(hearts, cards)

# Determine the probability of drawing a face card
face_cards = 12
face_card_probability = event_probability(face_cards, cards)

# Determine the probability of drawing the queen of hearts
queen_of_hearts = 1
queen_of_hearts_probability = event_probability(queen_of_hearts, cards)

# Print each probability
print(str(heart_probability) + '%')
print(str(face_card_probability) + '%')
print(str(queen_of_hearts_probability) + '%')

25.0%
23.1%
1.9%


These results probably don't surprise you: as you expected, the chances of drawing a Queen of Hearts are much smaller than the chances of drawing a regular face card or a Heart.

 <a id="13"></a> <br>
## 5- Combinations and Permutations
You have seen in the previous section that determining the size of your sample space is key to calculating probabilities. However, this can sometimes prove to be a challenge!

Fortunately, there are ways to make the counting task easier. Two of these ways are permutations and combinations. In this section, you'll see what both of these concepts exactly mean and how you can use them to calculate the size of your sample space!

 <a id="14"></a> <br>
## 5-1 Permutations
Permutations are the number of ways a subset of a specified size can be arranged from a given set, generally without replacement. An example of this would be a 4 digit PIN with no repeated digits. The probability of having no repeated digits can be calculated by executing the following calculation:

10 x 9 x 8 x 7

In [62]:
# Permutations Code
import math
n = 4
k = 2

# Determine permutations and print result
Permutations = math.factorial(n) / math.factorial(k)
print(Permutations)

12.0


To determine the number of combinations, simply divide the number of permutations by the factorial of the size of the subset. Try finding the number of starting hand combinations that can be dealt in Texas Hold’em.

In [63]:
# Combinations Code
n = 52
k = 2

# Determine Permutations
Permutations = math.factorial(n) / math.factorial(n - k)

# Determine Combinations and print result
Combinations = Permutations / math.factorial(k)
print(Combinations)

1326.0


 <a id="15"></a> <br>
### 6- Independent versus Dependent Events

In [64]:
# Sample Space
cards = 52
cards_drawn = 1 
cards = cards - cards_drawn 

# Determine the probability of drawing an Ace after drawing a King on the first draw
aces = 4
ace_probability1 = event_probability(aces, cards)

# Determine the probability of drawing an Ace after drawing an Ace on the first draw
aces_drawn = 1
aces = aces - aces_drawn
ace_probability2 = event_probability(aces, cards)

# Print each probability
print(ace_probability1)
print(ace_probability2)

7.8
5.9


There are a few situations common to poker which are relevant to the concept of dependent events.

But before you get started, a little background info is in order. The game is Texas Hold’em. Played with a standard 52 card deck, Texas Hold’em is the most popular of all the poker variations. Each player is dealt two cards to start the hand and will make the best five-card hand possible by using their two cards combined with the five community cards that are dealt throughout the hand. Cards are dealt in four rounds:

* Pre-Flop: Each player is dealt two cards, known as "hole cards"
* Flop: Three community cards are dealt
* Turn: One community card is dealt
* River: Final community card is dealt

In [65]:
# Sample Space
cards = 52
hole_cards = 2
turn_community_cards = 4
cards = cards - (hole_cards + turn_community_cards)

# Outcomes
diamonds = 13
diamonds_drawn = 4
# In poker, cards that complete a draw are known as "outs"
outs = diamonds - diamonds_drawn

#Determine river flush probability
river_flush_probability = event_probability(outs, cards)
print(river_flush_probability)

19.6


There is roughly a 20% chance of hitting your Flush draw on the River. Here’s another one:



Your on the Turn and you have an open-ended Straight draw. A Straight is another strong hand where there are five cards in sequential order. The Straight draw is open-ended because any Eight ( 8, 9, 10, Jack, Queen) or any King (9, 10, Jack, Queen, King) will complete the straight.

What's the probability that the River card completes the Straight?

In [66]:
# Sample Space
cards = 52
hole_cards = 2
turn_community_cards = 4
cards = cards - (hole_cards + turn_community_cards)

# Outcomes
eights = 4
kings = 4
outs = eights + kings

# Determine river straight probability
river_straight_probability = event_probability(outs, cards)
print(river_straight_probability)

17.4


There is roughly a 17% chance of hitting your Straight draw on the River.

 <a id="16"></a> <br>
## 7- Expected Value
When playing a game such as poker, you're fairly concerned with questions such as "how much do I gain - or lose - on average, if
I repeatedly play this game?". You can imagine that this is no different for poker, especially when you're a professional poker player!

Now, if the possible outcomes of the game and their associated probabilities can be described by a random variable, then you can answer the above question by computing its expected value, which is equal to a weighted average of the outcomes where each outcome is weighted by its probability.

Or, in other words, you simply multiply the Total Value times the probability of winning to get your Expected Value:

**ExpectedValue=TotalValue×Probability**

What is the expected value if there is $100 (Total Value) in the pot, and your probability of winning the pot is 0.75?

**ExpectedValue=$100×0.75**

In [67]:
# Initialize `pot` and `probability` variables
pot = 100
probability = 0.75

# Determine expected value
expected_value = pot * probability
print(expected_value)

75.0


Your opponent has decided to be helpful and show you his cards, and has a set of 2s.

To win the hand on the River, you must hit any Diamond except a Jack or 2. The Jack or 2 of Diamonds would give your opponent a better hand, a full house and four of a kind respectively. 

You have to call $20 to stay in the hand, and if you win the hand you win $60. If your expected value is greater than $20 you should call the bet, and if not you should fold.
Figure out if you should call the bet:

In [68]:
# Sample Space
cards = 52
hole_cards = 2
# Your opponent provided you information... use it!
opponents_hole_cards = 2 

turn_community_cards = 4
cards = cards - (hole_cards + opponents_hole_cards + turn_community_cards)

# Outcomes
diamonds = 13
diamonds_drawn = 4

# You can't count the two diamonds that won't help you win
diamond_non_outs = 2 

outs = diamonds - diamonds_drawn - diamond_non_outs

# Determine win probability
win_probability = outs / cards

# Determine expected value
pot = 60
ev = pot * win_probability

# Print ev and appropriate decision
call_amount = 20
if ev >= 20:
    print(round(ev, 2), 'Call')
else:
    print(round(ev, 2), 'Fold')

9.55 Fold


In [69]:
import random
random_number = random.random()
print(random_number)

0.6862428719937888


<a id="17"></a> <br>
## 8- Random
We will show an alternative and secure approach in the following example, in which we will use the class SystemRandom of the random module. It will use a different random number generator. It uses sources which are provided by the operating system. This will be /dev/urandom on Unix and CryptGenRandom on windows. The random method of the SystemRandom class generates a float number in the range from 0.0 (included) to 1.0 (not included):



In [70]:
from random import SystemRandom
crypto = SystemRandom()
print(crypto.random())

0.42772577512146404


<a id="18"></a> <br>
## 8-1 Generate a list of Random Numbers


In [71]:
import random
def random_list(n, secure=True):
    random_floats = []
    if secure:
        crypto = random.SystemRandom()
        random_float = crypto.random
    else:
        random_float = random.random
    for _ in range(n):
        random_floats.append(random_float())
    return random_floats
print(random_list(10, secure=False))

[0.8425831513922636, 0.14033345794575625, 0.8658748039031625, 0.31181989672926835, 0.9352852459925426, 0.34941993528249116, 0.5739736528064628, 0.14237336364933806, 0.664295256193561, 0.26419764765154097]


The "simple" random function of the random module is a lot faster as we can see in the following:



In [72]:
%%timeit
random_list(100)

10000 loops, best of 3: 76.5 µs per loop


In [73]:
%%timeit
random_list(100, secure=False)

100000 loops, best of 3: 11.1 µs per loop


In [74]:
crypto = random.SystemRandom()
[crypto.random() for _ in range(10)]
#The previous Python code returned the following:


[0.5959011690512823,
 0.022162397575714587,
 0.12747530799182438,
 0.04520581551405267,
 0.6886491550407245,
 0.8891902779607284,
 0.5928668636657934,
 0.7396973571943954,
 0.26053951523579455,
 0.5695619586457665]

Alternatively, you can use a list comprehension to create a list of random float numbers:



In [75]:
%%timeit
[crypto.random() for _ in range(100)]

10000 loops, best of 3: 68.4 µs per loop


The fastest and most efficient way will be using the random package of the numpy module:



In [76]:
import numpy as np
np.random.random(10)

array([ 0.74835928,  0.66818196,  0.27471614,  0.926003  ,  0.98778592,
        0.47055631,  0.4301672 ,  0.00727917,  0.99220369,  0.33521366])

The code above returned the following:


array([ 0.0422172 ,  0.98285327,  0.40386413,  0.34629582,  0.25666744,
        0.69242112,  0.9231164 ,  0.47445382,  0.63654389,  0.06781786])


In [77]:
%%timeit
np.random.random(100)

The slowest run took 10.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop


Random Numbers Satisfying sum-to-one Condition

In [78]:
import numpy as np
list_of_random_floats = np.random.random(100)
sum_of_values = list_of_random_floats.sum()
print(sum_of_values)
normalized_values = list_of_random_floats / sum_of_values
print(normalized_values.sum())

47.9888935779
1.0


<a id="19"></a> <br>
## 9- Generating Random Strings or Passwords with Python
We assume that you don't use and don't like weak passwords like "123456", "password", "qwerty" and the likes. Believe it or not, these passwords are always ranking to 10. So you looking for a safe password? You want to create passwords with Python? But don't use some of the functions ranking top 10 in the search results, because you may use a functions using the random function of the random module.

We will define a strong random password generator, which uses the SystemRandom class. This class uses, as we have alreay mentioned, a cryptographically strong pseudo random number generator:


In [79]:
from random import SystemRandom
sr = SystemRandom() # create an instance of the SystemRandom class
    
def generate_password(length, 
                      valid_chars=None):
    """ generate_password(length, check_char) -> password
        length: the length of the created password
        check_char: a Boolean function used to check the validity of a char
    """
    if valid_chars==None:
        valid_chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        valid_chars += valid_chars.lower() + "0123456789"
    
    password = ""
    counter = 0
    while counter < length:
        rnum = sr.randint(0, 128)
        char = chr(rnum)
        if char in valid_chars:
            password += chr(rnum)
            counter += 1
    return password
print("Automatically generated password by Python: " + generate_password(15))

Automatically generated password by Python: pQ1Y35ANAY9PkWu


<a id="20"></a> <br>
## 10- Random Integer Numbers
Everybody is familar with creating random integer numbers without computers. If you roll a die, you create a random number between 1 and 6. In terms of probability theory, we would call "the rolling of the die" an experiment with a result from the set of possible outcomes {1, 2, 3, 4, 5, 6}. It is also called the sample space of the experiment.

How can we simulate the rolling of a die in Python? We don't need Numpy for this aim. "Pure" Python and its random module is enough.

In [80]:
import random
outcome = random.randint(1,6)
print(outcome)

5


Let's roll our virtual die 10 times:



In [81]:
import random
[ random.randint(1, 6) for _ in range(10) ]

[5, 6, 2, 1, 6, 6, 5, 4, 1, 3]

After having executed the Python code above we received the following result:


[2, 1, 5, 5, 6, 5, 4, 4, 1, 1]


In [82]:
#We can accomplish this easier with the NumPy package random:

import numpy as np
outcome = np.random.randint(1, 7, size=10)
print(outcome)

[6 3 4 2 3 4 3 6 6 6]


You may have noticed, that we used 7 instead of 6 as the second parameter. randint from numpy.random uses a "half-open" interval unlike randint from the Python random module, which uses a closed interval!

The formal definition:

numpy.random.randint(low, high=None, size=None)

This function returns random integers from 'low' (inclusive) to 'high' (exclusive). In other words: randint returns random integers from the "discrete uniform" distribution in the "half-open" interval ['low', 'high'). If 'high' is None or not given in the call, the results will range from [0, 'low'). The parameter 'size' defines the shape of the output. If 'size' is None, a single int will be the output. Otherwise the result will be an array. The parameter 'size' defines the shape of this array. So size should be a tuple. If size is defined as an integer n, this is considered to be the tuple (n,).

The following examples will clarify the behavior of the parameters:

In [83]:
import numpy as np
print(np.random.randint(1, 7))
print(np.random.randint(1, 7, size=1))
print(np.random.randint(1, 7, size=10))
print(np.random.randint(1, 7, size=(10,))) # the same as the previous one
print(np.random.randint(1, 7, size=(5, 4)))

1
[1]
[6 1 2 2 6 3 5 2 6 1]
[4 6 3 3 2 4 2 3 5 6]
[[5 4 4 3]
 [5 2 1 1]
 [2 6 6 6]
 [4 3 4 3]
 [6 3 6 4]]


-----------------

<a id="21"></a> <br>
## 11- Conclusion

Congrats, you have made it to the end of this tutorial on probability theory with Python! This concludes Part 1 of the tutorial. You learned about several core probability concepts including Independent/Dependent events, Permutations/Combinations, Multiple events, Expected Values, and how to calculate each of them.

In Part 2, you will apply these concepts to actual poker hands that I played during my career.

*One card is dealt face down, known as the Burn card, before the Flop, Turn, and River. Since the card is dealt face down, and no player knows what it is, it does not count as a trial.

you can follow me on:
> ###### [ GitHub](https://github.com/mjbahmani)
> ###### [LinkedIn](https://www.linkedin.com/in/bahmani/)
> ###### [Kaggle](https://www.kaggle.com/mjbahmani/)

--------------------------------------

 **I hope you find this kernel helpful and some upvotes would be very much appreciated**
 
 ----------

<a id="22"></a> <br>
# 9- References
* [1] [DataCamp](https://www.datacamp.com/community/tutorials/statistics-python-tutorial-probability-1)