# Lecture 2: Introduction to Machine Learning tasks and intuition


## Quick recap:

<b> The goal of AI is to build a being that reasons like a human. </b>

Altough in many tasks these algorithms already surpass human abilities, these are bad at generlising tasks, being trained to do single thing only. But, we are getting better.

### Machine learning defintion

Machine learning is a model learning from experience E to perform task T evaluating its performance through a measure P.


### Model development process

The whole process of developing machine learning solutions is complex and involves multiple steps.

<img src='img/flow.jpg' width=550/>

## Types of machine learning algorithms

<img src='img/typesofml.jpg' />

### Supervised learning

Definition: Learning from labeled data, where the input-output pairs are provided.
Goal: Predict output values for new data.
Types:

* Regression: Predict continuous outputs.
        Example: Predicting house prices.
* Classification: Predict categorical outputs.
        Example: Email spam detection.
        
### Unsupervised learning

Definition: Learning from data that is not labeled, finding patterns or structure in the data.
Goal: Group data points, reduce data dimensions, or discover associations.
Types:

* Clustering: Group similar data points together.
        Example: Customer segmentation.
        
* Association: Find rules that describe large portions of the data.
        Example: Market basket analysis.
       
* Dimensionality Reduction: Limit the number of variables describing the data to minimum.
    
        Example: Decide which variables are the most important to predict who will buy certain products in a shop.

## Probability: at the core of Machine Learning

### What is probability?

Probability is a way of measuring the likelihood of an event happening.
It ranges between 0 (impossible event) and 1 (certain event).

Probability of something happening half the times is 0.5 (or 50%, but we prefer the decimal expression).

### Random variable

A random variable is a variable that can take different values, each with a certain probability.

For example: a dice roll. It can be either 1, 2, 3, 4, 5 or 6. The random variable X is the outcome of the roll, therefore

P(X=x) = 1/6

In [1]:
import numpy as np

In [2]:
data = list(np.random.randint(1,10,100))

In [3]:
counter = 0
for x in data:
    if x == 9:
        counter += 1
print(counter/number_of_datapoints)

NameError: name 'number_of_datapoints' is not defined

In [26]:
number_of_datapoints = len(data)

In [14]:
a = (1,2,3,4,5) #declare variable

In [15]:
a[0] = 10

TypeError: 'tuple' object does not support item assignment

In [21]:
list(a)

[1, 2, 3, 4, 5]

In [40]:
a

[10, 2, 3, 4, 5]

In [33]:
type(a)

tuple

In [15]:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact

# Function to simulate dice rolls and plot the distribution
def plot_dice_rolls(num_rolls=10):
    # Simulate dice rolls (values between 1 and 6)
    rolls = np.random.randint(1, 7, size=num_rolls)
    
    # Calculate the frequency of each outcome (1 to 6)
    counts, bins = np.histogram(rolls, bins=np.arange(1, 8), density=True)
    
    # Plot the histogram
    plt.figure(figsize=(10, 6))
    plt.bar(bins[:-1], counts, width=0.6, color='blue', alpha=0.7, edgecolor='black')
    plt.xticks(np.arange(1, 7))
    
    # Plotting the theoretical uniform distribution line for comparison
    plt.axhline(1/6, color='red', linestyle='--', label='Theoretical Probability (1/6)')
    
    # Adding labels and title
    plt.title(f'Distribution of Dice Rolls (n={num_rolls})')
    plt.xlabel('Dice Face')
    plt.ylabel('Probability')
    plt.legend()
    plt.grid(True, axis='y', alpha=0.3)
    plt.show()

# Creating an interactive slider for the number of dice rolls
interact(plot_dice_rolls, 
         num_rolls=widgets.IntSlider(min=1, max=10_000, step=10, value=1, description='Number of Rolls'));


interactive(children=(IntSlider(value=1, description='Number of Rolls', max=10000, min=1, step=10), Output()),…

### Other examples of probability

We find probability to be useful in many examples, such as games. 

Probability of geting a pair in the poker: 42.26%

    Getting a three of a kind: 2.11%
    Full House: 0.144%
    Flush: 0.198%
    Full poker: 0.024%

And the probability to winn the lottery (where you have to correctly guess 6 out of 49 numbers) is...

In [45]:
import scipy.special

x = 1/(scipy.special.binom(49, 6))
print(f'P(x)={x:.10f}%'.format(x*100))

P(x)=0.0000000715%


Probability is **why casino always wins**. Let's say we have a roulette

<img src='img/roulette.jpg' />

We have 37 fields, 18 red, 18 black, 1 green.

Betting all on red:

In [47]:
(18/37)*100

48.64864864864865

Hence, the probability that casino wins:

In [49]:
(19/37)*100

51.35135135135135

Casino use probability to earn money. **Odds are always in the favor of casino.**

### Probability distributions

A probability distribution tells us the probability of each possible outcome for a random variable.

Example: Rolling a fair dice has a uniform distribution, where each outcome (1 to 6) is equally likely.

In [22]:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact

# Function to plot different distributions as bar plots based on user selection
def plot_distribution(distribution, param1, param2):
    plt.figure(figsize=(10, 6))
    
    # Choose the distribution to plot
    if distribution == 'Normal':
        # Generate random samples from a normal distribution
        samples = np.random.normal(param1, param2, 1000)
        # Plot histogram as bar plot
        plt.hist(samples, bins=20, density=True, alpha=0.7, color='blue', edgecolor='black')
        plt.title(f'Normal Distribution\nMean = {param1}, Std Dev = {param2}')
        plt.xlabel('Value')
        plt.ylabel('Probability Density')
        
    elif distribution == 'Uniform':
        # Generate random samples from a uniform distribution
        samples = np.random.uniform(param1, param2, 1000)
        # Plot histogram as bar plot
        plt.hist(samples, bins=20, density=True, alpha=0.7, color='blue', edgecolor='black')
        plt.title(f'Uniform Distribution\nRange = [{param1}, {param2}]')
        plt.xlabel('Value')
        plt.ylabel('Probability Density')
        
    elif distribution == 'Binomial':
        # Generate random samples from a binomial distribution
        samples = np.random.binomial(param1, param2, 1000)
        # Plot histogram as bar plot
        plt.hist(samples, bins=np.arange(0, param1+2) - 0.5, density=True, alpha=0.7, color='blue', edgecolor='black')
        plt.title(f'Binomial Distribution\nTrials = {param1}, Probability = {param2}')
        plt.xlabel('Number of Successes')
        plt.ylabel('Probability')
        
    elif distribution == 'Poisson':
        # Generate random samples from a Poisson distribution
        samples = np.random.poisson(param1, 1000)
        # Plot histogram as bar plot
        plt.hist(samples, bins=np.arange(0, np.max(samples)+1) - 0.5, density=True, alpha=0.7, color='blue', edgecolor='black')
        plt.title(f'Poisson Distribution\nLambda = {param1}')
        plt.xlabel('Number of Events')
        plt.ylabel('Probability')
        
    # Show the plot
    plt.xlim(0,100)
    plt.grid(True, axis='y', alpha=0.3)
    plt.show()

# Interactive widgets to switch between distributions and adjust parameters
distribution_dropdown = widgets.Dropdown(
    options=['Normal', 'Uniform', 'Binomial', 'Poisson'],
    value='Normal',
    description='Distribution'
)

# Parameters for the different distributions
param1_slider = widgets.FloatSlider(min=0, max=10, step=0.1, value=5, description='Param1')
param2_slider = widgets.FloatSlider(min=0.1, max=5, step=0.1, value=1, description='Param2')

# Update the sliders based on the chosen distribution
def update_parameters(distribution):
    if distribution == 'Normal':
        param1_slider.description = 'Mean'
        param2_slider.description = 'Std Dev'
        param1_slider.min = -10
        param1_slider.max = 10
        param2_slider.min = 0.1
        param2_slider.max = 5
    elif distribution == 'Uniform':
        param1_slider.description = 'Min'
        param2_slider.description = 'Max'
        param1_slider.min = -10
        param1_slider.max = 10
        param2_slider.min = -10
        param2_slider.max = 10
    elif distribution == 'Binomial':
        param1_slider.description = 'Trials'
        param2_slider.description = 'Probability'
        param1_slider.min = 1
        param1_slider.max = 100
        param2_slider.min = 0.01
        param2_slider.max = 1
    elif distribution == 'Poisson':
        param1_slider.description = 'Lambda'
        param2_slider.description = 'Unused'
        param2_slider.value = 0  # Not used in Poisson, but needed for the function
        param2_slider.layout.display = 'none'  # Hide the second parameter slider

# Function to handle changes in distribution dropdown
def update_distribution(change):
    update_parameters(change['new'])

# Link the update function to the dropdown change event
distribution_dropdown.observe(update_distribution, names='value')

# Display interactive plot with widgets
interact(plot_distribution, 
         distribution=distribution_dropdown,
         param1=param1_slider,
         param2=param2_slider);


interactive(children=(Dropdown(description='Distribution', options=('Normal', 'Uniform', 'Binomial', 'Poisson'…

### Normal distribution

Normal distribution  (also called the Gaussian Distribution) is the most common distribution type. It describe a probability of random incident happening.

The **mean** (average) defines the center of the distribution, while the **standard deviation** defines the spread or width of the curve.

Many natural phenomena are Normally Distributed:

* Many real-world data sets follow a normal distribution, such as heights, test scores, or measurement errors.
* Machine learning often works with natural data, making the normal distribution an important assumption for many models.

Central Limit Theorem (CLT):

The Central Limit Theorem states that the sum or average of a large number of independent random variables, regardless of their original distribution, will be approximately normally distributed.


Outlier Detection:

When we try to identify if our data is **correct** and we <i>know</i> that data is normally distributed. We can use the distribution to asses if our data has incorrect measurments.

In [24]:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import ipywidgets as widgets
from ipywidgets import interact

# Function to calculate and plot the probability density function (PDF)
def plot_normal_distribution(mean=0, std_dev=1):
    # Generate a range of x values
    x = np.linspace(mean - 4*std_dev, mean + 4*std_dev, 1000)
    
    # Calculate the PDF using the mean and standard deviation
    y = norm.pdf(x, mean, std_dev)
    
    # Plot the distribution
    plt.figure(figsize=(10, 6))
    plt.plot(x, y, label=f'Normal Distribution\nMean = {mean}, Std Dev = {std_dev}')
    plt.fill_between(x, y, alpha=0.2)
    
    # Highlighting the area under the curve between -1 std and +1 std
    x_fill = np.linspace(mean - std_dev, mean + std_dev, 1000)
    y_fill = norm.pdf(x_fill, mean, std_dev)
    plt.fill_between(x_fill, y_fill, color='red', alpha=0.3, label='1 Std Dev Range')
    
    # Adding labels and title
    plt.title('Probability Density Function (Normal Distribution)')
    plt.xlabel('X')
    plt.ylabel('Probability Density')
    plt.legend()
    plt.xlim(-10,10)
    plt.grid(True)
    plt.show()

# Creating interactive sliders for mean and standard deviation
interact(plot_normal_distribution, 
         mean=widgets.FloatSlider(min=-10, max=10, step=0.1, value=0, description='Mean'),
         std_dev=widgets.FloatSlider(min=0.1, max=5, step=0.1, value=1, description='Std Dev'));


interactive(children=(FloatSlider(value=0.0, description='Mean', max=10.0, min=-10.0), FloatSlider(value=1.0, …

### Joint, Marginal, and Conditional Probability

Joint probability is the probability that two events happen at the same time.
    
    P(A∩B) is the probability that both A and B occur.

For example, to get an answer on the test right picking it at random (let's say there are four A, B, C, D) is P(x) = 1/4 = 25%

To get two of them right at the same time is:

In [51]:
(1/4 * 1/4)*100

6.25

To get 50 questions right at the same time on random is:

In [52]:
(1/4**50)*100

7.888609052210118e-29

Marginal probability is the probability of the single event happening.

Conditional Probability is the robability of an event occurring given that another event has occurred.

    P(A∣B) is the probability of event A happening, given that B has occurred.
    
Let's say we have the sequence of symbols:

[1 2 1 2 1 3]

What is the probability that after 1, 2 will appear?

### Monty Hall problem

Given three gates a contenstant have to pick a gate. One of them have a car behind, other two hide goats behind them.

What is the probability of selecting the right gate?

1/3 - 1 of 3 three gates has the price.

You pick a gate, say 1, nd the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat.

You then are given a choice, do you want to stay or do you want to switch? What should you do?


When the player first chooses a gate, there is 2/3 chance that the car is behind door **NOT** chosen. This probability stays the same, when other gate is opened.

When wrong doors were revealed, you *learned* that this 2/3 chance lies on the other gate, gate No. 2., which you haven't pick at the start.

**It feels counterintuitive**, because we feel that it does not matter which one we pick, there is always 1/3 chance we get the door right.

But the host revaled the information that one of gates not choosen there is a goat. It changed the marginal probability into the conditional probability:

Initially there is:

    P(Car behind Door 1) = 1/3
    P(Car behind Door 2) = 1/3
    P(Car behind Door 3) = 1/3

But when the host opened the doors it switched the probabilities into:

     P(Car behind Door 1) = 1/3
     P(Car behind Door 2|goat behind Door 3) = 2/3
     
**New information updates probabilities**


If you still don't belive me, let's simulate it:

In [9]:
import random

def monty_hall_simulation(trials, switch=True):
    wins = 0
    
    for _ in range(trials):
        # Randomly assign the car (1) and the goats (0) behind the doors
        doors = [0, 0, 1]
        random.shuffle(doors)
        
        # Contestant makes a random choice
        initial_choice = random.randint(0, 2)
        
        # Host opens a door that has a goat (not the contestant's choice and not the car)
        available_doors = [i for i in range(3) if i != initial_choice and doors[i] == 0]
        host_opens = random.choice(available_doors)
        
        # If the contestant switches, choose the remaining door
        if switch:
            final_choice = [i for i in range(3) if i != initial_choice and i != host_opens][0]
        else:
            final_choice = initial_choice
        
        # Count wins
        if doors[final_choice] == 1:
            wins += 1
    
    return wins / trials

In [24]:
# Parameters
trials = 100_000

# Simulate both strategies
win_rate_with_switch = monty_hall_simulation(trials, switch=True)
win_rate_without_switch = monty_hall_simulation(trials, switch=False)

print(f"Win rate when switching: {win_rate_with_switch:.2%}")
print(f"Win rate when not switching: {win_rate_without_switch:.2%}")


Win rate when switching: 66.41%
Win rate when not switching: 33.17%


## Probability theory in ML

Probability theory plays a fundamental role in machine learning by providing a framework for modeling uncertainty and making predictions based on data. We can tell when something happens, if this happens with a higher probability.

### Modeling Uncertainty

Machine learning often deals with uncertain outcomes, such as predicting whether an email is spam or not. Probability allows us to quantify this uncertainty. Models do not give a definite answer, instead they provide a probability score - e.g. 70% chance that the mail is spam

### Probabilistic models

Many machine learning models are probabilistic by nature, meaning they use probability distributions to make decisions.


### Prior knowledge

Models incorporate prior knowledge (experience E) to estimate the probability of something happening. This is how they "learn" to predict certain outcomes. 

### Other uses of probability theory in machine learning

The concept of likelihood—a key probabilistic idea—is used to estimate the best parameters for a model and to optimise it.

## Bayes' Theorem

We know that Steve is a very shy and tidy person. What is higher:

* Probability that he is a librarian?
* Probability that he is a farmer?

We know that Linda is actively engaged in the feminist movement. What is more probable:

* Linda is a bank teller?
* Linda is a bank teller and is active in the feminist movements?

Let's see about Steve:

We are given information that in the population there are 10 librarians and 200 farmers. What would you say now?

And we got another piece of information:

There is a 40% probability that librarian is shy and tidy, and 10% probability that farmer fits this description. What about now?

### Bayes' Theorem in practice

Bayes' Theorem helps us find the probability of an event given prior knowledge of related conditions. It is a simple and yet powerful equation, used widely in machine learning. Below equation is used as a basis for multiple machine learning models:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Where:

* P(A∣B) is the posterior probability: the probability of event A occurring given that B is true.
* P(B∣A) is the likelihood: the probability of event B occurring given that A is true.
* P(A) is the prior probability: the initial probability of event A.
* P(B) is the marginal probability: the probability of event B occurring.

Bayes' theorem provides a way to reason about probabilities in machine learning, especially in situations where we have to deal with uncertainty.

This is a very quick algorithm with a lot of predictive power, used for example to:

* Filter SPAM emails
* Updating model beliefs given new information
* In general in classification applications

Going back to the example of Steve we know that

    P(A|B) - is librarian given that he is shy and tidy
    P(B|A) - is shy and tidy because he is a librarian
    P(B) - is shy and tidy
    P(A) - is librarian

    P(B|A) - 40% of shy and tidy are librarians
    P(B) - 10% of farmers is shy and tidy while 40% of librarians is shy and tidy
    P(A) - 10 out of 210 people are librarians

In [None]:
#P(B|A) - is librarian because he is shy

PBA = 0.4

In [61]:
#P(A) - is librarian
PA = 10/(200+10)
print(PA)

0.047619047619047616


In [62]:
#P(~A) - is farmer
PnA = 200/(200+10)
print(PnA)

0.9523809523809523


In [64]:
#P(B) - is shy (All shy people in the population)
PB = 0.4*PA + 0.1*PnA
print(PB)

0.11428571428571428


In [66]:
#P(A|B) - is librarian given he is shy and tidy

PAB = (0.4 * PA)/PB
print(PAB)

0.16666666666666669


**There is only 16.7% probability that Steve is librarian!**

In [73]:
# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact

# Function to calculate and visualize Bayes' Theorem for the given scenario
def visualize_bayes_theorem(num_librarians=10, num_farmers=200, prob_shy_librarian=0.4, prob_shy_farmer=0.1):
    # Calculate the total number of people
    total_people = num_librarians + num_farmers
    
    # Calculate P(Librarian) and P(Farmer)
    P_librarian = num_librarians / total_people
    P_farmer = num_farmers / total_people
    
    # Calculate P(Shy|Librarian) and P(Shy|Farmer)
    P_shy_given_librarian = prob_shy_librarian
    P_shy_given_farmer = prob_shy_farmer
    
    # Calculate P(Shy) using the law of total probability
    P_shy = (P_shy_given_librarian * P_librarian) + (P_shy_given_farmer * P_farmer)
    
    # Calculate P(Librarian|Shy) using Bayes' Theorem
    P_librarian_given_shy = (P_shy_given_librarian * P_librarian) / P_shy
    
    # Plotting the results
    plt.figure(figsize=(20, 6))
    
    # Bar plot showing probabilities
    categories = ['Librarian', 'Farmer']
    base_rates = [P_librarian, P_farmer]
    shy_given_category = [P_shy_given_librarian, P_shy_given_farmer]
    posterior = [P_librarian_given_shy, 1 - P_librarian_given_shy]
    
    # Plotting base rates
    plt.subplot(1, 3, 1)
    plt.bar(categories, base_rates, color='blue', alpha=0.6, edgecolor='black')
    plt.title('Base Rates (P(Librarian), P(Farmer))')
    plt.ylim(0, 1)
    plt.ylabel('Probability')
    
    # Plotting conditional probabilities (P(Shy|Librarian) and P(Shy|Farmer))
    plt.subplot(1, 3, 2)
    plt.bar(categories, shy_given_category, color='green', alpha=0.6, edgecolor='black')
    plt.title('P(Shy|Librarian) and P(Shy|Farmer)')
    plt.ylim(0, 1)
    
    # Plotting posterior probabilities (P(Librarian|Shy) and P(Farmer|Shy))
    plt.subplot(1, 3, 3)
    plt.bar(categories, posterior, color='red', alpha=0.6, edgecolor='black')
    plt.title('Posterior Probabilities (P(Librarian|Shy), P(Farmer|Shy))')
    plt.ylim(0, 1)
    
#     plt.tight_layout(2)
    plt.show()

# Interactive sliders and inputs
interact(visualize_bayes_theorem,
         num_librarians=widgets.IntSlider(min=1, max=100, step=1, value=10, description='Num Librarians'),
         num_farmers=widgets.IntSlider(min=1, max=500, step=10, value=200, description='Num Farmers'),
         prob_shy_librarian=widgets.FloatSlider(min=0.0, max=1.0, step=0.01, value=0.4, description='P(Shy|Librarian)'),
         prob_shy_farmer=widgets.FloatSlider(min=0.0, max=1.0, step=0.01, value=0.1, description='P(Shy|Farmer)'));


interactive(children=(IntSlider(value=10, description='Num Librarians', min=1), IntSlider(value=200, descripti…