# Lecture 5: Pracitcal approach to using machine learning with Python


## Let's focus on the practical use of machine learning tools with Python (and without)

<b>First, we will try to see how to understand probability with simple calculations</b>

### Probability of event based on the available data

We are given a data, where some numbers were drawn from a basket. These are the following drawing results:

In [2]:
import numpy as np
drawing = np.random.randint(1,10,100)
drawing

array([9, 3, 7, 4, 3, 8, 2, 5, 7, 8, 3, 4, 9, 2, 1, 6, 8, 6, 8, 5, 4, 7,
       3, 1, 3, 5, 7, 9, 8, 6, 3, 6, 7, 8, 9, 6, 3, 1, 4, 3, 8, 8, 4, 7,
       2, 3, 4, 9, 8, 3, 8, 4, 2, 6, 1, 4, 1, 5, 7, 6, 9, 4, 6, 1, 1, 4,
       2, 9, 7, 7, 7, 3, 6, 5, 2, 2, 7, 2, 9, 4, 8, 5, 3, 3, 7, 3, 4, 5,
       9, 1, 1, 1, 4, 5, 6, 7, 2, 7, 9, 3])

Let's calculate probability of drawing 9:

We can do it simpler with NumPy:

Now let's see on a more complex example using pandas.
We have lottery drawings from few weeks:

In [7]:
import pandas as pd
import random

In [26]:
num_drawings = 100
lottery_drawings = []

for _ in range(num_drawings):
    drawing = sorted(random.sample(range(1, 50), 6))  # Draw 6 unique numbers
    lottery_drawings.append(drawing)
df = pd.DataFrame(lottery_drawings, columns=[f"Number_{i+1}" for i in range(6)])
df

Unnamed: 0,Number_1,Number_2,Number_3,Number_4,Number_5,Number_6
0,2,3,9,10,16,25
1,6,26,29,30,34,49
2,4,15,19,21,24,38
3,22,23,25,27,42,45
4,6,7,14,38,43,48
...,...,...,...,...,...,...
95,9,20,31,32,37,47
96,1,8,11,20,41,47
97,1,2,10,29,30,41
98,1,8,22,23,24,26


Let's calculate probability of drawing each number and visualize it:

In [27]:
# calculte

In [28]:
# import matplotlib

In [30]:
# plot here

### Let's use Python to make a simulation of some event

This way we can simulate even very complex situations with Python and calculate probability of occurences.

Let's start with coin toss:

Now, let's calculate probability of winning in roulette while always betting on black.

There are 37 slots, 18 of which are black.

Let's simulate our winnings over time:

## I hope that using Python to calculate probabilities is clear now. 

### Let's move to how to use Linear Regression and scikit-learn library in Python

First we start with importing the library

Let's define some exemplary data

In [33]:
data = {'X': [1, 2, 3, 4, 5], 'Y': [1.5, 1.7, 3.1, 4.5, 5.2]}
df = pd.DataFrame(data)

Now we can fit the model:

Let's plot the model

Let's make predictions using the model!

Also, we can asses our model on its performance using one of the metrics:

### Now let's do a bit more complex model for multiple linear regression

In [34]:
# Sample dataset
data = {
    'Bedrooms': [2, 3, 3, 4, 5, 3, 4, 4, 5, 2],
    'SquareFootage': [850, 900, 1200, 1500, 2000, 1100, 1600, 1800, 2100, 800],
    'Age': [10, 5, 20, 15, 8, 30, 5, 12, 3, 25],
    'Price': [150000, 200000, 230000, 250000, 300000, 220000, 275000, 280000, 320000, 140000]
}

# Convert to DataFrame
df = pd.DataFrame(data)

But actually we should split our data into the <b> training and testing </b> sets.

Why is that?

Using that data we can now evaluate our model better

### Let's do even more complex example with a larger dataset of California House Market Prices

In [35]:
import pandas as pd
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the California Housing dataset
california = fetch_california_housing()
# Convert to DataFrame for easier handling
df = pd.DataFrame(california.data, columns=california.feature_names)
df['MedHouseVal'] = california.target  # Target variable (median house value)


Let's explore the dataset first

First we have to define what is the target and what are the features

Then - split the data

Then, we can analyse data to try understanding it - and using the statistical measures to describe our data.

Finally, we develop, train and evaluate our model:

In tasks like this, we also should try to <b> interpret </b> our results. It is very imporant!

## Can I use machine learning without knowing Python?

<b> Yes! </b> Altough less flexible, there ae many *no-code* platform where you can use ML algorithms without the knowledge of any programming language.

These include:

AutoML Platforms

    Google AutoML, DataRobot, and Microsoft Azure ML offer automated machine learning. You simply upload data, choose the task (like prediction or classification), and let the platform handle the complex model-building process.
    
No-Code/Low-Code ML Platforms

    Teachable Machine by Google and Lobe enable users to train image or audio recognition models without writing code. Just upload examples, and the tool manages the rest.
    
ML in Business Intelligence Tools

    Microsoft Power BI and Tableau now integrate ML for tasks like forecasting and anomaly detection. Users can apply ML insights directly to data within the BI platform through intuitive menu options and settings.
    
Graphical ML Software

    Tools like KNIME, RapidMiner, and Orange use drag-and-drop interfaces where each component represents an ML task (data input, model training, etc.). Connect components to design workflows, explore data, and build models visually.
    
    
<b> However, important is to understand how machine learning works - and this course aims at providing you with this knowledge </b> Without this, using no-code platforms is just blind experimentation.