## Introduction

### 1. Introduction to ML

A user wants to find the right price to sell a car.

![image.png](attachment:4b979386-bda9-440b-b8cf-87d5e972119c.png) 

How can we help the user select the best price? We know a lot of things about a car, such as year, manufacturer, the car mileage.

![image.png](attachment:7861ea59-be8f-40c0-83e3-b1fc155f59bd.png)

Given this information, experts in the past have determined the prices. 

![image.png](attachment:b5257db7-2d63-460b-8f00-2772cc42d35f.png)

And **if experts can determine this price, so can models**.

![image.png](attachment:cb0cf243-87cb-4283-998c-d14406073952.png)

The model takes features (what we know about cars), and targets (what we want to predict) to derive a model.

![image.png](attachment:674dd5db-c870-46a9-9f47-a22754cdfbc3.png)

Then this model can be used to obtain the prices from cars that we don't know.

![image.png](attachment:df87d49e-a710-4b36-bf76-6c65c78df658.png)

In summary, the input of the ML process is our features and target variable, and our output is a model that then has the ability to predict a target given input features.

![image.png](attachment:bc45e439-8118-4661-8e55-8ff1d5876b44.png)
![image.png](attachment:f7c09045-c40e-4ea5-ba07-1c68b40a95db.png)

### 2. Machine Learning vs. Classical Programming

Suppose we want a product that classifies if messages are either SPAM or NOT SPAM.

![image.png](attachment:f821ce80-bcfb-44e0-99d1-2208a434d674.png)

So we analyze mails and come up with a couple of rules that determine if a message is SPAM, such as:

- If sender = promotions@online.com.
- If title contains "tax review" and sender contains "online".

So, we write a simple program that uses these rules to classify our e-mails.

![image.png](attachment:b07e1f1b-43e5-42e8-91d6-60c238a55e67.png)

Now, we get a new type of email:

![image.png](attachment:af810837-06b8-4974-a3f1-258a181786e4.png)

So, we add another rule to our code.

![image.png](attachment:060d25c3-87eb-46d6-a543-1621807ec5ca.png)

But we, as we don't want to misclassify food emails, such as:

![image.png](attachment:ed7c9db9-a74c-4381-8adc-affb4bf8ec71.png)

We add another rule... you start to get the pattern. These rules can be hard to detect, some may not work, and we would need a lot of effort to maintain our program.

So, what can we do with Machine Learning? We gather data that we can use as features and targets.

![image.png](attachment:2824bfcb-b249-47c9-b4df-64553e0f7170.png)

But, how do we obtain the features? They can come from the rules we have, but we will not determine the impact of those rules in our prediction, this will be determined by our ML system.

![image.png](attachment:39576df0-ece6-40c2-a39c-735ad15ad9a9.png)

This way, we will have a set of features and a target for each of our emails.

![image.png](attachment:cfd2dd38-70b6-49ef-8f13-7187587e31a1.png)

This is how we obtain the dataset (one row per email) that we can use to train a model.

![image.png](attachment:a97a49b7-3df1-4c49-9cf5-c09aa1cb82d1.png)

And that we can later use to classify our emails (get the predicted targets).

![image.png](attachment:6c842a94-cfb8-4d8a-b6b5-3104e37c0cd9.png)

![image.png](attachment:2be74182-b986-437a-ad99-afcd4586b3dc.png)

The difference between ML and classical programming can be summarized in the following image.

![image.png](attachment:d434d9d4-4301-48a2-a4ae-f36dfa8f9260.png)

### 3. Supervised Machine learning

Having a **target variable** we want to predict using certain **features**. Modeling is given by 

$g(X) \approx y$, 

where $X :=$ feature matrix, $y :=$ target variable, and $g :=$ model function. 

The process of obtaining $g$ given our features and target is called "training". Where we collect several examples of $X$ and $y$ to obtain the $g$ so that $g(X) \approx y$.

Within supervised learning applications we have:

1. Classification: target variable is a category
    1. Multiclass (ex. cat, dog, car)
    1. Binary (ex. SPAM/NOT SPAM)
    ![image.png](attachment:3b887978-717c-40e3-b264-4c25c4090584.png)
1. Regression: target is a continuous number (ex. price of a car) ![image.png](attachment:cfa2966d-4ed4-42b7-9519-3d87e257a253.png)
2. Ranking (recommender system)(ex. product reccomendations) ![image.png](attachment:4d91777b-0adb-45b8-a523-bac7d49b2430.png)

### 4. CRISP-DM ML Process

Six steps to develop an ML project.

![image.png](attachment:8f2d1a8c-0742-4aea-a075-c4bba4d8dc13.png)

- Business understanding: do we actually need ML? We need to understand the business problem (to what extent it is a problem) and how to solve it. We need to have a goal (has to be measurable) (Ex. Reduce the amount of spam messages by 50%).
- Data understanding: identify data sources (Ex. we have a "report spam button"). Is the data reliable? Is it good enough? Is it large enough? Do we need more? How are we tracking it?). This step may influence the goal.
- Data Preparation: Transform it so it can be put into an ML model (feature extraction).

    ![image.png](attachment:e92941d9-0c80-4c0b-a16b-6678bd26bc0c.png)
    ![image.png](attachment:06f36729-6145-4c3c-bff6-686f813c1319.png)

- Modeling: try different models and select the best one (logistic regression, decision trees, neural networks). Sometimes after modeling we may go back to data preparation to add new features or fix data.
- Evaluation: Is the model good enough? Have we reached the goal? Do our metrics improve? After this, we can go back and adjust the goal, roll the model to more users, or stop working on it.
- Deployment: often evaluation and deployment happen together. We usually test models on real users (*online evaluation*). In this step we care about monitoring, ensuring quality, maintainability, etc.

Now, we **iterate**! That's why it's a cycle, ML projects require many iterations.

### 3. Modeling Selection

Which model to choose? We typically use a training, testing, and a validation dataset. 
- Training: (heuristically 60% of the data) used to train the model.
- Validation: (heuristically 20% of the data) data that the model has not seen before and we can use to compare models and obtain the one that best generalized the patterns in the dataset.
- Testing: (heuristically 20% of the data) the final secret data so we can make sure the model chosen didn't just get lucky on the validation dataset + we it helps with human overfitting (when we change model hyperparameters after observing performance in validation dataset) that the model has not seen before and we can use to compare models and obtain the one that best generalized the patterns in the dataset.

![image.png](attachment:5373268a-6c46-401c-9688-60be90616cd4.png)

**Note:** To not throw away the validation data, you can join in with the training data after model selection. To train a final model.

### 4. Intro to Numpy

In [None]:
import numpy as np

In [None]:
# ARRAYS
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [None]:
np.full(10, 2.5)

array([2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5])

In [None]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [None]:
a[2]

3

In [None]:
a[2] = 4
a

array([1, 2, 4])

In [None]:
np.arange(3, 10) # (inclusive, exclusive)

array([3, 4, 5, 6, 7, 8, 9])

In [None]:
np.linspace(0,100,11)

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In [None]:
# MULTIDIMENSION ARRAY
np.zeros((5, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [None]:
n = np.array(
    [
        [1,2,3],
        [4,5,6]
    ]
)
n

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
n[0,1]

2

In [None]:
n[1] # row 1

array([4, 5, 6])

In [None]:
n[:,1] # column 1

array([2, 5])

In [None]:
# RANDOMLY GENERATE ARRAY
np.random.rand(5,2) # standard uniform distribution

array([[0.16117142, 0.05990175],
       [0.02912512, 0.62509369],
       [0.86578782, 0.34314786],
       [0.9183946 , 0.18985838],
       [0.49808458, 0.70632803]])

In [None]:
np.random.seed(2) # make results reproducible
np.random.rand(5,2)

array([[0.4359949 , 0.02592623],
       [0.54966248, 0.43532239],
       [0.4203678 , 0.33033482],
       [0.20464863, 0.61927097],
       [0.29965467, 0.26682728]])

In [None]:
np.random.randint(low = 0, high = 100, size = (5,2))

array([[37, 39],
       [67,  4],
       [42, 51],
       [38, 33],
       [58, 67]])

In [None]:
# ELEMENT WISE OPERATION
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [None]:
a + 2 # Broadcasting: array with number 2 gets auto generated 

array([2, 3, 4, 5, 6])

In [None]:
a * 2

array([0, 2, 4, 6, 8])

In [None]:
b = (10 + a * 2)**2/100
a + b

array([1.  , 2.44, 3.96, 5.56, 7.24])

In [None]:
# COMPARISON
a >= 2

array([False, False,  True,  True,  True])

In [None]:
a > b

array([False, False,  True,  True,  True])

In [None]:
a[a > b]

array([2, 3, 4])

In [None]:
# SUMMARIZING OPERATIONS
a.mean()

2.0

In [None]:
a.std()

1.4142135623730951

### 5. Linear Algebra Basics

In [None]:
v = np.arange(2,7)
v

array([2, 3, 4, 5, 6])

In [None]:
# Multiplication by a scalar (elementwise)
2 * v

array([ 4,  6,  8, 10, 12])

In [None]:
u = np.full(5, 2)
u

array([2, 2, 2, 2, 2])

In [None]:
# Sum vectors (elementwise)
u + v

array([4, 5, 6, 7, 8])

In [None]:
# Vector-vector multiplication (dot product): vectors need to be the same size
np.dot(u,v)

40

In [None]:
# What is happening in dor product?
def vector_mult(u,v):
    assert u.shape[0] == v.shape[0]

    n = u.shape[0]

    result = 0.0

    for i in range(n):
        result = result + u[i]*v[i]

    return result

In [None]:
vector_mult(u,v)

40.0

In [None]:
# Matrix-vector multiplication:
a = np.array(
    [
        [1,1,1,1,1],
        [2,2,2,2,2]
    ]
)

In [None]:
np.matmul(a, u)

array([10, 20])

In [None]:
# What is happening in multiplication?
def matrix_vec_mult(a, v):
    assert a.shape[1] == v.shape[0]

    num_rows = a.shape[0]

    result = np.zeros(num_rows)

    for i in range(num_rows):
        result[i] = vector_mult(a[i],v)

    return result

In [None]:
matrix_vec_mult(a, u)

array([10., 20.])

In [None]:
# Matrix-matrix multiplication
a = np.array(
    [
        [1,1,1,1,1],
        [2,2,2,2,2]
    ]
)
b = np.ones((5,2))
b

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [None]:
ab = np.matmul(a, b)
ab

array([[ 5.,  5.],
       [10., 10.]])

In [None]:
def matrix_mult(a,b):
    assert a.shape[1] == b.shape[0]

    num_rows = a.shape[0]
    num_cols = b.shape[1]

    result = np.zeros((num_rows, num_cols))

    for i in range(num_cols):
        bi = b[:, i]
        abi = matrix_vec_mult(a, bi)
        result[:, i] = abi

    return result

In [None]:
matrix_mult(a,b)

array([[ 5.,  5.],
       [10., 10.]])

In [None]:
# SPECIAL MATRICES:
# Identity Matrix:
iden = np.eye(2)
iden

array([[1., 0.],
       [0., 1.]])

In [None]:
np.matmul(b, iden)

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [None]:
b

array([[1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 1.]])

In [None]:
# Inverse matrix:
v = np.array([
    [1, 2],
    [3, 4]
])
v

array([[1, 2],
       [3, 4]])

In [None]:
v_inv = np.linalg.inv(v)
v_inv

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

In [None]:
np.dot(v, v_inv).round(2)

array([[1., 0.],
       [0., 1.]])

### 6. Intro to pandas

In [None]:
import pandas as pd

In [None]:
data = [
    ['Nissan', 'Stanza', 1991, 138, 4,'MANUAL','sedan', 2000],
    [ 'Hyundai', 'Sonata', 2017, None, 4, 'AUTOMATIC', 'Sedan', 27150],
    ['Lotus', 'Elise', 2810, 218, 4, 'MANUAL', 'convertible', 54990],
    ['GMC','Acadia', 2017, 194, 4, 'AUTOMATIC' , '4dr SUV', 34450],
    ['Nissan', 'Frontier', 2017, 261, 6, 'MANUAL','Pickup', 32340],
]
columns = [
    'Make',
    'Model',
    'Year',
    'Engine HP', 
    'Engine Cylinders',
    'Transmission Type',
    'Vehicle_Style', 
    'MSRP',
]

In [None]:
pd.DataFrame(data)

Unnamed: 0,0,1,2,3,4,5,6,7
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150
2,Lotus,Elise,2810,218.0,4,MANUAL,convertible,54990
3,GMC,Acadia,2017,194.0,4,AUTOMATIC,4dr SUV,34450
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
df = pd.DataFrame(data, columns = columns)
df

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150
2,Lotus,Elise,2810,218.0,4,MANUAL,convertible,54990
3,GMC,Acadia,2017,194.0,4,AUTOMATIC,4dr SUV,34450
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
# Other ways of creating dfs:
data = [
    {
        "Make": "Nissan",
        "Model": "Stanza",
        "Year": 1991,
        "Engine HP": 138.0,
        "Engine Cylinders": 4,
        "Transmission Type": "MANUAL",
        "Vehicle_Style": "sedan",
        "MSRP": 2000
    },
    {
        "Make": "Hyundai",
        "Model": "Sonata",
        "Year": 2017,
        "Engine HP": None,
        "Engine Cylinders": 4,
        "Transmission Type": "AUTOMATIC",
        "Vehicle_Style": "Sedan",
        "MSRP": 27150
    },
    {
        "Make": "Lotus",
        "Model": "Elise",
        "Year": 2010,
        "Engine HP": 218.0,
        "Engine Cylinders": 4,
        "Transmission Type": "MANUAL",
        "Vehicle_Style": "convertible",
        "MSRP": 54990
    },
    {
        "Make": "GMC",
        "Model": "Acadia",
        "Year": 2017,
        "Engine HP": 194.0,
        "Engine Cylinders": 4,
        "Transmission Type": "AUTOMATIC",
        "Vehicle_Style": "4dr SUV",
        "MSRP": 34450
    },
    {
        "Make": "Nissan",
        "Model": "Frontier",
        "Year": 2017,
        "Engine HP": 261.0,
        "Engine Cylinders": 6,
        "Transmission Type": "MANUAL",
        "Vehicle_Style": "Pickup",
        "MSRP": 32340
    }
]
df = pd.DataFrame(data)
df

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150
2,Lotus,Elise,2010,218.0,4,MANUAL,convertible,54990
3,GMC,Acadia,2017,194.0,4,AUTOMATIC,4dr SUV,34450
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
df.head(2)

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150


In [None]:
# SERIES (each column of a dataframe is a pandas series)
df.Make

0     Nissan
1    Hyundai
2      Lotus
3        GMC
4     Nissan
Name: Make, dtype: object

In [None]:
df['Engine HP']

0    138.0
1      NaN
2    218.0
3    194.0
4    261.0
Name: Engine HP, dtype: float64

In [None]:
df[['Make','Model','MSRP']]

Unnamed: 0,Make,Model,MSRP
0,Nissan,Stanza,2000
1,Hyundai,Sonata,27150
2,Lotus,Elise,54990
3,GMC,Acadia,34450
4,Nissan,Frontier,32340


In [None]:
df['id'] = [1,2,3,4,5] # Create a new column

In [None]:
df.id

0    1
1    2
2    3
3    4
4    5
Name: id, dtype: int64

In [None]:
del df['id'] # Delete a column
df.head(2)

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150


In [None]:
# INDEX
df.index

RangeIndex(start=0, stop=5, step=1)

In [None]:
df.loc[1] # Row

Make                   Hyundai
Model                   Sonata
Year                      2017
Engine HP                  NaN
Engine Cylinders             4
Transmission Type    AUTOMATIC
Vehicle_Style            Sedan
MSRP                     27150
Name: 1, dtype: object

In [None]:
df.index = ['a', 'b', 'c', 'd', 'e']

In [None]:
df.loc[['a','c']]

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
a,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
c,Lotus,Elise,2010,218.0,4,MANUAL,convertible,54990


In [None]:
df.iloc[[0,2]] # Positional index

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
a,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
c,Lotus,Elise,2010,218.0,4,MANUAL,convertible,54990


In [None]:
df = df.reset_index(drop = True) # Alone it doesn't overwrite
df

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,Nissan,Stanza,1991,138.0,4,MANUAL,sedan,2000
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150
2,Lotus,Elise,2010,218.0,4,MANUAL,convertible,54990
3,GMC,Acadia,2017,194.0,4,AUTOMATIC,4dr SUV,34450
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
# ELEMENT-WISE OPERATION
df['Engine HP'] / 100 # We can do everything we do in numpy (on series)

0    1.38
1     NaN
2    2.18
3    1.94
4    2.61
Name: Engine HP, dtype: float64

In [None]:
df['Year'] >= 2015

0    False
1     True
2    False
3     True
4     True
Name: Year, dtype: bool

In [None]:
#FILTERING
df[
    df['Year'] >= 2015
]

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
1,Hyundai,Sonata,2017,,4,AUTOMATIC,Sedan,27150
3,GMC,Acadia,2017,194.0,4,AUTOMATIC,4dr SUV,34450
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
df[
    (df['Year'] >= 2015) &
    (df['Make'] == 'Nissan')
]

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
4,Nissan,Frontier,2017,261.0,6,MANUAL,Pickup,32340


In [None]:
# STRING OPERATIONS
df['Vehicle_Style']

0          sedan
1          Sedan
2    convertible
3        4dr SUV
4         Pickup
Name: Vehicle_Style, dtype: object

In [None]:
df['Vehicle_Style'].str.lower()

0          sedan
1          sedan
2    convertible
3        4dr suv
4         pickup
Name: Vehicle_Style, dtype: object

In [None]:
df['Vehicle_Style'] = df['Vehicle_Style'].str.replace(' ', '_').str.lower()
df['Vehicle_Style']

0          sedan
1          sedan
2    convertible
3        4dr_suv
4         pickup
Name: Vehicle_Style, dtype: object

In [None]:
# SUMMARIZING OPERATIONS
df.MSRP.mean()

30186.0

In [None]:
df.MSRP.describe()

count        5.000000
mean     30186.000000
std      18985.044904
min       2000.000000
25%      27150.000000
50%      32340.000000
75%      34450.000000
max      54990.000000
Name: MSRP, dtype: float64

In [None]:
df.describe() # Only numerical values

Unnamed: 0,Year,Engine HP,Engine Cylinders,MSRP
count,5.0,4.0,5.0,5.0
mean,2010.4,202.75,4.4,30186.0
std,11.260551,51.29896,0.894427,18985.044904
min,1991.0,138.0,4.0,2000.0
25%,2010.0,180.0,4.0,27150.0
50%,2017.0,206.0,4.0,32340.0
75%,2017.0,228.75,4.0,34450.0
max,2017.0,261.0,6.0,54990.0


In [None]:
df.Make

0     Nissan
1    Hyundai
2      Lotus
3        GMC
4     Nissan
Name: Make, dtype: object

In [None]:
df.Make.nunique()

4

In [None]:
df.nunique()

Make                 4
Model                5
Year                 3
Engine HP            4
Engine Cylinders     2
Transmission Type    2
Vehicle_Style        4
MSRP                 5
dtype: int64

In [None]:
# MISSING VALUES
df.isnull()

Unnamed: 0,Make,Model,Year,Engine HP,Engine Cylinders,Transmission Type,Vehicle_Style,MSRP
0,False,False,False,False,False,False,False,False
1,False,False,False,True,False,False,False,False
2,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False


In [None]:
df.isnull().sum()

Make                 0
Model                0
Year                 0
Engine HP            1
Engine Cylinders     0
Transmission Type    0
Vehicle_Style        0
MSRP                 0
dtype: int64

In [None]:
# GROUPING

```
SELECT
    transmision_type,
    avg(MSRP)
FROM 
    cars
GROUP BY
    transmision_type
```

In [None]:
df.groupby('Transmission Type').MSRP.mean()

Transmission Type
AUTOMATIC    30800.000000
MANUAL       29776.666667
Name: MSRP, dtype: float64

In [None]:
# NUMPY
df.MSRP.values

array([ 2000, 27150, 54990, 34450, 32340])

In [None]:
# GET DICTIONARY
df.to_dict(orient = 'records')

[{'Make': 'Nissan',
  'Model': 'Stanza',
  'Year': 1991,
  'Engine HP': 138.0,
  'Engine Cylinders': 4,
  'Transmission Type': 'MANUAL',
  'Vehicle_Style': 'sedan',
  'MSRP': 2000},
 {'Make': 'Hyundai',
  'Model': 'Sonata',
  'Year': 2017,
  'Engine HP': nan,
  'Engine Cylinders': 4,
  'Transmission Type': 'AUTOMATIC',
  'Vehicle_Style': 'sedan',
  'MSRP': 27150},
 {'Make': 'Lotus',
  'Model': 'Elise',
  'Year': 2010,
  'Engine HP': 218.0,
  'Engine Cylinders': 4,
  'Transmission Type': 'MANUAL',
  'Vehicle_Style': 'convertible',
  'MSRP': 54990},
 {'Make': 'GMC',
  'Model': 'Acadia',
  'Year': 2017,
  'Engine HP': 194.0,
  'Engine Cylinders': 4,
  'Transmission Type': 'AUTOMATIC',
  'Vehicle_Style': '4dr_suv',
  'MSRP': 34450},
 {'Make': 'Nissan',
  'Model': 'Frontier',
  'Year': 2017,
  'Engine HP': 261.0,
  'Engine Cylinders': 6,
  'Transmission Type': 'MANUAL',
  'Vehicle_Style': 'pickup',
  'MSRP': 32340}]

---

All the images from the videos come from the [ML Zoomcamp YouTube playlist](https://youtube.com/playlist?list=PL3MmuxUbc_hIhxl5Ji8t4O6lPAOpHaCLR&si=6csM2ytZ_syPu36S). In this notebook we refer to *section 1*.