## Homework 1: Introduction to Machine Learning for Machine Learning Zoomcamp 2025

### Importing the dependencies

In [250]:
import pandas as pd
import numpy as np

### Q1. Pandas version
What's the version of Pandas that you installed?

You can get the version information using the __version__ field:

In [251]:
# Check my current pandas version
pd.__version__

'2.2.3'

### Getting the data
For this homework, we'll use the Car Fuel Efficiency dataset. Download it from here.

You can do it with wget: 

In [261]:
# Read the CSV data file
df = pd.read_csv('https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv')

In [262]:
# Show the top 5 rows data
df.head(5)

Unnamed: 0,engine_displacement,num_cylinders,horsepower,vehicle_weight,acceleration,model_year,origin,fuel_type,drivetrain,num_doors,fuel_efficiency_mpg
0,170,3.0,159.0,3413.433759,17.7,2003,Europe,Gasoline,All-wheel drive,0.0,13.231729
1,130,5.0,97.0,3149.664934,17.8,2007,USA,Gasoline,Front-wheel drive,0.0,13.688217
2,170,,78.0,3079.038997,15.1,2018,Europe,Gasoline,Front-wheel drive,0.0,14.246341
3,220,4.0,,2542.392402,20.2,2009,USA,Diesel,All-wheel drive,2.0,16.912736
4,210,1.0,140.0,3460.87099,14.4,2009,Europe,Gasoline,All-wheel drive,2.0,12.488369


### Q2. Records count
How many records are in the dataset?

In [254]:
# Calculate the records number from simple dataset
"""
Three ways to check the result
"""

print('The shape method: ', df.shape[0])
print('The count method: ', df[df.columns[0]].count())
print('The len method: ', len(df.index))


The shape method:  9704
The count method:  9704
The len method:  9704


### Q3. Fuel types
How many fuel types are presented in the dataset?

In [255]:
# Calculate the number of the fuel types in df
print('The unique of the fuel types: ', df.fuel_type.unique())
print('The sum of the unique fuel types: ', df.fuel_type.nunique())


The unique of the fuel types:  ['Gasoline' 'Diesel']
The sum of the unique fuel types:  2


### Q4. Missing values
How many columns in the dataset have missing values?

In [256]:
# Caclulate the number of missing value in each column

print('The sum of columns which have missing value: ', (df.isnull().sum() > 0).sum())


The sum of columns which have missing value:  4


### Q5. Max fuel efficiency
What's the maximum fuel efficiency of cars from Asia?

In [257]:
# Check the maximum fuel efficiency of cars from Asia

print('The maximum of the fuel efficiency of cars from Asia: ', df[df.origin == 'Asia'].fuel_efficiency_mpg.max())


The maximum of the fuel efficiency of cars from Asia:  23.759122836520497


### Q6. Median value of horsepower
1. Find the median value of horsepower column in the dataset.
2. Next, calculate the most frequent value of the same horsepower column.
3. Use fillna method to fill the missing values in horsepower column with the most frequent value from the previous step.
4. Now, calculate the median value of horsepower once again.

Has it changed?

In [264]:
# Check whether the median value of horsepower has been changed after the 3 steps

print('The mean value of horsepower column', df.horsepower.mean())

print('The most frequent value of horsepower column: ', df.horsepower.value_counts().index[0])

"""
Filling the missing value with the most frequent value via fillna for horsepower column
Then recalculate the mean value of horsepower column
"""

print('The new mean value of the horsepower column: ', (df.horsepower.fillna(df.horsepower.value_counts().index[0])).mean())


The mean value of horsepower column 149.65729212983547
The most frequent value of horsepower column:  152.0
The new mean value of the horsepower column:  149.82821516900248


### Q7. Sum of weights
1. Select all the cars from Asia
2. Select only columns vehicle_weight and model_year
3. Select the first 7 values
4. Get the underlying NumPy array. Let's call it X.
5. Compute matrix-matrix multiplication between the transpose of X and X. To get the transpose, use X.T. Let's call the result XTX.
6. Invert XTX.
7. Create an array y with values [1100, 1300, 800, 900, 1000, 1100, 1200].
8. Multiply the inverse of XTX with the transpose of X, and then multiply the result by y. Call the result w.
9. What's the sum of all the elements of the result?

> Note: You just implemented linear regression. We'll talk about it in the next lesson.

In [287]:
# Calculate the sum of all the elemnets after the 8 steps

"""
First: select the all cars from Asia
Second: select two columns: vehicle_weight and model_year
Third: select the top 7 rows
Forth: save the rows to X
Fifth: compute the matrix multiplication for transpose X and X, and save the result to XTX
Sixth: invert XTX and create an array y
Seventh: mutiply XTX*transpose(X)*y = w
"""

def select_matrix(data,ori,cols):
    if data is None:
        return 0
    
    asia_cars = data[data.origin==ori]
    matrix = asia_cars[cols].head(7).to_numpy()

    return matrix


        
    
def matrix_matrix_multiplication(U,V):
    assert U.shape[1] == V.shape[0]

    num_Urows = U.shape[0]
    num_Ucols = U.shape[1]
    num_Vcols = V.shape[1]

    mm_result = np.zeros((num_Urows,num_Vcols))


    for i in range(num_Urows):
        for j in range(num_Vcols):
            for k in range(num_Ucols):
                mm_result[i][j] += U[i][k]*V[k][j]

    return mm_result

def matrix_vector_multiplication(U,v):
    assert U.shape[1] == v.shape[0]

    num_rows = U.shape[0]

    mv_result = np.zeros(num_rows)

    for i in range(num_rows):
        mv_result[i] = vector_vector_multiplication(U[i],v)

    return mv_result
    


X = select_matrix(df,'Asia',['vehicle_weight','model_year'])
X_transpose = X.T
XTX = matrix_matrix_multiplication(X_transpose,X)

y = np.array([1100, 1300, 800, 900, 1000, 1100, 1200])

XTX_inv = np.linalg.inv(XTX)

temp = matrix_matrix_multiplication(XTX_inv,X_transpose)

w = matrix_vector_multiplication(temp,y)

print('The sum of the all elements is: ',np.sum(w))    

The sum of the all elements is:  0.5187709081074005


In [288]:
# Using dot() function directly

XTX_new = X_transpose.dot(X)
XTX_X_new = XTX_inv.dot(X_transpose)

w_new = XTX_X_new.dot(y)

print('The sum of the all elements is: ', np.sum(w_new))


The sum of the all elements is:  0.5187709081074007
