## Homework

### Set up the environment

You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can use the instructions from
[06-environment.md](../../../01-intro/06-environment.md).

### Q1. Pandas version

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

In [97]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [98]:
pd.__version__

'2.2.3'

### Getting the data 

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

In [99]:
%%bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv

--2025-01-25 00:39:34--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Resolviendo raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Conectando con raw.githubusercontent.com (raw.githubusercontent.com)[185.199.108.133]:443... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 298573 (292K) [text/plain]
Grabando a: «laptops.csv.1»

     0K .......... .......... .......... .......... .......... 17% 1.35M 0s
    50K .......... .......... .......... .......... .......... 34% 1.54M 0s
   100K .......... .......... .......... .......... .......... 51% 1.96M 0s
   150K .......... .......... .......... .......... .......... 68% 1.39M 0s
   200K .......... .......... .......... .......... .......... 85% 1.39M 0s
   250K .......... .......... .......... .......... .         100% 3.40M=0.2s

2025-01-25 00:39:35 (1.63 MB/s) - «laptops.csv.1» guardado [298573/298573]



Or just open it with your browser and click "Save as...".

Now read it with Pandas.

In [100]:
df = pd.read_csv('laptops.csv')

### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [101]:
df.shape

(2160, 12)

### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- 27
- 28
- 2160

In [102]:
df.Brand.nunique()

27

### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [103]:
df.isnull().sum()

Laptop             0
Status             0
Brand              0
Model              0
CPU                0
RAM                0
Storage            0
Storage type      42
GPU             1371
Screen             4
Touch              0
Final Price        0
dtype: int64

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [104]:
df[df['Brand']=='Dell']['Final Price'].max()

np.float64(3936.0)


### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- No


In [105]:
df['Screen'].median()

np.float64(15.6)

In [106]:
mode=df['Screen'].mode()
mode=mode[0]
mode

np.float64(15.6)

In [107]:
df['Screen'].isnull().sum()

np.int64(4)

In [108]:
df['Screen'].fillna(mode, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Screen'].fillna(mode, inplace=True)


In [109]:
df['Screen'].isnull().sum()

np.int64(0)

In [110]:
df['Screen'].median()

np.float64(15.6)


### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30

In [111]:
df_Innjoo=df[df['Brand']=='Innjoo']

In [112]:
df_Innjoo[['RAM', 'Storage', 'Screen']]

Unnamed: 0,RAM,Storage,Screen
1478,8,256,15.6
1479,8,512,15.6
1480,4,64,14.1
1481,6,64,14.1
1482,6,128,14.1
1483,6,128,14.1


In [113]:
X=np.array(df_Innjoo[['RAM', 'Storage', 'Screen']])

In [114]:
XTX=(X.T).dot(X)

In [115]:
XTX

array([[2.52000e+02, 8.32000e+03, 5.59800e+02],
       [8.32000e+03, 3.68640e+05, 1.73952e+04],
       [5.59800e+02, 1.73952e+04, 1.28196e+03]])

In [116]:
XTX_inv=np.linalg.inv(XTX)

In [117]:
Y = [1100, 1300, 800, 900, 1000, 1100]

In [118]:
w = (XTX_inv.dot(X.T)).dot(Y)

In [119]:
sum(w)

np.float64(91.2998806299555)

## Submit the results

* Submit your results here: https://courses.datatalks.club/ml-zoomcamp-2024/homework/hw01
* If your answer doesn't match options exactly, select the closest one