## Homework
You need to install Python, NumPy, Pandas, Matplotlib and Seaborn. For that, you can use the instructions from
[06-environment.md](../../../01-intro/06-environment.md).

In [1]:
import pandas as pd
import numpy as np

### Q1. Pandas version

What's the version of Pandas that you installed?

You can get the version information using the `__version__` field:

```python
pd.__version__
```

In [2]:
print(f"The version of pandas is {pd.__version__}")

The version of pandas is 2.2.2


### Run in CML
python -c "import pandas as pd; print(pd.__version__)"

### Getting the data 

For this homework, we'll use the Laptops Price dataset. Download it from 
[here](https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
```

Or just open it with your browser and click "Save as...".

Now read it with Pandas.

In [3]:
!wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv

--2024-10-05 15:36:31--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/laptops.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 298573 (292K) [text/plain]
Saving to: ‘laptops.csv.3’


2024-10-05 15:36:31 (1.84 MB/s) - ‘laptops.csv.3’ saved [298573/298573]



In [4]:
df = pd.read_csv("laptops.csv")
df.head()

Unnamed: 0,Laptop,Status,Brand,Model,CPU,RAM,Storage,Storage type,GPU,Screen,Touch,Final Price
0,ASUS ExpertBook B1 B1502CBA-EJ0436X Intel Core...,New,Asus,ExpertBook,Intel Core i5,8,512,SSD,,15.6,No,1009.0
1,Alurin Go Start Intel Celeron N4020/8GB/256GB ...,New,Alurin,Go,Intel Celeron,8,256,SSD,,15.6,No,299.0
2,ASUS ExpertBook B1 B1502CBA-EJ0424X Intel Core...,New,Asus,ExpertBook,Intel Core i3,8,256,SSD,,15.6,No,789.0
3,MSI Katana GF66 12UC-082XES Intel Core i7-1270...,New,MSI,Katana,Intel Core i7,16,1000,SSD,RTX 3050,15.6,No,1199.0
4,HP 15S-FQ5085NS Intel Core i5-1235U/16GB/512GB...,New,HP,15S,Intel Core i5,16,512,SSD,,15.6,No,669.01


### Q2. Records count

How many records are in the dataset?

- 12
- 1000
- 2160
- 12160

In [5]:
print(f"The number of records in the dataset is {df.shape[0]}")

The number of records in the dataset is 2160


### Q3. Laptop brands

How many laptop brands are presented in the dataset?

- 12
- 27
- 28
- 2160

In [6]:
print(f"The number of laptop brands in the dataset is {df['Brand'].nunique()}")

The number of laptop brands in the dataset is 27


### Q4. Missing values

How many columns in the dataset have missing values?

- 0
- 1
- 2
- 3

In [7]:
((df.isnull().sum()) > 0).sum()

3

### Q5. Maximum final price

What's the maximum final price of Dell notebooks in the dataset?

- 869
- 3691
- 3849
- 3936

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2160 entries, 0 to 2159
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Laptop        2160 non-null   object 
 1   Status        2160 non-null   object 
 2   Brand         2160 non-null   object 
 3   Model         2160 non-null   object 
 4   CPU           2160 non-null   object 
 5   RAM           2160 non-null   int64  
 6   Storage       2160 non-null   int64  
 7   Storage type  2118 non-null   object 
 8   GPU           789 non-null    object 
 9   Screen        2156 non-null   float64
 10  Touch         2160 non-null   object 
 11  Final Price   2160 non-null   float64
dtypes: float64(2), int64(2), object(8)
memory usage: 202.6+ KB


In [9]:
df.groupby('Brand')['Final Price'].agg(['min', 'mean', 'max'])


Unnamed: 0_level_0,min,mean,max
Brand,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Acer,264.14,1001.285766,3691.0
Alurin,239.0,484.701379,869.0
Apple,299.0,1578.227672,3849.0
Asus,239.25,1269.380699,5758.14
Deep Gaming,1334.0,1505.3775,1639.01
Dell,379.0,1153.839881,3936.0
Denver,329.95,329.95,329.95
Dynabook Toshiba,397.29,999.197895,1805.01
Gigabyte,799.0,1698.488958,3799.0
HP,210.14,952.628478,5368.77


In [10]:
df[df['Brand'] == "Dell"]['Final Price'].max()


3936.0

### Q6. Median value of Screen

1. Find the median value of `Screen` column in the dataset.
2. Next, calculate the most frequent value of the same `Screen` column.
3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
4. Now, calculate the median value of `Screen` once again.

Has it changed?

> Hint: refer to existing `mode` and `median` functions to complete the task.

- Yes
- No

In [11]:
#1.Find the median value of `Screen` column in the dataset.
median_screen = df['Screen'].median()
median_screen

15.6

In [12]:
#2. Next, calculate the most frequent value of the same `Screen` column.
mode_screen = df['Screen'].mode()[0]
mode_screen

15.6

In [13]:
#3. Use `fillna` method to fill the missing values in `Screen` column with the most frequent value from the previous step.
df['Screen'].fillna(mode_screen)

0       15.6
1       15.6
2       15.6
3       15.6
4       15.6
        ... 
2155    17.3
2156    17.3
2157    17.3
2158    13.4
2159    13.4
Name: Screen, Length: 2160, dtype: float64

In [14]:
#4. Now, calculate the median value of `Screen` once again.
median_mode_screen = df['Screen'].median()
median_mode_screen
#Has it changed?

#> Hint: refer to existing `mode` and `median` functions to complete the task.

#- Yes
#- No

15.6

No, it doesn't change. The median is not affected by extreme values.

### Q7. Sum of weights

1. Select all the "Innjoo" laptops from the dataset.
2. Select only columns `RAM`, `Storage`, `Screen`.
3. Get the underlying NumPy array. Let's call it `X`.
4. Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
5. Compute the inverse of `XTX`.
6. Create an array `y` with values `[1100, 1300, 800, 900, 1000, 1100]`.
7. Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
8. What's the sum of all the elements of the result?

> **Note**: You just implemented linear regression. We'll talk about it in the next lesson.

- 0.43
- 45.29
- 45.58
- 91.30


In [33]:
Innjoo_laptop = df[df['Brand'] == "Innjoo"]
features = ["RAM", "Storage", "Screen"]
X = Innjoo_laptop[features].to_numpy()
XTX = X.T @ X
# XTX = np.dot(X.T,X)
XTX_inv = np.linalg.inv(XTX) 
y = np.array([1100, 1300, 800, 900, 1000, 1100])
w = XTX_inv @ X.T @ y
sum_w = round((np.sum(w)),2)
sum_w

91.3