### Import Libraries

In [3]:
import numpy as np
import pandas as pd


### Question 1

What's the version of NumPy that you installed? 

You can get the version information using the `__version__` field:

```python
np.__version__
```

In [4]:
np.__version__

'1.22.4'

### Question 2

What's the version of Pandas? 

In [5]:
pd.__version__

'1.4.3'

### Getting the data 

For this homework, we'll use the car price dataset.

Download it from [here](https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv).

You can do it with wget:

```bash
wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv
```

Or just open it with your browser and click "Save as...".


In [6]:
!wget https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv -O hw-1.csv

--2022-08-21 22:01:09--  https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/chapter-02-car-price/data.csv
CA sertifikası '/etc/ssl/certs/ca-certificates.crt' yüklendi
raw.githubusercontent.com (raw.githubusercontent.com) çözümleniyor... 185.199.110.133, 185.199.111.133, 185.199.109.133, ...
raw.githubusercontent.com (raw.githubusercontent.com)[185.199.110.133]:443 bağlanılıyor... bağlantı kuruldu.
HTTP isteği gönderildi, yanıt bekleniyor... 200 OK
Uzunluk: 1475504 (1,4M) [text/plain]
Kayıt yeri: `hw-1.csv'


2022-08-21 22:02:42 (15,7 KB/s) - `hw-1.csv' kaydedildi [1475504/1475504]



> **Note**: I have wget installed separately, so if you're on Windows without WSL, you will need to download it

Now read it with Pandas. 

In [7]:
df = pd.read_csv('hw-1.csv')

Let's look at first five rows of the dataset.

In [8]:
df.head()

Unnamed: 0,Make,Model,Year,Engine Fuel Type,Engine HP,Engine Cylinders,Transmission Type,Driven_Wheels,Number of Doors,Market Category,Vehicle Size,Vehicle Style,highway MPG,city mpg,Popularity,MSRP
0,BMW,1 Series M,2011,premium unleaded (required),335.0,6.0,MANUAL,rear wheel drive,2.0,"Factory Tuner,Luxury,High-Performance",Compact,Coupe,26,19,3916,46135
1,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Convertible,28,19,3916,40650
2,BMW,1 Series,2011,premium unleaded (required),300.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,High-Performance",Compact,Coupe,28,20,3916,36350
3,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,"Luxury,Performance",Compact,Coupe,28,18,3916,29450
4,BMW,1 Series,2011,premium unleaded (required),230.0,6.0,MANUAL,rear wheel drive,2.0,Luxury,Compact,Convertible,28,18,3916,34500


### Question 3

What's the average price of BMW cars in the dataset?

In [9]:
df[df.Make == 'BMW'].MSRP.mean()

61546.76347305389

### Question 4

Select a subset of cars after year 2015 (inclusive, i.e. 2015 and after). How many of them have missing values for Engine HP?

In [10]:
df[df.Year >= 2015]['Engine HP'].isnull().sum()

51

### Question 5

* Calculate the average "Engine HP" in the dataset. 
* Use the `fillna` method and to fill the missing values in "Engine HP" with the mean value from the previous step. 
* Now, calcualte the average of "Engine HP" again.
* Has it changed? 

Round both means before answering this questions.

In [11]:
mean_hp = df['Engine HP'].mean()
mean_hp

249.38607007176023

In [12]:
df['Engine HP'].fillna(mean_hp).mean()

249.38607007176

Filling NAs with 0 changes the mean of "Engine HP":

In [13]:
df['Engine HP'].fillna(0).mean()

247.94174920261878

### Question 6

* Select all the "Rolls-Royce" cars from the dataset.
* Select only columns "Engine HP", "Engine Cylinders", "highway MPG".
* Now drop all duplicated rows using `drop_duplicates` method (you should get a dataframe with 7 rows).
* Get the underlying NumPy array. Let's call it `X`.
* Compute matrix-matrix multiplication between the transpose of `X` and `X`. To get the transpose, use `X.T`. Let's call the result `XTX`.
* Invert `XTX`.
* What's the sum of all the elements of the result?

In [14]:
df_rr = df[df.Make == "Rolls-Royce"]
df_rr = df_rr[["Engine HP", "Engine Cylinders", "highway MPG"]]
df_rr = df_rr.drop_duplicates()

In [15]:
X = df_rr.values
XTX = X.T.dot(X)

XTX_inv = np.linalg.inv(XTX)
XTX_inv.sum()

0.032212320677486125

### Questions 7 

* Create an array `y` with values `[1000, 1100, 900, 1200, 1000, 850, 1300]`.
* Multiply the inverse of `XTX` with the transpose of `X`, and then multiply the result by `y`. Call the result `w`.
* What's the value of the first element of `w`?.

In [16]:
y = [1000, 1100, 900, 1200, 1000, 850, 1300]

In [17]:
w = XTX_inv.dot(X.T).dot(y)

In [18]:
w[0]

0.19989598183188978

> **Note**: we just implemented normal equation

$$w = (X^T X)^{-1} X^T y$$

We'll talk about it more in the next week (Machine Learning for Regression)

### Bonus

Floating point arithmetics is not exact

In [19]:
0.1 + 0.2

0.30000000000000004

Adding the mean value doesn't change the resulting mean:

In [20]:
np.array([1, 2, 3, 4, 5, 6]).mean()

3.5

In [21]:
np.array([1, 2, 3, 4, 5, 6, 3.5, 3.5, 3.5]).mean()

3.5