# Real Estate estimator

In the following challenge, we want to estimate the **price** of a flat depending of data from other flats.

In this exercice, pandas is forbidden ⚠️

Welcome to the [NumPy documentation](https://docs.scipy.org/doc/numpy/reference/) which will be your friend through this exercise as the NumPy library is your only authorized import,  You can also find help on this [NumPy cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf).

In [None]:
# Load the NumPy library


Considering those 4 flats, we want to find the relation between the `price` (in k$) and the 3 criterions: `surface` (square feet), `bedrooms` and `floors`. Those criterions are the **features** of the estimator.

|flats |surface|bedrooms|floors|price|
|------|-------------|--------|------|------------|
|flat1 |620|1|1|244|
|flat2 |3280|4|2|671|
|flat3 |1900|2|2|504|
|flat4 |1320|3|3|510|

A first approach is to find a linear relation between the `Price` and the features resolving this system of equations:

$$\begin{cases}
    244 = \theta_0 + 620\theta_1 + 1\theta_2 + 1\theta_3 \\
    671 = \theta_0 + 3280\theta_1 + 4\theta_2 + 2\theta_3 \\
    504 = \theta_0 + 1900\theta_1 + 2\theta_2 + 2\theta_3 \\
    510 = \theta_0 + 1320\theta_1 + 3\theta_2 + 3\theta_3
\end{cases}$$

Which can be translated into a matricial equation:

$$Y = X\theta$$

where $Y$ is the vector of `Price`, $X$ is the matrix of features and $\theta$ (theta) is the vector of coefficients to be found.

## 1. Define the matrix `x` of features:

_Hint: `x` should be a 4 by 3 `numpy.ndarray`_

In [None]:
# YOUR CODE HERE


## 2. Define the vector `Y` of `Price`s

In [None]:
# Define Y here


In [None]:
# Make Y a 4 by 1 vector with the right NumPy method


## 3. Create the matrix `X` representing the linear system of equation

As you probably noticed, the linear system of equation includes a $\theta_0$ coefficient which appears in the 4 equations. This coefficient is here to represent an [affine relation](https://math.stackexchange.com/questions/275310/what-is-the-difference-between-linear-and-affine-function) rather than a strict linear relation between the `price` and the features. As a result, we need to add one last _feature_ $x_0$ to the matrix $x$.

In [None]:
# Define x0 as a 4 by 1 vector filled with 1 with the right NumPy method


The complete matrix $X$ should look like:

$$\begin{bmatrix}
    1 & 620 & 1 & 1 \\
    1 & 3280 & 4 & 2 \\
    1 & 1900 & 2 & 2 \\
    1 & 1320 & 3 & 3
\end{bmatrix}$$

In [None]:
# Use x0 and x to define the matrix X with the right NumPy method


## 4 Find the solution of the system

Now is the time to find the vector of coefficients $\theta$ !

The solution of the equation is:
 
$$Y = X\theta \Leftrightarrow X\theta = Y \Leftrightarrow X^{-1}X\theta = X^{-1}Y \Leftrightarrow \theta = X^{-1}Y$$

where $X^{-1}$ is the inverse of $X$.

In [None]:
# Compute the inverse of the matrix X with the right NumPy method


You can check the inversion worked testing:

$$X^{-1}X = I_4$$
where $I_4$ is the 4 by 4 identity matrix.

In [None]:
# Define I4 using the right NumPy method


Now compute $X^{-1}X$:

In [None]:
# YOUR CODE HERE


Does it looks like $I_4$?

If not, you probably use the `*` operator to perform the multiplication between $X^{-1}$ and $X$. Here we want to perform the matrix product you should find the right Numpy method to do so.

If so, you noticed that you do not really get exact $0$ and $1$ is the resulting product. To be sure, you can try the [`numpy.allclose()`](https://numpy.org/doc/stable/reference/generated/numpy.allclose.html?highlight=allclose#numpy.allclose) method to check your result:

In [None]:
# YOUR CODE HERE


You are finally able to find $\theta = X^{-1}Y$:

In [None]:
# Compute theta


What do you think about those coefficients? How does the `price` evolve while the `surface` raises? What about the `bedrooms` or the `floors` raising?

You can plot the `price` against one of the feature to visualize this relation.

In [None]:
# YOUR PLOT HERE


## 5. Estimation

You finally solve the system finding $\theta$, you are able to estimate the `Price` (in thousands of $) of a 5th flat given this characteristics:

- `Surface`: 3000 $ft^2$
- `Bedrooms`: 5 
- `Floors`: 1

with the following formula:

$$Y_{flat5} = X_{flat5}\theta$$

In [None]:
# Define X5

# Compute Y5

# You should find a Price of 526,000 $


## 6. What if we know more than 4 flats ?

Let's now imagine we discover the real price of this 5th flat is 700,000$\$$ in reality, and we want to take this new information into account in our model, to estimate the price of a 6th flat.  
Update the linear system of equation $X\theta = Y$ accordingly

In [None]:
# Create the new X


In [None]:
# Create new Y


Try to solve the equation in one line using [`numpy.linalg.solve`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html), in order to find the new $\theta$ to predict the price of a 6th flat. What can you conclude?

In [None]:
# Your code


<details>
    <summary>👉 Answer (hidden)</summary>

We can't re-use the same approach as our new matrice X is of shape (5,4) and thus non-inversible.
There is no **deterministic** mathematic formula to compute exactly the price of each new flat based only on these 4 features.

Instead of solving $X\theta = Y$, one of the thing we could do is to try to find the $\hat{\theta}$ that minimizes the error $e = X\hat{\theta} - Y $: This approach is called a **linear regression model**. If the error is measured using the euclidian distance, we called this approach an **ordinary least square** regression, and is the single most common machine learning algorithm! 

This new estimator can then be used to give an **approximate** estimation of the price on any new flats. 
Let's try to build one together in our next exercice!

</details>