# Multivariate Regression

Aka linear regression with multiple variables. 

If we want to decide the home price based on the area, bedrooms and age:

$$
price = m_1 * area + m_2 * bedrooms + m_3 * age + b
$$
where 
- $area$, $bedrooms$ and $age$: the independent variables, or better known as **features**
- $price$ is the dependent variable. 
- $m_1$, $m_2$ and $m_3$ are the coefficients of the features
- $b$ is the bias term or the intercept

it can be generalized in the following equation:

$$
y = m_1 * x_1 + m_2 * x_2 + ... + m_n * x_n + b
$$
The goal is to find the best values for $m_1$, $m_2$, ..., $m_n$ and $b$ that minimize the error in predicting the price.

For this example we use data testscore interview score and salary. We will predict the salary based on the testscore and interview score.

We will try to estimate the salaries for the following candidates:
1. experience: 2 test: 9 interview: 6
2. experience: 12 test: 10 interview: 10

In [13]:
import pandas as pd
import numpy as np
from word2number import w2n

In [14]:
data = pd.read_csv('../../../z - Attachments/02-hiring.csv')
data

Unnamed: 0,experience,test_score(out of 10),interview_score(out of 10),salary($)
0,,8.0,9,50000
1,,8.0,6,45000
2,five,6.0,7,60000
3,two,10.0,10,65000
4,seven,9.0,6,70000
5,three,7.0,10,62000
6,ten,,7,72000
7,eleven,7.0,8,80000


## Cleaning the data

We can see in the above table that there are some missing numbers, and numbers that are a string ('five' instead of 5). We need to clean the data before we can use it.

We will also rename the columns to make it easier to work with.

In [40]:
# cleaning the column names to exp, test, interview, salary
data.columns = ['exp', 'test', 'interview', 'salary']

def convert_experience(val):
    if pd.isna(val):
        return np.nan
    try:
        return w2n.word_to_num(val)
    except (ValueError, TypeError):
        return val

data.exp = data.exp.apply(convert_experience).astype('float')
data.exp


0     0
1     0
2     5
3     2
4     7
5     3
6    10
7    11
Name: exp, dtype: int64