# Linear Regression

A linear regression model is a supervised machine learning algorithm used to predict a continuous numerical output based on one or more input variables. It establishes a linear relationship between the input variables (also known as independent or predictor variables) and the output variable (also known as the dependent variable). In general, a linear regression model can be written as:

$$
y = f(x_1, x_2, x_3, ..., x_n) = \alpha + \beta_1*x_1 + \beta_2*x_2 + \beta_3*x_3 + ... + \beta_n*x_n \tag{1}
$$

Where,

- **$x_1, x_2, x_3, ..., x_n$** are features (regressors) that describe the information of interest (regressant) **y**;
- **$\beta_1, \beta_2, \beta_3, ..., \beta_n$** are the coefficients of the features (regressors) $x_1, x_2, x_3, ..., x_n$;
- **$\alpha$** is the intercept (independent term) that measures the "starting level" of **y** when there is no contributions of any regressor;

Using vector notation, it is possible to represent the list of regressor of y and its respective coefficients as following:

$$
\vec{\beta} = [\beta_1, \beta_2, \beta_3, ..., \beta_n] \space\space and \space\space \vec{x} = [x_1, x_2, x_3, ..., x_n] \tag{2}
$$

Applying the concept of **dot product** between two vector, equation (1) can be written as:

$$
y = \alpha + \vec{\beta} \cdot \vec{x} \tag{3}
$$

In order to find the coefficients **$\beta_1, \beta_2, \beta_3, ..., \beta_n$**, linear regression packages commonly use a technique called **Ordinary Least Squares regression (OLS)**. Maximum likelihood and Generalized method of moments estimator are alternative approaches to OLS. In OLS method aims to minimize the sum of square differences between the observed and predicted values.

To be continued...

# The Boston Housing Dataset

**About the dataset**

This dataset was created to evaluate the prices of houses based on simple features. The dataset can be found in Kaggle in the following [link](https://www.kaggle.com/datasets/marcosangelo/saltosp-house-dataset-2022).

Here we have 4 datasets: 

- Houses for sale: real_estate_salto_sp_casa_venda_35.csv
- Apartments for sale: real_estate_salto_sp_apartamento_venda_35.csv
- Houses for rent: real_estate_salto_sp_casa_aluguel_3.csv
- Apartments for rent: real_estate_salto_sp_apartamento_aluguel_5.csv

It is possible to combine all of them and create a single file to analyze all of them together. Each dataset contains 7 columns which are the following:

- **nome_casa**: general description;
- **endereço_casa**: house addres;
- **preço_casa**: house price;
- **metros_casa**: house square foot;
- **quartos_casa**: number of bedrooms;
- **banheiros_casa**: number of bathrooms;
- **vagas_casa**: garage size;

If you wish more details I encourage enriching the data using Google Maps APIs or IBGE to capture features like: zip code area, distance to downtown and more.

# Coding

In [2]:
# Importing pandas library for data wrangling
import pandas as pd

# Importing numpy for algebra operations
import numpy as np

In [9]:
# Reading SP-Salto Housing csv file
housing_df = pd.read_csv('../Data/real_estate_salto_sp.csv')

In [10]:
housing_df

Unnamed: 0,property_description,address,price,size_m2,number_rooms,number_bathrooms,number_parking_spaces
0,Apartamento com 2 Quartos à Venda/Aluguel 76m2,"Rua Padre José de Anchieta, 86 - Vila Romão, S...",BRL 1.500 /month,76,2,1,1
1,Apartamento com 2 Quartos para Venda/Aluguel 62m2,"Jardim Sontag, Salto - SP",BRL 260.000,62,2,2,1
2,"Apartamento com 2 Quartos para Aluguel, 53m2","Parque Bela Vista, Salto - SP",BRL 1.800 /month,53,2,1,1
3,"Apartamento com 2 Quartos para Aluguel, 60m2","Rua Estados Unidos, 195 - Guaraú, Salto - SP",BRL 290.000,60,2,2,1
4,"Apartamento com 2 Quartos para Aluguel, 75m2","Da Estação, Salto - SP",BRL 1.550 /month,75,2,2,2
...,...,...,...,...,...,...,...
2506,"Casa com 4 Quartos à Venda, 360m2","Rua Estado de Minas Gerais, 811 - Village Mout...",BRL 1.499.999,360,4,5,2
2507,"Casa com 3 Quartos à Venda, 219m2","Rua Fortaleza, 10 - Jardim Panorama, Salto - SP",BRL 300.000,219,3,2,2
2508,"Casa com 3 Quartos à Venda, 180m2","Itapecerica, Salto - SP",BRL 320.000,180,3,2,2
2509,"Casa com 4 Quartos à Venda, 300m2","Rua Estado de Minas Gerais, 22 - Loteamento Te...",BRL 650.000,300,4,3,4
