# Prática Guiada - Regressão IV.

## Dados abertos do Airbnb em Nova York:

#### Desde 2008, os hóspedes e anfitriões têm usado o Airbnb para expandir as possibilidades de viagem e apresentar uma forma mais única e personalizada de experimentar o mundo. Este conjunto de dados descreve a atividade de listagem e as métricas em NYC, NY para 2019.

### Conteúdo:

#### O arquivo `'AB_NYC_2019.csv'` inclui todas as informações necessárias para descobrir mais sobre hosts, disponibilidade geográfica, métricas necessárias para fazer previsões e tirar conclusões.

#### Este conjunto de dados [públicos](http://insideairbnb.com/) faz parte do [Airbnb](https://www.airbnb.com.br/).

<img src="AirBNBNewYork.png" width="735" height="616" align="center"/>

#### Vamos importar as bibliotecas necessárias.

In [1]:
import pandas as pd
import numpy as np

#### Lemos e imprimimos as primeiras linhas.

In [2]:
df = pd.read_csv('AB_NYC_2019.csv')
df.head()
#df.columns

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.1,1,0


#### Criamos uma função para realizar uma análise exploratória de dados (Explanatory Data Analysis - [EDA](https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15)).

In [3]:
def EDA (df):
    eda_df = {}
    eda_df['Amount_NaN'] = df.isnull().sum()
    eda_df['%_NaN'] = df.isnull().mean().round(2)
    eda_df['DType'] = df.dtypes
    eda_df['Amount_Data'] = df.count()
    
    # Outro ponto para ser verificado, porque para criar a coluna com a quantidade de valores unicos por coluna
    # Não utilizei a função df.unique() 
    colunas = sorted(df.columns.tolist(), key=str.lower, reverse=False)
    eda_df['Amount_Unique'] = filtered_result = list(map(lambda x: len(df[x].unique().tolist()), colunas))
    
    eda_df['Mean'] = np.round(df.mean(), 2)
    eda_df['Median'] = np.round(df.median(), 2)
    
    eda_df['Max'] = df.max()
    eda_df['Min'] = df.min()
    eda_df['STD'] = np.round(df.std(), 2)
    
    return pd.DataFrame(eda_df)

#### Chamamos a função criada acima para gerar o `DataFrame` e imprimimos as primeiras linhas.

In [4]:
informacao_df = EDA(df)
informacao_df

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,int64,48895,366,112.78,45.0,365,0,131.62
calculated_host_listings_count,0,0.0,int64,48895,47,7.14,1.0,327,1,32.95
host_id,0,0.0,int64,48895,37457,67620010.65,30793816.0,274321313,2438,78610967.03
host_name,21,0.0,object,48874,11453,,,,,
id,0,0.0,int64,48895,48895,19017143.24,19677284.0,36487245,2539,10983108.39
last_review,10052,0.21,object,38843,1765,,,,,
latitude,0,0.0,float64,48895,19048,40.73,40.72,40.9131,40.4998,0.05
longitude,0,0.0,float64,48895,14718,-73.95,-73.96,-73.713,-74.2444,0.05
minimum_nights,0,0.0,int64,48895,109,7.03,3.0,1250,1,20.51
name,16,0.0,object,48879,47906,,,,,


### Limpeza inicial

### colunas que chamam a atenção:

- last review - Discussões para simplificar - minha sugestão é deletar - já que não temos uma data de referência

- reviews_per_month - para não perder muitos dados - fazer preechimento

- name - podemos excluir as linhas com dados faltantes

- host_name - podemos excluir as linhas com dados faltantes

- id - elementos unicos 

#### Realizamos a limpeza.

In [5]:
del df['last_review']

df['reviews_per_month'] = df.reviews_per_month.fillna(df.reviews_per_month.mean())

df = df.dropna(subset = ['name', 
                         'host_name'
                        ]
              )

del df['id']

#### Checamos por mudanças no `DataFrame`.

In [6]:
informacao_df2 = EDA(df)
informacao_df2

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,int64,48858,366,112.8,45.0,365,0,131.61
calculated_host_listings_count,0,0.0,int64,48858,47,7.15,1.0,327,1,32.96
host_id,0,0.0,int64,48858,37425,67631688.29,30791331.0,274321313,2438,78623888.99
host_name,0,0.0,object,48858,11450,,,현선,'Cil,
latitude,0,0.0,float64,48858,19039,40.73,40.72,40.9131,40.4998,0.05
longitude,0,0.0,float64,48858,14716,-73.95,-73.96,-73.713,-74.2444,0.05
minimum_nights,0,0.0,int64,48858,108,7.01,3.0,1250,1,20.02
name,0,0.0,object,48858,47884,,,"ﾏﾝﾊｯﾀﾝ､駅から徒歩4分でどこに行くのにも便利な場所!女性の方希望,ｷﾚｲなお部屋｡",1 Bed Apt in Utopic Williamsburg,
neighbourhood,0,0.0,object,48858,221,,,Woodside,Allerton,
neighbourhood_group,0,0.0,object,48858,5,,,Staten Island,Bronx,


#### Depois dessa limpeza inicial vamos continuar o trabalho de simplificação do problema. Vamos tirar a colunas `'name'`, `'host_id'` e `'host_name'`.

In [7]:
del df['name']
del df['host_id']
del df['host_name']

#### Checamos por mudanças no `DataFrame`.

In [8]:
informacao_df3 = EDA(df)
informacao_df3

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,int64,48858,366,112.8,45.0,365,0,131.61
calculated_host_listings_count,0,0.0,int64,48858,47,7.15,1.0,327,1,32.96
latitude,0,0.0,float64,48858,19039,40.73,40.72,40.9131,40.4998,0.05
longitude,0,0.0,float64,48858,14716,-73.95,-73.96,-73.713,-74.2444,0.05
minimum_nights,0,0.0,int64,48858,108,7.01,3.0,1250,1,20.02
neighbourhood,0,0.0,object,48858,221,,,Woodside,Allerton,
neighbourhood_group,0,0.0,object,48858,5,,,Staten Island,Bronx,
number_of_reviews,0,0.0,int64,48858,394,23.27,5.0,629,0,44.55
price,0,0.0,int64,48858,674,152.74,106.0,10000,0,240.23
reviews_per_month,0,0.0,float64,48858,938,1.37,1.22,58.5,0.01,1.5


#### Analisando as colunas categóricas remanecentes?

In [9]:
informacao_df3.loc[informacao_df3['DType'] == 'object']

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
neighbourhood,0,0.0,object,48858,221,,,Woodside,Allerton,
neighbourhood_group,0,0.0,object,48858,5,,,Staten Island,Bronx,
room_type,0,0.0,object,48858,3,,,Shared room,Entire home/apt,


#### Vamos tratar as variáveis categóricas com a função [.get_dummies()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html), que converte variáveis categóricas em variáveis dummy / indicadoras. Aplicamos esse método em `'neighbourhood_group'` e `'room_type'`.

In [10]:
df = pd.get_dummies(df, columns = ['neighbourhood_group', 'room_type'])
df.head()

Unnamed: 0,neighbourhood,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,neighbourhood_group_Bronx,neighbourhood_group_Brooklyn,neighbourhood_group_Manhattan,neighbourhood_group_Queens,neighbourhood_group_Staten Island,room_type_Entire home/apt,room_type_Private room,room_type_Shared room
0,Kensington,40.64749,-73.97237,149,1,9,0.21,6,365,0,1,0,0,0,0,1,0
1,Midtown,40.75362,-73.98377,225,1,45,0.38,2,355,0,0,1,0,0,1,0,0
2,Harlem,40.80902,-73.9419,150,3,0,1.373221,1,365,0,0,1,0,0,0,1,0
3,Clinton Hill,40.68514,-73.95976,89,1,270,4.64,1,194,0,1,0,0,0,1,0,0
4,East Harlem,40.79851,-73.94399,80,10,9,0.1,1,0,0,0,1,0,0,1,0,0


#### Conferimos por mudanças no `DataFrame`.

In [11]:
informacao_df4 = EDA(df)
informacao_df4

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,int64,48858,366,112.8,45.0,365,0,131.61
calculated_host_listings_count,0,0.0,int64,48858,47,7.15,1.0,327,1,32.96
latitude,0,0.0,float64,48858,19039,40.73,40.72,40.9131,40.4998,0.05
longitude,0,0.0,float64,48858,14716,-73.95,-73.96,-73.713,-74.2444,0.05
minimum_nights,0,0.0,int64,48858,108,7.01,3.0,1250,1,20.02
neighbourhood,0,0.0,object,48858,221,,,Woodside,Allerton,
neighbourhood_group_Bronx,0,0.0,uint8,48858,2,0.02,0.0,1,0,0.15
neighbourhood_group_Brooklyn,0,0.0,uint8,48858,2,0.41,0.0,1,0,0.49
neighbourhood_group_Manhattan,0,0.0,uint8,48858,2,0.44,0.0,1,0,0.5
neighbourhood_group_Queens,0,0.0,uint8,48858,2,0.12,0.0,1,0,0.32


#### Nesse momento vamos dividir nosso `dataset` entre um subconjunto de treino e um de teste. Para isso importamos a biblioteca [sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), que divide arranjos ou matrizes em subconjuntos aleatórios de treino e teste.


#### Alé diss, isolamos nosso contunjo alvo (`target`).

In [12]:
from sklearn.model_selection import train_test_split
y = df['price']
del df['price']

#### E realisamos a divisão dos subconjuntos.

In [13]:
x_treino, x_teste, y_treino, y_teste = train_test_split(df.values, y, test_size = 0.3)

In [14]:
x_treino.shape

(34200, 16)

In [15]:
x_teste.shape

(14658, 16)

#### Sabemos que a saída da divisão entre treino e teste será um arranjo, vamos convertelos em `dataframes` para fazer uma transformação.

In [16]:
x_teste_df = pd.DataFrame(x_teste, 
                          columns = df.columns.tolist()
                         )
x_treino_df = pd.DataFrame(x_treino, 
                           columns = df.columns.tolist()
                          )

#### Vamos aplicar os seguintes métodos nas coluna `'neighbourhood'` do subconjunto e treino.  

- [.value_counts()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html), que retorna uma série contendo contagens de valores únicos.

- [.index()](https://docs.python.org/2/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange), que traz o índice da primeira ocorrência associada. usamos também o método 

- [.tolist()](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html), que retorna um formato de lista.

#### Criamos então um novo `dataframe` `frequencia_df` com uma coluna de vizinhanças `'neighbourhood'` e uma de freqüências `'frequencia_neighbourhood'` de aparições.

In [17]:
frequencia_df = pd.DataFrame()
frequencia_df['neighbourhood'] = x_treino_df['neighbourhood'].value_counts(normalize = True).index.tolist()
frequencia_df['frequencia_neighbourhood'] = x_treino_df['neighbourhood'].value_counts(normalize = True).tolist()
frequencia_df.head()

Unnamed: 0,neighbourhood,frequencia_neighbourhood
0,Williamsburg,0.080877
1,Bedford-Stuyvesant,0.075117
2,Harlem,0.054152
3,Bushwick,0.050409
4,Hell's Kitchen,0.041637


#### Realizamos a junção dos `dataframes` `x_treino_df` e `frequencia_df` com o método [.merge()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) que mescla `DataFrames` ou objetos nomeados do tipo Series com uma junção de estilo de banco de dados.

In [18]:
x_treino_df = x_treino_df.merge(frequencia_df, 
                                on = 'neighbourhood'
                               )
x_treino_df.head()

Unnamed: 0,neighbourhood,latitude,longitude,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,neighbourhood_group_Bronx,neighbourhood_group_Brooklyn,neighbourhood_group_Manhattan,neighbourhood_group_Queens,neighbourhood_group_Staten Island,room_type_Entire home/apt,room_type_Private room,room_type_Shared room,frequencia_neighbourhood
0,Kips Bay,40.7436,-73.9806,6,4,0.08,1,0,0,0,1,0,0,1,0,0,0.009474
1,Kips Bay,40.7438,-73.9791,2,2,0.04,1,0,0,0,1,0,0,1,0,0,0.009474
2,Kips Bay,40.7426,-73.9791,2,35,3.79,3,267,0,0,1,0,0,0,1,0,0.009474
3,Kips Bay,40.7429,-73.9803,4,5,0.1,1,184,0,0,1,0,0,1,0,0,0.009474
4,Kips Bay,40.7405,-73.9819,3,1,1.0,1,263,0,0,1,0,0,1,0,0,0.009474


#### Chamamos novamente função de EDA que definimos anteriormente.

In [19]:
informacao_df5 = EDA(x_treino_df)
informacao_df5

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,object,34200,366,112.28,44.0,365,0,131.59
calculated_host_listings_count,0,0.0,object,34200,47,7.16,1.0,327,1,33.07
frequencia_neighbourhood,0,0.0,float64,34200,113,0.03,0.02,0.0808772,2.92398e-05,0.03
latitude,0,0.0,object,34200,16454,40.73,40.72,40.9131,40.4998,0.05
longitude,0,0.0,object,34200,12894,-73.95,-73.96,-73.7169,-74.2444,0.05
minimum_nights,0,0.0,object,34200,94,6.99,2.0,1250,1,20.48
neighbourhood,0,0.0,object,34200,218,,,Woodside,Allerton,
neighbourhood_group_Bronx,0,0.0,object,34200,2,0.02,0.0,1,0,0.15
neighbourhood_group_Brooklyn,0,0.0,object,34200,2,0.41,0.0,1,0,0.49
neighbourhood_group_Manhattan,0,0.0,object,34200,2,0.44,0.0,1,0,0.5


#### Realizamos também a junção dos `dataframes` `x_teste_df` e `frequencia_df` com o método [.merge()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html).

In [20]:
#x_teste_df = x_teste_df.merge(frequencia_df, on = 'neighbourhood')
x_teste_df = x_teste_df.merge(frequencia_df, 
                              how = 'left', 
                              on = 'neighbourhood'
                             )

In [21]:
x_teste_df.head()

Unnamed: 0,neighbourhood,latitude,longitude,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,neighbourhood_group_Bronx,neighbourhood_group_Brooklyn,neighbourhood_group_Manhattan,neighbourhood_group_Queens,neighbourhood_group_Staten Island,room_type_Entire home/apt,room_type_Private room,room_type_Shared room,frequencia_neighbourhood
0,Kensington,40.6402,-73.9717,2,1,0.03,1,0,0,1,0,0,0,0,1,0,0.003246
1,Bedford-Stuyvesant,40.6861,-73.9323,30,0,1.37322,27,331,0,1,0,0,0,0,1,0,0.075117
2,Chelsea,40.7445,-73.9994,30,0,1.37322,232,292,0,0,1,0,0,1,0,0,0.022573
3,South Slope,40.6684,-73.9865,2,83,3.51,3,201,0,1,0,0,0,1,0,0,0.006053
4,East Flatbush,40.6564,-73.9188,7,0,1.37322,9,365,0,1,0,0,0,0,1,0,0.010088


#### E mais uma ve chamamos novamente função de EDA que definimos anteriormente.

In [22]:
informacao_df5 = EDA(x_teste_df)
informacao_df5

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,object,14658,366,114.01,48.0,365,0,131.67
calculated_host_listings_count,0,0.0,object,14658,47,7.11,1.0,327,1,32.72
frequencia_neighbourhood,3,0.0,float64,14655,114,0.03,0.02,0.0808772,2.92398e-05,0.03
latitude,0,0.0,object,14658,10155,40.73,40.72,40.9048,40.5229,0.05
longitude,0,0.0,object,14658,8595,-73.95,-73.96,-73.713,-74.2124,0.05
minimum_nights,0,0.0,object,14658,74,7.06,3.0,999,1,18.9
neighbourhood,0,0.0,object,14658,209,,,Woodside,Allerton,
neighbourhood_group_Bronx,0,0.0,object,14658,2,0.02,0.0,1,0,0.15
neighbourhood_group_Brooklyn,0,0.0,object,14658,2,0.41,0.0,1,0,0.49
neighbourhood_group_Manhattan,0,0.0,object,14658,2,0.44,0.0,1,0,0.5


In [23]:
x_teste_df['frequencia_neighbourhood'] = x_teste_df['frequencia_neighbourhood'].fillna(0)
informacao_df5 = EDA(x_teste_df)
informacao_df5

Unnamed: 0,Amount_NaN,%_NaN,DType,Amount_Data,Amount_Unique,Mean,Median,Max,Min,STD
availability_365,0,0.0,object,14658,366,114.01,48.0,365,0,131.67
calculated_host_listings_count,0,0.0,object,14658,47,7.11,1.0,327,1,32.72
frequencia_neighbourhood,0,0.0,float64,14658,114,0.03,0.02,0.0808772,0,0.03
latitude,0,0.0,object,14658,10155,40.73,40.72,40.9048,40.5229,0.05
longitude,0,0.0,object,14658,8595,-73.95,-73.96,-73.713,-74.2124,0.05
minimum_nights,0,0.0,object,14658,74,7.06,3.0,999,1,18.9
neighbourhood,0,0.0,object,14658,209,,,Woodside,Allerton,
neighbourhood_group_Bronx,0,0.0,object,14658,2,0.02,0.0,1,0,0.15
neighbourhood_group_Brooklyn,0,0.0,object,14658,2,0.41,0.0,1,0,0.49
neighbourhood_group_Manhattan,0,0.0,object,14658,2,0.44,0.0,1,0,0.5


#### Podemos então excluir as colunas `'neighbourhood'` `dos dataframes` `x_treino_df` e `x_teste_df`.

In [24]:
del x_treino_df['neighbourhood']
del x_teste_df['neighbourhood']

#### Para [pré-processamento](https://towardsdatascience.com/data-preprocessing-concepts-fa946d11c825) vamos utilizar a biblioteca [sklearn.preprocessing](https://scikit-learn.org/stable/modules/preprocessing.html), que fornece várias funções de utilidade comuns e classes de transformador para alterar vetores com atributos brutos em uma representação que seja mais adequada para os estimadores.

#### Normalizamos nossos dados com o método [.transform()](https://scikit-learn.org/stable/modules/preprocessing.html) para calcular a média e o desvio padrão em um conjunto de treinamento para poder reaplicar posteriormente a mesma transformação no conjunto de teste.

In [25]:
from sklearn import preprocessing

scaler = preprocessing.StandardScaler().fit(x_treino_df.values)

x_train_norm = scaler.transform(x_treino_df.values)
x_test_norm = scaler.transform(x_teste_df.values)

#x_test_norm[np.isnan(x_test_norm)] = np.median(x_test_norm[~np.isnan(x_test_norm)])

In [26]:
x_test_norm[0,:]

array([-1.62482638, -0.41987815, -0.24372638, -0.49778377, -0.88913444,
       -0.18638033, -0.8533303 , -0.15216973,  1.19792749, -0.89352723,
       -0.36026961, -0.08903849, -1.03815426,  1.088973  , -0.15614569,
       -1.07807911])

In [27]:
x_teste_df.head(1)

Unnamed: 0,latitude,longitude,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,neighbourhood_group_Bronx,neighbourhood_group_Brooklyn,neighbourhood_group_Manhattan,neighbourhood_group_Queens,neighbourhood_group_Staten Island,room_type_Entire home/apt,room_type_Private room,room_type_Shared room,frequencia_neighbourhood
0,40.6402,-73.9717,2,1,0.03,1,0,0,1,0,0,0,0,1,0,0.003246


#### Com a biblioteca [sklearn.linear_model.LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) podemos realizar o cálculo da regressão Linear de mínimos quadrados ordinários.

#### O método [.fit()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.fit) realiza o ajuste do modelo linear, é nesse momento que o modelo aprende e otimiza os parâmetros calculados. E o método [.score()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.score) retorne o coeficiente de determinação $R^{2}$ da previsão. Os coeficientes angulares são obtidos com o atributo `.coef_`.

In [28]:
from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(x_train_norm, y_treino)

In [29]:
reg.score(x_train_norm, y_treino)

0.00037128812767339703

In [30]:
reg.coef_

array([ 2.24010837e+00, -3.49099642e+00, -5.99470146e-01, -2.03976826e-02,
       -1.48371447e+00, -1.31421909e+00, -7.08412949e-02,  6.94897734e+13,
        2.29858455e+14,  2.32144292e+14,  1.48991883e+14,  4.12745817e+13,
       -2.95391997e+13, -2.94528578e+13, -9.01158947e+12, -8.31580065e-02])

#### Convertendo nossos resultados em uma lista com o método [.tolist()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.tolist.html).

In [31]:
importance_df = pd.DataFrame()
importance_df['colunas'] = x_teste_df.columns.tolist()
importance_df['importancia'] = list(reg.coef_)
importance_df

Unnamed: 0,colunas,importancia
0,latitude,2.240108
1,longitude,-3.490996
2,minimum_nights,-0.5994701
3,number_of_reviews,-0.02039768
4,reviews_per_month,-1.483714
5,calculated_host_listings_count,-1.314219
6,availability_365,-0.07084129
7,neighbourhood_group_Bronx,69489770000000.0
8,neighbourhood_group_Brooklyn,229858500000000.0
9,neighbourhood_group_Manhattan,232144300000000.0


#### E o coeficiente linear é obtido com o atributo `.intercept_`.

In [32]:
reg.intercept_

153.5195122331136

#### Realizamos nossas previsões com o auxílio do método [.predict()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict).

In [33]:
y_pred = reg.predict(x_test_norm)

#### Com a biblioteca [sklearn.metrics.mean_squared_error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) podemos checar a eficácia de nosso modelo.

In [34]:
from sklearn.metrics import mean_squared_error

(mean_squared_error(y_teste, y_pred)) ** 0.5

209.11020644224126

In [35]:
####  <span style = "color:red">Parei aqui.</span>