# Samambaia Price House Prediction

## Table of contents

 1. Problem definition<br>
     1.1 General Objectives<br>
     1.2 Specific objectives
 2. Importing Libraries
 3. Reading the dataset
 4. Filtering data<br>
     4.1 Filtering houses by price<br>
     4.2 Filtering houses by size
 5. Calculating train station distance
 6. Plotting figures
 
 
 
## 1. Problem Definition

### 1.1 General objectives

My family wants to buy a new house in Distrito Federal State, Brazil. My uncle João (ficticious name) have said that Samambaia is the best place for us to live. So, in order to help my family buy a good house in Samambaia, I will explore some houses in OLX and help them to make the best decision based on data.

For my family, the two most important things for a house or appartment to have are the number of bedrooms (3 or more) and the house price must cost R$ 290.000,00 at most. Another thing I consider important is living nearby the train station (if we can afford houses there). So, if we have to choose among houses of the same value, we will probably consider houses nearby the train station, even though it is a little bit more expensive in average.

So, the objective of this work is understanding how samambaia house prices change in Samambaia city using olx data. So, understanding that, we will have a better idea if we can buy a good house in samambaia city, considering all the constraints.

### 1.2 Specific objectives

* Compare Samambaia Norte vs Samambaia Sul house prices;
* Compare houses nearby vs far from train station;
* Compare house size vs house price;
* Compare house prices that has vs not having a condominium;
* Compare house price vs condominium value;
* Compare house prices vs number of bedrooms;
* Compare house price that has vs not having car parking;
* Compare house price vs number of car parking;
* Compare house price vs number of bathrooms;

## 2. Import Libraries

In [54]:
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.graph_objects as go

%matplotlib inline

#column names
HOUSE_PRICE = 'new_house_price'
HOUSE_CATEGORY = 'house_category'
HOUSE_SIZE = 'house_size'
HOUSE_N_ROOMS = 'n_rooms'
HOUSE_REGION = 'Is_samambaia_norte'
HOUSE_HAS_CONDOMI = 'has_condominium'
HOUSE_CONDOMI_VALUE = 'value_condominium'
HOUSE_N_PARKING = 'n_parking'
HOUSE_HAS_PARKING = 'has_parking'
HOUSE_N_BATH = 'n_bathrooms'
HOUSE_CEP = 'CEP'
HOUSE_LOGRADOURO = 'Logradouro'
HOUSE_LINK = 'house_hyperlink'

# Figure constants
FIGURE_TITLE_SIZE = 20
FIGURE_SUBTITLE_SIZE = 15
FIGURE_TICKFONT_SIZE = 12
FIGURE_BG_COLOR = '#FBFBFB'
FIGURE_GRID_COLOR = '#e7e7e7'
FIGURE_AXES_COLOR = 'black'
FIGURE_COLOR_PALETTE = ['#363D45', '#6AB187', '#CED2CC', '#4CB5F5', '#D32D41']

## 3. Reading the dataset

In [55]:
# house data
df_samambaia = pd.read_csv('./data/samambaia_houses.csv', index_col=[0])

# data about street blocks nearby train station
df_distances = pd.read_csv('./samambaia_metro_distances.csv')

df_samambaia.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2926 entries, 0 to 2925
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   new_house_price     2926 non-null   float64
 1   Is_samambaia_norte  2926 non-null   int64  
 2   n_rooms             2926 non-null   int64  
 3   has_condominium     2926 non-null   int64  
 4   value_condominium   2926 non-null   float64
 5   has_parking         2926 non-null   int64  
 6   n_parking           2926 non-null   int64  
 7   house_size          2926 non-null   float64
 8   house_hyperlink     2926 non-null   object 
 9   house_category      2498 non-null   object 
 10  n_bathrooms         2476 non-null   object 
 11  CEP                 2498 non-null   float64
 12  Logradouro          2436 non-null   object 
dtypes: float64(4), int64(5), object(4)
memory usage: 320.0+ KB


In [56]:
df_samambaia[HOUSE_REGION] = df_samambaia[HOUSE_REGION]\
                                .apply(lambda x: 'Samambaia norte' if x == 1 else 'Samambaia sul')

df_samambaia[HOUSE_HAS_CONDOMI] = df_samambaia[HOUSE_HAS_CONDOMI]\
                                    .apply(lambda x: 'Com condomínio' if x == 1 else 'Sem condomínio')

## 4. Filtering data

### 4.1 Filtering houses by price

The first thing to do is having a look at the variables using the describe method. As we can see, we have houses of R$ 0,00 and too much expensive houses. This kind of data might influence our data analysis - in a bad way. 

In [57]:
df_samambaia.describe()

Unnamed: 0,new_house_price,n_rooms,value_condominium,has_parking,n_parking,house_size,CEP
count,2926.0,2926.0,2926.0,2926.0,2926.0,2926.0,2498.0
mean,698515.3,2.3838,341.821941,0.775461,1.3838,126.509228,72316570.0
std,6946811.0,0.935162,7724.161203,0.417349,1.187397,1071.153251,12523.34
min,0.0,0.0,0.0,0.0,0.0,0.0,72161200.0
25%,165000.0,2.0,0.0,1.0,1.0,34.0,72309300.0
50%,209000.0,2.0,0.0,1.0,1.0,54.0,72318210.0
75%,285000.0,3.0,0.0,1.0,2.0,126.0,72321010.0
max,150000000.0,5.0,280000.0,1.0,5.0,47600.0,72660310.0


Taking a closer look at new_house_price, we can see that the mean is very high and the standard deviation is also high. This might happen when we have a few houses that are too much expensive compared to the rest of the dataset and/or houses that are too cheap compared to the dataset - also known as outliers. So, we will apply a threshold to remove this data. Before doing that, let's take a look at the most expensive and cheap houses:

In [58]:
# UNCOMMENT TO SEE THE RESULTS

# Most cheap houses
# df_samambaia[[HOUSE_PRICE, HOUSE_LINK]].sort_values(HOUSE_PRICE)[:30].values

# Most expensive houses
# df_samambaia[[HOUSE_PRICE, HOUSE_LINK]].sort_values(HOUSE_PRICE)[-30:].values

To keep this notebook clean, I commented the result of the cell above. In the first line of code, we can see that some instances has a price of 0, which makes no sense at all; also, we can see that real houses costs at least R$ 20.000,00 (even though some houses of this price might have debt or something like that).

On the other hand, expensive houses that costs more than R$1.000.000,00 are not that common in Samambaia.

For this analysis in this jupyter notebook, our approach is going to remove these data by defining a threshold.

In [59]:
upper_threshold = 10**6
down_threshold = 20000

condition_up = df_samambaia[HOUSE_PRICE] < upper_threshold
condition_down = df_samambaia[HOUSE_PRICE] > down_threshold
df_samambaia_filtered = df_samambaia[condition_up & condition_down]
df_samambaia_filtered.describe()

Unnamed: 0,new_house_price,n_rooms,value_condominium,has_parking,n_parking,house_size,CEP
count,2849.0,2849.0,2849.0,2849.0,2849.0,2849.0,2428.0
mean,234381.443664,2.377325,348.955774,0.774658,1.365742,122.568621,72316650.0
std,115778.418628,0.922825,7827.607788,0.417881,1.16342,1081.663437,12592.65
min,25000.0,0.0,0.0,0.0,0.0,0.0,72161200.0
25%,168000.0,2.0,0.0,1.0,1.0,34.0,72309300.0
50%,209000.0,2.0,0.0,1.0,1.0,54.0,72318230.0
75%,280000.0,3.0,0.0,1.0,2.0,126.0,72321010.0
max,990000.0,5.0,280000.0,1.0,5.0,47600.0,72660310.0


It looks much better now, in terms of house prices.

### 4.2 Filtering houses by size

Another thing we can see is that a few houses is too much big. Let's see the top 15 biggest houses:

In [60]:
df_samambaia_filtered[HOUSE_SIZE].sort_values()[-15:]

775       450.0
754       480.0
2897      500.0
560       500.0
1706      548.0
2893     1000.0
1069     1000.0
302      1125.0
172      1508.0
2813     2000.0
1636     5257.0
2789     6000.0
2711     9999.0
2784    30000.0
2131    47600.0
Name: house_size, dtype: float64

In [61]:
df_samambaia_filtered[HOUSE_SIZE].size

2849

We will be removing houses bigger or equal to 1000 m²:

In [62]:
df_samambaia_filtered = df_samambaia_filtered[df_samambaia_filtered[HOUSE_SIZE] < 1000].copy()
df_samambaia_filtered.fillna(0, inplace=True)

In [63]:
total_size = df_samambaia.shape[0]
new_size = df_samambaia_filtered.shape[0]
percent = new_size/total_size
diff = total_size - new_size

print(f'The filtered dataset has {diff} less instances than the original dataset of {total_size} instances.')
print(f'That represent {percent} of the data in the filtered dataset. So, it was worth it!')

The filtered dataset has 87 less instances than the original dataset of 2926 instances.
That represent 0.9702665755297334 of the data in the filtered dataset. So, it was worth it!


## 5. Calculating house distance to train station

I want to calculate the train station distance from each house and save that information in the dataframe for analysis. So, that's exactly what this piece of code does, using the df_distance dataframe to do that.

In [64]:
metro_one_km = df_distances['1 KM'].values
metro_two_km = df_distances['2 KM'].values

def calc_metro_distance(logradouro):
    logradouro = str(logradouro)
    
    try:
        street_block = ' '.join(logradouro.split()[:2])
        
        if street_block in metro_one_km:
            return 1
        if street_block in metro_two_km:
            return 2
        else:
            return 3
    except:
        return 3

df_samambaia_filtered['Metro Distance'] = df_samambaia_filtered['Logradouro'].apply(calc_metro_distance)
df_samambaia_filtered['Metro Distance'].unique()

array([1, 3, 2])

## 6. Plotting figures and making analysis

In [65]:
title='<b>Preço de casas e apartamentos na Samambaia</b>'

df = df_samambaia_filtered[[HOUSE_REGION,HOUSE_PRICE]].groupby(HOUSE_REGION).mean().reset_index()

fig = make_subplots(
        rows=1,
        cols=2,
        column_widths=[0.35, 0.65],
        subplot_titles=("Preço médio por região", "Preço de imóveis por categoria"))

# Figure 1 - Bar plot
fig.add_trace(
    go.Bar(
        x=df[HOUSE_REGION],
        y=df[HOUSE_PRICE], 
        name='Samambaia', 
        showlegend=False, 
        marker_color=FIGURE_COLOR_PALETTE[0:2],
        textposition='auto', 
        text=df[HOUSE_PRICE].apply(lambda x: '{:,.0f}'.format(x))),
    row=1, col=1)

# Figure 2: Boxplot  -black boxplot
fig.add_trace(
    go.Box(
        x=df_samambaia_filtered[df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos'][HOUSE_REGION],
        y=df_samambaia_filtered[HOUSE_PRICE][df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos'],
        legendgroup='Apartamentos', 
        name='Apartamentos', 
        line_color=FIGURE_COLOR_PALETTE[0]),
    row=1, col=2)

# Figure 2: Boxplot - green boxplot
fig.add_trace(
    go.Box(
        x=df_samambaia_filtered[df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas'][HOUSE_REGION],
        y=df_samambaia_filtered[df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas'][HOUSE_PRICE],
        legendgroup='Casas', 
        name='Casas',
        line_color=FIGURE_COLOR_PALETTE[1]),
    row=1, col=2)

fig.update_layout(
    boxmode='group',
    title=title,
    titlefont={'size':FIGURE_TITLE_SIZE},
    paper_bgcolor=FIGURE_BG_COLOR,
    plot_bgcolor=FIGURE_BG_COLOR,
    template='seaborn',
)

fig.update_xaxes(
    showgrid=False,
    tickfont_size=FIGURE_TICKFONT_SIZE,
    color=FIGURE_AXES_COLOR)

fig.update_yaxes(
    color=FIGURE_AXES_COLOR,
    showgrid=True,
    gridwidth=0.8, 
    gridcolor=FIGURE_GRID_COLOR,
    tickfont_size=FIGURE_TICKFONT_SIZE,
)

fig['layout']['xaxis']['title']='Região'
fig['layout']['xaxis2']['title']='Região'
fig['layout']['yaxis']['title']='Preço (R$)'
fig['layout']['yaxis2']['title']='Preço (R$)'

fig.show()

In [109]:
df_ = df_samambaia_filtered[['Metro Distance', HOUSE_PRICE]].groupby('Metro Distance').mean().reset_index()
title = '<b>Preço dos imóveis em relação à distância do metrô</b>'

fig = make_subplots(rows=1, cols=2, subplot_titles=['Preço médio', 'Preço dos imóveis'], column_widths=[0.4, 0.6])

# Figure Preço Médio
fig.add_trace(
    go.Bar(
        x=df_['Metro Distance'], 
        y=df_[HOUSE_PRICE], 
        showlegend=False,
        text=df_[HOUSE_PRICE].apply(lambda x: '{:,.0f}'.format(x)),
        hovertemplate='Preço médio: R$'+df_[HOUSE_PRICE].apply(lambda x: '{:,.2f}'.format(x)),
        name='',
        marker_color=FIGURE_COLOR_PALETTE[:3]),
    row=1, col=1)

# Figure Preço dos imóveis first boxplot
fig.add_trace(
    go.Box(
        y=df_samambaia_filtered[HOUSE_PRICE][df_samambaia_filtered['Metro Distance'] == 1], 
        name='1',
        showlegend=False,
        line_color=FIGURE_COLOR_PALETTE[0],
    ), 
    row=1, col=2)

# Figure Preço dos imóveis second boxplot
fig.add_trace(
    go.Box(
        y=df_samambaia_filtered[HOUSE_PRICE][df_samambaia_filtered['Metro Distance'] == 2], 
        name='2',
        showlegend=False,
        line_color=FIGURE_COLOR_PALETTE[1]), 
    row=1, col=2)

# Figure Preço dos imóveis third boxplot
fig.add_trace(
    go.Box(
        y=df_samambaia_filtered[HOUSE_PRICE][df_samambaia_filtered['Metro Distance'] == 3], 
        name='3 KM',
        showlegend=False,
        line_color=FIGURE_COLOR_PALETTE[2]
    ), 
    row=1, col=2)

fig.update_layout(
    boxmode='group',
    title=title,
    titlefont={'size':22},
    violinmode='group',
    paper_bgcolor='#f9f9f9',
    plot_bgcolor='#f9f9f9',
    template='seaborn',
)

fig.update_xaxes(
    showgrid=False,
    gridcolor='#eeeeee',
    tickfont_size=13,
)

fig.update_yaxes(
    showgrid=True,
    gridwidth=1.0, 
    gridcolor='#eeeeee',
    tickfont_size=13,
)

# fig.update_layout(hovermode="x")

fig['layout']['xaxis']['title'] = 'Distância do imóvel (km)'
fig['layout']['xaxis2']['title'] = 'Distância do imóvel (km)'
fig['layout']['yaxis']['title'] = 'Preço (R$)'
fig['layout']['yaxis2']['title'] = 'Preço (R$)'

fig.show()

In [150]:
df_fig1 = df_samambaia_filtered[(df_samambaia_filtered[HOUSE_CATEGORY] != 0) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] > 1) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] < 2000)]
df_fig2 = df_samambaia_filtered[(df_samambaia_filtered[HOUSE_CONDOMI_VALUE] > 1) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] < 2000) & (df_samambaia_filtered[HOUSE_SIZE] > 0)]

title='<b>Tamanho x Preço do imóvel</b>'

fig = make_subplots(rows=1, cols=2)


fig.add_trace(
    go.Scatter(
        x=df_fig2[df_fig2[HOUSE_CATEGORY] == 'Casas'][HOUSE_SIZE], 
        y=df_fig2[df_fig2[HOUSE_CATEGORY] == 'Casas'][HOUSE_PRICE], 
        mode='markers',
        name = 'Casas',
        marker = {'color': FIGURE_COLOR_PALETTE[1]}
    ),
    row=1, col=1)

fig.add_trace(
    go.Scatter(
        x=df_fig2[df_fig2[HOUSE_CATEGORY] == 'Apartamentos'][HOUSE_SIZE], 
        y=df_fig2[df_fig2[HOUSE_CATEGORY] == 'Apartamentos'][HOUSE_PRICE], 
        mode='markers',
        name = 'Apartamentos',
        marker = {'color': FIGURE_COLOR_PALETTE[2]}
    ),
    row=1, col=2)

fig.update_layout(
    boxmode='group',
    title=title,
    titlefont={'size':20},
    violinmode='group',
    paper_bgcolor='#f9f9f9',
    plot_bgcolor='#f9f9f9',
    template='seaborn',
)

fig.update_xaxes(
    color='black',
    showgrid=True,
    gridwidth=1.0, 
    gridcolor='#eeeeee',
    tickfont_size=11,
)

fig.update_yaxes(
    color='black',
    showgrid=True,
    gridwidth=1.0,
    gridcolor='#eeeeee',
    tickfont_size=11,
)

fig['layout']['xaxis']['title']='Tamanho do imóvel (m²)'
fig['layout']['xaxis2']['title']='Tamanho do imóvel (m²)'
fig['layout']['yaxis']['title']='Preço (R$)'
fig['layout']['yaxis2']['title']='Preço (R$)'

fig.update_traces(marker_size=8,marker_line_width=0.7)

fig.show()

In [144]:
# COM CONDOMÍNIO VS SEM CONDOMÍNIO
df_ = df_samambaia_filtered[[HOUSE_HAS_CONDOMI, HOUSE_CATEGORY, HOUSE_PRICE]].groupby([HOUSE_HAS_CONDOMI, HOUSE_CATEGORY]).mean().reset_index()
title = '<b>Preço de imóveis com condomínio vs sem condomínio</b>'

fig = make_subplots(
    rows = 1, 
    cols = 2,
    subplot_titles=['Preço médio dos imóveis', 'Preço dos imóveis'],
    column_widths=[0.6, 0.4]
)

fig.add_trace(
    go.Bar(
        x = df_[HOUSE_HAS_CONDOMI][df_[HOUSE_CATEGORY] == 'Apartamentos'],
        y = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Apartamentos'],
        marker_color = FIGURE_COLOR_PALETTE[0],
        text = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Apartamentos'].apply(lambda x: '{:,.0f}'.format(x)),
        hovertemplate='Preço médio: R$'+df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Apartamentos']\
                        .apply(lambda x: '{:,.2f}'.format(x)+'<br>Categoria: Apartamentos'),
        name = 'Apartamentos'),
    row=1, col=1)

fig.add_trace(
    go.Bar(
        x = df_[HOUSE_HAS_CONDOMI][df_[HOUSE_CATEGORY] == 'Casas'],
        y = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Casas'],
        marker_color = FIGURE_COLOR_PALETTE[1],
        text = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Casas'].apply(lambda x: '{:,.0f}'.format(x)),
        hovertemplate='Preço médio: R$'+df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 'Casas']\
                        .apply(lambda x: '{:,.2f}'.format(x))+'<br>Categoria: Casas',
        name = 'Casas'),
    row=1, col=1)

fig.add_trace(
    go.Bar(
        x = df_[HOUSE_HAS_CONDOMI][df_[HOUSE_CATEGORY] == 0],
        y = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 0],
        marker_color = FIGURE_COLOR_PALETTE[2],
        text = df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 0].apply(lambda x: '{:,.0f}'.format(x)),
        hovertemplate='Preço médio: R$'+df_[HOUSE_PRICE][df_[HOUSE_CATEGORY] == 0]\
                        .apply(lambda x: '{:,.2f}'.format(x))+'<br>Categoria: Sem categoria',
        name = 'Sem categoria'),
    row=1, col=1)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        line_color=FIGURE_COLOR_PALETTE[0],
        showlegend=False,
        name = 'Apartamentos'
    ),
    row=1, col=2)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Apartamentos') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        line_color=FIGURE_COLOR_PALETTE[0],
        showlegend=False,
        name = 'Apartamentos'
    ),
    row=1, col=2)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        line_color=FIGURE_COLOR_PALETTE[1],
        showlegend=False,
        name = 'Casas'
    ),
    row=1, col=2)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 'Casas') &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        line_color=FIGURE_COLOR_PALETTE[1],
        showlegend=False,
        name = 'Casas'
    ),
    row=1, col=2)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 0) &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 0) &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Com condomínio')],
        line_color=FIGURE_COLOR_PALETTE[2],
        showlegend=False,
        name = 'Sem categoria'
    ),
    row=1, col=2)

fig.add_trace(
    go.Box(
        y = df_samambaia_filtered[HOUSE_PRICE][(df_samambaia_filtered[HOUSE_CATEGORY] == 0) &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        x = df_samambaia_filtered[HOUSE_HAS_CONDOMI][(df_samambaia_filtered[HOUSE_CATEGORY] == 0) &\
                (df_samambaia_filtered[HOUSE_HAS_CONDOMI] == 'Sem condomínio')],
        line_color=FIGURE_COLOR_PALETTE[2],
        showlegend=False,
        name = 'Sem categoria'
    ),
    row=1, col=2)

fig.update_layout(
    boxmode = 'group',
    barmode = 'group',
    title = title,
    titlefont = {'size': FIGURE_TITLE_SIZE},
    paper_bgcolor = FIGURE_BG_COLOR,
    plot_bgcolor = FIGURE_BG_COLOR,
    template = 'seaborn'
)

fig.update_xaxes(
    color = FIGURE_AXES_COLOR,
    showgrid = False,
)

fig.update_yaxes(
    color = FIGURE_AXES_COLOR,
    gridwidth = 0.9,
    gridcolor = FIGURE_GRID_COLOR
)

fig.show()

In [70]:
df_fig1 = df_samambaia_filtered[(df_samambaia_filtered[HOUSE_CATEGORY] != 0) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] > 1) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] < 2000)]
df_fig2 = df_samambaia_filtered[(df_samambaia_filtered[HOUSE_CONDOMI_VALUE] > 1) & (df_samambaia_filtered[HOUSE_CONDOMI_VALUE] < 2000) & (df_samambaia_filtered[HOUSE_SIZE] > 0)]

title='<b>Condomínio x Preço do imóvel</b>'

fig = make_subplots(rows=1, cols=2)

fig.add_trace(
    go.Scatter(
        x=df_fig1[df_fig1[HOUSE_CATEGORY] == 'Casas'][HOUSE_CONDOMI_VALUE], 
        y=df_fig1[df_fig1[HOUSE_CATEGORY] == 'Casas'][HOUSE_PRICE], 
        mode='markers', 
        name='Casas',
        marker={'color': FIGURE_COLOR_PALETTE[1]}),
row=1, col=1)

fig.add_trace(
    go.Scatter(
        x=df_fig1[df_fig1[HOUSE_CATEGORY] == 'Apartamentos'][HOUSE_CONDOMI_VALUE], 
        y=df_fig1[df_fig1[HOUSE_CATEGORY] == 'Apartamentos'][HOUSE_PRICE], 
        mode='markers', 
        name='Apartamentos',
        marker={'color': FIGURE_COLOR_PALETTE[2]}),
    row=1, col=2)

fig.update_layout(
    boxmode='group',
    title=title,
    titlefont={'size':22},
    violinmode='group',
    paper_bgcolor='#f9f9f9',
    plot_bgcolor='#f9f9f9',
    template='seaborn',
)

fig.update_xaxes(
    color='black',
    showgrid=True,
    gridwidth=1.0, 
    gridcolor='#eeeeee',
    tickfont_size=11,
)

fig.update_yaxes(
    color='black',
    showgrid=True,
    gridwidth=1.0, 
    gridcolor='#eeeeee',
    tickfont_size=11,
)

fig['layout']['xaxis']['title']='Valor do condomínio'
fig['layout']['xaxis2']['title']='Valor do condomínio'
fig['layout']['yaxis']['title']='Valor da casa'
fig['layout']['yaxis2']['title']='Valor do apartamento'

fig.update_traces(marker_size=8,marker_line_width=0.7)

fig.show()

Unnamed: 0,new_house_price,Is_samambaia_norte,n_rooms,has_condominium,value_condominium,has_parking,n_parking,house_size,house_hyperlink,house_category,n_bathrooms,CEP,Logradouro,Metro Distance
0,152000.0,Samambaia sul,1,Com condomínio,329.0,0,0,38.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72302705.0,QR 116 Conjunto 4-A Comércio,1
1,408000.0,Samambaia sul,2,Com condomínio,432.0,1,1,65.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,2,72300533.0,Quadra 301 Conjunto 2,3
2,145000.0,Samambaia norte,2,Com condomínio,10.0,0,0,63.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72316080.0,QR 204,1
3,190000.0,Samambaia sul,2,Com condomínio,350.0,1,1,55.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72304051.0,QN 120 Conjunto 1,3
4,290000.0,Samambaia norte,2,Sem condomínio,0.0,1,1,105.0,https://df.olx.com.br/distrito-federal-e-regia...,Casas,1,72318030.0,QR 402 Conjunto 29,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2921,168000.0,Samambaia norte,2,Com condomínio,0.0,0,0,33.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72321000.0,QR 407,3
2922,168000.0,Samambaia norte,2,Com condomínio,0.0,1,1,33.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72321000.0,QR 407,3
2923,168000.0,Samambaia norte,2,Com condomínio,0.0,1,1,33.0,https://df.olx.com.br/distrito-federal-e-regia...,Apartamentos,1,72321000.0,QR 407,3
2924,212060.0,Samambaia sul,2,Com condomínio,0.0,1,1,42.0,https://df.olx.com.br/distrito-federal-e-regia...,0,0,0.0,0,3
