# Analysis Report II

## Types of Real Estate

### Checking the types of properties available in the data, using drop_duplicates to remove duplicate objects, renaming columns and organizing the index with range.

In [127]:
# let's import the Pandas library, read the file 'aluguel.csv' and name it
import pandas as pd

In [128]:
dados = pd.read_csv('dados/aluguel.csv', sep = ';')

# displaying the first 10 lines with the head function
# then we have our database below
dados.head(10)

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU
0,Quitinete,Copacabana,1,0,0,40,1700.0,500.0,60.0
1,Casa,Jardim Botânico,2,0,1,100,7000.0,,
2,Conjunto Comercial/Sala,Barra da Tijuca,0,4,0,150,5200.0,4020.0,1111.0
3,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0
4,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,
5,Apartamento,Vista Alegre,3,1,0,70,1200.0,,
6,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0
7,Casa de Condomínio,Barra da Tijuca,5,4,5,750,22000.0,,
8,Casa de Condomínio,Ramos,2,2,0,65,1000.0,,
9,Conjunto Comercial/Sala,Centro,0,3,0,695,35000.0,19193.0,3030.0


In [129]:
# in the Type column, we’ll show you what types of properties are in that column
# note that in the information, the length is 32,960 (lines)
dados['Tipo']

0                      Quitinete
1                           Casa
2        Conjunto Comercial/Sala
3                    Apartamento
4                    Apartamento
                  ...           
32955                  Quitinete
32956                Apartamento
32957                Apartamento
32958                Apartamento
32959    Conjunto Comercial/Sala
Name: Tipo, Length: 32960, dtype: object

In [130]:
# there is another way to display a column that's ... but I will do it as above (dados['Tipo'])
dados.Tipo

0                      Quitinete
1                           Casa
2        Conjunto Comercial/Sala
3                    Apartamento
4                    Apartamento
                  ...           
32955                  Quitinete
32956                Apartamento
32957                Apartamento
32958                Apartamento
32959    Conjunto Comercial/Sala
Name: Tipo, Length: 32960, dtype: object

In [131]:
# the tipo_de_movel variable receives the database (dados['Tipo'])
tipo_de_imovel = dados['Tipo']

In [132]:
# checking the type of this variable, it is a Series
# documentation - https://pandas.pydata.org/pandas-docs/stable/reference/series.html
type(tipo_de_imovel)

pandas.core.series.Series

In [133]:
# drop_duplicates returns a Series with the duplicated values removed, look...
tipo_de_imovel.drop_duplicates()

0                          Quitinete
1                               Casa
2            Conjunto Comercial/Sala
3                        Apartamento
7                 Casa de Condomínio
16                    Prédio Inteiro
17                              Flat
29                        Loja/Salão
80           Galpão/Depósito/Armazém
83                    Casa Comercial
117                     Casa de Vila
159                   Terreno Padrão
207                      Box/Garagem
347                             Loft
589      Loja Shopping/ Ct Comercial
2157                         Chácara
3354           Loteamento/Condomínio
4379                           Sítio
4721                   Pousada/Chalé
6983                          Studio
9687                           Hotel
23614                      Indústria
Name: Tipo, dtype: object

In [134]:
tipo_de_imovel

0                      Quitinete
1                           Casa
2        Conjunto Comercial/Sala
3                    Apartamento
4                    Apartamento
                  ...           
32955                  Quitinete
32956                Apartamento
32957                Apartamento
32958                Apartamento
32959    Conjunto Comercial/Sala
Name: Tipo, Length: 32960, dtype: object

In [135]:
tipo_de_imovel.drop_duplicates(inplace = True)

In [136]:
# okay, here we have the duplicate values removed
tipo_de_imovel

0                          Quitinete
1                               Casa
2            Conjunto Comercial/Sala
3                        Apartamento
7                 Casa de Condomínio
16                    Prédio Inteiro
17                              Flat
29                        Loja/Salão
80           Galpão/Depósito/Armazém
83                    Casa Comercial
117                     Casa de Vila
159                   Terreno Padrão
207                      Box/Garagem
347                             Loft
589      Loja Shopping/ Ct Comercial
2157                         Chácara
3354           Loteamento/Condomínio
4379                           Sítio
4721                   Pousada/Chalé
6983                          Studio
9687                           Hotel
23614                      Indústria
Name: Tipo, dtype: object

In [137]:
# organizing the visualization of the type of property
# I created a new DataFrame passing the values of the property type
tipo_de_imovel = pd.DataFrame(tipo_de_imovel)
tipo_de_imovel

Unnamed: 0,Tipo
0,Quitinete
1,Casa
2,Conjunto Comercial/Sala
3,Apartamento
7,Casa de Condomínio
16,Prédio Inteiro
17,Flat
29,Loja/Salão
80,Galpão/Depósito/Armazém
83,Casa Comercial


In [138]:
# the column with the indexes is disorganized, as we can see above. Solving this problem
# Checking the indices:
tipo_de_imovel.index

Int64Index([    0,     1,     2,     3,     7,    16,    17,    29,    80,
               83,   117,   159,   207,   347,   589,  2157,  3354,  4379,
             4721,  6983,  9687, 23614],
           dtype='int64')

In [139]:
# number of non-duplicated values... is 22:
tipo_de_imovel.shape[0]

22

In [140]:
# the range function is very useful for returning a numeric series
# therefore, we can use it to organize our column
range(tipo_de_imovel.shape[0])

range(0, 22)

In [141]:
# using the for, we will have our column numbered from 0 to 21
# remember that line 1, starts with 0 ...
for i in range(tipo_de_imovel.shape[0]):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21


In [142]:
# variable 'tipo_de_imovel' receives the value of range(tipo_de_imovel.shape [0])
tipo_de_imovel.index = range(tipo_de_imovel.shape[0])

In [143]:
# let's check the indices - note that it starts at 0, ends at position 22, and increases by 1 in 1
# that was exactly what I needed
tipo_de_imovel.index

RangeIndex(start=0, stop=22, step=1)

In [144]:
# here is the solution I wanted - to organize the indexes from 0, from 1 to 1 until the last type of property
tipo_de_imovel

Unnamed: 0,Tipo
0,Quitinete
1,Casa
2,Conjunto Comercial/Sala
3,Apartamento
4,Casa de Condomínio
5,Prédio Inteiro
6,Flat
7,Loja/Salão
8,Galpão/Depósito/Armazém
9,Casa Comercial


In [145]:
# renaming the index column
tipo_de_imovel.columns.name = 'Id'

In [146]:
# checking that the column is now named with 'Id'
tipo_de_imovel

Id,Tipo
0,Quitinete
1,Casa
2,Conjunto Comercial/Sala
3,Apartamento
4,Casa de Condomínio
5,Prédio Inteiro
6,Flat
7,Loja/Salão
8,Galpão/Depósito/Armazém
9,Casa Comercial


In [147]:
# okay, we now have all types of properties contained in the DataFrame