# PRATICA_INDEPENDENTE - Pandas 2.

## Introdução

### Contexto:

#### Esse conjunto de dados é um registro de cada edifício ou unidade de edifício (apartamento etc.) vendidos no mercado imobiliário de Nova York durante um período de 12 meses.

### Conteúdo:

#### Esse dataset contem o local (location), endereço (address), tipo (type), preço de venda (sale price) e data de venda (sale date) de unidades do edifício. Veja a seguir algumas referências  sobre os campos:

* BOROUGH: Um código para definir o bairro em que a propriedade está localizada:
    - Manhattan (1), 
    - Bronx (2), 
    - Brooklyn (3), 
    - Queens (4), 
    - Staten Island (5).

* BLOCK; LOT: A combinação do bairro "borough", bloco "block", e lote "lot" forma uma chave única para a propriedade em New York City. Chamado de BBL.

* BUILDING CLASS AT PRESENT e BUILDING CLASS AT TIME OF SALE: O tipo de edifício em vários pontos no tempo. Veja o glossário abaixo:

#### Para referência adicional em campos individuais, consulte o [Glossário de Termos](https://www1.nyc.gov/assets/finance/downloads/pdf/07pdf/glossary_rsf071607.pdf). Para os códigos de classificação de construção, consulte o Glossário de classificações de construção de [NYC Property Sales](https://www.kaggle.com/new-york-city/nyc-property-sales).

## Importamos os pacotes necessários e carregamos os dados.

### para fazer a instalacao no pacote seaborn no proprio jupyter se a instalacao no 
### nao der certo

In [1]:
conda install seaborn 

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
import pandas as pd
from scipy import stats, integrate
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks")

In [11]:
imoveis = pd.read_csv('nyc-rolling-sales_twentieth.csv', encoding = "UTF-8", sep = ",")

### Exercício 1: Avalie os [tipos](https://realpython.com/python-data-types/#type-conversion) das colunas e faça as alterações necessárias.

In [12]:
imoveis.head(10)

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ADDRESS,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,SALE PRICE,SALE DATE
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633,6440,1900,2,C2,"6.625.000,00",2017-07-19 00:00:00
1,5,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,26,,C7,234 EAST 4TH STREET,...,28,3,31,4616,18690,1900,2,C7,-,2016-12-14 00:00:00
2,6,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,39,,C7,197 EAST 3RD STREET,...,16,1,17,2212,7803,1900,2,C7,-,2016-12-09 00:00:00
3,7,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,402,21,,C4,154 EAST 7TH STREET,...,10,0,10,2272,6794,1913,2,C4,"3.936.272,00",2016-09-23 00:00:00
4,8,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,404,55,,C2,301 EAST 10TH STREET,...,6,0,6,2369,4615,1900,2,C2,"8.000.000,00",2016-11-17 00:00:00
5,9,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,405,16,,C4,516 EAST 12TH STREET,...,20,0,20,2581,9730,1900,2,C4,-,2017-07-20 00:00:00
6,10,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,406,32,,C4,210 AVENUE B,...,8,0,8,1750,4226,1920,2,C4,"3.192.840,00",2016-09-23 00:00:00
7,11,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,407,18,,C7,520 EAST 14TH STREET,...,44,2,46,5163,21007,1900,2,C7,-,2017-07-20 00:00:00
8,12,1,ALPHABET CITY,08 RENTALS - ELEVATOR APARTMENTS,2,379,34,,D5,141 AVENUE D,...,15,0,15,1534,9198,1920,2,D5,-,2017-06-20 00:00:00
9,13,1,ALPHABET CITY,08 RENTALS - ELEVATOR APARTMENTS,2,387,153,,D9,629 EAST 5TH STREET,...,24,0,24,4489,18523,1920,2,D9,"16.232.000,00",2016-11-07 00:00:00


In [5]:
imoveis.columns.T

Index(['Unnamed: 0', 'BOROUGH', 'NEIGHBORHOOD', 'BUILDING CLASS CATEGORY',
       'TAX CLASS AT PRESENT', 'BLOCK', 'LOT', 'EASE-MENT',
       'BUILDING CLASS AT PRESENT', 'ADDRESS', 'APARTMENT NUMBER', 'ZIP CODE',
       'RESIDENTIAL UNITS', 'COMMERCIAL UNITS', 'TOTAL UNITS',
       'LAND SQUARE FEET', 'GROSS SQUARE FEET', 'YEAR BUILT',
       'TAX CLASS AT TIME OF SALE', 'BUILDING CLASS AT TIME OF SALE',
       'SALE PRICE', 'SALE DATE'],
      dtype='object')

In [8]:
imoveis.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16909 entries, 0 to 16908
Data columns (total 22 columns):
 #   Column                          Non-Null Count  Dtype 
---  ------                          --------------  ----- 
 0   Unnamed: 0                      16909 non-null  int64 
 1   BOROUGH                         16909 non-null  int64 
 2   NEIGHBORHOOD                    16909 non-null  object
 3   BUILDING CLASS CATEGORY         16909 non-null  object
 4   TAX CLASS AT PRESENT            16909 non-null  object
 5   BLOCK                           16909 non-null  int64 
 6   LOT                             16909 non-null  int64 
 7   EASE-MENT                       16909 non-null  object
 8   BUILDING CLASS AT PRESENT       16909 non-null  object
 9   ADDRESS                         16909 non-null  object
 10  APARTMENT NUMBER                16909 non-null  object
 11  ZIP CODE                        16909 non-null  int64 
 12  RESIDENTIAL UNITS               16909 non-null

### Vamos aterar os tipos de algumas colunas.

In [10]:
imoveis['SALE PRICE'] = imoveis['SALE PRICE'].replace(['-'], 1)
imoveis.head(3)

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ADDRESS,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,SALE PRICE,SALE DATE
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633,6440,1900,2,C2,"6.625.000,00",2017-07-19 00:00:00
1,5,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,26,,C7,234 EAST 4TH STREET,...,28,3,31,4616,18690,1900,2,C7,-,2016-12-14 00:00:00
2,6,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,39,,C7,197 EAST 3RD STREET,...,16,1,17,2212,7803,1900,2,C7,-,2016-12-09 00:00:00


In [22]:
imoveis['PRECO DE VENDAS'] = float(imoveis['PRECO DE VENDAS'].replace(',','.'))

TypeError: cannot convert the series to <class 'float'>

In [23]:
imoveis['PRECO DE VENDAS'].astype(float)

ValueError: could not convert string to float: ' -  '

In [29]:
imoveis['PRECO DE VENDAS'] = imoveis['PRECO DE VENDAS'].apply(lambda x: float(x))

ValueError: could not convert string to float: ' -  '

In [19]:
imoveis['PRECO DE VENDAS'] = float(imoveis['PRECO DE VENDAS'])

TypeError: cannot convert the series to <class 'float'>

In [14]:
imoveis['SALE PRICE'] = imoveis['SALE PRICE'].astype('float64')

KeyError: 'SALE PRICE'

In [13]:
# renomear colunas

imoveis.rename({'LOCATION': 'LOCAL', 'SALE PRICE': 'PRECO DE VENDAS', 'SALE DATE': 'DATA DE VENDA', 'ADDRESS': 'ENDERECO'}, axis =1, inplace= True)
imoveis.head(3)

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ENDERECO,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,PRECO DE VENDAS,DATA DE VENDA
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633,6440,1900,2,C2,6625000.00,2017-07-19 00:00:00
1,5,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,26,,C7,234 EAST 4TH STREET,...,28,3,31,4616,18690,1900,2,C7,-,2016-12-14 00:00:00
2,6,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,39,,C7,197 EAST 3RD STREET,...,16,1,17,2212,7803,1900,2,C7,-,2016-12-09 00:00:00


### Vamos eliminar as linhas que contêm valores `NaN`.

In [59]:
imoveis[imoveis['PRECO DE VENDAS'].map(len) >1]

Unnamed: 0.1,Unnamed: 0,BOROUGH,NEIGHBORHOOD,BUILDING CLASS CATEGORY,TAX CLASS AT PRESENT,BLOCK,LOT,EASE-MENT,BUILDING CLASS AT PRESENT,ENDERECO,...,RESIDENTIAL UNITS,COMMERCIAL UNITS,TOTAL UNITS,LAND SQUARE FEET,GROSS SQUARE FEET,YEAR BUILT,TAX CLASS AT TIME OF SALE,BUILDING CLASS AT TIME OF SALE,PRECO DE VENDAS,DATA DE VENDA
0,4,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,392,6,,C2,153 AVENUE B,...,5,0,5,1633,6440,1900,2,C2,6625000.00,2017-07-19 00:00:00
1,5,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,26,,C7,234 EAST 4TH STREET,...,28,3,31,4616,18690,1900,2,C7,-,2016-12-14 00:00:00
2,6,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2,399,39,,C7,197 EAST 3RD STREET,...,16,1,17,2212,7803,1900,2,C7,-,2016-12-09 00:00:00
3,7,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2B,402,21,,C4,154 EAST 7TH STREET,...,10,0,10,2272,6794,1913,2,C4,3936272.00,2016-09-23 00:00:00
4,8,1,ALPHABET CITY,07 RENTALS - WALKUP APARTMENTS,2A,404,55,,C2,301 EAST 10TH STREET,...,6,0,6,2369,4615,1900,2,C2,8000000.00,2016-11-17 00:00:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16904,16908,1,UPPER WEST SIDE (79-96),10 COOPS - ELEVATOR APARTMENTS,2,1237,29,,D4,"201 WEST 89TH STREET, 9A",...,0,0,0,-,-,1925,2,D4,-,2017-02-08 00:00:00
16905,16909,1,UPPER WEST SIDE (79-96),10 COOPS - ELEVATOR APARTMENTS,2,1237,29,,D4,"201 WEST 89TH STREET, 10C",...,0,0,0,-,-,1925,2,D4,712500.00,2017-02-13 00:00:00
16906,16910,1,UPPER WEST SIDE (79-96),10 COOPS - ELEVATOR APARTMENTS,2,1237,29,,D4,"201 WEST 89 ST, 10D",...,0,0,0,-,-,1925,2,D4,740000.00,2017-02-13 00:00:00
16907,16911,1,UPPER WEST SIDE (79-96),10 COOPS - ELEVATOR APARTMENTS,2,1237,29,,D4,"201 WEST 89TH STREET, 7G",...,0,0,0,-,-,1925,2,D4,1800000.00,2017-04-27 00:00:00


### Qual é o valor médio do metro quadrado em NY?

In [100]:
imoveis['PRECO DE VENDAS']

0        6625000
1            -  
2            -  
3        3936272
4        8000000
          ...   
16904        -  
16905     712500
16906     740000
16907    1800000
16908        -  
Name: PRECO DE VENDAS, Length: 16909, dtype: object

### Qual o preço médio por metro quadrado de cada `BLOCK`? Organizar os dados para indicar qual é o mais caro.
Nota: fazer o cálculo tanto com groupby como com pivot tables

### Em qual `BLOCK` há maior dispersão de preços por metro quadrado? Organizar os valores para identificar o maior.

(Lembrar da fórmula do coeficiente de variação para medir a dispersão)

**Pistas**
* A primeira opção é definir uma função com arrays e utilizar `.apply()`

* A segunda, é gerar duas séries: 
  - uma com o método `.std()` e dividi-la por outra série gerada com `mean()`

### Em qual bairro os apartamentos são maiores? 
Nota: o cálculo pode ser feiro tanto com groupby como com pivot tables

### Em geral, você pode ver alguma diferença entre o preço médio por metro quadrado dos apartamentos, considerando seu ano de construção? o que você pode dizer sobre a relação entre o ano de construção e o tamanho total médio dos mesmos em pés quadrados?

### Gere um `DataFrame` que acrescente a informação por (`PRICE PER SQUARED FEET`), unidades resideinciais (`RESIDENTIAL UNITS`) e unidades comerciais (`COMMERCIAL UNITS`) por `BLOCK` e vizinhança (`NEIGHBORHOOD`). Forneça informações sobre a tendência central e a dispersão de ambas as distribuições.