<a href="https://colab.research.google.com/github/NiveskZ/ETL-Alura/blob/main/pandas/manipulacao_de_dados.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Entendendo o problema


## Conhecendo os dados

Nosso objetivo vai ser a precificação do aluguel de imovéis a curto prazo

Aqui vamos apenas fazer uma preparação de dados, ou seja, transformação e manipulação dos dados fornecidos para que posteriormente seja possível achar uma solução do nosso problema.

In [1]:
import pandas as pd
dados = pd.read_json('/content/dados_hospedagem.json')
dados.head()

Unnamed: 0,info_moveis
0,"{'avaliacao_geral': '10.0', 'experiencia_local..."
1,"{'avaliacao_geral': '10.0', 'experiencia_local..."
2,"{'avaliacao_geral': '10.0', 'experiencia_local..."
3,"{'avaliacao_geral': '10.0', 'experiencia_local..."
4,"{'avaliacao_geral': '10.0', 'experiencia_local..."


In [2]:
dados = pd.json_normalize(dados['info_moveis'])
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,[This clean and comfortable one bedroom sits r...,[Lower Queen Anne is near the Seattle Center (...,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[Real Bed, Futon, Futon, Pull-out Sofa, Real B...","[{Internet,""Wireless Internet"",Kitchen,""Free P...","[$0, $0, $0, $0, $0, $350.00, $350.00, $350.00...","[$0, $0, $0, $20.00, $15.00, $28.00, $35.00, $...","[$110.00, $45.00, $55.00, $52.00, $85.00, $50...."
1,10.0,--,10,[Welcome to the heart of the 'Ballard Brewery ...,"[--, Capital Hill is the heart of Seattle, bor...","[2, 3, 2, 3, 3, 3, 2, 1, 2, 2, 2]","[3, 4, 2, 3, 3, 3, 3, 3, 3, 4, 3]","[5, 6, 8, 3, 3, 5, 4, 5, 6, 7, 4]","[Real Bed, Real Bed, Real Bed, Real Bed, Real ...","[{TV,Internet,""Wireless Internet"",Kitchen,""Fre...","[$500.00, $300.00, $0, $300.00, $300.00, $360....","[$125.00, $100.00, $85.00, $110.00, $110.00, $...","[$350.00, $300.00, $425.00, $300.00, $285.00, ..."
2,10.0,--,11,[New modern house built in 2013. Spectacular ...,[Upper Queen Anne is a charming neighborhood f...,[4],[5],[7],[Real Bed],"[{TV,""Cable TV"",Internet,""Wireless Internet"",""...","[$1,000.00]",[$300.00],[$975.00]
3,10.0,--,12,[Our NW style home is 3200+ sq ft with 3 level...,[The Views from our top floor! Wallingford ha...,"[3, 3, 3, 3, 3, 3, 3, 3]","[6, 6, 5, 5, 5, 5, 4, 4]","[6, 6, 7, 8, 7, 7, 6, 6]","[Real Bed, Real Bed, Real Bed, Real Bed, Real ...","[{Internet,""Wireless Internet"",Kitchen,""Free P...","[$500.00, $500.00, $500.00, $500.00, $500.00, ...","[$225.00, $300.00, $250.00, $250.00, $250.00, ...","[$490.00, $550.00, $350.00, $350.00, $350.00, ..."
4,10.0,--,14,"[Perfect for groups. 2 bedrooms, full bathroom...",[Safeway grocery store within walking distance...,"[2, 3]","[2, 6]","[3, 9]","[Real Bed, Real Bed]","[{TV,Internet,""Wireless Internet"",Kitchen,""Fre...","[$300.00, $2,000.00]","[$40.00, $150.00]","[$200.00, $545.00]"


Quando trabalhamos com um conjunto de dados qualquer, precisamos saber quais informações esses dados estão trazendo, pois só assim conseguiremos estudá-lo e analisá-lo para desenvolver uma solução de análise e tratamento de dados para ele.

- `avaliacao_geral`: refere-se à média de notas dadas para a avaliação da hospedagem no imóvel.
- `experiencia_local`: descreve as experiências oferecidas durante a hospedagem no imóvel.
- `max_hospedes`: informa a quantidade máxima de hóspedes que o local permite.
- `descricao_local`: descreve o imóvel.
- `descricao_vizinhanca`: descreve a vizinhança ao redor do imóvel.
- `quantidade_banheiros`: informa a quantidade de banheiros disponíveis.
- `quantidade_quartos`: informa a quantidade de quartos disponíveis.
- `quantidade_camas`: informa a quantidade de camas disponíveis.
- `modelo_cama`: informa o modelo de cama oferecido.
- `comodidades`: informa as comodidades oferecidas pelo imóvel.
- `taxa_deposito`: informa a taxa de depósito mínima para segurança de hospedagem.
- `taxa_limpeza`: informa a taxa cobrada para o serviço de limpeza.
- `preco`: refere-se ao preço base a ser cobrado pela diária no imóvel.

## Dados numéricos

In [3]:
colunas = list(dados.columns)
colunas

['avaliacao_geral',
 'experiencia_local',
 'max_hospedes',
 'descricao_local',
 'descricao_vizinhanca',
 'quantidade_banheiros',
 'quantidade_quartos',
 'quantidade_camas',
 'modelo_cama',
 'comodidades',
 'taxa_deposito',
 'taxa_limpeza',
 'preco']

In [4]:
dados = dados.explode(colunas[3:])
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
0,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
0,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00
0,10.0,--,1,Very lovely and cozy room for one. Convenientl...,"Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$20.00,$52.00
0,10.0,--,1,The “Studio at Mibbett Hollow' is in a Beautif...,--,1,1,1,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",$0,$15.00,$85.00


In [5]:
dados.reset_index(inplace=True, drop=True)
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,This clean and comfortable one bedroom sits ri...,Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$0,$110.00
1,10.0,--,1,Our century old Upper Queen Anne house is loca...,"Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$45.00
2,10.0,--,1,Cozy room in two-bedroom apartment along the l...,The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",$0,$0,$55.00
3,10.0,--,1,Very lovely and cozy room for one. Convenientl...,"Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",$0,$20.00,$52.00
4,10.0,--,1,The “Studio at Mibbett Hollow' is in a Beautif...,--,1,1,1,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",$0,$15.00,$85.00


In [6]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   avaliacao_geral       3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   object
 3   descricao_local       3818 non-null   object
 4   descricao_vizinhanca  3818 non-null   object
 5   quantidade_banheiros  3818 non-null   object
 6   quantidade_quartos    3818 non-null   object
 7   quantidade_camas      3818 non-null   object
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  taxa_deposito         3818 non-null   object
 11  taxa_limpeza          3818 non-null   object
 12  preco                 3818 non-null   object
dtypes: object(13)
memory usage: 387.9+ KB


In [7]:
import numpy as np

In [9]:
dados['max_hospedes'] = dados['max_hospedes'].astype(np.int64)

In [10]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   avaliacao_geral       3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   int64 
 3   descricao_local       3818 non-null   object
 4   descricao_vizinhanca  3818 non-null   object
 5   quantidade_banheiros  3818 non-null   object
 6   quantidade_quartos    3818 non-null   object
 7   quantidade_camas      3818 non-null   object
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  taxa_deposito         3818 non-null   object
 11  taxa_limpeza          3818 non-null   object
 12  preco                 3818 non-null   object
dtypes: int64(1), object(12)
memory usage: 387.9+ KB


In [11]:
col_numericas = ['quantidade_banheiros','quantidade_quartos','quantidade_camas']
dados[col_numericas] = dados[col_numericas].astype(np.int64)

In [12]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   avaliacao_geral       3818 non-null   object
 1   experiencia_local     3818 non-null   object
 2   max_hospedes          3818 non-null   int64 
 3   descricao_local       3818 non-null   object
 4   descricao_vizinhanca  3818 non-null   object
 5   quantidade_banheiros  3818 non-null   int64 
 6   quantidade_quartos    3818 non-null   int64 
 7   quantidade_camas      3818 non-null   int64 
 8   modelo_cama           3818 non-null   object
 9   comodidades           3818 non-null   object
 10  taxa_deposito         3818 non-null   object
 11  taxa_limpeza          3818 non-null   object
 12  preco                 3818 non-null   object
dtypes: int64(4), object(9)
memory usage: 387.9+ KB


In [13]:
dados['avaliacao_geral'] = dados['avaliacao_geral'].astype(np.float64)
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   avaliacao_geral       3162 non-null   float64
 1   experiencia_local     3818 non-null   object 
 2   max_hospedes          3818 non-null   int64  
 3   descricao_local       3818 non-null   object 
 4   descricao_vizinhanca  3818 non-null   object 
 5   quantidade_banheiros  3818 non-null   int64  
 6   quantidade_quartos    3818 non-null   int64  
 7   quantidade_camas      3818 non-null   int64  
 8   modelo_cama           3818 non-null   object 
 9   comodidades           3818 non-null   object 
 10  taxa_deposito         3818 non-null   object 
 11  taxa_limpeza          3818 non-null   object 
 12  preco                 3818 non-null   object 
dtypes: float64(1), int64(4), object(8)
memory usage: 387.9+ KB


In [16]:
dados['preco'] = dados['preco'].apply(lambda x: x.replace('$','').replace(',','').strip())
dados['preco'] = dados['preco'].astype(np.float64)
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   avaliacao_geral       3162 non-null   float64
 1   experiencia_local     3818 non-null   object 
 2   max_hospedes          3818 non-null   int64  
 3   descricao_local       3818 non-null   object 
 4   descricao_vizinhanca  3818 non-null   object 
 5   quantidade_banheiros  3818 non-null   int64  
 6   quantidade_quartos    3818 non-null   int64  
 7   quantidade_camas      3818 non-null   int64  
 8   modelo_cama           3818 non-null   object 
 9   comodidades           3818 non-null   object 
 10  taxa_deposito         3818 non-null   object 
 11  taxa_limpeza          3818 non-null   object 
 12  preco                 3818 non-null   float64
dtypes: float64(2), int64(4), object(7)
memory usage: 387.9+ KB


In [20]:
dados[['taxa_deposito','taxa_limpeza']] = dados[['taxa_deposito','taxa_limpeza']].map(lambda x: x.replace('$','').replace(',','').strip())

In [21]:
dados[['taxa_deposito','taxa_limpeza']] = dados[['taxa_deposito','taxa_limpeza']].astype(np.float64)
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3818 entries, 0 to 3817
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   avaliacao_geral       3162 non-null   float64
 1   experiencia_local     3818 non-null   object 
 2   max_hospedes          3818 non-null   int64  
 3   descricao_local       3818 non-null   object 
 4   descricao_vizinhanca  3818 non-null   object 
 5   quantidade_banheiros  3818 non-null   int64  
 6   quantidade_quartos    3818 non-null   int64  
 7   quantidade_camas      3818 non-null   int64  
 8   modelo_cama           3818 non-null   object 
 9   comodidades           3818 non-null   object 
 10  taxa_deposito         3818 non-null   float64
 11  taxa_limpeza          3818 non-null   float64
 12  preco                 3818 non-null   float64
dtypes: float64(4), int64(4), object(5)
memory usage: 387.9+ KB


## Dados Textuais
Tokenização dos textos

In [24]:
# Colocando em letras minúsculas
dados['descricao_local'] = dados['descricao_local'].str.lower()
dados['descricao_local']

Unnamed: 0,descricao_local
0,this clean and comfortable one bedroom sits ri...
1,our century old upper queen anne house is loca...
2,cozy room in two-bedroom apartment along the l...
3,very lovely and cozy room for one. convenientl...
4,the “studio at mibbett hollow' is in a beautif...
...,...
3813,beautiful craftsman home in the historic wedgw...
3814,located in a very easily accessible area of se...
3815,this home is fully furnished and available wee...
3816,this business-themed modern home features: *h...


In [25]:
dados['descricao_local'][3169]

"built, run and supported by seattle tech and start up veterans, grokhome's focus is to create a supportive environment for smart people working on interesting projects, start ups and more. this listing is an upper bunk, in a 2-person shared room. *note: this fall, there will be major renovations happening on one kitchen and bathroom at a time. there will always be two other working kitchens and two working bathrooms in the house. we'll work to minimize the impact these renovations have on your stay. **this listing is only available to those working in the tech/science space. live in a hacker house, and immerse yourself in the seattle tech scene. you can expect to be surrounded by smart people solving big problems or working on something fun. we have frequent demo nights, and love when our guests share something they are passionate about. if you're new to the city, our deep ties to the seattle tech scene can help you get involved. expand your network, develop your ideas, and learn some

In [28]:
# Remove tudo que não for letra de a-z, A-Z, números de 0-9, hífen(-) e apóstrofo (')
dados['descricao_local'] = dados['descricao_local'].str.replace('[^a-zA-Z0-9\-\']',' ',regex=True)
dados['descricao_local'][3169]

"built  run and supported by seattle tech and start up veterans  grokhome's focus is to create a supportive environment for smart people working on interesting projects  start ups and more  this listing is an upper bunk  in a 2-person shared room   note  this fall  there will be major renovations happening on one kitchen and bathroom at a time  there will always be two other working kitchens and two working bathrooms in the house  we'll work to minimize the impact these renovations have on your stay    this listing is only available to those working in the tech science space  live in a hacker house  and immerse yourself in the seattle tech scene  you can expect to be surrounded by smart people solving big problems or working on something fun  we have frequent demo nights  and love when our guests share something they are passionate about  if you're new to the city  our deep ties to the seattle tech scene can help you get involved  expand your network  develop your ideas  and learn some

In [31]:
# Removendo hífens soltos
dados['descricao_local'] = dados['descricao_local'].str.replace('(?<!\w)-(?!\w)',' ',regex=True)
dados['descricao_local']

Unnamed: 0,descricao_local
0,this clean and comfortable one bedroom sits ri...
1,our century old upper queen anne house is loca...
2,cozy room in two-bedroom apartment along the l...
3,very lovely and cozy room for one convenientl...
4,the studio at mibbett hollow' is in a beautif...
...,...
3813,beautiful craftsman home in the historic wedgw...
3814,located in a very easily accessible area of se...
3815,this home is fully furnished and available wee...
3816,this business-themed modern home features h...


In [33]:
dados['descricao_local'] = dados['descricao_local'].str.split()
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,"[this, clean, and, comfortable, one, bedroom, ...",Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,0.0,110.0
1,10.0,--,1,"[our, century, old, upper, queen, anne, house,...","Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,45.0
2,10.0,--,1,"[cozy, room, in, two-bedroom, apartment, along...",The convenience of being in Seattle but on the...,1,1,1,Futon,"{TV,Internet,""Wireless Internet"",Kitchen,""Free...",0.0,0.0,55.0
3,10.0,--,1,"[very, lovely, and, cozy, room, for, one, conv...","Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"{Internet,""Wireless Internet"",Kitchen,""Free Pa...",0.0,20.0,52.0
4,10.0,--,1,"[the, studio, at, mibbett, hollow', is, in, a,...",--,1,1,1,Real Bed,"{""Wireless Internet"",Kitchen,""Free Parking on ...",0.0,15.0,85.0


In [35]:
# Procura { e } e " para ser substituído por um espaço
dados['comodidades'] = dados['comodidades'].str.replace('\{|}|\"','',regex=True)

In [36]:
dados['comodidades'] = dados['comodidades'].str.split(',')
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,"[this, clean, and, comfortable, one, bedroom, ...",Lower Queen Anne is near the Seattle Center (s...,1,1,1,Real Bed,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,0.0,110.0
1,10.0,--,1,"[our, century, old, upper, queen, anne, house,...","Upper Queen Anne is a really pleasant, unique ...",1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,45.0
2,10.0,--,1,"[cozy, room, in, two-bedroom, apartment, along...",The convenience of being in Seattle but on the...,1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,55.0
3,10.0,--,1,"[very, lovely, and, cozy, room, for, one, conv...","Ballard is lovely, vibrant and one of the most...",1,1,1,Pull-out Sofa,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,20.0,52.0
4,10.0,--,1,"[the, studio, at, mibbett, hollow', is, in, a,...",--,1,1,1,Real Bed,"[Wireless Internet, Kitchen, Free Parking on P...",0.0,15.0,85.0


In [41]:
dados['descricao_vizinhanca'] = dados['descricao_vizinhanca'].str.replace('[^a-zA-Z0-9\-\']',' ',regex=True).str.lower()
dados['descricao_vizinhanca'].head()


Unnamed: 0,descricao_vizinhanca
0,lower queen anne is near the seattle center s...
1,upper queen anne is a really pleasant unique ...
2,the convenience of being in seattle but on the...
3,ballard is lovely vibrant and one of the most...
4,--


In [44]:
dados['descricao_vizinhanca'] = dados['descricao_vizinhanca'].str.replace('(?>!\w)-(?!\w)',' ',regex=True).str.split()
dados.head()

Unnamed: 0,avaliacao_geral,experiencia_local,max_hospedes,descricao_local,descricao_vizinhanca,quantidade_banheiros,quantidade_quartos,quantidade_camas,modelo_cama,comodidades,taxa_deposito,taxa_limpeza,preco
0,10.0,--,1,"[this, clean, and, comfortable, one, bedroom, ...","[lower, queen, anne, is, near, the, seattle, c...",1,1,1,Real Bed,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,0.0,110.0
1,10.0,--,1,"[our, century, old, upper, queen, anne, house,...","[upper, queen, anne, is, a, really, pleasant, ...",1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,45.0
2,10.0,--,1,"[cozy, room, in, two-bedroom, apartment, along...","[the, convenience, of, being, in, seattle, but...",1,1,1,Futon,"[TV, Internet, Wireless Internet, Kitchen, Fre...",0.0,0.0,55.0
3,10.0,--,1,"[very, lovely, and, cozy, room, for, one, conv...","[ballard, is, lovely, vibrant, and, one, of, t...",1,1,1,Pull-out Sofa,"[Internet, Wireless Internet, Kitchen, Free Pa...",0.0,20.0,52.0
4,10.0,--,1,"[the, studio, at, mibbett, hollow', is, in, a,...",[--],1,1,1,Real Bed,"[Wireless Internet, Kitchen, Free Parking on P...",0.0,15.0,85.0


### Referências de REGEX
[Construção de REGEX](https://regex101.com/)

[Principais Casos de uso de REGEX para tratamento de dados](https://www.alura.com.br/artigos/principais-casos-uso-regex-para-tratamento-dados)

## Dados de tempo

### Biblioteca datetime

Conseguimos trabalhar diretamente com o datetime através da biblioteca [datetime](https://docs.python.org/3/library/datetime.html), uma biblioteca padrão do Python que fornece classes para trabalhar com datas e horas. Com essa biblioteca, é possível criar objetos de data e hora, realizar cálculos de tempo, formatar datas e horas em diferentes formatos e muito mais.


In [46]:
import datetime

# criando um objeto datetime com a data e hora atual
agora = datetime.datetime.now()

print("Data e hora atual:", agora)

Data e hora atual: 2025-07-04 21:21:33.078135


In [47]:
# criando um objeto date com a data de hoje
hoje = datetime.date.today()

print("Data de hoje:", hoje)

Data de hoje: 2025-07-04


In [48]:
# criando dois objetos date com datas diferentes
data_1 = datetime.date(2022, 1, 1)
data_2 = datetime.date(2023, 1, 1)

# calculando a diferença entre as duas datas
diferenca = data_2 - data_1

print("Diferença entre as duas datas:", diferenca)

Diferença entre as duas datas: 365 days, 0:00:00


In [51]:
dt_data = pd.read_json('/content/moveis_disponiveis.json')
dt_data.head()

Unnamed: 0,id,data,vaga_disponivel,preco
0,857,2016-01-04,False,
1,857,2016-01-05,False,
2,857,2016-01-06,False,
3,857,2016-01-07,False,
4,857,2016-01-08,False,


In [52]:
dt_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   id               365000 non-null  int64 
 1   data             365000 non-null  object
 2   vaga_disponivel  365000 non-null  bool  
 3   preco            270547 non-null  object
dtypes: bool(1), int64(1), object(2)
memory usage: 11.5+ MB


In [54]:
dt_data['data'] = pd.to_datetime(dt_data['data'])
dt_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   id               365000 non-null  int64         
 1   data             365000 non-null  datetime64[ns]
 2   vaga_disponivel  365000 non-null  bool          
 3   preco            270547 non-null  object        
dtypes: bool(1), datetime64[ns](1), int64(1), object(1)
memory usage: 11.5+ MB


In [55]:
dt_data.head()

Unnamed: 0,id,data,vaga_disponivel,preco
0,857,2016-01-04,False,
1,857,2016-01-05,False,
2,857,2016-01-06,False,
3,857,2016-01-07,False,
4,857,2016-01-08,False,


In [56]:
dt_data['data'].dt.strftime('%Y-%m')

Unnamed: 0,data
0,2016-01
1,2016-01
2,2016-01
3,2016-01
4,2016-01
...,...
364995,2016-12
364996,2016-12
364997,2016-12
364998,2017-01


In [58]:
subset = dt_data.groupby(dt_data['data'].dt.strftime('%Y-%m'))['vaga_disponivel'].sum()
subset

Unnamed: 0_level_0,vaga_disponivel
data,Unnamed: 1_level_1
2016-01,16543
2016-02,20128
2016-03,23357
2016-04,22597
2016-05,23842
2016-06,23651
2016-07,22329
2016-08,22529
2016-09,22471
2016-10,23765


In [73]:
dt_data['preco'] = dt_data['preco'].fillna('0.0')
dt_data['preco'] = dt_data['preco'].astype(str).str.replace(r'[$,]', '', regex=True).astype(float)
dt_data

Unnamed: 0,id,data,vaga_disponivel,preco
0,857,2016-01-04,False,0.0
1,857,2016-01-05,False,0.0
2,857,2016-01-06,False,0.0
3,857,2016-01-07,False,0.0
4,857,2016-01-08,False,0.0
...,...,...,...,...
364995,3279,2016-12-29,True,140.0
364996,3279,2016-12-30,True,140.0
364997,3279,2016-12-31,True,140.0
364998,3279,2017-01-01,True,140.0


In [72]:
dt_data.info()

<class 'pandas.core.frame.DataFrame'>
Index: 365000 entries, 0 to 364999
Data columns (total 4 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   id               365000 non-null  int64         
 1   data             365000 non-null  datetime64[ns]
 2   vaga_disponivel  365000 non-null  bool          
 3   preco            365000 non-null  float64       
dtypes: bool(1), datetime64[ns](1), float64(1), int64(1)
memory usage: 11.5 MB
