### Neste notebook, irei mostrar algumas aplicações de uso de datas na prática usando o pandas como:
1. Usar o tipo data na variável
2. Agrupar a data por alguma unidade de tempo como dia, mês e ano

In [3]:
import pandas as pd
rides=pd.read_csv("../dados/capital-onebike.csv")
rides.head()

Unnamed: 0,Start date,End date,Start station number,Start station,End station number,End station,Bike number,Member type
0,2017-10-01 15:23:25,2017-10-01 15:26:26,31038,Glebe Rd & 11th St N,31036,George Mason Dr & Wilson Blvd,W20529,Member
1,2017-10-01 15:42:57,2017-10-01 17:49:59,31036,George Mason Dr & Wilson Blvd,31036,George Mason Dr & Wilson Blvd,W20529,Casual
2,2017-10-02 06:37:10,2017-10-02 06:42:53,31036,George Mason Dr & Wilson Blvd,31037,Ballston Metro / N Stuart & 9th St N,W20529,Member
3,2017-10-02 08:56:45,2017-10-02 09:18:03,31037,Ballston Metro / N Stuart & 9th St N,31295,Potomac & M St NW,W20529,Member
4,2017-10-02 18:23:48,2017-10-02 18:45:05,31295,Potomac & M St NW,31230,Metro Center / 12th & G St NW,W20529,Member


In [4]:
rides.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 290 entries, 0 to 289
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Start date            290 non-null    object
 1   End date              290 non-null    object
 2   Start station number  290 non-null    int64 
 3   Start station         290 non-null    object
 4   End station number    290 non-null    int64 
 5   End station           290 non-null    object
 6   Bike number           290 non-null    object
 7   Member type           290 non-null    object
dtypes: int64(2), object(6)
memory usage: 18.2+ KB


### Perceba que as datas estão com o tipo object e não data. Irei transformar essas variáveis para o tipo data usando a função to_datetime()

In [6]:
# Formato das datas é: Ano-Mês-Dia Hora:Minuto:Segundo
rides[["Start date", "End date"]].iloc[0]

Start date    2017-10-01 15:23:25
End date      2017-10-01 15:26:26
Name: 0, dtype: object

In [8]:
rides["Start date"]=pd.to_datetime(rides["Start date"], format="%Y-%m-%d %H:%M:%S")
rides["End date"]=pd.to_datetime(rides["End date"], format="%Y-%m-%d %H:%M:%S")

In [10]:
# Agora temos os tipos das variáveis definidas corretamente 
rides.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 290 entries, 0 to 289
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Start date            290 non-null    datetime64[ns]
 1   End date              290 non-null    datetime64[ns]
 2   Start station number  290 non-null    int64         
 3   Start station         290 non-null    object        
 4   End station number    290 non-null    int64         
 5   End station           290 non-null    object        
 6   Bike number           290 non-null    object        
 7   Member type           290 non-null    object        
dtypes: datetime64[ns](2), int64(2), object(4)
memory usage: 18.2+ KB


In [11]:
rides['Duration'] = rides['End date'] - rides['Start date']
print(rides['Duration'].head(5))

0   0 days 00:03:01
1   0 days 02:07:02
2   0 days 00:05:43
3   0 days 00:21:18
4   0 days 00:21:17
Name: Duration, dtype: timedelta64[ns]


### Medidas descritivas de datas são diferenciadas

In [12]:
print(rides['Duration'].mean())
print(rides['Duration'].sum())

0 days 00:19:38.931034482
3 days 22:58:10


In [14]:
from datetime import timedelta

In [15]:
rides['Duration'].sum() / timedelta(days=91)

0.04348417785917786

In [16]:
rides['Duration seconds'] = rides['Duration'].dt.total_seconds()

### Podemos agrupar as datas e calcular alguma medida específica

In [17]:
rides.resample('M', on = 'Start date')['Duration seconds'].mean()

Start date
2017-10-31    1886.453704
2017-11-30     854.174757
2017-12-31     635.101266
Freq: M, Name: Duration seconds, dtype: float64

#### Também podemos calcular algumas medidas para cada data como mês, ano, dia da semana, nome do dia da semana...

In [18]:
rides['Start date'].head(5).dt.year

0    2017
1    2017
2    2017
3    2017
4    2017
Name: Start date, dtype: int64

In [20]:
rides['Start date'].head(5).dt.day_name()

0    Sunday
1    Sunday
2    Monday
3    Monday
4    Monday
Name: Start date, dtype: object