## Funcionalidades do Python, Numpy e Pandas para manipular tipos de dados temporal
* Objeto  para trabalhar com data e hora no python e Pandas (datetime)
* Funções diversas para trabalhar com data e hora (dateutil)
    * No exemplo abaixo parse de texto para objeto datetime

In [48]:
import numpy as np
import pandas as pd

## Objeto datetime do python

In [52]:
from datetime import datetime
t=datetime(year=2015, month=7, day=4, hour=11, minute=12, second=13)
print(t)
print(type(t))

2015-07-04 11:12:13
<class 'datetime.datetime'>


## Operacoes com datas
* Parser de data
* visualização parcial de data
* função para formatar data para visualização strftime

In [59]:
from dateutil import parser
date = parser.parse("5th of July, 2018")
print(date)
print(date.strftime('%A'))

dt=date.day
print(dt)

2018-07-05 00:00:00
Thursday
5


## Suporte do numpy para operação com datas
* np.datetime64 - objeto que manipula datas na biblioteca numpy

In [62]:
datenp = np.array('2015-07-08', dtype=np.datetime64)
print(datenp)
date = parser.parse("4th of July, 2015")
datenp = np.array(date, dtype=np.datetime64)
print(datenp)
print(date)
print(type(date))
print(type(datenp))

2015-07-08
2015-07-04T00:00:00.000000
2015-07-04 00:00:00
<class 'datetime.datetime'>
<class 'numpy.ndarray'>


## Criando novas datas usando função range

In [65]:
datenp = np.array('2015-07', dtype=np.datetime64)
datenp + np.arange(12)

array(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
       '2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06'],
      dtype='datetime64[M]')

## Representacao de datas por nível de detalhe

In [73]:
dt=np.datetime64('2015-07-04 12:59:59.50', 'ns')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 'as')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 's')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 'h')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 'D')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 'M')
print(dt)
dt=np.datetime64('2015-07-04 12:59:59.50', 'Y')
print(dt)
dt=np.datetime64('2015-07-31 12:59:59.50', 'W')
print(dt)

2015-07-04T12:59:59.500000000
1970-01-01T00:00:04.924576063353126912
2015-07-04T12:59:59
2015-07-04T12
2015-07-04
2015-07
2015
2015-07-30


## Tabela de códigos de especificação de datas

Cód | Nível
----|-------
Y	| Year
M	| Month
W	| Week
D	| Day
h	| Hour
m	| Minute
s	| Second
ms	| Millisecond
us	| Microsecond
ns	| Nanosecond
ps	| Picosecond
fs	| Femtosecond
as	| Attosecond

## Pandas manipulação de dados temporais

* O Pandas implementa as funcionalidades do objetos mostrados
* Função to_datetime converte uma coluna para tipo timestamp

In [74]:
date = pd.to_datetime("4th of July, 2015")
print(date)
print(type(date))
print(date.strftime('%A'))

2015-07-04 00:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Saturday


## Gerando dados temporais
* converte para o tipo timestamp de data do pandas to_timedelta

In [76]:
date + pd.to_timedelta(np.arange(18), 'M')

  return func(*args, **kwargs)


DatetimeIndex(['2015-07-04 00:00:00', '2015-08-03 10:29:06',
               '2015-09-02 20:58:12', '2015-10-03 07:27:18',
               '2015-11-02 17:56:24', '2015-12-03 04:25:30',
               '2016-01-02 14:54:36', '2016-02-02 01:23:42',
               '2016-03-03 11:52:48', '2016-04-02 22:21:54',
               '2016-05-03 08:51:00', '2016-06-02 19:20:06',
               '2016-07-03 05:49:12', '2016-08-02 16:18:18',
               '2016-09-02 02:47:24', '2016-10-02 13:16:30',
               '2016-11-01 23:45:36', '2016-12-02 10:14:42'],
              dtype='datetime64[ns]', freq=None)

## Pandas permite que um objeto series (coluna do dataframe) seja indexado por data

In [78]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([0, 1, 2, 3], index=index)
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

## Slicing com datas

In [80]:
data['2014-07-04':'2015-07-03']


2014-07-04    0
2014-08-04    1
dtype: int64

In [19]:
data

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

## Filtrando apenas por ano com slicing

In [81]:
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

Estruturas de dados de séries temporais do Pandas

* Timestamps

* Period

* Timedelta


In [89]:
dates = pd.to_datetime([datetime(2015, 7, 3), '4th of July, 2015',
                       '2015-Jul-6', '07-07-2015', '20150708', '20150709', '20150810', '20150811'])
dates

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-06', '2015-07-07',
               '2015-07-08', '2015-07-09', '2015-08-10', '2015-08-11'],
              dtype='datetime64[ns]', freq=None)

In [93]:
dates.to_period('H')

PeriodIndex(['2015-07-03 00:00', '2015-07-04 00:00', '2015-07-06 00:00',
             '2015-07-07 00:00', '2015-07-08 00:00', '2015-07-09 00:00',
             '2015-08-10 00:00', '2015-08-11 00:00'],
            dtype='period[H]', freq='H')

## calcula diferenca entre datas

In [94]:
dates - dates[0]

TimedeltaIndex([ '0 days',  '1 days',  '3 days',  '4 days',  '5 days',
                 '6 days', '38 days', '39 days'],
               dtype='timedelta64[ns]', freq=None)

## Sequências regulares: pd.date_range ()

* funções para criação de sequências regulares de datas

* pd.date_range() recebe uma data inicial, final e uma quantidade opcional de valores a serem gerados on intervalo
* pd.period_range()  recebe uma data inicial, final e uma frequência: Mensal, semanal , etc
* pd.timedelta_range() 

* Exemplo com dias

In [95]:
pd.date_range('2015-07', periods=8)

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-04',
               '2015-07-05', '2015-07-06', '2015-07-07', '2015-07-08'],
              dtype='datetime64[ns]', freq='D')

* Exemplo com horas

In [97]:
pd.date_range('2015-07-03', periods=8, freq='M')

DatetimeIndex(['2015-07-31', '2015-08-31', '2015-09-30', '2015-10-31',
               '2015-11-30', '2015-12-31', '2016-01-31', '2016-02-29'],
              dtype='datetime64[ns]', freq='M')

In [98]:
#gera 3 semanas
dt=pd.Series(pd.period_range('1/1/2011', freq='W', periods=3))
print(dt)

#gera 3 dias
dt=pd.Series(pd.period_range('1/1/2011', freq='D', periods=3))
print(dt)

#gera 3 horas
dt=pd.Series(pd.period_range('1/1/2011', freq='H', periods=3))
print(dt)

0    2010-12-27/2011-01-02
1    2011-01-03/2011-01-09
2    2011-01-10/2011-01-16
dtype: period[W-SUN]
0    2011-01-01
1    2011-01-02
2    2011-01-03
dtype: period[D]
0    2011-01-01 00:00
1    2011-01-01 01:00
2    2011-01-01 02:00
dtype: period[H]


# Manipulação de datasets de dados temporais

## Exemplo: Visualizando Contagens de Bicicleta em Seattle

* Fremont Bridge Seattle: sensor indutivo que contabiliza o número de bicicletas por hora
* O dataset e mais detalhes a respeito pode ser encontrado em: http://data.seattle.gov/
* Link direto para o dataset: https://data.seattle.gov/Transportation/Fremont-Bridge-Bicycle-Counter/65db-xm6k/data



In [99]:
data = pd.read_csv('Fremont_Bridge_Bicycle_Counter.csv', index_col='Date', parse_dates=True)
data.head()

Unnamed: 0_level_0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2012-10-03 00:00:00,13.0,4.0,9.0
2012-10-03 01:00:00,10.0,4.0,6.0
2012-10-03 02:00:00,2.0,1.0,1.0
2012-10-03 03:00:00,5.0,2.0,3.0
2012-10-03 04:00:00,7.0,6.0,1.0


In [100]:
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 135614 entries, 2012-10-03 00:00:00 to 2020-08-31 23:00:00
Data columns (total 3 columns):
Fremont Bridge Total            135593 non-null float64
Fremont Bridge East Sidewalk    135593 non-null float64
Fremont Bridge West Sidewalk    135593 non-null float64
dtypes: float64(3)
memory usage: 4.1 MB


In [101]:
data.describe()

Unnamed: 0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
count,135593.0,135593.0,135593.0
mean,113.146158,51.605415,61.540743
std,143.810041,66.313631,89.549971
min,0.0,0.0,0.0
25%,14.0,6.0,7.0
50%,61.0,28.0,30.0
75%,148.0,69.0,75.0
max,1097.0,698.0,850.0


# Mudando frequência dos dados
* Originalmente os dados estão em média diária, mas podem ser transformados para média semanam, mensal, etc
* método resample

In [105]:
from dateutil import parser
date = parser.parse("2012-10-03")
print(date)
print(date.strftime('%A'))


2012-10-03 00:00:00
Wednesday


In [102]:
weekly = data.resample('W').sum()
weekly.head()

Unnamed: 0_level_0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2012-10-07,28584.0,14594.0,13990.0
2012-10-14,33590.0,17358.0,16232.0
2012-10-21,31018.0,15892.0,15126.0
2012-10-28,26874.0,13802.0,13072.0
2012-11-04,24388.0,12816.0,11572.0


## Agregando dados por intervalo de 30 dias

In [106]:
daily = data.resample('D').sum()
daily.head()

Unnamed: 0_level_0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2012-10-03,7042.0,3520.0,3522.0
2012-10-04,6950.0,3416.0,3534.0
2012-10-05,6296.0,3116.0,3180.0
2012-10-06,4012.0,2160.0,1852.0
2012-10-07,4284.0,2382.0,1902.0


In [112]:
daily = data.resample('Y').max()
daily

Unnamed: 0_level_0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2012-12-31,604.0,317.0,400.0
2013-12-31,831.0,626.0,593.0
2014-12-31,946.0,698.0,667.0
2015-12-31,808.0,584.0,571.0
2016-12-31,957.0,560.0,717.0
2017-12-31,913.0,517.0,696.0
2018-12-31,924.0,521.0,709.0
2019-12-31,1097.0,541.0,850.0
2020-12-31,730.0,381.0,572.0


## Agrupando dados
* Obtendo uma visão de tráfego por hora do dia


In [113]:
import numpy as np

by_time = data.groupby(data.index.time).mean()

by_time

Unnamed: 0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
00:00:00,11.304902,4.791541,6.51336
01:00:00,5.984074,2.680764,3.303309
02:00:00,3.938453,1.890032,2.048421
03:00:00,2.903557,1.493895,1.409662
04:00:00,6.487878,3.440807,3.047071
05:00:00,25.58963,15.619182,9.970448
06:00:00,89.945496,54.047779,35.897717
07:00:00,227.002831,140.801982,86.200849
08:00:00,322.147231,189.640241,132.50699
09:00:00,188.577093,102.221101,86.355992


## Agrupando por dia da semana

In [114]:
by_weekday = data.groupby(data.index.dayofweek).mean()
by_weekday

Unnamed: 0_level_0,Fremont Bridge Total,Fremont Bridge East Sidewalk,Fremont Bridge West Sidewalk
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,132.039395,58.638527,73.400867
1,140.619442,62.726267,77.893175
2,139.598217,62.306091,77.292126
3,132.737108,59.636035,73.101073
4,120.132101,53.894702,66.237399
5,64.290376,32.24272,32.047656
6,62.537484,31.765849,30.771635
