<a href="https://colab.research.google.com/github/anicelysantos/book-python-para-analise-de-dados/blob/main/series_temporais.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*Estudos do livro "Python para análise de dados" cap. 11*

# **Imports**

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
from pandas.tseries.offsets import Hour, Minute
from pandas.tseries.offsets import Day, MonthEnd
import pytz
from pandas.tseries.offsets import Hour

# **Tipos de dados e ferramentas para data e hora**

In [2]:
now = datetime.now()
now

datetime.datetime(2021, 9, 24, 16, 46, 40, 88213)

In [3]:
#now.year, now.month, now.day
now.day, now.month, now.year

(24, 9, 2021)

In [4]:
#import lá em cima timedelta
#timedelta mostra a diferença de tempo entre dois objetos datetime

delta = datetime(2011, 1, 7) - datetime(2008, 6, 24, 8, 15)
delta

datetime.timedelta(days=926, seconds=56700)

In [5]:
delta.days

926

In [6]:
delta.seconds

56700

In [7]:
start = datetime(2011,1,7)
start + timedelta(12)

datetime.datetime(2011, 1, 19, 0, 0)

In [8]:
start - 2 * timedelta(12)


datetime.datetime(2010, 12, 14, 0, 0)

**Conversão entre string e datetime**

In [9]:
stamp = datetime(2011, 1, 3)
str(stamp)

'2011-01-03 00:00:00'

In [10]:
#stamp.strftime('%Y-%m-%d')
stamp.strftime('%F') #mesma coisa de cima

'2011-01-03'

In [11]:
value = '2011-01-03'

In [12]:
datetime.strptime(value, '%Y-%m-%d')

datetime.datetime(2011, 1, 3, 0, 0)

In [13]:
datestrs = ['9/19/2021','9/20/2020']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]

[datetime.datetime(2021, 9, 19, 0, 0), datetime.datetime(2020, 9, 20, 0, 0)]

In [14]:
#Import do parse lá no inicio
parse('21-09-2021')

datetime.datetime(2021, 9, 21, 0, 0)

In [15]:
parse('Sep 21, 2021 09:02 AM')

datetime.datetime(2021, 9, 21, 9, 2)

In [16]:
#Dia antes do mês, padrão Brasil
parse('21/09/2021', dayfirst=True)

datetime.datetime(2021, 9, 21, 0, 0)

In [17]:
datastrs = ['2021-09-21 09:04:00', '2021-09-21 09:05:00']
pd.to_datetime(datastrs)

DatetimeIndex(['2021-09-21 09:04:00', '2021-09-21 09:05:00'], dtype='datetime64[ns]', freq=None)

In [18]:
#Lidar com dados ausentes (None)
idx = pd.to_datetime(datestrs + [None])
idx

DatetimeIndex(['2021-09-19', '2020-09-20', 'NaT'], dtype='datetime64[ns]', freq=None)

In [19]:
#NaT é o valor nulo do pandas para tempo/data (dados de timestamp)
idx[2]

NaT

In [20]:
pd.isnull(idx)

array([False, False,  True])

# **Básico sobre séries temporais**

In [21]:
#import datetime no inicio
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5), datetime(2011, 1, 7), datetime(2011, 1, 8), datetime(2011,1,10), datetime(2011, 1, 12)]

In [22]:
ts = pd.Series(np.random.randn(6), index=dates)
ts

2011-01-02   -0.577903
2011-01-05   -1.402747
2011-01-07   -1.131579
2011-01-08   -1.275769
2011-01-10    1.263197
2011-01-12   -0.104327
dtype: float64

In [23]:
ts.index

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [24]:
#[::2]seleciona um elemento a cada dois
ts + ts[::2]

2011-01-02   -1.155807
2011-01-05         NaN
2011-01-07   -2.263158
2011-01-08         NaN
2011-01-10    2.526394
2011-01-12         NaN
dtype: float64

In [25]:
ts.index.dtype

dtype('<M8[ns]')

In [26]:
stamp = ts.index[0]
stamp

Timestamp('2011-01-02 00:00:00')

**Indexação, seleção e geração de subconjuntos**

In [27]:
stamp = ts.index[2]
ts[stamp]

-1.1315790361104112

In [28]:
#Passar uma string para ser interpretada como data
ts['1/10/2011']

1.2631970095270282

In [29]:
ts['20110110']

1.2631970095270282

In [30]:
#series temporais mais longas podem selecionar o ano ou só o ano e o mês para trazer trechos do dataframe
longer_ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
longer_ts

2000-01-01    1.333187
2000-01-02   -0.616121
2000-01-03    2.692836
2000-01-04    1.608723
2000-01-05    1.838233
                ...   
2002-09-22   -0.311549
2002-09-23    1.059237
2002-09-24   -1.382216
2002-09-25    0.708477
2002-09-26   -0.385970
Freq: D, Length: 1000, dtype: float64

In [31]:
longer_ts['2001']

2001-01-01    1.720393
2001-01-02   -0.852683
2001-01-03    0.548805
2001-01-04    0.242733
2001-01-05    1.016667
                ...   
2001-12-27   -1.117608
2001-12-28   -0.848222
2001-12-29    1.238030
2001-12-30    0.045534
2001-12-31   -0.127413
Freq: D, Length: 365, dtype: float64

In [32]:
longer_ts['2001-05']

2001-05-01   -0.643270
2001-05-02    0.484707
2001-05-03   -0.327748
2001-05-04   -1.622745
2001-05-05   -0.444126
2001-05-06    0.469529
2001-05-07   -1.292709
2001-05-08   -0.358277
2001-05-09   -1.164088
2001-05-10    0.650132
2001-05-11   -0.521766
2001-05-12   -0.289935
2001-05-13   -0.768603
2001-05-14   -0.262173
2001-05-15    1.048643
2001-05-16   -0.908652
2001-05-17   -1.161935
2001-05-18    0.767444
2001-05-19   -0.616102
2001-05-20   -1.860177
2001-05-21    0.642098
2001-05-22    1.697910
2001-05-23    0.997767
2001-05-24   -0.097690
2001-05-25   -0.732954
2001-05-26    0.498671
2001-05-27    0.079296
2001-05-28   -0.648569
2001-05-29   -0.354568
2001-05-30   -0.042652
2001-05-31    0.869994
Freq: D, dtype: float64

In [33]:
ts[datetime(2011, 1, 7):]

2011-01-07   -1.131579
2011-01-08   -1.275769
2011-01-10    1.263197
2011-01-12   -0.104327
dtype: float64

In [34]:
ts

2011-01-02   -0.577903
2011-01-05   -1.402747
2011-01-07   -1.131579
2011-01-08   -1.275769
2011-01-10    1.263197
2011-01-12   -0.104327
dtype: float64

In [35]:
ts['1/6/2011':'1/11/2011']

2011-01-07   -1.131579
2011-01-08   -1.275769
2011-01-10    1.263197
dtype: float64

In [36]:
#Fatiar um periodo entre duas datas
ts.truncate(after='1/9/2011')

2011-01-02   -0.577903
2011-01-05   -1.402747
2011-01-07   -1.131579
2011-01-08   -1.275769
dtype: float64

In [37]:
dates = pd.date_range('1/1/2000', periods=100, freq='W-WED')
long_df = pd.DataFrame(np.random.randn(100,4), index=dates, columns=['Colorado', 'Texas', 'New York', 'Ohio'])

In [38]:
long_df.loc['5-2001']

Unnamed: 0,Colorado,Texas,New York,Ohio
2001-05-02,0.407468,1.613245,0.803362,0.523541
2001-05-09,0.971775,1.311985,-0.23308,-0.535736
2001-05-16,1.632667,-0.166932,-0.813172,-0.329271
2001-05-23,-0.368004,0.083205,1.445448,-1.487704
2001-05-30,1.532468,-1.164027,0.379398,0.94782


**Séries temporais com índices duplicados**

In [39]:
dates = pd.DatetimeIndex(['1/1/2000', '1/2/2000', '1/2/2000', '1/2/2000', '1/3/2000'])
dup_ts = pd.Series(np.arange(5), index=dates)

In [40]:
dup_ts

2000-01-01    0
2000-01-02    1
2000-01-02    2
2000-01-02    3
2000-01-03    4
dtype: int64

In [41]:
dup_ts.index.is_unique

False

In [42]:
dup_ts['1/3/2000']

4

In [43]:
dup_ts['1/2/2000']

2000-01-02    1
2000-01-02    2
2000-01-02    3
dtype: int64

In [44]:
#agregar os dados únicos
grouped = dup_ts.groupby(level=0)
grouped.mean()

2000-01-01    0
2000-01-02    2
2000-01-03    4
dtype: int64

In [45]:
grouped.count()

2000-01-01    1
2000-01-02    3
2000-01-03    1
dtype: int64

# **Intervalos de datas, frequências e deslocamentos**

In [46]:
ts

2011-01-02   -0.577903
2011-01-05   -1.402747
2011-01-07   -1.131579
2011-01-08   -1.275769
2011-01-10    1.263197
2011-01-12   -0.104327
dtype: float64

In [47]:
#String 'D' é o equivalente a frequência diária
resampler = ts.resample('D')


**Gerando intervalos de datas**

In [48]:
#pandas.date_range gera index com tamanho especifico
index = pd.date_range('2012-04-01', '2012-06-01')
index

DatetimeIndex(['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04',
               '2012-04-05', '2012-04-06', '2012-04-07', '2012-04-08',
               '2012-04-09', '2012-04-10', '2012-04-11', '2012-04-12',
               '2012-04-13', '2012-04-14', '2012-04-15', '2012-04-16',
               '2012-04-17', '2012-04-18', '2012-04-19', '2012-04-20',
               '2012-04-21', '2012-04-22', '2012-04-23', '2012-04-24',
               '2012-04-25', '2012-04-26', '2012-04-27', '2012-04-28',
               '2012-04-29', '2012-04-30', '2012-05-01', '2012-05-02',
               '2012-05-03', '2012-05-04', '2012-05-05', '2012-05-06',
               '2012-05-07', '2012-05-08', '2012-05-09', '2012-05-10',
               '2012-05-11', '2012-05-12', '2012-05-13', '2012-05-14',
               '2012-05-15', '2012-05-16', '2012-05-17', '2012-05-18',
               '2012-05-19', '2012-05-20', '2012-05-21', '2012-05-22',
               '2012-05-23', '2012-05-24', '2012-05-25', '2012-05-26',
      

In [49]:
#Se passar só a data do inicio ou do fim, precisa passar o periodo
pd.date_range(start='2021-08-01', periods=20)

DatetimeIndex(['2021-08-01', '2021-08-02', '2021-08-03', '2021-08-04',
               '2021-08-05', '2021-08-06', '2021-08-07', '2021-08-08',
               '2021-08-09', '2021-08-10', '2021-08-11', '2021-08-12',
               '2021-08-13', '2021-08-14', '2021-08-15', '2021-08-16',
               '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20'],
              dtype='datetime64[ns]', freq='D')

In [50]:
pd.date_range(end='2021-09-01', periods=20)

DatetimeIndex(['2021-08-13', '2021-08-14', '2021-08-15', '2021-08-16',
               '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20',
               '2021-08-21', '2021-08-22', '2021-08-23', '2021-08-24',
               '2021-08-25', '2021-08-26', '2021-08-27', '2021-08-28',
               '2021-08-29', '2021-08-30', '2021-08-31', '2021-09-01'],
              dtype='datetime64[ns]', freq='D')

In [51]:
#'BM' trás um trecho de data contendo o ultimo dia útil de cada mês
pd.date_range('2020-01-01','2021-01-01', freq='BM')

DatetimeIndex(['2020-01-31', '2020-02-28', '2020-03-31', '2020-04-30',
               '2020-05-29', '2020-06-30', '2020-07-31', '2020-08-31',
               '2020-09-30', '2020-10-30', '2020-11-30', '2020-12-31'],
              dtype='datetime64[ns]', freq='BM')

In [52]:
pd.date_range('2021-07-03 12:56:31', periods=5)

DatetimeIndex(['2021-07-03 12:56:31', '2021-07-04 12:56:31',
               '2021-07-05 12:56:31', '2021-07-06 12:56:31',
               '2021-07-07 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [53]:
#normalizar horários com normalize
pd.date_range('2021-08-02 12:56:31', periods=5, normalize=True)

DatetimeIndex(['2021-08-02', '2021-08-03', '2021-08-04', '2021-08-05',
               '2021-08-06'],
              dtype='datetime64[ns]', freq='D')

**Frequências e offset de datas**

In [54]:
#Import Hour e Minute lá em cima
hour = Hour()
hour

<Hour>

In [55]:
four_hours = Hour(4)
four_hours

<4 * Hours>

In [56]:
pd.date_range('2000-01-01', '2001-01-03 23:59', freq='4h')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 04:00:00',
               '2000-01-01 08:00:00', '2000-01-01 12:00:00',
               '2000-01-01 16:00:00', '2000-01-01 20:00:00',
               '2000-01-02 00:00:00', '2000-01-02 04:00:00',
               '2000-01-02 08:00:00', '2000-01-02 12:00:00',
               ...
               '2001-01-02 08:00:00', '2001-01-02 12:00:00',
               '2001-01-02 16:00:00', '2001-01-02 20:00:00',
               '2001-01-03 00:00:00', '2001-01-03 04:00:00',
               '2001-01-03 08:00:00', '2001-01-03 12:00:00',
               '2001-01-03 16:00:00', '2001-01-03 20:00:00'],
              dtype='datetime64[ns]', length=2214, freq='4H')

In [57]:
Hour(2) + Minute(30)

<150 * Minutes>

In [58]:
pd.date_range('2000-01-01', periods=10, freq='1h30min')

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90T')

**Datas com a semana do mês**

In [59]:
rng = pd.date_range('2020-01-01', '2020-09-01', freq='WOM-3FRI')
list(rng)

[Timestamp('2020-01-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-02-21 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-03-20 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-04-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-05-15 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-06-19 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-07-17 00:00:00', freq='WOM-3FRI'),
 Timestamp('2020-08-21 00:00:00', freq='WOM-3FRI')]

**Deslocamento de datas (adiantando e atrasando)**

In [60]:
ts = pd.Series(np.random.randn(4), index=pd.date_range('1/1/2020', periods=4, freq='M'))
ts

2020-01-31    2.073052
2020-02-29    0.186838
2020-03-31    0.215567
2020-04-30    0.012094
Freq: M, dtype: float64

In [61]:
ts.shift(2)

2020-01-31         NaN
2020-02-29         NaN
2020-03-31    2.073052
2020-04-30    0.186838
Freq: M, dtype: float64

In [62]:
ts.shift(-2)

2020-01-31    0.215567
2020-02-29    0.012094
2020-03-31         NaN
2020-04-30         NaN
Freq: M, dtype: float64

In [63]:
ts/ts.shift(1) - 1

2020-01-31         NaN
2020-02-29   -0.909873
2020-03-31    0.153766
2020-04-30   -0.943896
Freq: M, dtype: float64

In [64]:
ts.shift(2,freq='M')

2020-03-31    2.073052
2020-04-30    0.186838
2020-05-31    0.215567
2020-06-30    0.012094
Freq: M, dtype: float64

In [65]:
ts.shift(3, freq='D')

2020-02-03    2.073052
2020-03-03    0.186838
2020-04-03    0.215567
2020-05-03    0.012094
dtype: float64

In [66]:
ts.shift(1, freq='90T') #T é minutos

2020-01-31 01:30:00    2.073052
2020-02-29 01:30:00    0.186838
2020-03-31 01:30:00    0.215567
2020-04-30 01:30:00    0.012094
dtype: float64

**Deslocando datas com offsets**

In [67]:
#Import Day e MonthEnd lá em cima
now = datetime(2011, 11, 17)
now + 3 * Day()

Timestamp('2011-11-20 00:00:00')

In [68]:
now + MonthEnd()

Timestamp('2011-11-30 00:00:00')

In [69]:
now + MonthEnd(2)

Timestamp('2011-12-31 00:00:00')

In [70]:
offset = MonthEnd()

In [71]:
offset.rollforward(now)

Timestamp('2011-11-30 00:00:00')

In [72]:
offset.rollback(now)

Timestamp('2011-10-31 00:00:00')

In [73]:
ts = pd.Series(np.random.randn(20), index=pd.date_range('1/15/2000', periods=20, freq='4d'))
ts

2000-01-15    0.376459
2000-01-19   -2.356795
2000-01-23   -1.364912
2000-01-27   -0.289286
2000-01-31    0.208085
2000-02-04   -1.458685
2000-02-08   -1.474612
2000-02-12   -0.234847
2000-02-16   -0.038196
2000-02-20   -1.804030
2000-02-24    0.642490
2000-02-28    0.090419
2000-03-03   -0.051233
2000-03-07    1.672150
2000-03-11    0.156591
2000-03-15   -0.928460
2000-03-19    1.420019
2000-03-23   -1.579748
2000-03-27    0.483882
2000-03-31   -0.445448
Freq: 4D, dtype: float64

In [74]:
ts.groupby(offset.rollforward).mean()

2000-01-31   -0.685290
2000-02-29   -0.611066
2000-03-31    0.090969
dtype: float64

In [75]:
ts.resample('M').mean()

2000-01-31   -0.685290
2000-02-29   -0.611066
2000-03-31    0.090969
Freq: M, dtype: float64

# **Tratamento de fusos horários**

In [76]:
#Import pytz lá em cima
#Encontrar o nome dos fusohorários
pytz.common_timezones[-5:]

['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']

In [77]:
#Obter um objeto de fuso horário
tz = pytz.timezone('America/New_York')
tz

<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>

**Localização e conversão dos fuso horários**

In [78]:
#Series temporais não consideram fuso horario por padrão
rng = pd.date_range('3/9/2012 09:30', periods =6, freq='D')
ts=pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-09 09:30:00   -0.129003
2012-03-10 09:30:00   -0.100823
2012-03-11 09:30:00   -1.492626
2012-03-12 09:30:00   -0.874743
2012-03-13 09:30:00    0.831298
2012-03-14 09:30:00    0.171795
Freq: D, dtype: float64

In [79]:
#Campo tz no índice é None
print(ts.index.tz)

None


In [80]:
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00', '2012-03-16 09:30:00+00:00',
               '2012-03-17 09:30:00+00:00', '2012-03-18 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [81]:
ts

2012-03-09 09:30:00   -0.129003
2012-03-10 09:30:00   -0.100823
2012-03-11 09:30:00   -1.492626
2012-03-12 09:30:00   -0.874743
2012-03-13 09:30:00    0.831298
2012-03-14 09:30:00    0.171795
Freq: D, dtype: float64

In [82]:
ts_utc = ts.tz_localize('UTC')
ts_utc

2012-03-09 09:30:00+00:00   -0.129003
2012-03-10 09:30:00+00:00   -0.100823
2012-03-11 09:30:00+00:00   -1.492626
2012-03-12 09:30:00+00:00   -0.874743
2012-03-13 09:30:00+00:00    0.831298
2012-03-14 09:30:00+00:00    0.171795
Freq: D, dtype: float64

In [83]:
ts_utc.index

DatetimeIndex(['2012-03-09 09:30:00+00:00', '2012-03-10 09:30:00+00:00',
               '2012-03-11 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq='D')

In [84]:
#É possivel converter uma serie em outro fuso horário
ts_utc.tz_convert('America/New_York')

2012-03-09 04:30:00-05:00   -0.129003
2012-03-10 04:30:00-05:00   -0.100823
2012-03-11 05:30:00-04:00   -1.492626
2012-03-12 05:30:00-04:00   -0.874743
2012-03-13 05:30:00-04:00    0.831298
2012-03-14 05:30:00-04:00    0.171795
Freq: D, dtype: float64

In [85]:
#Mudando o fuso horário para o de Berlim

ts_eastern = ts.tz_localize('America/New_York')
ts_eastern.tz_convert('UTC')

2012-03-09 14:30:00+00:00   -0.129003
2012-03-10 14:30:00+00:00   -0.100823
2012-03-11 13:30:00+00:00   -1.492626
2012-03-12 13:30:00+00:00   -0.874743
2012-03-13 13:30:00+00:00    0.831298
2012-03-14 13:30:00+00:00    0.171795
dtype: float64

In [86]:
ts_eastern.tz_convert('Europe/Berlin')

2012-03-09 15:30:00+01:00   -0.129003
2012-03-10 15:30:00+01:00   -0.100823
2012-03-11 14:30:00+01:00   -1.492626
2012-03-12 14:30:00+01:00   -0.874743
2012-03-13 14:30:00+01:00    0.831298
2012-03-14 14:30:00+01:00    0.171795
dtype: float64

In [87]:
ts.index.tz_localize('Asia/Shanghai')

DatetimeIndex(['2012-03-09 09:30:00+08:00', '2012-03-10 09:30:00+08:00',
               '2012-03-11 09:30:00+08:00', '2012-03-12 09:30:00+08:00',
               '2012-03-13 09:30:00+08:00', '2012-03-14 09:30:00+08:00'],
              dtype='datetime64[ns, Asia/Shanghai]', freq=None)

**Operações com objetos Timestamp que consideram fuso horários**

In [88]:
Stamp = pd.Timestamp('2011-03-12 04:00')
stamp_utc = stamp.tz_localize('utc')
stamp_utc.tz_convert('America/New_York')

Timestamp('2011-01-06 19:00:00-0500', tz='America/New_York')

In [89]:
#É possivel passar um fuso horário quando se cria um timestamp
stamp_moscow = pd.Timestamp('2011-03-12 04:00', tz='Europe/Moscow')
stamp_moscow

Timestamp('2011-03-12 04:00:00+0300', tz='Europe/Moscow')

In [90]:
#Timestampo armazena valor UTC em nanosegundos
stamp_utc.value

1294358400000000000

In [91]:
stamp_utc.tz_convert('America/New_York').value

1294358400000000000

In [92]:
#Offset respeita horário de verão, import Hour lá em cima
#30 minutos antes de iniciar o horário de verão
stamp = pd.Timestamp('2012-03-12 01:30', tz='US/Eastern')
stamp

Timestamp('2012-03-12 01:30:00-0400', tz='US/Eastern')

In [93]:
stamp + Hour()

Timestamp('2012-03-12 02:30:00-0400', tz='US/Eastern')

In [94]:
#30 minutos antes de sair do horário de verão
stamp = pd.Timestamp('2012-11-04 00:30', tz='US/Eastern')
stamp

Timestamp('2012-11-04 00:30:00-0400', tz='US/Eastern')

In [95]:
stamp + 2* Hour()

Timestamp('2012-11-04 01:30:00-0500', tz='US/Eastern')

**Operação entre fusos horários diferentes**

In [96]:
rng = pd.date_range('3/7/2012 9:30', periods=10, freq='B')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2012-03-07 09:30:00   -0.402954
2012-03-08 09:30:00   -0.872877
2012-03-09 09:30:00   -0.191592
2012-03-12 09:30:00    1.204823
2012-03-13 09:30:00   -0.227866
2012-03-14 09:30:00   -1.202797
2012-03-15 09:30:00   -0.133517
2012-03-16 09:30:00    0.033403
2012-03-19 09:30:00   -1.303898
2012-03-20 09:30:00   -0.273748
Freq: B, dtype: float64

In [97]:
ts1 = ts[:7].tz_localize('Europe/London')
ts2 = ts1[2:].tz_convert('Europe/Moscow')
result = ts1 + ts2
result.index

DatetimeIndex(['2012-03-07 09:30:00+00:00', '2012-03-08 09:30:00+00:00',
               '2012-03-09 09:30:00+00:00', '2012-03-12 09:30:00+00:00',
               '2012-03-13 09:30:00+00:00', '2012-03-14 09:30:00+00:00',
               '2012-03-15 09:30:00+00:00'],
              dtype='datetime64[ns, UTC]', freq=None)

# **Período e aritmética com períodos**

In [98]:
#Period representa dias, meses, trimestres ou anos
p = pd.Period(2007, freq='A-DEC')
p

Period('2007', 'A-DEC')

In [99]:
p + 5

Period('2012', 'A-DEC')

In [100]:
p - 2

Period('2005', 'A-DEC')

In [101]:
pd.Period('2014', freq='A-DEC') - p

<7 * YearEnds: month=12>

In [102]:
#intervalos de periodos com period_range
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng

PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]', freq='M')

In [103]:
#PeriodIndex armazena uma sequência de períodos e pode ser usada como índice
pd.Series(np.random.randn(6), index=rng)

2000-01   -0.442838
2000-02   -0.075116
2000-03   -0.808194
2000-04    0.641455
2000-05   -0.304809
2000-06   -0.687378
Freq: M, dtype: float64

In [104]:
#Pode ser usada com um array de string também

values = ['2001Q3', '2002Q2', '2003Q1']
index = pd.PeriodIndex(values, freq='Q-DEC')
index

PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]', freq='Q-DEC')

**Conversão de frequência e períodos**

In [105]:
p = pd.Period('2007', freq='A-DEC')
p

Period('2007', 'A-DEC')

In [106]:
p.asfreq('M', how='start')

Period('2007-01', 'M')

In [107]:
p.asfreq('M', how='end')

Period('2007-12', 'M')

In [108]:
#Ano fiscal que termine em um mês diferente de dezembro
p = pd.Period('2007', freq='A-JUN')
p

Period('2007', 'A-JUN')

In [109]:
p.asfreq('M', 'start')

Period('2006-07', 'M')

In [110]:
p.asfreq('M', 'end')

Period('2007-06', 'M')

In [111]:
p = pd.Period('Aug-2007', 'M')
p.asfreq('A-JUN')

Period('2008', 'A-JUN')

In [112]:
rng = pd.period_range('2006', '2009', freq='A-DEC')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts

2006   -0.060215
2007    1.087320
2008    2.146387
2009    0.521524
Freq: A-DEC, dtype: float64

In [113]:
ts.asfreq('M', how='start')

2006-01   -0.060215
2007-01    1.087320
2008-01    2.146387
2009-01    0.521524
Freq: M, dtype: float64

In [114]:
#Se quiser o último dia útil de cada ano
ts.asfreq('B', how='end')

2006-12-29   -0.060215
2007-12-31    1.087320
2008-12-31    2.146387
2009-12-31    0.521524
Freq: B, dtype: float64

**Frequências de períodos trimestrais**