# Time Series

### Data observations occurring over different points in time, whether *fixed frequency* or *irregular*.

#### Entities useful to think about concerning time series

Timestamps - specific instants in time

Intervals - spans of time marked by a starting and ending timestamp

Periods - special cases of Intervals, neatly bookended, like the month of January, the 1st Quarter, or the year 2016

Experiment or Elapsed time - time relative to an arbitrarily chosen $t_0$ starting timestamp.

### Standard Python Date and Time Data Types

Python has date and time functionality in its `datetime`, `time`, and `calendar` modules.

The most basic type is the `datetime` type from the `datetime` module, which can store a date and time down to the microsecond

In [2]:
from datetime import datetime

#### the `datetime` type - a timestamp

The `now()` method returns current time and will also take a time zone argument

In [3]:
now = datetime.now()
now

datetime.datetime(2017, 11, 12, 11, 7, 30, 155385)

In [4]:
type(now)

datetime.datetime

The `today()` method also returns the current time but does not take any arguments.

In [5]:
today = datetime.today()
today

datetime.datetime(2017, 11, 12, 11, 8, 2, 482550)

In [6]:
type(today)

datetime.datetime

Both give precision down to the microsecond.  
0.001 = 1 millisecond = one onethousandth of a second  
0.000001 = 1 microsecond = one onemillionth of a second  
0.0000000001 = 1 nanosecond = one onebilliong of a second  

#### the `date` type - less precision than the `datetime` type

In [9]:
my_time = datetime.now()

`.date()` returns a `date` object without a time.  
Requires a `datetime` object, not integer arguments.

Two ways to do it:

In [11]:
my_time.date()

datetime.date(2017, 11, 12)

In [36]:
datetime.date(my_time)

datetime.date(2017, 11, 12)

In [14]:
my_date = my_time.date()
type(my_date)

datetime.date

#### Extracting pieces of a `datetime` object

In [46]:
my_time = datetime.now()
my_time

datetime.datetime(2017, 11, 12, 7, 16, 48, 466682)

`.year`, `.month`, `.day`, `.hour`, `.minute`, `.second`, `.microsecond`  
`.weekday()` returns the day of the week on a Mon=0, ..., Sun=6 scale

In [62]:
my_time.month

11

In [61]:
my_time.microsecond

466682

In [64]:
my_time.weekday()

6

#### use `datetime()` to manually create a datetime object

In [7]:
new_time = datetime(2017,10,31, 23,45,23, 234456)
new_time

datetime.datetime(2017, 10, 31, 23, 45, 23, 234456)

#### the `timedelta` type - an Interval - the difference in time between two timestamps

When a `datetime` object is subtracted from another, a `timedelta` object results.

In [79]:
diff = my_time - new_time
diff

datetime.timedelta(11, 27085, 232226)

the difference is expressed in days, seconds, microseconds

In [80]:
type(diff)

datetime.timedelta

Can also have `timedelta` expressed only in seconds with `.total_seconds()`

In [83]:
diff.total_seconds()

977485.232226

Can also extract pieces from a `timedelta` object.  
`.days`, `.seconds`, `.microseconds`

In [89]:
diff.days

11

Cannot subtract a `date` object from a `datetime` object or vice versa.

In [103]:
my_time - my_date

TypeError: unsupported operand type(s) for -: 'datetime.datetime' and 'datetime.date'

But can subtract one `date` object from another, resulting in a `timedelta` object of days.

In [16]:
new_date = new_time.date()  # make a new date object
new_date

datetime.date(2017, 10, 31)

In [21]:
test = my_date - new_date  # difference results in timedelta object
test.resolution

datetime.timedelta(0, 0, 1)

#### Manually create intervals with the `timedelta()` method.

In [23]:
from datetime import timedelta

In [29]:
my_interval = timedelta(20, 5 * 3600 + 37 * 60 + 22)  # 12 days 5hrs 37 min 22 seconds
my_interval

datetime.timedelta(20, 20242)

#### Shift time with `timedelta()`

`timedelta` objects can be added or subtracted from `datetime` or `date` objects, resulting in the same type shifted forward or backward in time.

In [26]:
start = datetime(2017,7,1)

In [37]:
start - timedelta(12)

datetime.datetime(2017, 6, 19, 0, 0)

In [30]:
start + my_interval

datetime.datetime(2017, 7, 21, 5, 37, 22)

In [35]:
type(my_date)  # try a timedelta with a date type

datetime.date

In [39]:
my_date  # reminder

datetime.date(2017, 11, 12)

In [38]:
my_date + timedelta(3, 100, 657)  # results in a date type, seconds and microseconds are lost in the process

datetime.date(2017, 11, 15)

Can work with multiples of timedeltas as well.

In [40]:
four_weeks_hence = datetime.now() + 4 * timedelta(7)
four_weeks_hence

datetime.datetime(2017, 12, 10, 11, 51, 20, 216229)

## Datetimes and strings

### Converting datetimes to string

In [42]:
stamp = datetime(2017, 11, 12, 14, 37, 33)
str(stamp)

'2017-11-12 14:37:33'

Convenient function but maybe not the desired format or more than you want.  
For a little control, use `strftime()`, mnemonic "string format time".  
It takes a `datetime` and a formatting string argument.

In [47]:
datetime.strftime(stamp, '%m-%d-%y')

'11-12-17'

Can achieve many different results using the formatting codes.

In [49]:
datetime.strftime(stamp, '%A, the %dth of %B, %Y')

'Sunday, the 12th of November, 2017'

See full formatting codes at Python docs https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

## Converting strings to datetimes

Very often you will encounter a date represented as a string and need to convert it to `datetime` in order to perform calculations.

### `strptime()`

`strftime()` sister function, `strptime()` takes a string date and a formatting string indicating the positions of the date components in the string date and returns a `datetime` object.

In [50]:
datetime.strptime('3/14/17', '%m/%d/%y')

datetime.datetime(2017, 3, 14, 0, 0)

In [51]:
datetime.strptime('Dec-12-2017', '%b-%d-%Y')

datetime.datetime(2017, 12, 12, 0, 0)

This works well, but you may get weary of always having to supply the formatting string.

### `parse()`

The `dateutil` package contains a powerful function, `parser.parse()` which will parse most date strings into a `datetime`.

In [52]:
from dateutil.parser import parse
parse('2017-3-7')

datetime.datetime(2017, 3, 7, 0, 0)

In [53]:
parse('4/5/17')

datetime.datetime(2017, 4, 5, 0, 0)

In [55]:
parse('June 6th, 1984, 10:45 PM')

datetime.datetime(1984, 6, 6, 22, 45)

When you know you're dealing with International date formats that list the day before the month, `parse()` takes an argument `dayfirst=True`

In [60]:
parse('4-5-17', dayfirst=True )  # May 4th

datetime.datetime(2017, 5, 4, 0, 0)

However, `parse()` is not built to handle arrays of dates.  You'd have to use list comprehension to work with arrays.

In [62]:
[parse(x) for x in ['7-21-17', '7-22-17', '7-23-17']]

[datetime.datetime(2017, 7, 21, 0, 0),
 datetime.datetime(2017, 7, 22, 0, 0),
 datetime.datetime(2017, 7, 23, 0, 0)]

### Pandas `.to_datetime()`

`pd.to_datetime()` **is** built to handle arrays of dates and parses as well as `parse()`

In [75]:
import pandas as pd
datestrs = ['7/6/2017', '8-9-17', '12-Feb-17']
pd.to_datetime(datestrs)

DatetimeIndex(['2017-07-06', '2017-08-09', '2017-02-12'], dtype='datetime64[ns]', freq=None)

When `pd.to_datetime()` parses multiple date strings, the result is a Pandas DatetimeIndex.

In [70]:
pd.to_datetime('7/6/2017')

Timestamp('2017-07-06 00:00:00')

But when `pd.to_datetime()` parses a single date string, the result is a Pandas Timestamp.

## Time Series Basics

The most basic kind of time series object in Pandas is, generically, a Series of data indexed by timestamps.  
The timestamps may arise from Python strings or `datetime` objects.

In [1]:
import pandas as pd
from pandas import Series
import numpy as np
from datetime import datetime

In [2]:
from datetime import datetime
dates = [datetime(2017, 1, 2), datetime(2017, 1, 5), datetime(2017, 1, 7), datetime(2017, 1, 8), datetime(2017, 1, 10), datetime(2017, 1, 12)]
ts = Series(np.random.randn(6), index=dates)

In [3]:
type(ts)

pandas.core.series.Series

Under the hood, these datetime objects have been assembled into a Pandas DatetimeIndex and the variable `ts` is of type `Series`.  
** older documentation may refer to a `TimeSeries` type but that was deprecated with Pandas 0.13

As with all Series, arithmetic operations between differently indexed time series automatically align on the dates:

In [4]:
ts

2017-01-02    1.013992
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

In [5]:
ts[::2]  # every other

2017-01-02    1.013992
2017-01-07   -1.324754
2017-01-10    0.635246
dtype: float64

In [6]:
ts + ts[::2]  # numbers summed where indexes match, NaNs where the 2nd Series has nothing

2017-01-02    2.027985
2017-01-05         NaN
2017-01-07   -2.649507
2017-01-08         NaN
2017-01-10    1.270492
2017-01-12         NaN
dtype: float64

Scalar values from the DatetimeIndex are of Pandas type `Timestamp`.  A `Timestamp` can be substituted anywhere you'd use a `Datetime` object.  
Additionally, it can store frequency information and understands how to do Time Zone conversions and other kinds of manipulations.

In [7]:
ts.index

DatetimeIndex(['2017-01-02', '2017-01-05', '2017-01-07', '2017-01-08',
               '2017-01-10', '2017-01-12'],
              dtype='datetime64[ns]', freq=None)

In [8]:
type(ts.index[0])

pandas._libs.tslib.Timestamp

## Indexing, Selecting, Subsetting

In [9]:
ts  # reminder

2017-01-02    1.013992
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

In [10]:
ts.index[2]

Timestamp('2017-01-07 00:00:00')

In [11]:
stamp = ts.index[2]

In [12]:
ts[stamp]

-1.3247536905615083

Can also pass a string that is interpretable as a date:

In [13]:
ts['Jan-7-2017']

-1.3247536905615083

In [14]:
ts['20170107']

-1.3247536905615083

For longer time series, a year only or a year and month can be passed to select data:

In [15]:
longer_ts = Series(np.random.randn(1000), index=pd.date_range('1/1/2014', periods=1000))

In [16]:
longer_ts

2014-01-01   -0.581059
2014-01-02   -0.626574
2014-01-03    0.285798
2014-01-04    0.436557
2014-01-05    0.249141
2014-01-06    0.248383
2014-01-07   -0.270371
2014-01-08   -1.209010
2014-01-09    0.004494
2014-01-10   -2.059895
2014-01-11   -0.916560
2014-01-12   -0.866008
2014-01-13    0.289443
2014-01-14    0.245898
2014-01-15   -0.131810
2014-01-16    0.488818
2014-01-17    0.291183
2014-01-18    0.186903
2014-01-19    0.212818
2014-01-20    1.240618
2014-01-21    0.774390
2014-01-22    0.904166
2014-01-23    0.167843
2014-01-24   -1.932781
2014-01-25   -0.089920
2014-01-26   -0.759111
2014-01-27   -0.072015
2014-01-28   -1.091258
2014-01-29    0.612128
2014-01-30   -0.171878
                ...   
2016-08-28    1.197473
2016-08-29   -0.521036
2016-08-30    0.622203
2016-08-31   -0.721521
2016-09-01   -1.106838
2016-09-02   -0.985234
2016-09-03   -0.864989
2016-09-04    2.253327
2016-09-05    0.274149
2016-09-06   -0.298985
2016-09-07    1.070865
2016-09-08    0.469468
2016-09-09 

In [17]:
longer_ts['2015']

2015-01-01   -1.978820
2015-01-02    0.386896
2015-01-03    0.801072
2015-01-04   -0.835593
2015-01-05   -0.128167
2015-01-06   -0.268469
2015-01-07   -0.744481
2015-01-08    2.451290
2015-01-09   -1.267607
2015-01-10   -1.336758
2015-01-11   -1.606489
2015-01-12   -0.124963
2015-01-13   -1.123109
2015-01-14   -2.002436
2015-01-15   -0.603219
2015-01-16    0.549910
2015-01-17    0.628098
2015-01-18   -0.807392
2015-01-19   -0.890333
2015-01-20   -0.332081
2015-01-21    1.008852
2015-01-22   -1.498122
2015-01-23   -0.720306
2015-01-24    0.327020
2015-01-25   -0.909572
2015-01-26   -0.135499
2015-01-27   -0.406672
2015-01-28    0.953059
2015-01-29    0.675853
2015-01-30   -0.692012
                ...   
2015-12-02   -0.200029
2015-12-03   -0.615186
2015-12-04    0.269072
2015-12-05   -1.035311
2015-12-06    1.697176
2015-12-07    0.644690
2015-12-08   -1.981519
2015-12-09    0.815312
2015-12-10   -0.752776
2015-12-11   -1.548760
2015-12-12    1.782599
2015-12-13   -0.524443
2015-12-14 

In [18]:
longer_ts['2016-Mar']

2016-03-01   -0.109365
2016-03-02    0.181461
2016-03-03   -0.175725
2016-03-04    0.547773
2016-03-05   -0.934731
2016-03-06    1.642307
2016-03-07    1.996190
2016-03-08    0.801441
2016-03-09    0.674438
2016-03-10   -0.114258
2016-03-11   -0.155783
2016-03-12   -1.439531
2016-03-13    1.282998
2016-03-14   -0.556539
2016-03-15    0.047362
2016-03-16    0.604573
2016-03-17   -0.447864
2016-03-18    0.710213
2016-03-19   -1.183996
2016-03-20   -0.134745
2016-03-21   -1.195419
2016-03-22   -0.009212
2016-03-23    0.368021
2016-03-24   -0.632013
2016-03-25   -1.304101
2016-03-26   -0.384647
2016-03-27   -0.244581
2016-03-28   -2.301828
2016-03-29   -0.819440
2016-03-30    2.412657
2016-03-31    0.617553
Freq: D, dtype: float64

Slicing with dates works like slicing with a regular Series:

In [19]:
longer_ts[:'2014-Feb-15']

2014-01-01   -0.581059
2014-01-02   -0.626574
2014-01-03    0.285798
2014-01-04    0.436557
2014-01-05    0.249141
2014-01-06    0.248383
2014-01-07   -0.270371
2014-01-08   -1.209010
2014-01-09    0.004494
2014-01-10   -2.059895
2014-01-11   -0.916560
2014-01-12   -0.866008
2014-01-13    0.289443
2014-01-14    0.245898
2014-01-15   -0.131810
2014-01-16    0.488818
2014-01-17    0.291183
2014-01-18    0.186903
2014-01-19    0.212818
2014-01-20    1.240618
2014-01-21    0.774390
2014-01-22    0.904166
2014-01-23    0.167843
2014-01-24   -1.932781
2014-01-25   -0.089920
2014-01-26   -0.759111
2014-01-27   -0.072015
2014-01-28   -1.091258
2014-01-29    0.612128
2014-01-30   -0.171878
2014-01-31   -0.544346
2014-02-01    0.690956
2014-02-02   -1.544420
2014-02-03    0.666483
2014-02-04    0.595745
2014-02-05    0.435411
2014-02-06    0.627232
2014-02-07    0.251988
2014-02-08    1.481643
2014-02-09    0.515291
2014-02-10    0.232096
2014-02-11    0.930026
2014-02-12    1.409060
2014-02-13 

In [20]:
longer_ts['9/15/2016':]

2016-09-15   -0.092849
2016-09-16   -0.281034
2016-09-17    1.186484
2016-09-18    0.414335
2016-09-19   -0.836099
2016-09-20    0.806664
2016-09-21    0.801261
2016-09-22    0.543293
2016-09-23   -1.075291
2016-09-24    0.227980
2016-09-25   -0.344715
2016-09-26   -1.659978
Freq: D, dtype: float64

Because time series are usually ordered chronologically, you can slice using dates that aren't actually present in the index:

In [21]:
ts  # reminder

2017-01-02    1.013992
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

In [22]:
ts['1/3/2017':'1-11-17']

2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
dtype: float64

Remember, slicing returns a view on the original, so modifying the slice modifies the original.

In [44]:
sl = ts['1/1/2017':'1/6/2017']

In [45]:
sl['1/2/2017'] = 3.14159

In [46]:
ts

2017-01-02    3.141590
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

but adding a new entry to the slice does not add to the original

In [47]:
sl[datetime(2017,1,6)] = 2.781828

In [48]:
sl

2017-01-02    3.141590
2017-01-05   -0.544905
2017-01-06    2.781828
dtype: float64

In [49]:
ts

2017-01-02    3.141590
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

Note when **adding** to a time series, whether slice or original, you must supply a proper Datetime subscript, not a string:

In [50]:
sl['1/3/2017'] = 1.11111

In [51]:
sl  # not inserted chronologically in the DatetimeIndex

2017-01-02 00:00:00    3.141590
2017-01-05 00:00:00   -0.544905
2017-01-06 00:00:00    2.781828
1/3/2017               1.111110
dtype: float64

If you want, instead, to get a copy, not a view, use the awkwardly named `truncate()` method with no arguments.

In [52]:
ts2 = ts.truncate()  # makes a copy, can use arguments to restrict data before or after dates

In [53]:
ts2

2017-01-02    3.141590
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

In [54]:
ts2['1/2/2017'] = 4  # alter the copy

In [55]:
ts2

2017-01-02    4.000000
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

In [56]:
ts  # doesn't affect the original

2017-01-02    3.141590
2017-01-05   -0.544905
2017-01-07   -1.324754
2017-01-08   -0.020392
2017-01-10    0.635246
2017-01-12    0.251412
dtype: float64

### Slicing time series in DataFrames

The above holds true for DataFrames as well.

In [145]:
dates = pd.date_range('1/1/2014', periods=100, freq='W-WED')
dates

DatetimeIndex(['2014-01-01', '2014-01-08', '2014-01-15', '2014-01-22',
               '2014-01-29', '2014-02-05', '2014-02-12', '2014-02-19',
               '2014-02-26', '2014-03-05', '2014-03-12', '2014-03-19',
               '2014-03-26', '2014-04-02', '2014-04-09', '2014-04-16',
               '2014-04-23', '2014-04-30', '2014-05-07', '2014-05-14',
               '2014-05-21', '2014-05-28', '2014-06-04', '2014-06-11',
               '2014-06-18', '2014-06-25', '2014-07-02', '2014-07-09',
               '2014-07-16', '2014-07-23', '2014-07-30', '2014-08-06',
               '2014-08-13', '2014-08-20', '2014-08-27', '2014-09-03',
               '2014-09-10', '2014-09-17', '2014-09-24', '2014-10-01',
               '2014-10-08', '2014-10-15', '2014-10-22', '2014-10-29',
               '2014-11-05', '2014-11-12', '2014-11-19', '2014-11-26',
               '2014-12-03', '2014-12-10', '2014-12-17', '2014-12-24',
               '2014-12-31', '2015-01-07', '2015-01-14', '2015-01-21',
      

In [147]:
long_df = pd.DataFrame(np.random.randn(100,4), columns=['California', 'Texas', 'New York', 'Florida'], index=dates)

In [148]:
long_df

Unnamed: 0,California,Texas,New York,Florida
2014-01-01,-0.670055,0.887765,0.046678,2.034517
2014-01-08,0.763195,-1.009497,0.343186,0.808925
2014-01-15,0.567926,-0.649508,0.301028,-1.183059
2014-01-22,1.029831,-0.357734,-0.256060,-0.789614
2014-01-29,0.441617,0.582626,0.546245,-0.036859
2014-02-05,0.102730,-0.013300,-0.592911,1.399227
2014-02-12,0.324932,-1.371970,0.430918,-0.582621
2014-02-19,-0.614318,0.519467,1.529626,0.525963
2014-02-26,1.050643,-0.623419,-1.511494,-1.424813
2014-03-05,-0.647939,0.360815,0.194534,-0.006605


In [149]:
long_df['2015']

Unnamed: 0,California,Texas,New York,Florida
2015-01-07,-0.253776,-2.051296,-0.953472,-0.308618
2015-01-14,-1.43119,-1.731018,0.970498,0.916734
2015-01-21,0.831249,-1.129123,0.114963,-0.976705
2015-01-28,-0.677658,0.303644,-0.331522,0.657884
2015-02-04,2.541704,-0.663811,1.373684,1.396331
2015-02-11,-0.86703,1.726744,-0.01852,0.996996
2015-02-18,-0.583344,-0.756376,0.154717,-0.980414
2015-02-25,-0.231362,0.566808,-0.867567,-1.128934
2015-03-04,0.021171,0.671597,0.389465,0.683615
2015-03-11,0.463642,-0.965461,1.663242,1.122066


In [150]:
long_df['Aug-2015']

Unnamed: 0,California,Texas,New York,Florida
2015-08-05,0.544672,-1.169498,0.596024,0.549084
2015-08-12,1.495104,-0.582083,0.694864,0.778727
2015-08-19,-1.562557,0.370357,-0.037172,0.675588
2015-08-26,-0.446703,0.731919,-1.708521,0.564574


In [151]:
long_df['2014-01-06':'2014-02-06']

Unnamed: 0,California,Texas,New York,Florida
2014-01-08,0.763195,-1.009497,0.343186,0.808925
2014-01-15,0.567926,-0.649508,0.301028,-1.183059
2014-01-22,1.029831,-0.357734,-0.25606,-0.789614
2014-01-29,0.441617,0.582626,0.546245,-0.036859
2014-02-05,0.10273,-0.0133,-0.592911,1.399227
