# Time Series

In [1]:
# Time Series data is an important form of structured data in many different fields, such as finance,economics,ecology, neuroscience, and physics. Anything that is observed or measured at many points in time forms a time series. Many time series are fixed frequency, which is to say that data points occur at regular intervals according to some rule, such as every 15 seconds, every 5 minutes, or once per month. Time series can also be irregular without a fixed unit of time or offset between units. How you mark and refer to time series data depends on the application, and you may have one of the following:
# Timestamps, specific instants in time 
# Fixed periods, such as the month January 2007 or the full year 2010
# Intervals of time, indicated by a start and end timestamp. Periods can be thought of as special cases of intervals
# Experiment or elapsed time; each timestamp is a measure of time relative to a particular start time (e.g., the diameter of a cookie baking each second since being placed in the oven)

# Date and Time Data Types and Tools

In [2]:
#  The python standard library includes data types for date and time data, as well as calendar-related functionality.
# The datetime, time, and calendar modules are the main places to start. The datetime.datetime type or simply datetime is widely used:
from datetime import datetime
now = datetime.now()
print(now)

2024-01-30 16:36:36.265225


In [3]:
now.year, now.month, now.day 

(2024, 1, 30)

In [4]:
# datetime stores both the date and time down to the microsecond. datetime.timedelta represents the temporal difference between tow datetime objects"ch6_data loading_storage and file formats.ipynb
delta = datetime(2011,1,7) - datetime(2008,6,24,8,15) # 2011-01-07 00:00:00 - 2008-06-24 08:15:00 = 926 days, 15:45:00

In [5]:
delta.days

926

In [6]:
delta.seconds

56700

In [7]:
# you can add (or substract) a timedelta or multiple thereof to a datetime object to yield a new shifted object:
from datetime import timedelta 
start = datetime(2011,1,7)

In [12]:
from datetime import datetime 
start + timedelta(12) # 2011-01-19 00:00:00 

datetime.datetime(2011, 1, 19, 0, 0)

In [13]:
# Types in datetime module 
# date stores the calendar date(year, month, day) using the Gregorian calendar 
# time stores the time as hours, minutes, seconds, and microseconds
# datetime stores both date and time
# timedelta represents the difference between two datetime values as days, seconds, and microseconds

# Converting between string and datetime 

datetime objects and pandas Timestamp objects can be converted to one another very easily:

```python

In [14]:
from turtle import stamp


stamp = datetime.now() # 2019-11-05 15:54:00.000000
str(stamp) # '2019-11-05 15:54:00.000000' 

'2024-01-30 16:48:01.991575'

In [15]:
stamp.strftime('%Y-%m-%d') # '2019-11-05' 

'2024-01-30'

In [16]:
# for a complete list of the format codes. These same format codes can be used to convert strings to dates using datetime.strptime: 
value = '2011-01-03'

In [17]:
datetime.strptime(value, '%Y-%m-%d') # datetime.datetime(2011, 1, 3, 0, 0) 

datetime.datetime(2011, 1, 3, 0, 0)

In [18]:
datestrs= ['7/6/2011', '8/6/2011']

In [19]:
datestrs 

['7/6/2011', '8/6/2011']

In [20]:
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs] # [datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)] 


[datetime.datetime(2011, 7, 6, 0, 0), datetime.datetime(2011, 8, 6, 0, 0)]

In [21]:
# datetime.strptime is the best way to parse a date with a known format. However, it can be a bit annoying to have to 
# write a format spec each time, especially for common date formats. 
# In this case, you can use the parser.parse method in the third-party dateutil package (this is installed automatically when you install pandas): 
from dateutil.parser import parse 
parse('2011-01-03') # datetime.datetime(2011, 1, 3, 0, 0) 

datetime.datetime(2011, 1, 3, 0, 0)

In [22]:
# dateutil is capable of parsing most human-intelligible date representations: 
parse('Jan 31, 1997 10:45 PM') # datetime.datetime(1997, 1, 31, 22, 45) 

datetime.datetime(1997, 1, 31, 22, 45)

In [23]:
# In internationally locales, day apprearing before month is very common, so you can pass dayfirst=True to indicate this:
parse('6/12/2011', dayfirst=True) # datetime.datetime(2011, 12, 6, 0, 0) 

datetime.datetime(2011, 12, 6, 0, 0)

In [24]:
# pandas is generally oriented toward working with arrays of dates, whether used as an axis index or a column in a DataFrame.
# The to_datetime method parses many different kinds of date representations. Standard date formats like ISO 8601 can be parsed very quickly: 
import pandas as pd 
datestrs = ['2011-07-06 12:00:00', '2011-08-06 00:00:00'] 

In [25]:
pd.to_datetime(datestrs) # DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06'], dtype='datetime64[ns]', freq=None) 

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00'], dtype='datetime64[ns]', freq=None)

In [26]:
# It also handles values that should be considered missing (None, empty string, etc.): 
idx = pd.to_datetime(datestrs + [None]) 

In [27]:
idx

DatetimeIndex(['2011-07-06 12:00:00', '2011-08-06 00:00:00', 'NaT'], dtype='datetime64[ns]', freq=None)

In [28]:
# Datetime format specification (ISO C89 compatible) 
# %Y 4-digit year
# %y 2-digit year
# %m 2-digit month [01,12]
# %d 2-digit day [01,31]
# %H Hour (24-hour clock)[00,23]
# %I Hour (12-hour clock)[01,12]
# %M 2-digit minute [00,59]
# %S Second [00,61] (seconds 60,61 account for leap seconds)
# %w Weekday as integer [0(Sunday),6]
# %U Week number of the year [00,53]; Sunday is considered the first day of the week, and days before the first Sunday of the year are “week 0”
# %W Week number of the year [00,53]; Monday is considered the first day of the week, and days before the first Monday of the year are “week 0”
# %z UTC time zone offset as +HHMM or -HHMM; empty if time zone naive
# %F Shortcut for %Y-%m-%d (e.g., 2012-4-18)
# %D Shortcut for %m/%d/%y (e.g., 04/18/12)

In [29]:
#datetime objects also have a number of locale-specific formatting options for systems in other countries or languages. 
# For example, in German, the day appears before the month:
# 2011-03-12 04:00:00 PM -> 12.03.2011 16:00:00
# The German locale (and others) can be indicated in pandas like so:
pd.to_datetime(datestrs[0], dayfirst=True) # Timestamp('2011-06-07 12:00:00')

Timestamp('2011-07-06 12:00:00')

In [30]:
# Locale - specific date formatting 
# %a Weekday as locale’s abbreviated name. Sun, Mon, ..., Sat (en_US); So, Mo, ..., Sa (de_DE)
# %A Weekday as locale’s full name. Sunday, Monday, ..., Saturday (en_US); Sonntag, Montag, ..., Samstag (de_DE)
# %b Month as locale’s abbreviated name. Jan, Feb, ..., Dec (en_US); Jan, Feb, ..., Dez (de_DE)
# %B Month as locale’s full name. January, February, ..., December (en_US); Januar, Februar, ..., Dezember (de_DE)
# %c Locale’s appropriate date and time representation. Tue Aug 16 21:30:00 1988 (en_US); Di 16 Aug 21:30:00 1988 (de_DE)
# %p Locale’s equivalent of either AM or PM. AM, PM (en_US); am, pm (de_DE)
# %x Locale’s appropriate date representation. 08/16/88 (None); 08/16/1988 (en_US); 16.08.1988 (de_DE)
# %X Locale’s appropriate time representation. 21:30:00 (en_US); 21:30:00 (de_DE)

# Time Series Basics 
The most basic kind of time series object in pandas is a Series indexed by timestamps, which is often represented external to pandas as Python strings or datetime objects: 

```python

In [2]:
from datetime import datetime

dates = [datetime(2011,1,2), datetime(2011,1,5), datetime(2011,1,7), datetime(2011,1,8), datetime(2011,1,10), datetime(2011,1,12)]


In [3]:
from pandas import Series, DataFrame 
import numpy as np 
ts = Series (np.random.randn(6), index=dates)

In [4]:
ts

2011-01-02   -0.191806
2011-01-05    1.348229
2011-01-07    0.228457
2011-01-08    0.941104
2011-01-10    2.017281
2011-01-12    0.614095
dtype: float64

In [5]:
# Under the hood, these datetime objects have been put in a DatatimeIndex: 
# and the variable ts is now of type TimeSeries 
type(ts) # pandas.core.series.Series 

pandas.core.series.Series

In [6]:
ts.index 

DatetimeIndex(['2011-01-02', '2011-01-05', '2011-01-07', '2011-01-08',
               '2011-01-10', '2011-01-12'],
              dtype='datetime64[ns]', freq=None)

In [7]:
# Like other Series, arithmetic operations between differently-indexed time series automatically align on the dates: 
ts + ts[::2] # 2011-01-02 00:00:00   -0.168180
            # 2011-01-05 00:00:00         NaN
            # 2011-01-07 00:00:00   -0.759805
            # 2011-01-08 00:00:00         NaN
            # 2011-01-10 00:00:00    0.669935
            # 2011-01-12 00:00:00         NaN
            # dtype: float64

2011-01-02   -0.383613
2011-01-05         NaN
2011-01-07    0.456914
2011-01-08         NaN
2011-01-10    4.034562
2011-01-12         NaN
dtype: float64