# **S01: DATETIMES IN PYTHON**

Date and time data in Python comes in a few flavors:

- *Time stamps* reference particular moments in time (e.g., July 4th, 2015 at 7:00am).
- *Periods* reference a length of datetime between a particular beginning and end point; for example, the year 2015.
- *Time deltas* or *durations* reference an exact length of time (e.g., a duration of 22.56 seconds).

## Dates and Times in Python

The Python world has a number of available representations of dates, times, deltas, and timespans.
While the time series tools provided by Pandas tend to be the most useful for data science applications, it is helpful to see their relationship to other packages used in Python.

### Native Python dates and times: ``datetime`` and ``dateutil``

Python's basic objects for working with dates and times reside in the built-in ``datetime`` module.
Along with the third-party ``dateutil`` module, you can use it to quickly perform a host of useful functionalities on dates and times.
For example, you can manually build a date using the ``datetime`` type:

In [43]:
from datetime import datetime
a = datetime(2023, 11, hour=9, day=7, minute=6) 

In [44]:
a

datetime.datetime(2023, 11, 7, 9, 6)

In [45]:
type(a)

datetime.datetime

Or, using the ``dateutil`` module, you can parse dates from a variety of string formats:

In [46]:
from dateutil import parser
date = parser.parse("December 12th, 2024")
date

datetime.datetime(2024, 12, 12, 0, 0)

### The `strftime` method

This method states for "string from time" and it's very useful to transform a `datetime` variable into a formatted string according to the date and time format we want. All possible options here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

In [47]:
date.strftime("%V") # string format time

'50'

In [48]:
date.strftime('%d-%B-%Y')

'12-December-2024'

In [49]:
date.strftime('%B, %d %Y')

'December, 12 2024'

In [50]:
date.strftime('%b %d')

'Dec 12'

## Dealing with timeseries in Pandas

This section will introduce the fundamental Pandas data structures for working with time series data:

- For *time stamps*, Pandas provides the ``Timestamp`` type. As mentioned before, it is essentially a replacement for Python's native ``datetime``, but is based on the more efficient ``numpy.datetime64`` data type. The associated Index structure is ``DatetimeIndex``.
- For *time Periods*, Pandas provides the ``Period`` type. This encodes a fixed-frequency interval based on ``numpy.datetime64``. The associated index structure is ``PeriodIndex``.
- For *time deltas* or *durations*, Pandas provides the ``Timedelta`` type. ``Timedelta`` is a more efficient replacement for Python's native ``datetime.timedelta`` type, and is based on ``numpy.timedelta64``. The associated index structure is ``TimedeltaIndex``.

**So, in general, for date and time manipulation in pandas bear in mind `Timestamp`, `Period` and `Timedelta`**

In [51]:
import numpy as np
import pandas as pd

### Operating with `Timestamp` and `Period`

One of the useful things we can do with datetimes in pandas is checking if a specific timestamp is comprised inside a specific period

In [52]:
pd.Period('2025') # check this (represents whole year 2025)

Period('2025', 'Y-DEC')

In [53]:
p = pd.Period('2022-08-01') # represents the period of the day so beginning to end of day

timestamp = pd.Timestamp('2022-08-01 20:00') # you can specify the time for it too

p.start_time < timestamp < p.end_time

True

In [54]:
p.start_time, p.end_time

(Timestamp('2022-08-01 00:00:00'), Timestamp('2022-08-01 23:59:59.999999999'))

### Creating datetimes with `pd.to_datetime` and `pd.to_timedelta` functions

This function tries to convert the provided input into a sequence of pandas datetime objects. The most common use of this function is to convets a **formatted string** into a **datetime**

In [55]:
dt_s = pd.Series(["2023-01-01", "2023-01-02"])

def parse_date(element):
    year = int(element.split("-")[0])
    month = int(element.split("-")[1])
    day = int(element.split("-")[2])
    return datetime(year, month, day)

dt_s_converted = dt_s.map(parse_date)

dt_s_converted

0   2023-01-01
1   2023-01-02
dtype: datetime64[ns]

In [56]:
print(dt_s_converted.dtype) # changed to datetime64
print(dt_s.dtype) # object

datetime64[ns]
object


Or 

In [57]:
dt_s = pd.Series(["2023-01-01", "2023-01-02"])

pd.to_datetime(dt_s) # this is a better way to convert to datetime

0   2023-01-01
1   2023-01-02
dtype: datetime64[ns]

In [58]:
# convert a string date into a pandas datetime
date = pd.to_datetime("23rd of July, 2024")
date

Timestamp('2024-07-23 00:00:00')

In [59]:
print(pd.to_datetime('July 23 2024 at 4:25PM')) # you can use various ways of writing it

2024-07-23 16:25:00


In [60]:
# convert a array of string dates into a pandas datetime array
date = pd.to_datetime(["24th of July, 2024", "25th of July, 2024"])
date

DatetimeIndex(['2024-07-24', '2024-07-25'], dtype='datetime64[ns]', freq=None)

In [61]:
pd.read('.csv', parse_dates=['column name']) # this will convert the column to datetime

AttributeError: module 'pandas' has no attribute 'read'

In [None]:
# crate a list of datetimes with different formats
dates = pd.to_datetime([
    '4th of July, 2015',
    #'2015-Jul-6',
    datetime(2015, 7, 3),
    '07-07-2015',
    '20150708'
])

dates # pandas struggles to convert dates with different delimiters

NameError: name 'datetime' is not defined

The detection of the format is done automatically, but sometimes it fails. For more securtity, we can provide directly the _format_ with the **format** argument. Only valid if the format is always the same

In [None]:
date_format = "%d/%m/%Y"
date_format_2 = '%d of %B, %Y'

# use the "format" argument to provide the datetime format 
dates = pd.to_datetime(['2 of December, 2024', "3 of December, 2024"], format=date_format_2) 
print(dates)

DatetimeIndex(['2024-12-02', '2024-12-03'], dtype='datetime64[ns]', freq=None)


In [None]:
dates.to_period("H") # if unclear, it will just give the first hour of the day

  dates.to_period("H") # if unclear, it will just give the first hour of the day


PeriodIndex(['2024-12-02 00:00', '2024-12-03 00:00'], dtype='period[h]')

Additionally, we can create timedeltas (time span) with the following code

In [None]:
# create a timedelta of 1 day
span = pd.to_timedelta(1.666667, unit="H") # If you don't specify the unit, you get the miliseconds 
span # 1.666667 hours

  span = pd.to_timedelta(1.666667, unit="H") # If you don't specify the unit, you get the miliseconds


Timedelta('0 days 01:40:00.001200')

Timedeltas can be used to perform operations with datetime objects in pandas. For example:

In [None]:
pd.to_datetime("3rd of September, 2024")

Timestamp('2024-09-03 00:00:00')

In [None]:
pd.to_datetime("3rd of September, 2024") + span

Timestamp('2024-09-03 01:40:00.001200')

In [None]:
pd.to_datetime('2024-09-03 16:44') - span

Timestamp('2024-09-03 15:03:59.998800')

The same with timedelta arrays

In [None]:
list(np.arange(12))

[np.int64(0),
 np.int64(1),
 np.int64(2),
 np.int64(3),
 np.int64(4),
 np.int64(5),
 np.int64(6),
 np.int64(7),
 np.int64(8),
 np.int64(9),
 np.int64(10),
 np.int64(11)]

In [None]:
spans = pd.to_timedelta(np.arange(12), 'H')
spans 

  spans = pd.to_timedelta(np.arange(12), 'H')


TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00', '0 days 10:00:00', '0 days 11:00:00'],
               dtype='timedelta64[ns]', freq=None)

In [None]:
datetimes = pd.to_datetime("23rd of July, 2024") + spans[3:6]

datetimes

DatetimeIndex(['2024-07-23 03:00:00', '2024-07-23 04:00:00',
               '2024-07-23 05:00:00'],
              dtype='datetime64[ns]', freq=None)

In [None]:
# create a dataframe and convert one column into the index

df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})

df.set_index("b")

Unnamed: 0_level_0,a
b,Unnamed: 1_level_1
3,1
4,2


### Indexing by Time

Where the Pandas time series tools really become useful is when you begin to *index data by timestamps*.
For example, we can construct a ``Series`` object that has time indexed data:

In [None]:
index = pd.DatetimeIndex([
    '2014-07-04',
    '2014-08-04',
    '2015-07-04',
    '2015-08-04'
])

data = pd.Series([0, 1, 2, 3], index=index)
data # especially powerful when datetime is the index

2014-07-04    0
2014-08-04    1
2015-07-04    2
2015-08-04    3
dtype: int64

Now that we have this data in a ``Series``, we can make use of any of the ``Series`` indexing patterns we discussed in previous sections, passing values that can be coerced into dates:

In [None]:
data['2014'] # how to extract motnhs or other frequencies

2014-07-04    0
2014-08-04    1
dtype: int64

There are additional special date-only indexing operations, such as passing a year to obtain a slice of all data from that year:

In [None]:
data['2015']

2015-07-04    2
2015-08-04    3
dtype: int64

### Create sequences with `pd.date_range()`, `pd.period_range()` and `pd.timedelta_range()`

To make the creation of regular date sequences more convenient, Pandas offers a few functions for this purpose: ``pd.date_range()`` for timestamps, ``pd.period_range()`` for periods, and ``pd.timedelta_range()`` for time deltas.

In [None]:
# create a daily range between two dates
pd.date_range('2015-07-03', '2015-07-20', freq="D")

DatetimeIndex(['2015-07-03', '2015-07-04', '2015-07-05', '2015-07-06',
               '2015-07-07', '2015-07-08', '2015-07-09', '2015-07-10',
               '2015-07-11', '2015-07-12', '2015-07-13', '2015-07-14',
               '2015-07-15', '2015-07-16', '2015-07-17', '2015-07-18',
               '2015-07-19', '2015-07-20'],
              dtype='datetime64[ns]', freq='D')

In [None]:
# create a hourly range between two dates
pd.date_range('2015-07-03', '2015-07-10', freq="H")

  pd.date_range('2015-07-03', '2015-07-10', freq="H")


DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00',
               '2015-07-03 08:00:00', '2015-07-03 09:00:00',
               ...
               '2015-07-09 15:00:00', '2015-07-09 16:00:00',
               '2015-07-09 17:00:00', '2015-07-09 18:00:00',
               '2015-07-09 19:00:00', '2015-07-09 20:00:00',
               '2015-07-09 21:00:00', '2015-07-09 22:00:00',
               '2015-07-09 23:00:00', '2015-07-10 00:00:00'],
              dtype='datetime64[ns]', length=169, freq='h')

Alternatively, the date range can be specified not with a start and endpoint, but with a startpoint and a number of periods:

In [None]:
pd.date_range('2015-07', periods=8)

DatetimeIndex(['2015-07-01', '2015-07-02', '2015-07-03', '2015-07-04',
               '2015-07-05', '2015-07-06', '2015-07-07', '2015-07-08'],
              dtype='datetime64[ns]', freq='D')

The spacing can be modified by altering the ``freq`` argument, which defaults to ``D``.
For example, here we will construct a range of hourly timestamps:

In [None]:
pd.date_range('2015-07-03', periods = 8, freq = 'H')

  pd.date_range('2015-07-03', periods = 8, freq = 'H')


DatetimeIndex(['2015-07-03 00:00:00', '2015-07-03 01:00:00',
               '2015-07-03 02:00:00', '2015-07-03 03:00:00',
               '2015-07-03 04:00:00', '2015-07-03 05:00:00',
               '2015-07-03 06:00:00', '2015-07-03 07:00:00'],
              dtype='datetime64[ns]', freq='h')

In [None]:
pd.date_range('2015-07-03', periods=8, freq='2D') 

DatetimeIndex(['2015-07-03', '2015-07-05', '2015-07-07', '2015-07-09',
               '2015-07-11', '2015-07-13', '2015-07-15', '2015-07-17'],
              dtype='datetime64[ns]', freq='2D')

To create regular sequences of ``Period`` or ``Timedelta`` values, the very similar ``pd.period_range()`` and ``pd.timedelta_range()`` functions are useful.
Here are some monthly periods:

In [None]:
pd.period_range('2015-07', periods=8, freq='H')

  pd.period_range('2015-07', periods=8, freq='H')


PeriodIndex(['2015-07-01 00:00', '2015-07-01 01:00', '2015-07-01 02:00',
             '2015-07-01 03:00', '2015-07-01 04:00', '2015-07-01 05:00',
             '2015-07-01 06:00', '2015-07-01 07:00'],
            dtype='period[h]')

In [None]:
pd.period_range('2015-07', periods=8, freq='M')

PeriodIndex(['2015-07', '2015-08', '2015-09', '2015-10', '2015-11', '2015-12',
             '2016-01', '2016-02'],
            dtype='period[M]')

And a sequence of durations increasing by an hour:

In [None]:
pd.timedelta_range(0, periods=10, freq='H')

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00'],
               dtype='timedelta64[ns]', freq='H')

All of these require an understanding of Pandas frequency codes, which we'll summarize in the next section.

#### Frequencies and Offsets

Fundamental to these Pandas time series tools is the concept of a frequency or date offset.
Just as we saw the ``D`` (day) and ``H`` (hour) codes above, we can use such codes to specify any desired frequency spacing.
The following table summarizes the main codes available:

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``W``  | Weekly              |        |                      |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |

The monthly, quarterly, and annual frequencies are all marked at the end of the specified period.
By adding an ``S`` suffix to any of these, they instead will be marked at the beginning:

| Code    | Description            | Code    | Description            |
|---------|------------------------|---------|------------------------|
| ``MS``  | Month start            |``BMS``  | Business month start   |
| ``QS``  | Quarter start          |``BQS``  | Business quarter start |
| ``AS``  | Year start             |``BAS``  | Business year start    |

Additionally, you can change the month used to mark any quarterly or annual code by adding a three-letter month code as a suffix:

- ``Q-JAN``, ``BQ-FEB``, ``QS-MAR``, ``BQS-APR``, etc.
- ``A-JAN``, ``BA-FEB``, ``AS-MAR``, ``BAS-APR``, etc.

In the same way, the split-point of the weekly frequency can be modified by adding a three-letter weekday code:

- ``W-SUN``, ``W-MON``, ``W-TUE``, ``W-WED``, etc.

On top of this, codes can be combined with numbers to specify other frequencies.
For example, for a frequency of 2 hours 30 minutes, we can combine the hour (``H``) and minute (``T``) codes as follows:

In [None]:
pd.timedelta_range(0, periods=9, freq="2H30T")

  pd.timedelta_range(0, periods=9, freq="2H30T")
  pd.timedelta_range(0, periods=9, freq="2H30T")


TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150min')

## Resampling, Shifting, and Windowing

The ability to use dates and times as indices to intuitively organize and access data is an important piece of the Pandas time series tools.
The benefits of indexed data in general (automatic alignment during operations, intuitive data slicing and access, etc.) still apply, and Pandas provides several additional time series-specific operations.

We will take a look at a few of those here, using some stock price data as an example. Install the `yfinance` package (installable via ``conda install yfinance``), and download Google's stock price history:

In [None]:
!pip install yfinance

Collecting yfinance
  Downloading yfinance-0.2.52-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting multitasking>=0.0.7 (from yfinance)
  Downloading multitasking-0.0.11-py3-none-any.whl.metadata (5.5 kB)
Collecting lxml>=4.9.1 (from yfinance)
  Downloading lxml-5.3.1-cp313-cp313-win_amd64.whl.metadata (3.8 kB)
Collecting frozendict>=2.3.4 (from yfinance)
  Downloading frozendict-2.4.6-py313-none-any.whl.metadata (23 kB)
Collecting peewee>=3.16.2 (from yfinance)
  Downloading peewee-3.17.9.tar.gz (3.0 MB)
     ---------------------------------------- 0.0/3.0 MB ? eta -:--:--
     ---------------------------------------- 3.0/3.0 MB 20.0 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collectin

In [75]:
import yfinance as yf

goog = yf.download('GOOG', start='2023-01-01', end='2024-09-03')
goog.head()

[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,GOOG,GOOG,GOOG,GOOG,GOOG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2023-01-03,89.378845,91.222228,88.70128,89.508385,20738500
2023-01-04,88.392403,90.913344,87.485665,90.684171,27046500
2023-01-05,86.459343,87.89419,86.250096,87.754692,23136100
2023-01-06,87.844376,88.153263,85.263644,87.047237,26612600
2023-01-09,88.482079,90.504809,88.262865,88.875661,22996700


For simplicity, we'll use just the closing price:

In [74]:
goog = goog['Close']

goog

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-03,89.378845
2023-01-04,88.392403
2023-01-05,86.459343
2023-01-06,87.844376
2023-01-09,88.482079
...,...
2024-08-26,167.519180
2024-08-27,165.972977
2024-08-28,164.097565
2024-08-29,163.000259


In [None]:
import matplotlib.pyplot as plt

import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_dark"

In [62]:
px.line(goog, title="GOOG Stock")


### Resampling and converting frequencies

One common need for time series data is resampling at a higher or lower frequency.
This can be done using the ``resample()`` method, or the much simpler ``asfreq()`` method.
The primary difference between the two is that ``resample()`` is fundamentally a *data aggregation*, while ``asfreq()`` is fundamentally a *data selection*.

Taking a look at the Google closing price, let's compare what the two return when we down-sample the data.
Here we will resample the data at the end of business year:

In [63]:
goog.head()

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-03,89.378845
2023-01-04,88.392403
2023-01-05,86.459343
2023-01-06,87.844376
2023-01-09,88.482079


In [64]:
goog_resample = goog.resample('M').mean() # resample the data to get the mean of the month
goog_freq = goog.asfreq('M') # asfreq will just take the last value of the month


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.



In [65]:
goog_resample

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-31,93.679403
2023-02-28,96.462349
2023-03-31,98.205833
2023-04-30,105.96767
2023-05-31,116.327705
2023-06-30,122.786908
2023-07-31,123.111149
2023-08-31,130.679587
2023-09-30,134.712464
2023-10-31,134.869493


In [66]:
goog_freq

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-31,99.512451
2023-02-28,89.976707
2023-03-31,103.627663
2023-04-30,
2023-05-31,122.928314
2023-06-30,120.536896
2023-07-31,132.633438
2023-08-31,136.858261
2023-09-30,
2023-10-31,124.851402


In [67]:
import pandas as pd
# df = pd.DataFrame({
#     "resample":goog_resample,
#     "as_freq":goog_freq
#     }).dropna()
df = pd.concat([goog_resample, goog_freq], axis=1)
df.columns = ["resample", "as_freq"]
df = pd.DataFrame(df)


figure = px.line(df, line_dash_sequence=["dashdot"], title="GOOG Stock (Yearly)")
figure.add_trace(px.line({"daily": goog}, color_discrete_sequence=["green"]).data[0])
figure.show()

NotImplementedError: Unable to convert data_frame of type <class 'dict'> to pandas DataFrame. Please provide a supported dataframe type or a type that can be passed to pd.DataFrame.

In this case we've made a down-sampling of timeseries data

For up-sampling, ``resample()`` and ``asfreq()`` are largely equivalent, though resample has many more options available.
In this case, the default for both methods is to leave the up-sampled points empty, that is, filled with NA values.
Just as with the ``pd.fillna()`` function discussed previously, ``asfreq()`` accepts a ``method`` argument to specify how values are imputed.
Here, we will resample the business day data at a daily frequency (i.e., including weekends):

In [68]:
goog_d = goog.asfreq('D')
goog_d_fill = goog.asfreq('D', method='bfill') # fill the missing values with the next value

In [69]:
px.line({
    "empty_weekends":goog_d+10, # add 10 to the empty weekends to see the difference between the two
    "filled_weekends":goog_d_fill
}, title="GOOG Stock (Daily)")

NotImplementedError: Unable to convert data_frame of type <class 'dict'> to pandas DataFrame. Please provide a supported dataframe type or a type that can be passed to pd.DataFrame.

### Time-shifts with `shift()`

Another common time series-specific operation is shifting of data in time. The method is `shift()`

In [70]:
goog.head()

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-03,89.378845
2023-01-04,88.392403
2023-01-05,86.459343
2023-01-06,87.844376
2023-01-09,88.482079


In [71]:
goog_sh = goog.shift(periods = 1) # shift the data by one period (bring from past to future by one day)

goog_sh.head()

Ticker,GOOG
Date,Unnamed: 1_level_1
2023-01-03,
2023-01-04,89.378845
2023-01-05,88.392403
2023-01-06,86.459343
2023-01-09,87.844376


In [76]:
import pandas as pd
goog_df = pd.DataFrame(goog)

goog_df['goog_shift_1'] = goog_df['Close'].shift(1)

goog_df.corr()

Unnamed: 0_level_0,Price,Close,High,Low,Open,Volume,goog_shift_1
Unnamed: 0_level_1,Ticker,GOOG,GOOG,GOOG,GOOG,GOOG,Unnamed: 7_level_1
Price,Ticker,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
Close,GOOG,1.0,0.99902,0.999221,0.99779,-0.409791,0.995508
High,GOOG,0.99902,1.0,0.999138,0.999037,-0.397751,0.996884
Low,GOOG,0.999221,0.999138,1.0,0.999047,-0.415188,0.996581
Open,GOOG,0.99779,0.999037,0.999047,1.0,-0.403902,0.997464
Volume,GOOG,-0.409791,-0.397751,-0.415188,-0.403902,1.0,-0.408349
goog_shift_1,,0.995508,0.996884,0.996581,0.997464,-0.408349,1.0


In [None]:
goog_df = pd.DataFrame({
    "original": goog,
    "original_shift_1": goog_sh
})

goog_df.corr()

ValueError: If using all scalar values, you must pass an index

In [80]:
px.line({"original":goog, "shifted":goog_sh}, title="GOOG Stock (Daily)")

NotImplementedError: Unable to convert data_frame of type <class 'dict'> to pandas DataFrame. Please provide a supported dataframe type or a type that can be passed to pd.DataFrame.

This feature is very useful to calculate target variable in machine learning in forecasting problems

### Rolling windows

Rolling statistics are a third type of time series-specific operation implemented by Pandas.
These can be accomplished via the ``rolling()`` attribute of ``Series`` and ``DataFrame`` objects, which returns a view similar to what we saw with the ``groupby`` operation
This rolling view makes available a number of aggregation operations by default.

For example, here is the one-year rolling mean and standard deviation of the Google stock prices:

In [81]:
goog

Price,Close,High,Low,Open,Volume
Ticker,GOOG,GOOG,GOOG,GOOG,GOOG
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2023-01-03,89.378845,91.222228,88.701280,89.508385,20738500
2023-01-04,88.392403,90.913344,87.485665,90.684171,27046500
2023-01-05,86.459343,87.894190,86.250096,87.754692,23136100
2023-01-06,87.844376,88.153263,85.263644,87.047237,26612600
2023-01-09,88.482079,90.504809,88.262865,88.875661,22996700
...,...,...,...,...,...
2024-08-26,167.519180,168.965645,165.913134,167.743636,11990300
2024-08-27,165.972977,167.833404,165.753514,167.199963,13718200
2024-08-28,164.097565,166.980494,162.880548,166.371986,15208700
2024-08-29,163.000259,167.219922,161.585729,165.653756,17133800


In [82]:
rolling = goog.rolling(30) # rolling window of 30 days

data = pd.DataFrame({
    'input': goog,
    'rolling_mean': rolling.mean(),
    'rolling_std': rolling.std()
})

data # rolling mean and rolling std

ValueError: If using all scalar values, you must pass an index

In [None]:
px.line(data)

## The `dt` attribute in Series

The `dt` attribute of a pandas Series represents the datetime values of the series as a DatetimeIndex, which provides a lot of convenient functions for working with dates and times.

The `dt` attribute is only available for Series objects that contain datetime values. If the series does not contain datetime values, attempting to access the dt attribute will raise an `AttributeError`.

In [83]:
today = pd.to_datetime("2024-09-05")

In [84]:
today.day_of_week

3

In [85]:
# Create a series with datetime values
s = pd.Series(['2022-01-01', '2022-02-01', '2022-03-01'], dtype='datetime64[ns]')

print(s.dt.day_of_week) # dt specifies that it is a datetime and applies the method to the series

df = pd.DataFrame(s)

df[0] = pd.to_datetime(df[0])

df['day_of_week'] = df[0].dt.day_of_week
#df['day_of_week'] = df[0].map(lambda x: x.day_of_week)
df

0    5
1    1
2    1
dtype: int32


Unnamed: 0,0,day_of_week
0,2022-01-01,5
1,2022-02-01,1
2,2022-03-01,1


The `dt` attribute provides access to the following properties:

- `year`: the year of the datetime
- `month`: the month of the datetime
- `day`: the day of the datetime
- `hour`: the hour of the datetime
- `minute`: the minute of the datetime
- `second`: the second of the datetime

In [86]:
# Get the year of each datetime
s.dt.year

0    2022
1    2022
2    2022
dtype: int32

In [87]:
# Get the month of each datetime
s.dt.month

0    1
1    2
2    3
dtype: int32

In [88]:
# Get the day of each datetime
s.dt.day

0    1
1    1
2    1
dtype: int32

In [89]:
df = pd.DataFrame({
    "date": pd.to_datetime(['2022-01-01', '2022-02-01', '2022-03-01']),
    "values": [12, 23, 435]
})

df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["day"] = df["date"].dt.day
df["day_of_week"] = df["date"].dt.day_of_week

df.drop("date", axis=1)

Unnamed: 0,values,year,month,day,day_of_week
0,12,2022,1,1,5
1,23,2022,2,1,1
2,435,2022,3,1,1
