### Working with Time Series in Pandas

Pandas is great for handling data related to dates and times. Here are the key types of date/time data that we works with:

1. **Timestamps**: Specific points in time (e.g., July 4th, 2021, at 7:00 a.m.).
2. **Time Intervals/Periods**: Lengths of time (e.g., the entire month of June 2021).
3. **Time Deltas/Durations**: Exact lengths of time (e.g., 22.56 seconds).

### Dates and Times in Python

**Python's Native Tools**:
- **`datetime` Module**: Provides basic functions to work with dates and times.
  - Example: Create a specific date:
    
    

In [3]:
from datetime import datetime
dt = datetime(year=2021, month=7, day=4)
dt

datetime.datetime(2021, 7, 4, 0, 0)




- Example: Parse a date from a string using the `dateutil` module:
    
    

The `dateutil` module provides a `parse` function that you can use to parse `dates` into desired `string formats`. First, you'll need to import the parse function: **`from dateutil.parser import parse`**. Then you can use the parse function to return a parsed version of the date string.

In [6]:
from dateutil import parser
dt = parser.parse("4th of July, 2021")
dt

datetime.datetime(2021, 7, 4, 0, 0)

- **Formatting**: Print the day of the week:

In [109]:
dt.strftime('%a')

'Sun'

**NumPy's `datetime64`**:

- **`datetime64` Type**: Efficiently handles arrays of dates and times.
  - Example: Create a `datetime64` object:

In [10]:
import numpy as np
date = np.array('2002-06-24',dtype=np.datetime64)
date

array('2002-06-24', dtype='datetime64[D]')

  - **Vectorized Operations**: Perform operations on arrays of dates:

In [12]:
date+ np.arange(28)

array(['2002-06-24', '2002-06-25', '2002-06-26', '2002-06-27',
       '2002-06-28', '2002-06-29', '2002-06-30', '2002-07-01',
       '2002-07-02', '2002-07-03', '2002-07-04', '2002-07-05',
       '2002-07-06', '2002-07-07', '2002-07-08', '2002-07-09',
       '2002-07-10', '2002-07-11', '2002-07-12', '2002-07-13',
       '2002-07-14', '2002-07-15', '2002-07-16', '2002-07-17',
       '2002-07-18', '2002-07-19', '2002-07-20', '2002-07-21'],
      dtype='datetime64[D]')

**Precision and Range**:

- **Time Resolution**: The `datetime64` type can encode different levels of precision (e.g., days, minutes, nanoseconds).
  - Example: Create a nanosecond-precision date:

|Code    | Meaning     | Time span (relative) | Time span (absolute)   |
|--------|-------------|----------------------|------------------------|
| ``Y``  | Year	       | ± 9.2e18 years       | [9.2e18 BC, 9.2e18 AD] |
| ``M``  | Month       | ± 7.6e17 years       | [7.6e17 BC, 7.6e17 AD] |
| ``W``  | Week	       | ± 1.7e17 years       | [1.7e17 BC, 1.7e17 AD] |
| ``D``  | Day         | ± 2.5e16 years       | [2.5e16 BC, 2.5e16 AD] |
| ``h``  | Hour        | ± 1.0e15 years       | [1.0e15 BC, 1.0e15 AD] |
| ``m``  | Minute      | ± 1.7e13 years       | [1.7e13 BC, 1.7e13 AD] |
| ``s``  | Second      | ± 2.9e12 years       | [ 2.9e9 BC, 2.9e9 AD]  |
| ``ms`` | Millisecond | ± 2.9e9 years        | [ 2.9e6 BC, 2.9e6 AD]  |
| ``us`` | Microsecond | ± 2.9e6 years        | [290301 BC, 294241 AD] |
| ``ns`` | Nanosecond  | ± 292 years          | [ 1678 AD, 2262 AD]    |
| ``ps`` | Picosecond  | ± 106 days           | [ 1969 AD, 1970 AD]    |
| ``fs`` | Femtosecond | ± 2.6 hours          | [ 1969 AD, 1970 AD]    |
| ``as`` | Attosecond  | ± 9.2 seconds        | [ 1969 AD, 1970 AD]    |

In [15]:
np.datetime64('2015-07-04 12:00')

numpy.datetime64('2015-07-04T12:00')

In [16]:
np.datetime64('2021-07-04 12:59:59.50', 'ns')

numpy.datetime64('2021-07-04T12:59:59.500000000')

In [17]:
np.datetime64('2002-06-24 08:16','D')

numpy.datetime64('2002-06-24')

#### **Available Time Units**:
- **Time Units**: Different codes represent different levels of precision (e.g., 'Y' for year, 'D' for day).
  - Example: `'ns'` for nanoseconds can encode dates from 1678 AD to 2262 AD.

**NOTES**:
- Use Python’s `datetime` and `dateutil` for basic date and time handling.
- For handling large arrays of dates efficiently, use NumPy’s `datetime64`.

### **Dates and Times in Pandas**

**Pandas Enhancements**:
- **`Timestamp`**: Combines the ease of Python's `datetime` with the efficiency of NumPy's `datetime64`.
- **`DatetimeIndex`**: A special type of index in Pandas that handles time-series data efficiently.

**Example**:
- Create a `Timestamp` from a string and format it:

In [20]:
import pandas as pd
date = pd.to_datetime('2002-06-21')
date

Timestamp('2002-06-21 00:00:00')

Here’s a list of common format codes for `strftime`, used to format dates and times in Python:

- **`%Y`**: Year with century (e.g., 2024)
- **`%y`**: Year without century (e.g., 24 for 2024)
- **`%m`**: Month as a zero-padded decimal number (01 to 12)
- **`%d`**: Day of the month as a zero-padded decimal number (01 to 31)
- **`%H`**: Hour (24-hour clock) as a zero-padded decimal number (00 to 23)
- **`%I`**: Hour (12-hour clock) as a zero-padded decimal number (01 to 12)
- **`%M`**: Minute as a zero-padded decimal number (00 to 59)
- **`%S`**: Second as a zero-padded decimal number (00 to 59)
- **`%p`**: AM or PM
- **`%a`**: Abbreviated weekday name (e.g., Mon, Tue)
- **`%A`**: Full weekday name (e.g., Monday, Tuesday)
- **`%b`**: Abbreviated month name (e.g., Jan, Feb)
- **`%B`**: Full month name (e.g., January, February)
- **`%c`**: Locale’s appropriate date and time representation
- **`%x`**: Locale’s appropriate date representation
- **`%X`**: Locale’s appropriate time representation
- **`%z`**: UTC offset in the form ±HHMM (e.g., +0000)
- **`%Z`**: Time zone name (e.g., UTC, EST)



In [22]:
import pandas as pd

s = pd.Series(pd.to_datetime(['2002-06-24','2002-10-29','2019-12-21', '2024-08-09']))

days_of_week = s.dt.strftime('%A')
print(days_of_week)

0      Monday
1     Tuesday
2    Saturday
3      Friday
dtype: object


- Perform vectorized operations:
  


In [24]:
import pandas as pd
date = pd.to_datetime('2002-06-21')
date + pd.to_timedelta(np.arange(10),'D')

DatetimeIndex(['2002-06-21', '2002-06-22', '2002-06-23', '2002-06-24',
               '2002-06-25', '2002-06-26', '2002-06-27', '2002-06-28',
               '2002-06-29', '2002-06-30'],
              dtype='datetime64[ns]', freq=None)

### **Indexing with Time in Pandas**

- **Creating a Time-Indexed Series**:

In [26]:
index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                          '2015-07-04', '2015-08-04'])
data = pd.Series([1,2,3,4],index = index)
data

2014-07-04    1
2014-08-04    2
2015-07-04    3
2015-08-04    4
dtype: int64

- **Indexing by Date**:

In [28]:
data['2014-08-04':'2015-08-04']

2014-08-04    2
2015-07-04    3
2015-08-04    4
dtype: int64

In [29]:
data['2014']

2014-07-04    1
2014-08-04    2
dtype: int64


### **Pandas Time Series Data Structures**

1. **`Timestamp` and `DatetimeIndex`**:
   - **`Timestamp`**: For single time points.
   - **`DatetimeIndex`**: For indexing time series data.
     

2. **`Period` and `PeriodIndex`**:
   - **`Period`**: Represents fixed time intervals (e.g., months, years).
   - **`PeriodIndex`**: Index for time periods.
  

3. **`Timedelta` and `TimedeltaIndex`**:
   - **`Timedelta`**: Represents differences between dates (e.g., durations).
   - **`TimedeltaIndex`**: Index for time differences.


The most fundamental of these date/time objects are the ``Timestamp`` and ``DatetimeIndex`` objects.
While these class objects can be invoked directly, it is more common to use the ``pd.to_datetime()`` function, which can parse a wide variety of formats.
Passing a single date to ``pd.to_datetime()`` yields a ``Timestamp``; passing a series of dates by default yields a ``DatetimeIndex``:

In [35]:
dates = pd.to_datetime([datetime(2021, 7, 3), '4th of July, 2021', '2021-Jul-6', '07-07-2021', '20210708'])
dates

DatetimeIndex(['2021-07-03', '2021-07-04', '2021-07-06', '2021-07-07',
               '2021-07-08'],
              dtype='datetime64[ns]', freq=None)

- **Convert `DatetimeIndex` to `PeriodIndex`**:

In [46]:
dates.to_period("D")

PeriodIndex(['2021-07-03', '2021-07-04', '2021-07-06', '2021-07-07',
             '2021-07-08'],
            dtype='period[D]')

- **Calculate time differences**:

A ``TimedeltaIndex`` is created, for example, when a date is subtracted from another:

In [49]:
dates - dates[0]

TimedeltaIndex(['0 days', '1 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq=None)

#### **``pd.date_range()``**

To make the creation of regular date sequences more convenient, Pandas offers a few functions for this purpose: 
- ``pd.date_range()`` for timestamps
- ``pd.period_range()`` for periods
- ``pd.timedelta_range()`` for time deltas.
  
We've seen that Python's ``range()`` and NumPy's ``np.arange()`` turn a startpoint, endpoint, and optional stepsize into a sequence.
Similarly, ``pd.date_range()`` accepts a start date, an end date, and an optional frequency code to create a regular sequence of dates.
By default, the frequency is one day:

In [61]:
pd.date_range('2002-06-24','2002-10-29')

DatetimeIndex(['2002-06-24', '2002-06-25', '2002-06-26', '2002-06-27',
               '2002-06-28', '2002-06-29', '2002-06-30', '2002-07-01',
               '2002-07-02', '2002-07-03',
               ...
               '2002-10-20', '2002-10-21', '2002-10-22', '2002-10-23',
               '2002-10-24', '2002-10-25', '2002-10-26', '2002-10-27',
               '2002-10-28', '2002-10-29'],
              dtype='datetime64[ns]', length=128, freq='D')

In [71]:
pd.date_range('2019-12-21',periods=12)

DatetimeIndex(['2019-12-21', '2019-12-22', '2019-12-23', '2019-12-24',
               '2019-12-25', '2019-12-26', '2019-12-27', '2019-12-28',
               '2019-12-29', '2019-12-30', '2019-12-31', '2020-01-01'],
              dtype='datetime64[ns]', freq='D')

In [75]:
pd.date_range('2019-12-21',periods=12,freq = 'h')

DatetimeIndex(['2019-12-21 00:00:00', '2019-12-21 01:00:00',
               '2019-12-21 02:00:00', '2019-12-21 03:00:00',
               '2019-12-21 04:00:00', '2019-12-21 05:00:00',
               '2019-12-21 06:00:00', '2019-12-21 07:00:00',
               '2019-12-21 08:00:00', '2019-12-21 09:00:00',
               '2019-12-21 10:00:00', '2019-12-21 11:00:00'],
              dtype='datetime64[ns]', freq='h')

To create regular sequences of ``Period`` or ``Timedelta`` values, the very similar ``pd.period_range()`` and ``pd.timedelta_range()`` functions are useful.
Here are some monthly periods:

In [119]:
pd.period_range('2019-12-21',periods=12,freq = 'M')

PeriodIndex(['2019-12', '2020-01', '2020-02', '2020-03', '2020-04', '2020-05',
             '2020-06', '2020-07', '2020-08', '2020-09', '2020-10', '2020-11'],
            dtype='period[M]')

And a sequence of durations increasing by an hour:

In [93]:
pd.timedelta_range(0, periods = 10, freq = 'h')

TimedeltaIndex(['0 days 00:00:00', '0 days 01:00:00', '0 days 02:00:00',
                '0 days 03:00:00', '0 days 04:00:00', '0 days 05:00:00',
                '0 days 06:00:00', '0 days 07:00:00', '0 days 08:00:00',
                '0 days 09:00:00'],
               dtype='timedelta64[ns]', freq='h')

## Frequencies and Offsets

Fundamental to these Pandas time series tools is the concept of a frequency or date offset.
Just as we saw the ``D`` (day) and ``H`` (hour) codes above, we can use such codes to specify any desired frequency spacing.
The following table summarizes the main codes available:

| Code   | Description         | Code   | Description          |
|--------|---------------------|--------|----------------------|
| ``D``  | Calendar day        | ``B``  | Business day         |
| ``MS`` | Month start         |``BMS`` | Business month start |
| ``QS`` | Quarter start       |``BQS`` | Businessquarter start|
| ``AS`` | Year start          |``BAS`` | Business year start  |
| ``M``  | Month end           | ``BM`` | Business month end   |
| ``Q``  | Quarter end         | ``BQ`` | Business quarter end |
| ``A``  | Year end            | ``BA`` | Business year end    |
| ``H``  | Hours               | ``BH`` | Business hours       |
| ``T``  | Minutes             |        |                      |
| ``S``  | Seconds             |        |                      |
| ``L``  | Milliseonds         |        |                      |
| ``U``  | Microseconds        |        |                      |
| ``N``  | nanoseconds         |        |                      |
| ``W``  | Weekly              |        |                      |

Frequencies and offsets are essential components of Pandas' time series functionality, allowing you to work with dates and times at different granularities.

**Frequencies**

Frequencies define the regularity of a time series. They are represented by strings or aliases

**Offsets**

Offsets represent a fixed duration of time. They are instances of the pandas.tseries.offsets.DateOffset class and its subclasses. Some common offsets include:

pd.DateOffset(days=1): One day offset\
pd.DateOffset(months=2): Two months offset\
pd.tseries.offsets.BDay(): Business day offset\
pd.tseries.offsets.MonthEnd(): Month end offset

In [115]:
pd.timedelta_range(0, periods=9, freq="2h30min")

TimedeltaIndex(['0 days 00:00:00', '0 days 02:30:00', '0 days 05:00:00',
                '0 days 07:30:00', '0 days 10:00:00', '0 days 12:30:00',
                '0 days 15:00:00', '0 days 17:30:00', '0 days 20:00:00'],
               dtype='timedelta64[ns]', freq='150min')