# Timedeltas and Date Ranges

## Timedeltas

- Timedelatas are a way to represent an increase or decrease in time. It is used primarily for calculating relative datetimes.
- To use pd.to_timedelta:
    - The first arg is the number of time steps
    - Second arg is the unit. Possible values:
        - W
        - D, days, day
        - hours, hour, hr, h
        - m, minutes, minute, min, T
        - S, seconds, sec, second
        - ms, milliseconds, millisecond, milli, millis, L
        - us, microseconds, microsecond, micro, micros, U
        - ns, nanoseconds, nano, nanos, nanosecond, N

In [1]:
import pandas as pd
url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQcpVvVioO23cndDwr1UmKhndrSq6ES6ZUKZ4fkBBqIAavd1_coVPO_yeOye-Ub-cAWlkX3psJvOU8o/pub?output=csv"
df = pd.read_csv(url)
df['datetime'] = pd.to_datetime(df['date'])

In [3]:
# Return to Delhi weather dataset and convert 'date' feature to datetime

import pandas as pd
url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQcpVvVioO23cndDwr1UmKhndrSq6ES6ZUKZ4fkBBqIAavd1_coVPO_yeOye-Ub-cAWlkX3psJvOU8o/pub?output=csv"
df = pd.read_csv(url)
df['datetime'] = pd.to_datetime(df['date'])

In [4]:
# Set datetime column as index
df = df.set_index('datetime')

In [5]:
# Set time delta of 3 days
delta_3d = pd.to_timedelta(3, 'D')
delta_3d

Timedelta('3 days 00:00:00')

- Example question: For the most humid day in the data, what was the average wind speed over the 3 days prior and the 3 days after the most humid day?

In [6]:
max_date = df['humidity'].idxmax()
# Calculate 3 days before
pre_max = max_date - delta_3d
pre_max

Timestamp('2016-12-29 00:00:00')

In [7]:
# Calculate 3 days after
post_max = max_date + delta_3d
post_max

Timestamp('2017-01-04 00:00:00')

In [8]:
# Using the pre and post dates, find the average for the range
mean_windspeed = df.loc[pre_max:post_max, 'wind_speed'].mean()
mean_windspeed

4.89791666675

In [11]:
mean_list = df.loc[pre_max:post_max, 'wind_speed']
mean_list

datetime
2016-12-29    6.000000
2016-12-30    6.266667
2016-12-31    7.325000
2017-01-01    0.000000
Name: wind_speed, dtype: float64

- This list stops at 01-01, which happens because this is where our data runs out. This is will important below.

**pd.date_range**

In [9]:
# Another option to store the range of dates in pd.date_range
date_range = pd.date_range(pre_max, post_max)
date_range

DatetimeIndex(['2016-12-29', '2016-12-30', '2016-12-31', '2017-01-01',
               '2017-01-02', '2017-01-03', '2017-01-04'],
              dtype='datetime64[ns]', freq='D')

In [13]:
# Use pd.date_range to get mean
# df.loc[date_range, 'wind_speed'].mean()

- This will give an error, because the last three dates (post_max) are beyond the index available in our data.
- pd.date_range will try to pull data from every single date in the range, rather than pull data from whatever days are available, as with .loc