* Timedeltas: calculating a change in time.
* date_ranges: creating equally spaced intervals

# Panda's Timedeltas


In [1]:
import pandas as pd
url="https://docs.google.com/spreadsheets/d/e/2PACX-1vQcpVvVioO23cndDwr1UmKhndrSq6ES6ZUKZ4fkBBqIAavd1_coVPO_yeOye-Ub-cAWlkX3psJvOU8o/pub?output=csv"
df = pd.read_csv(url)
df['datetime'] = pd.to_datetime(df['date'])

In [2]:
# Set the datetime column as the index:
df = df.set_index("datetime")

Using `pd.to_timedelta`:
The first Argument is the number of time steps (e.g. 30).
Unit: the unit of time (e.g. "D").
Possible values:
* ‘W’
* ‘D’ / ‘days’ / ‘day’,
* ‘hours’ / ‘hour’ / ‘hr’ / ‘h’,
* ‘m’ / ‘minute’ / ‘min’ / ‘minutes’ / ‘T’
* ‘S’ / ‘seconds’ / ‘sec’ / ‘second’
* ‘ms’ / ‘milliseconds’ / ‘millisecond’ / ‘milli’ / ‘millis’ / ‘L’
* ‘us’ / ‘microseconds’ / ‘microsecond’ / ‘micro’ / ‘micros’ / ‘U’
* ‘ns’ / ‘nanoseconds’ / ‘nano’ / ‘nanos’ / ‘nanosecond’ / ‘N’

In [6]:
# make the time delta
delta_3d = pd.to_timedelta(3,"D")
delta_3d


Timedelta('3 days 00:00:00')

For the most humid day in the weather data, what was the average wind speed over the 3 days prior and the 3 days after the most humid day?


In [9]:
max_date = df["humidity"].idxmax()
# calc 3 days BEFORE
print(max_date)
pre_max = max_date - delta_3d
pre_max

2017-01-01 00:00:00


Timestamp('2016-12-29 00:00:00')

In [10]:
# calc 3 days AFTER
post_max = max_date + delta_3d
post_max

Timestamp('2017-01-04 00:00:00')

Now that we have our pre and post dates, we can obtain average windspeed by setting those dates as the range and calculating the mean:

In [11]:
mean_windspeed = df.loc[pre_max:post_max, "wind_speed"].mean()
mean_windspeed

4.89791666675

# pd.date_range

Another option is to store our range of dates as a variable using pd.date_range.

In [13]:
## making a date range to cover the pre-max to post-max window
date_range = pd.date_range(pre_max, post_max)
date_range

DatetimeIndex(['2016-12-29', '2016-12-30', '2016-12-31', '2017-01-01',
               '2017-01-02', '2017-01-03', '2017-01-04'],
              dtype='datetime64[ns]', freq='D')

In [None]:
# this will give an error
df.loc[date_range,'wind_speed'].mean()

This did not cause an error when just using .loc with our pre and post dates because using the range within .loc will just pull any dates that fall within the range, rather than a list of each date in the range.



In [14]:
df.loc[pre_max:post_max, "wind_speed"]

datetime
2016-12-29    6.000000
2016-12-30    6.266667
2016-12-31    7.325000
2017-01-01    0.000000
Name: wind_speed, dtype: float64