# Introduction to Time - Practical

The following notebook will take you through the basics of handling time with Python using the `datetime`, `numpy` and `pandas` packages.

The notebook is split into three parts:

- Part 1: Datetime
- Part 2: Numpy
- Part 3: Pandas

Please take care to run cells in the correct order and create new objects (or "variables" if you prefer) as explained in the questions. Cells that require your input should contain the comment `# Your code here...`. Other cells give `# Examples` or are used for `# Checks`. You do not need to change these.

In [None]:
from datetime import datetime, timedelta, timezone
from dateutil import tz
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random

## Part 1: Introduction to Datetime

### Part 1a: Creating and Inspecting Datetime Objects

You are expecting to receive a package. The delivery company assures you that the delivery will arrive at the datetime `delivery`. Note that this is a naive datetime ('naive' means you don't need to worry about time zone information):

In [None]:
delivery = datetime(2035, 10, 19, 12, 30)

Use `delivery.weekday()` to get the weekday index and then `print()` a string of the weekday name.

Weekday names are indexed starting from Monday. You can use the given list `weekday_names` to help. 

In [None]:
weekday_names = [
    "Monday",
    "Tuesday",
    "Wednesday",
    "Thursday",
    "Friday",
    "Saturday",
    "Sunday",
]
# Your code here...


The delivery arrives that day, but later than planned. Create a new datetime object, called `actual_delivery` with a viable time for the actual delivery time.

In [None]:
# Your code here...


Show that the **day** of `delivery` and `actual_delivery` are the same using the equality operator (`==`). Show that the `actual_delivery` **hour** is greater using the greater than operator (`>`).

Hint: You can access datetime attributes directly, eg `delivery.year`...

In [None]:
# Your code here...


### Part 1b: Arithmetic and Time Differences

You ask for some compensation from the delivery company. They tell you that, in the small print, the original slot was 2 hours and 15 minutes long. Declare a new object called `latest_delivery` that represents this time.

Do not declare a new `datetime` object directly. Instead add a new `timedelta` object (representing 2 hours and 15 minutes) to the existing `delivery` datetime using the `+` operator.

In [None]:
# Your code here...


In [None]:
# Check:
if not isinstance(latest_delivery, datetime):
    print("latest_delivery is not a datetime object!")
if not latest_delivery > delivery:
    print("latest_delivery is not later than delivery!")
else:
    print("looks good")

Now check if your `actual_delivery` was before of after `latest_delivery`.

In [None]:
# Your code here...


You decide to return your package.

You contact the delivery company to arrange a pick-up time slot. They tell you that the first available time slot starts at `datetime(2035, 10, 22, 7, 30)`. That there are 6 slots each 90 minutes apart.

Create a list of the 6 start times.

Hint 1: You should use a single datetime object (already given above) and a single timedelta object.

Hint 2: to keep you answer succinct you could try using a list comprehension, eg `[initial + (i * delta) for i in range(n)]`

In [None]:
# Your code here...


### Part 1c: Timezones

A friend is flying from Tokyo Japan to visit. The flight departure time `departure` is given below in local Japanese time.

Create a new object called `departure_uk` with `departure` converted to London UK time using `astimezone()` function.

In [None]:
departure = datetime(
    year=2030, month=7, day=11, hour=11, minute=38, tzinfo=tz.gettz("Asia/Tokyo")
)

# Your code here...


You agree to collect your friend from the airport in London.

- The flight duration is 14 hours and 47 minutes.

- You want to leave your place 90 minutes before the flight arrival in order to arrive on time.

Calculate `my_departure`, the time you will leave **your place** for the airport (in London local time). 

- departure time is `datetime(year=2030, month=7, day = 11, hour=11, minute=38, tz=tz.gettz("Asia/Tokyo"))`

- buffer time is 90 minutes

- journey time is 14 hours and 47 minutes

In [None]:
# Your code here...


### Part 1d: Parsing and Formatting

It is very common to need to parse time formatted as a string into datetimes. Unfortunately, in practice, formats can vary a lot.

Parse the following strings into datetimes using `datetime.strptime()`. The first example is completed for you.

Hint: use the [docs](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) to lookup new formatting codes.

In [None]:
# Example:
a_str = "September 04 2008 3:36PM"
a = datetime.strptime(a_str, "%B %d %Y %I:%M%p")

# Check:
assert a == datetime(year=2008, month=9, day=4, hour=15, minute=36)

In [None]:
b_str = "31-01-2022 23.59.59"
# your code here...


In [None]:
# Check:
assert b == datetime(year=2022, month=1, day=31, hour=23, minute=59, second=59)

In [None]:
c_str = "12:30AM, 01 of March, 2022"
# your code here...


In [None]:
# Check:
assert c == datetime(year=2022, month=3, day=1, hour=0, minute=30)

You can return the favour, by creating your own bespoke formatted strings using `.strftime()`:

`datetime.now().strftime("%H:%M %a %b-%y")  # Output: '11:45 Fri Dec-23'`

But best/kind practice is to let datetime use "default" string formatting:

`str(datetime.now())  # Output: '2023-12-08 11:47:32.480037'`



### Example Task

You have spoken to your neighbours about parcels being late.

You go out periodically and find out how many parcels are late. You want to plot this time series data on a figure. The y axis will represent late parcels, the x axis will represent time. But at the moment it doesn't look good as there are no labels on the x axis:

In [None]:
# fake data
xs = list(range(8))
ys = [abs(int(5 * random.random())) for _ in xs]
# create figure
fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(xs, ys)
ax.set_xticks(xs)
_ = ax.set_xticklabels([])

You need to create a list of time labels for the x axis called `x_labels`.

**Specification:**

- list of length of 8
- starting at midnight on a 1st July 2020 (inclusive)
- with intervals/steps 9 hours long
- formatted as "HH:MM D" eg `10:30 Mon`

Hint: string formatting [cheatsheet](https://strftime.org/)

Remember to call your list `x_labels`.

In [None]:
# Your code here...


In [None]:
# Some checks
assert len(x_labels) == 8
assert x_labels[-1] == "15:00 Fri"

# Testing the new figure
fig, ax = plt.subplots(figsize=(10, 3))
ax.plot(xs, ys)
ax.set_xticks(xs)
_ = ax.set_xticklabels(x_labels)

## Part 2: Introduction to Numpy 

### Part 2a: Creating Numpy objects

You decide to start your own delivery business.

You are going to set up a lot of delivery slots. The first one will start at 7.15(am) 1st July 2030 (`2030-07-01 07:15:00`)

Create this first delivery time, called `first_delivery` as a `np.datetime64` object from the above time string.

In [None]:
# Your code here...


In [None]:
# Check
assert first_delivery == np.datetime64(datetime(2030, 7, 1, 7, 15))
# (See that we can also create a datetime64 from a datetime object)

`np.datetime64` accepts two arguments:

- The first is the `datetime` object or a `string` as above.
- The second is the time unit code which specifies the precision of the object.

We initially create our `first_delivery` using "D" as the time unit code:

`first_delivery = np.datetime64("2030-07-01 07:15:00", "D")`

Why is a problem? Examine what happens when using "D" for precision.

In [None]:
# Your code here...


You can see the available codes [here](https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units). From now on we will use a precision of minutes:

`first_delivery = np.datetime64("2030-07-01 07:15:00", "m")`

We want to create an array of times for deliveries on this first day, we could use the same technique as for datetime, adding `timedelta64` to our first time:

`slots = np.array([np.datetime64("2030-07-01 07:15:00", "m") + (i * np.timedelta64("15", "m")) for i in range(10)])`

But this is clunky. Instead you can use `np.arange` to do this more efficiently.

Create an array range, called `slots` starting from `first_delivery`, 24 hours long, with 15 minute slots.

Hint: The docs for arange are [here](https://numpy.org/doc/stable/reference/generated/numpy.arange.html#numpy.arange).

In [None]:
# Your code here...


In [None]:
# Checks
if not len(slots) == 24 * 4:
    print(f"Expecting to see 24 * 4 = 96 slots, you have {len(slots)} slots")

duration = slots[-1] - slots[0]  # Note that we can do arithmetics as per datetime
if duration == (np.timedelta64(24, "h") - np.timedelta64(15, "m")):
    print("Ok - looks like an exclusive range (does not include the end)")
elif duration == np.timedelta64(24, "h"):
    print("Ok - looks like inclusive range (includes the end)")
else:
    print("Total duration looks incorrect")

### Part 2b: Numpy and Datetimes

Numpy doesn't support timezone operations. Creating a datetime64 from an aware datetime is currently supported, but, at the time of writing being depreciated.  

Show that this is the case by:
1. Creating an "aware" datetime
2. Convert this to datetime64
3. Inspecting it

In [None]:
# Your code here...


### Part 2c: Numpy Arithmetic and Time Units

Recall that Numpy uses a set of time unit codes for controlling precision of `np.datetime64` and `np.timedelta64` objects. We can use these codes to tell Numpy the unit of a time interval.

You want to establish three tiers of delivery slots:

1. `rough` - a 3 hour slot
2. `standard` - a 40 minute slot
3. `precise` - a 450 second slot

Create three `np.timedelta` objects called `rough`, `standard` and `precise`.

Using datetime64 arithmetic show that 24 precise slots can fit in one rough slot.

How many full standard slots would fit into a rough slot? (hint: use `//` not `/`)

In [None]:
# Your code here...


Time arithmetic with datetime and numpy (and also pandas) is pretty consistent. The main thing to remember with numpy is to be careful with precision. Numpy operations will also work on arrays. In the below example we shift a range of datetimes later by 15 minutes using a `timedelat64`:

In [None]:
# Example:
data = [datetime(2030, 1, 1, 7) + timedelta(hours=i) for i in range(12)]
hourly = np.array(data, dtype="datetime64[m]")

# Shift
shifted = hourly + np.timedelta64(15, "m")
shifted

For your new delivery company you decide to create more complex slot times. Maybe this will reduce complaints.

Rather than 15 minutes past the hour you decide you want to use minutes past the hour equal to the hour. This should look like `07:07:00`, `08:08:00` and so on.

Add a new array to the existing `hourly` array to achieve this.

Call the new array `hours_plus`.

Hints:
- This will be an array_a + array_b type operation
- The arrays will need to be the same length (12)
- You can create a range of timedelta64 using `np.arange`, eg: `np.arange(0, 10, dtype="timedelta64[s]")`

In [None]:
# Your code here...


In [None]:
# Check:
if not len(hours_plus) == 12:
    print(f"Expecting to see 12 slots, you have {len(hours_plus)} slots")
if not isinstance(hours_plus[0], np.datetime64):
    print("Expecting to see datetime64 objects")
if not hours_plus[-1] == np.datetime64("2030-01-01 18:18"):
    print(f"Expecting to see 18:18 as the last slot, you have {hours_plus[-1]}")
else:
    print("Looks good!")

## Part 3: Introduction to Pandas Datetime

The Pandas library builds on top of numpy and datetime to provide a rich portfolio of classes and functions for handling dates and times as part of Series and DataFrame objects.

The official user guide is available [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html) and documentation can be studied [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html).

### Part 3a: Creating Pandas Time Objects

Pandas supports initialising `Timestamp` objects from strings, Python objects or Numpy objects. The best way to create new `Timestamp` objects is to use the `pd.to_datetime` method:

In [None]:
# Examples:
a = pd.to_datetime(datetime(year=2017, month=11, day=6, hour=6, minute=44, second=13))
b = pd.to_datetime(np.datetime64("2017-11-06T06:44:13"))
c = pd.to_datetime(
    "6-11-17 6:44:13", dayfirst=True
)  # Sometimes we need to specify string formatting

assert a == b == c

`pd.to_datetime()` also works with Series, arrays, and ranges.

The following uses `pd.to_datetime()` on the given array `sample_range`. This  outputs an object called a DatetimeIndex:

In [None]:
# Example:
sample_range = np.arange("2030-07-01", "2031-07-01", dtype="datetime64[M]")
print(pd.to_datetime(sample_range))
# Output: DatetimeIndex(...)
# Note that you could also create a DatetimeIndex directly:
# idx = pd.DatetimeIndex(["1/1/2020 10:00:00+00:00", "2/1/2020 11:00:00+00:00"])

We can also use the `pd.date_range()` function to create a DatetimeIndex.

Use `pd.date_range()` to create an index called `month_starts` equivalent to `sample_range` defined above.

(1st day of the month from 2030-07-01 to 2031-07-01)

In [None]:
# Your code here...


In [None]:
# Check:
if not list(pd.to_datetime(sample_range))[:12] == list(month_starts)[:12]:
    print("range is not correct, check the docs for the 'inclusive' argument")
else:
    print("Good enough!")
    print("Note that date_range is 'inclusive' of the 'end' argument (arange is not)")
    print("You could fix this using 'pd.date_range(..., inclusive='left')'")

Pandas also supports the type `Period` to represent a fixed span of time and the type `PeriodIndex` for an array of these periods.

We are going to use a PeriodIndex for planning delivery slots for our new delivery business.

In [None]:
# Example Period with length of one day:
p = pd.Period("2012-1-1", freq="D")
print(p)
# We can check the start and end times of the period:
print(p.start_time, p.end_time)

# We can generate a range of periods using 'pd.period_range()' (very similar to the above pd.date_range):
months = pd.period_range("2030-07-01", "2031-07-01", freq="M")
print(months)
days = pd.period_range("2030-07-01", freq="D", periods=365)
print(days)

### Case Study:

You are concerned about your delivered packages sitting outside in the rain.

The following list `rainfall` is the expected amount of rainfall for the next 12 hours, starting `2030-07-01 07:00:00`. Create a pandas DataFrame called `data` with a column called 'rainfall' and a DatetimeIndex for the 12 hours.

- first create the index using `date_range` (hint use freq "H")

- then combine into a `DataFrame`

In [None]:
rainfall = [
    70.0,
    34.8,
    78.0,
    69.0,
    16.6,
    14.4,
    12.8,
    10.6,
    14.4,
    2.0,
    2.0,
    0.6,
]

In [None]:
# Your code here...


You have some deliveries planned already. You want to be able to look up what the amount of expected rainfall is for each.

To this you want to convert your index to a PeriodIndex using map:

In [None]:
# Example:
data.index = data.index.map(lambda x: x.to_period("h"))
data.head()

You can now index the rainfall data using time ranges:

In [None]:
# Example
data["2030-07-01 08:08":"2030-07-01 09:09"]

In [None]:
# Example:
data["2030-07-01 08:08":"2030-07-01 09:09"].rainfall.sum()
# we simply sum the rainfall column to get the total rainfall, but if you can think of a better way, go for it.

Calculate the `sum`, `mean` and `max` rainfall for the time slot `"2030-07-01 07:35":"2030-07-01 09:55"`:

In [None]:
# Your code here...


### Part 3b: Frequency/Offsets

The `freq` argument indicates the frequency of `Period`, `PeriodIndex` and `DatetimeIndex` in Pandas.

The options for this argument are called _offset aliases_ with the full list in the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). Using these codes we can generate very specific and useful sequences of datetimes and periods.

For example, using the code `B` is how Pandas interprets a business day frequency. We can use this to calculate how many more workdays until Christmas day.

In [None]:
# Example;
today = pd.Timestamp("today")
# timestamp for next christmas:
xmas = pd.Timestamp(year=today.year, month=12, day=25)

len(pd.date_range(today, xmas, freq="B"))

Using this same logic, calculate how many business **hours** are till next christmas:

In [None]:
# Your code here...


### Part 3c: Reading Data and Plotting

When reading in datasets (for example with `read_csv`) we can choose to create our time based indexes after reading the data. Or we can use parsing arguments as per the [documentation for `pd.read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) to do this more succinctly.

Load the leuchers dataset (it's weather data) using `pd.read_csv("./data/leuchars.csv")` but add the required arguments to automatically create a `DateTimeIndex` using the month column.

Hint: You will need to use the `index_col` and `parse_dates` arguments. 

Store the new DataFrame as `weather`.

Then convert the index into a `PeriodIndex`.

Hint: We previously used `map` but we can more simply use `DatetimeIndex.to_period()` (docs [here](https://pandas.pydata.org/docs/reference/api/pandas.DatetimeIndex.to_period.html)).


In [None]:
# Your code here...


Plotting with datetime objects is very similar to plotting any other Pandas dataframe. Pandas automatically recognises `PeriodIndex` and `DatetimeIndex` for better x-axis formatting.

Here we've plotting the rainfall from 1970-1979:

In [None]:
ax = weather.rain_mm["1970-1":"1979-12"].plot(figsize=(8, 4))
ax.set_title("Rainfall in 1970 (mm)")
plt.show()

The above chart is very busy. Find the yearly mean rainfall by resampling the `weather` DataFrame. Plot this new data.

Hint: Use [DataFrame.resample](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) and `mean()`

Bonus points:
- repeat for maximum rainfall and minimum rainfall each year
- can you plot these all on the same chart?

In [None]:
# Your code here...


### Final Challenge:

Let's explore a different dataset for the final challenge. The `goog.csv` file contains various information for the Google stock listing (GOOG) over 2017 and 2018. 

In [None]:
# Example:
goog = pd.read_csv("data/goog.csv", index_col="Date", parse_dates=True)
goog.head()

Plot the **weekly** stock **highs** and **lows**.

In [None]:
# Your code here...
