##  Time Series in Pandas 

https://aeturrell.github.io/coding-for-economists/time-series.html

https://aeturrell.github.io/coding-for-economists/time-intro.html

## Part 1 - Some datetime and Pandas datetime

In [None]:
try:
    from rich import inspect
except ImportError:
    %pip install rich

In [None]:
from datetime import datetime

now = datetime.now()
print(now)

In [None]:
# inspect the object using the rich library
inspect(now)

In [None]:
LDOS = datetime(2024, 4, 25)
print(LDOS)


In [None]:
now > LDOS

In [None]:
econ_grad_string = "15 May in 2024"

In [None]:
datetime.strptime(econ_grad_string, "%d %B in %Y")


In [None]:
econ_date = datetime.strptime(econ_grad_string, "%d %B in %Y")
econ_date

In [None]:
print( "Economics Graduation will be at ")
econ_date.strftime("%A, %B %d, %Y")


In [None]:
days_left = econ_date - now
days_left

In [None]:
print(f"Only {days_left.days} days left until graduation!")

In [None]:
inspect(days_left)

In [None]:
# make a pandas now and LDOS
import pandas as pd
# make a pandas now and LDOS
now_pd = pd.to_datetime(now)
LDOS_pd = pd.to_datetime(LDOS)
now_pd, LDOS_pd


In [None]:
# make pandas now in pacific time   
now_pd_pacific = now_pd.tz_localize('US/Pacific')
now_pd_pacific

In [None]:
# now convert to singapore time
now_pd_singapore = now_pd_pacific.tz_convert('Asia/Singapore')
now_pd_singapore

In [None]:
sleepy = now_pd_singapore - now_pd_pacific
sleepy

In [None]:
# Lets demo with Econ graduation Date
econ_date_pd = pd.to_datetime(econ_date)
econ_date_pd



In [None]:
# shft the date formwards by 1 day
get_ready= econ_date_pd - pd.Timedelta(days=1)
get_ready

In [None]:
vacation = econ_date_pd + pd.Timedelta(days=7)
vacation

In [None]:
Get_job = econ_date_pd + pd.Timedelta(days=30)
Get_job

###  Part 2 - Time Series in Pandas

In [None]:
import requests
import pandas as pd

In [None]:
import matplotlib.pyplot as plt
import matplotlib as mpl

Now let’s see how to turn data that has been read in with a non-datetime type into a vector of datetimes. This happens all the time in practice. We’ll read in some data on job vacancies for information and communication jobs, ONS code UNEM-JP9P, and then try to wrangle the given “date” column into a pandas datetime column.

In [None]:
# Dataset from URL
url = "https://api.ons.gov.uk/timeseries/JP9P/dataset/UNEM/data"
# Get the data from the ONS API:
df = pd.DataFrame(pd.json_normalize(requests.get(url).json()["months"]))

In [None]:
df["value"] = pd.to_numeric(df["value"])
df = df[["date", "value"]]
df = df.rename(columns={"value": "Vacancies (ICT), thousands"})
df.head()

In [None]:
# write to local  for use in other notebooks
df.to_csv("ONS_vacancies.csv", index=False)


In [None]:
%ls

In [None]:
df = pd.read_csv('ONS_vacancies.csv')
df

In [None]:
df.info()

In [None]:
df["date"] = pd.to_datetime(df["date"])
df["date"].head()

An aside on different types of date formats

In [None]:
small_df = pd.DataFrame({"date": ["1, '19, 22", "1, '19, 23"], "values": ["1", "2"]})
small_df["date"]

In [None]:
pd.to_datetime(small_df["date"], format="%m, '%y, %d")


In [None]:
df["date"] = df["date"] + pd.offsets.MonthEnd()
df.head()

In [None]:


print("Using `dt.day_name()`")
print(df["date"].dt.day_name().head())
print("Using `dt.isocalendar()`")
print(df["date"].dt.isocalendar().head())
print("Using `dt.month`")
print(df["date"].dt.month.head())

In [None]:
df = df.set_index("date")
df.head()

In [None]:
df.index[:5]


In [None]:
df = df.asfreq("M")
df.index[:5]

In [None]:
df.plot();


In [None]:
# Remake the dataset as an annual dataset
#using the mean across months

df.resample("A").mean()


In [None]:
df.resample("5A").agg(["mean", "std"]).head()


In [None]:
df.resample("D").asfreq()


In [None]:
df.resample("D").interpolate(method="linear", limit_direction="forward", limit=3)[:6]


In [None]:
#  Moving Average
df.rolling(2).mean()


In [None]:
# Exponetially Weighted Moving Average 
df.ewm(alpha=0.2).mean()


In [None]:
lead = 12
lag = 3
orig_series_name = df.columns[0]
df[f"lead ({lead} months)"] = df[orig_series_name].shift(-lead)
df[f"lag ({lag} months)"] = df[orig_series_name].shift(lag)
df.head()

In [None]:
df.iloc[100:300, :].plot();


In [None]:
import statsmodels.api as sm
from statsmodels.graphics import tsaplots

In [None]:
fig = tsaplots.plot_acf(df["Vacancies (ICT), thousands"], lags=24)
plt.show()

In [None]:
fig = tsaplots.plot_pacf(df["Vacancies (ICT), thousands"], lags=24)
plt.show()

## Adding examples from Pandas Documentation

In [None]:
URl2 = "https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/air_quality_no2_long.csv"
air_quality = pd.read_csv(URl2)
air_quality = air_quality.rename(columns={"date.utc": "datetime"})

air_quality.head()

In [None]:
air_quality.city.unique()

In [None]:
#what type of data is in the datetime column
air_quality["datetime"]


In [None]:
air_quality["datetime"] = pd.to_datetime(air_quality["datetime"])
air_quality["datetime"]

In [None]:
# you can do it directly in the read_csv function
air_quality2 = pd.read_csv(URl2, parse_dates=["date.utc"])
air_quality2

In [None]:
air_quality2["date.utc"]

In [None]:
air_quality["datetime"].min(), air_quality["datetime"].max()

In [None]:
air_quality["datetime"].max() - air_quality["datetime"].min()

In [None]:
air_quality["month"] = air_quality["datetime"].dt.month
air_quality.head()

In pandas, the .dt accessor is used with Series objects containing datetime-like data. It provides access to a wide range of properties and methods to perform operations on the data as datetime objects. When you have a Series of datetime objects, using .dt allows you to extract information like the year, month, day, hour, and minute, or even perform more complex manipulations like time zone conversions.

Here's a brief overview of some of the properties and methods available via the .dt accessor:

Properties
 - **date:** Returns the date part of each datetime.
- **time:** Returns the time part of each datetime.
- **year:** Returns the year of each datetime.
- **month:** Returns the month of each datetime.
- **day:** Returns the day of each datetime.
- **hour:** Returns the hour of each datetime.
- **minute:** Returns the minute of each datetime.
- **second:** Returns the second of each datetime.
- **microsecond:** Returns the microsecond of each datetime.
- **nanosecond:** Returns the nanosecond of each datetime.
- **dayofweek:** Returns the day of the week (Monday=0, Sunday=6).
- **dayofyear:** Returns the ordinal day of the year.
- **weekofyear:** Returns the week ordinal of the year.
- **quarter:** Returns the quarter of the date.
- **is_month_start:** Returns True for elements that are the first day of the month.
- **is_month_end:** Returns True for elements that are the last day of the month.
- **is_quarter_start:** Returns True for elements that are the first day of the quarter.

**Analysis**
Lets look at the average reading for each sensor for each weekday using `groupby`
To group on weekdays, we use the datetime property weekday (with Monday=0 and Sunday=6) of pandas Timestamp, which is also accessible by the dt accessor. 

In [None]:
air_quality.groupby(
    [air_quality["datetime"].dt.weekday, "location"])["value"].mean()

**Graph**  Let's plot the hourly NO2 across allstattions using the average reading for each hour

In [None]:
fig, axs = plt.subplots(figsize=(12, 4))

air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(
    kind='bar', rot=0, ax=axs)

## Use Pivot to make a table that has columns by Sensor Location 

In [None]:
no_2 = air_quality.pivot(index="datetime", columns="location", values="value")
no_2

In [None]:
# Subset of time with the raw data plotted
no_2["2019-05-20":"2019-05-21"].plot(subplots=False, figsize=(12, 16));

In [None]:
monthly_max = no_2.resample("M").max()
monthly_max

In [None]:
weekly_max = no_2.resample("W").max()
weekly_max

In [None]:
#Daily average
no_2.resample("D").mean().plot(style="-o", figsize=(10, 5));
