# Lecture 4: Dates and time series analysis

_Please sign attendance sheet_

## [Time Out](https://www.sipa.columbia.edu/office-academic-affairs/instructional-support/time-out-funding)

## [Time series example](https://www.facebook.com/photo.php?fbid=586998827431846&set=a.145488218249578&type=3&mibextid=wwXIfr&rdid=O9MSDY6VBvIvVQwg&share_url=https%3A%2F%2Fwww.facebook.com%2Fshare%2F1CqjHKHZDh%2F%3Fmibextid%3DwwXIfr)

[Source of chart](https://fred.stlouisfed.org/series/IHLIDXUSTPSOFTDEVE)

- https://newsletter.pragmaticengineer.com/p/is-there-a-drop-in-software-engineer
- https://www.adpresearch.com/the-rise-and-fall-of-the-software-developer/

## Dates and time series analysis

From [Wikipedia](https://en.wikipedia.org/wiki/Time_series):

> A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

What are time series you've seen / worked with?

Is the [311 data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9/data_preview) a time series?

## Import necessary packages

In [1]:
import pandas as pd

### Example data

In [2]:
cases = pd.DataFrame(
    {
        "date": ["12/30/2024", "12/31/2024", "01/01/2025", "01/02/2025"],
        "cases": [67, 29, 85, 66],
    }
)
cases

Unnamed: 0,date,cases
0,12/30/2024,67
1,12/31/2024,29
2,01/01/2025,85
3,01/02/2025,66


Get the minimum and maximum created_date:

In [3]:
cases["date"].min()

'01/01/2025'

In [4]:
cases["date"].max()

'12/31/2024'

Anything you notice about that?

The dates are stored as strings (`object`s):

In [5]:
cases.dtypes

date     object
cases     int64
dtype: object

## Convert columns to timestamps using [pandas' `to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps)

In [6]:
cases["date"] = pd.to_datetime(cases["date"], format="%m/%d/%Y")
cases["date"]

0   2024-12-30
1   2024-12-31
2   2025-01-01
3   2025-01-02
Name: date, dtype: datetime64[ns]

[More about the `format` string.](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) If you don't provide one, it will take much longer to convert.

In [7]:
cases["date"].min()

Timestamp('2024-12-30 00:00:00')

In [8]:
cases["date"].max()

Timestamp('2025-01-02 00:00:00')

## [Time/date components](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#time-date-components)

In [9]:
cases["date"].dt.day_name()

0       Monday
1      Tuesday
2    Wednesday
3     Thursday
Name: date, dtype: object

## [Congestion Pricing](https://congestionreliefzone.mta.info/) demo

### [Vehicle entries](https://data.ny.gov/Transportation/MTA-Congestion-Relief-Zone-Vehicle-Entries-Beginni/t6yz-b64h/about_data) per day

How would you do this, conceptually?

Has the number of entries plateaued?

## [Resampling](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html#resample-a-time-series-to-another-frequency)

From [the User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#resampling):

> `resample()` is a time-based `groupby`

```python
.resample("FREQUENCY", on="COLUMN")
```

[Frequency information](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#period-aliases)

### {% if id == "columbia" %}In-class exercise{% else %}Demo{% endif %}

Calculate the entries per week.

### Rolling average

### Resampling with other grouping

```python
entries.resample("W-SUN", on="toll_10_minute_block").sum(numeric_only=True)

# can be rewritten as

entries.groupby(
    [
        pd.Grouper(key="toll_10_minute_block", freq="W-SUN"),
    ]
).sum(numeric_only=True)
```

This means you can add other columns to group by. Let's do entries per day by `vehicle_class`.

## Resampling vs. time/date components

- Use resampling when you want to work with dates as continuous values, e.g. points in time
- Use date components when you want to work with dates as categorical values, e.g. month number, day of week

Can also do math between datetime columns, such as calculating durations.

## [Filtering timestamps](https://www.geeksforgeeks.org/select-pandas-dataframe-rows-between-two-dates/)

How many vehicle entries were there in the first week?

You can also [filter dates using strings](https://saturncloud.io/blog/how-to-filter-pandas-dataframe-by-time/), but using the [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html) class is safer.

## Final Project

1. Check feedback on your proposals, even if you got 100%.
1. [Once you start](https://python-public-policy.afeld.me/en/{{school_slug}}/final_project.html#once-you-start)

[Ask for help.](https://python-public-policy.afeld.me/en/{{school_slug}}/assignments.html#tips)

## [Homework 4](https://python-public-policy.afeld.me/en/{{school_slug}}/hw_4.html)