In [None]:
import pandas as pd

Generating a time series in pandas involves creating a DataFrame or Series where the index is a datetime object. This allows you to work with time-related data efficiently. Here's how you can generate a time series in pandas:

**Using `pd.date_range()`**:
- The `pd.date_range()` function creates a range of datetime values based on specified parameters such as start date, end date, frequency, and periods.
- For example, `dates = pd.date_range(start='2022-01-01', end='2022-01-31', freq='D')` generates daily datetime values from January 1, 2022, to January 31, 2022.

In [None]:
dates = pd.date_range(start="2022-01-01", end="2022-01-31", freq="D")

dates

**Creating a DataFrame with Time Index**:
 - After generating datetime values, you can use them as the index for a DataFrame. For instance:

In [None]:
dates = pd.date_range(start="2022-01-01", end="2022-01-31", freq="D")
data = {"value": range(0, 155, 5)}
df = pd.DataFrame(data, index=dates)

df.tail()

**Resampling and Frequency Conversion**:
- You can resample the time series to change its frequency, such as aggregating daily data into monthly data using `resample()` and specifying the desired frequency.
    ```python
    monthly_data = df.resample('M').mean()
    ```
- This will calculate the mean of 'value' for each month, resulting in a new DataFrame with monthly data.
    ```

In [None]:
df.resample("W").sum()

What does this mean practically? Well, if you have a timeseries that might have _missing_ data, you can reconcile the issue by generating a date series, like we did in SQL.

In [None]:
import pandas as pd

alerts_df = pd.read_parquet("../../data/nps/nps_public_data_alerts.parquet")

alerts_df["alert_date"] = pd.to_datetime(alerts_df["lastIndexedDate"]).dt.date

In [None]:
# this df has date gaps
alerts_by_category = (
    alerts_df.groupby(["alert_date", "category"])["description"]
    .count()
    .reset_index()
    .sort_values("alert_date")
)

# this one doesn't— notice how we stack / unstack the index

alerts_df["alert_date"] = pd.to_datetime(alerts_df["lastIndexedDate"])

# Use grouper to build a datetime index with no gaps

alerts_no_gaps = (
    alerts_df.set_index("alert_date")
    .groupby([pd.Grouper(freq="1D"), "category"])["description"]
    .count()
)

# Unstack the category index to columns, fill in missing dates, and fill in NaNs with 0

num_alerts_unstacked = (
    alerts_no_gaps.unstack()
    .resample("1D")
    .asfreq()[["Caution", "Danger", "Information", "Park Closure"]]
    .fillna(0)
)

# Stack the category index back into a column

num_alerts = (
    num_alerts_unstacked.stack().reset_index().rename(columns={0: "num_alerts"})
)

We'll dig into date ranges more in our next lesson on Pandas windows. Of course, Pandas series also support arbitrary Python ranges.

In [None]:
pd.DataFrame(
    {
        "one_to_hundred": pd.Series(range(1, 101)),
        "hundred_to_one": pd.Series(range(100, 0, -1)),
        "one_hundred_by_twos": pd.Series(range(2, 202, 2)),
        "one_hundred_squares": pd.Series(range(1, 101)) ** 2,
    }
)