# Basic time series

# 7.1 Timestamps
`pandas` represents an instant in time using the `pandas.Timestamp` class. For example:

In [3]:
import pandas as pd

# create a timestamp
pd.Timestamp(year=2020, month=10, day=18, hour=13, minute=1, second=15)

Timestamp('2020-10-18 13:01:15')

When we store multiple Timestamps in a `pd.Series` (for example when we have a column of dates) the data type of the column is set to `datetime64[ns]`:

In [4]:
# Notice the dtype of the column is datetime64
pd.Series([pd.Timestamp(2020,10,18),
          pd.Timestamp(2020,10,17),
           pd.Timestamp(2020,10,16)
          ])

0   2020-10-18
1   2020-10-17
2   2020-10-16
dtype: datetime64[ns]

this is enough to get us started!

# 7.2 Data: Precipitation in Boulder, CO
To exemplify some of the basic time series functionalities we’ll be using data about hourly precipitation in the county of Boulder, Colorado from 2000 to 2014. In September 2013, an unusual weather pattern led to *some of the most intense precipitation ever recorded in this region*, causing devastating floods throughout the Colorado Front Range. Our goal is to visualize precipitation data in 2013 and identify this unusual weather event.

This data was obtained via the [National Oceanic and Atmosperic Administration (NOAA) Climate Data Online service](https://www.ncdc.noaa.gov/cdo-web/). This dataset is a csv and can be acceses at [this link](https://www.ncei.noaa.gov/orders/cdo/3488381.csv). You can [view the full documentation here](https://www.ncei.noaa.gov/pub/data/cdo/documentation/PRECIP_HLY_documentation.pdf). 

The following is a summary of the column descriptions:

- **STATION**: identification number indentifying the station.
- **STATION_NAME**: optional field, name identifying the station location.
- **DATE**: this is the year of the record (4 digits), followed by month (2 digits), followed by day of the month (2 digits), followed by a space and ending with a time of observation that is a two digit indication of the local time hour, followed by a colon (:) followed by a two digit indication of the minute which for this dataset will always be 00. Note: The subsequent data value will be for the hour ending at the time specified here. Hour 00:00 will be listed as the first hour of each date, however since this data is by definition an accumulation of the previous 60 minutes, it actually occurred on the previous day.
- **HPCP**: The amount of precipitation recorded at the station for the hour ending at the time specified for DATE above given in inches. The values 999.99 means the data value is missing. Hours with no precipitation are not shown.

# 7.3 Data preparation
Let’s start by reading in the data and taking a look at it.

In [6]:
# read in data 
url = 'https://raw.githubusercontent.com/carmengg/eds-220-book/main/data/boulder_colorado_2013_hourly_precipitation.csv'
precip = pd.read_csv(url)

# check df's head
precip.head()

Unnamed: 0,STATION,STATION_NAME,DATE,HPCP,Measurement Flag,Quality Flag
0,COOP:055881,NEDERLAND 5 NNW CO US,20000101 00:00,999.99,],
1,COOP:055881,NEDERLAND 5 NNW CO US,20000101 01:00,0.0,g,
2,COOP:055881,NEDERLAND 5 NNW CO US,20000102 20:00,0.0,,q
3,COOP:055881,NEDERLAND 5 NNW CO US,20000103 01:00,0.0,,q
4,COOP:055881,NEDERLAND 5 NNW CO US,20000103 05:00,0.0,,q
