# 08 Demo: Python Time
UW Geospatial Data Analysis  
CEE467/CEWA567  
David Shean  

## Introduction
* https://csit.kutztown.edu/~schwesin/fall20/csc223/lectures/Pandas_Time_Series.html
* Multiple options to represent datetime objects - easy to convert
* https://en.wikipedia.org/wiki/Second

### Python `datetime`
* Built-in module called `datetime` which contains classes for `datetime` object (and `timedelta` object) - can be confusing
* https://docs.python.org/3/library/datetime.html

### NumPy `datetime64`
* https://numpy.org/doc/stable/reference/arrays.datetime.html

### Pandas `Timestamp`
* https://pandas.pydata.org/docs/user_guide/timeseries.html
* https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html
* https://pandas.pydata.org/docs/user_guide/timeseries.html#overview
* `DatetimeIndex`
* `pd.to_datetime()`
    * Accepts "int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like"

### xarray
* https://xarray.pydata.org/en/stable/user-guide/time-series.html

### Day of calendar year
* January 1 = 1
* January 2 = 2
* December 31 = 365 

### Water year
* Starts October 1, ends September
* Southern hemisphere?

### Time zones
* Let Pandas handle this
* You will inevitably get a warning about timezone aware vs. naive Timestamp objects
    * Add time zone: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.tz_localize.html
    * Remove time zone: https://stackoverflow.com/a/34687479
* General advice (time and timestamps are messy): https://www.youtube.com/watch?v=-5wpm-gesOY&amp;ab_channel=Computerphile 

## Discussion
* (t,x,y,z) records for one or more variables
* Pandas Timestamp vs. Python DateTime vs. Numpy.DateTime64
    * Some functions across different modules play nicely with one and not the other
* Dealing with missing values in DataFrame
    * Sometimes sensors fail or datalogger fails, sometimes values are flagged as erroneous
    * Pandas has excellent support for missing values: https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
    * `dropna()` https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna
* Trajectories
    * Argo floats (https://argo.ucsd.edu/)
    * Weather balloons
    * GNSS tracks - vehicles, pedestrians, aircraft
        * Spatial and temporal derivatives
* Permanent stations
    * Stream gage
    * SNOTEL sites
* How big is too big for Pandas/GeoPandas?
    * https://github.com/toddwschneider/nyc-taxi-data
    * PostgreSQL/PostGIS
        * SQL - Structured Query Language, used for managing data in a relational database
* What to do with multiple variables for each timestamp?
    * xarray works well for multiple variables (e.g., snow depth and SWE for same site) for each station for each time
        * https://docs.xarray.dev/en/stable/
    * Separate 2D dataframes
        * One storing locations of all sites
        * One storing time series of some variable for all sites
        * Common station ID as key

In [1]:
from datetime import datetime
import pandas as pd
import numpy as np

In [2]:
datetime?

[0;31mInit signature:[0m [0mdatetime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           /srv/conda/envs/notebook/lib/python3.9/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     ABCTimestamp, _NaT


In [3]:
dt1 = datetime(2022, 2, 22)
dt2 = datetime.now()

In [4]:
print(dt2)

2022-02-26 01:55:25.626477


In [5]:
dt2

datetime.datetime(2022, 2, 26, 1, 55, 25, 626477)

In [6]:
dt2.year

2022

In [7]:
dt2.strftime?

[0;31mDocstring:[0m format -> strftime() style string.
[0;31mType:[0m      builtin_function_or_method


#### Side note: formatting timestamp strings

In [8]:
#YYYYMMDD is best
dt2.strftime('%Y%m%d')

'20220226'

In [9]:
#This won't sort alphanumerically
dt2.strftime('%m%d%Y')

'02262022'

In [10]:
dt_diff = dt2 - dt1

In [11]:
dt_diff

datetime.timedelta(days=4, seconds=6925, microseconds=626477)

In [12]:
dt_diff.total_seconds()

352525.626477

#### How many seconds in a day?  In a year?
* approximately `pi * 10^7`
* What is a second anyway?

In [27]:
60*60*24

86400

In [14]:
dt2

datetime.datetime(2022, 2, 26, 1, 55, 25, 626477)

In [15]:
pd.to_datetime(dt2)

Timestamp('2022-02-26 01:55:25.626477')

In [16]:
ts1 = pd.Timestamp('2019-02-01 12:00:00')

In [17]:
ts2 = pd.Timestamp('2019-02-06 00:00:00')

In [18]:
ts1

Timestamp('2019-02-01 12:00:00')

In [19]:
ts2

Timestamp('2019-02-06 00:00:00')

In [20]:
dt = ts2 - ts1

In [21]:
dt

Timedelta('4 days 12:00:00')

In [22]:
ts1

Timestamp('2019-02-01 12:00:00')

In [23]:
ts1 + dt

Timestamp('2019-02-06 00:00:00')

In [24]:
ts2 + dt

Timestamp('2019-02-10 12:00:00')

In [25]:
ts1 - pd.Timedelta(days=1)

Timestamp('2019-01-31 12:00:00')

In [26]:
dt.total_seconds()

388800.0