In [1]:
import pandas as pd
import numpy as np

## 1. A simple problem in hydrology

How many days elapsed between the start of Water Year 1996 (October 1, 1995) and the end of Water Year 2015 (September 30, 2015).

### The hard way...

There are 20 years in our date range. Each year has 365 days, except for leap years. In our range the following were leap years:
* 1996
* 2000 (this one was a special case)
* 2004
* 2008
* 2012

So our answer is:

$N_{days} = N_{years}*365 + 5*1$

In [2]:
Ndays = 20*365 + 5*1
print(Ndays)

7305


### A more generic way...
The above is the correct answer. But, it is super specific though and required that I manually count the number of years on my fingers. As such, it's error prone. A more generic answer might look like the following, which defines a function to determine how many days are in any given year based on whether the year is modulo divisible* by 4, 100, and 500:

\* Note that "modulo divisible," which uses the `%` operator, means that a whole number is divisible by another whole number with no remainder. 

In [5]:
def DaysInYear(Year):
    if (Year % 4 == 0):
        if (Year % 100 == 0):
            if (Year % 500 == 0):
                Ndays_year = 366
            else:
                Ndays_year = 365
        else:
            Ndays_year = 366
    else:
        Ndays_year = 365
    
    return Ndays_year

StartYear = 1996
EndYear = 2015

Ndays = 0

for ThisYear in np.arange(StartYear,EndYear+1):
    Ndays += DaysInYear(ThisYear)
    
print(Ndays)

7305


This also produces the correct answer and is now more generic in that we can similarly use the same function to get the number of days that elapse between any two water years:

In [10]:
StartYear = 1950
EndYear = 2020

Ndays = 0

for ThisYear in np.arange(StartYear,EndYear+1):
    Ndays += DaysInYear(ThisYear)
    
print('N number of days between '+str(StartYear)+' and '+str(EndYear)+' is '+str(Ndays))

N number of days between 1950 and 2020 is 25933


You can change the start and end year in the above to verify this works.

### The easier way...

![Someoneelse](someoneelse.gif)

Clearly this is a problem that must be common to hydrologists, as well as to anyone that works in a field where the specific dates and times of events are significant. Ecology, climate science, aerospace engineering, finance are all fields where dates and times are signficant. So someone has clearly developed some tools for doing things that we need to do with time, right? **They have!!!**

The Pandas library has a suite of tools that operate on an object called a `datetime64` object. This allows us to do powerful things in comparatively few lines of code...

In [19]:
StartDate = pd.to_datetime('1995-10-01 00:00')
EndDate = pd.to_datetime('2015-10-01 00:00')

print(EndDate - StartDate)

7305 days 00:00:00


Wow! That was easy. Not only that, but we are working with something that is unambiguously a date and time. We passed the string `1995-10-01 00:00` which is midnight on October 1, 1995 to the pandas function `pd.to_datetime()` and it created an object of type `datetime64` that interpreted that as a date. We did the same for midnight October 1, 2015 (the very end of our range) and it did the same. Then all we did was take the difference between the end and the beginning using the `-` operator and it logically concluded that we wanted the difference between those dates in some logically interpretable unit (days).

But this isn't the only way we could have done this...

In [24]:
DatesDuringRange = pd.date_range(start='1995-10-01',end='2015-09-30')

print(DatesDuringRange)

DatetimeIndex(['1995-10-01', '1995-10-02', '1995-10-03', '1995-10-04',
               '1995-10-05', '1995-10-06', '1995-10-07', '1995-10-08',
               '1995-10-09', '1995-10-10',
               ...
               '2015-09-21', '2015-09-22', '2015-09-23', '2015-09-24',
               '2015-09-25', '2015-09-26', '2015-09-27', '2015-09-28',
               '2015-09-29', '2015-09-30'],
              dtype='datetime64[ns]', length=7305, freq='D')


In [38]:
DatesDuringRange.size

7305

One advantage to this is that we now have a vector of objects of type `datetime64` that we could associate with some measurement or quantity of interest, like discharge or groundwater level. We can query any date in that vector the way we would with any other vector...

In [25]:
DatesDuringRange[0]

Timestamp('1995-10-01 00:00:00', freq='D')

In [26]:
DatesDuringRange[500]

Timestamp('1997-02-12 00:00:00', freq='D')

In [27]:
DatesDuringRange[5624]

Timestamp('2011-02-23 00:00:00', freq='D')

In [29]:
DatetimeDuringRange = pd.date_range(start='1995-10-01',end='2015-09-30',freq='1H')
print(DatetimeDuringRange)

DatetimeIndex(['1995-10-01 00:00:00', '1995-10-01 01:00:00',
               '1995-10-01 02:00:00', '1995-10-01 03:00:00',
               '1995-10-01 04:00:00', '1995-10-01 05:00:00',
               '1995-10-01 06:00:00', '1995-10-01 07:00:00',
               '1995-10-01 08:00:00', '1995-10-01 09:00:00',
               ...
               '2015-09-29 15:00:00', '2015-09-29 16:00:00',
               '2015-09-29 17:00:00', '2015-09-29 18:00:00',
               '2015-09-29 19:00:00', '2015-09-29 20:00:00',
               '2015-09-29 21:00:00', '2015-09-29 22:00:00',
               '2015-09-29 23:00:00', '2015-09-30 00:00:00'],
              dtype='datetime64[ns]', length=175297, freq='H')


In [58]:
DatetimeDuringRange.groupby(DatetimeDuringRange.date)

len(DatetimeDuringRange.groupby(DatetimeDuringRange.date))

7305

In [34]:
MonthsDuringRange = pd.date_range(start='1995-10-01',end='2015-09-30',freq='1MS')
print(MonthsDuringRange)

DatetimeIndex(['1995-10-01', '1995-11-01', '1995-12-01', '1996-01-01',
               '1996-02-01', '1996-03-01', '1996-04-01', '1996-05-01',
               '1996-06-01', '1996-07-01',
               ...
               '2014-12-01', '2015-01-01', '2015-02-01', '2015-03-01',
               '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
               '2015-08-01', '2015-09-01'],
              dtype='datetime64[ns]', length=240, freq='MS')
