This repository and project has a few personal and functional objectives

### Personal
1. Develop habit of using git and version control as I'm developing and writing functions
2. Teach myself how to write useful unit tests
3. Learn about flask and deploying an API

### Functional
1. Write an API that can return the number of days, weekdays and complete weeks between two arbitrary dates with
timezones and return the result in any specified time unit (seconds, minutes, hours, days, years)
2. Make the solution easy to deploy (possibly a docker container)
3. Once written in Python, reimplement in PHP (separate repo)

The purpose of this python notebook is to document the development process (outside of commits) and have small
tests of code as well design decisions

### The breakdown
It's always good to have a plan of how to tackle the problem and break it up into small chunks.

1. Core functionality. Finding days between two dates is easy, weekdays and complete weeks will need some thought
2. Write unit tests to validate the core functionality. Generate a set of dates and potential problem values,
check their results on some publicly available API (https://www.timeanddate.com/date/workdays.html), use these
values as correct behaviour
3. Write a parsing function to validate that the strings fed into the API generate valid aware datetime objects
4. Write unit tests to validate parsing function with a few inputs to make sure it gracefully handles results
5. Implement flask API and function routing
6. Write up some documentation

I expect I will get stuck on determining and algorithm for 1

In [13]:
# First we'll define some arbitrary strings to testing the function

time1 = '2020-09-14 00:00:00+08:30' # same local time in two different timezones
time2 = '2020-09-14 00:00:00+00:00' # same local time in two different timezones
time3 = '2016-02-29 00:00:00+00:00' # Leap year
time4 = '2017-11-23 13:00:50+00:00' # random date with a time

Let's check the web for what it thinks the values of these should be
1. time1-time2 is 8:30
2. time1-time3 is 1658 days, 14 hours, 30 minutes
3. time1-time4 is 1025 days, 1 hour, 29 minutes and 10 seconds

In [14]:
import datetime as dt
# Let's write the function to calculate the days between the dates
def days_between_dates(date1,date2):
    date1 = dt.datetime.fromisoformat(date1)
    date2 = dt.datetime.fromisoformat(date2)
    timedelta = date2-date1
    return timedelta

print(days_between_dates(time1, time2))
print(days_between_dates(time1, time3))
print(days_between_dates(time1, time4))

8:30:00
-1659 days, 8:30:00
-1026 days, 21:30:50


So some issues to resolve and debug, but the start is there. We've got the starting of our unit tests, as
well as something worth committing. We'll move this function to its own file now and just import into the note
book for testing.

In [15]:
from api_date_functions import *
print(time_between_dates(time1, time2))
print(time_between_dates(time1, time3))
print(time_between_dates(time1, time4))

(0, 0, 0)
(1658, 1185, 1652)
(1025, 732, 1022)


Modified the function to take the abs(timedelta)

Looks like the values are right, except they are off by an hour, this could be a daylight savings difference
between how the web version calculated and how python calculates. At any rate, we're calculating the number of
days correctly now. Next is to establish how many weekdays are within a timespan, so we'll set up another set
of test values and have a play around

In [16]:
weekday1 = '2020-11-23 00:00:00+00:00'  #Monday
weekday2 = '2020-11-22 00:00:00+00:00'  #Sunday
weekday3 = '2020-11-30 00:00:00+00:00'  #Next Monday
weekday4 = '2020-12-14 00:00:00+00:00'  #Three weeks
weekday5 = '2020-12-15 00:00:00+00:00'  #Three weeks + 1 day

def weekdays_between_dates(date1,date2):
    date1 = dt.datetime.fromisoformat(date1)
    date2 = dt.datetime.fromisoformat(date2)
    # Find out what days the start and end date are
    print(date1.weekday(),date2.weekday())
    timedelta = abs(date1-date2)
    return timedelta

weekdays_between_dates(weekday1,weekday2)
weekdays_between_dates(weekday1,weekday4)

0 6
0 0


datetime.timedelta(days=21)

Ok, we can establish what day of the week a time interval will start and end on, as represented by an integer
The number of weekdays can be established by considering the start day, end day and total number of days.
Start by making a 7 x 1 array, where each element is the count of each weekday. Every 7 days should add 1 to every element
We only need to consider what the the remainder days after dividing total days by 7 and ensure those remainder days
are added to the right elements. Then we can sum up the elements of the array that don't match a mask corresponding
to weekends.

In [17]:
def weekdays_between_dates(date1,date2):
    date1 = dt.datetime.fromisoformat(date1)
    date2 = dt.datetime.fromisoformat(date2)
    # Find out what days the start and end date are
    start_day = date1.weekday()
    end_day = date2.weekday()
    # Make the weekday mask, this will add some extensibility if we need to define weekends differently
    # for an upcoming 4 day work week at some point
    weekdays_mask = [0, 1, 2, 3, 4]  #5 and 6 correspond to saturday and sunday
    timedelta = abs(date1-date2)
    total_days = timedelta.days
    # Initialise a dictionary to track number of each weekday
    weekday_count = {0 : 0, 1 : 0, 2 : 0, 3 : 0, 4 : 0, 5 : 0, 6 : 0}
    total_weeks = total_days // 7
    remainder_days = total_days % 7
    for day in weekday_count:
        weekday_count[day] += total_weeks
        offset_day = remainder_days - start_day + day - 1
        print(offset_day)
    # Now workout the offset and use it with start_day to add to the correct elements in the dictionary
    print(weekday_count)

weekdays_between_dates(weekday1,weekday2)
weekdays_between_dates(weekday1,weekday5)
weekdays_between_dates(time1, time3)

0
1
2
3
4
5
6
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0}
0
1
2
3
4
5
6
{0: 3, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3}
5
6
7
8
9
10
11
{0: 236, 1: 236, 2: 236, 3: 236, 4: 236, 5: 236, 6: 236}


The function is collecting the right amount of weeks and total days, but we need to think about how the start
day, end day, remainder and offset work so as we loop through the dictionary we're only adding +1 to the right
element. Upon consideration, the total number of extra days can be at most 6. So a 7 element array with 0 or 1
indicating which days need to be added to the total will work. Then we just need to rotate the values of that array
on a loop until the start day matches the integer representation

In [18]:
def rotate_array(array,value):
    #rotates an array or list to the left by value, use a negative value to rotate to the right
    array = array[value:] + array[:value]
    return array

test = [1,1,1,0,0,0,0]
rotate_array(test,-3)
value = 2
test = [ 1 if i < value else 0 for i in range(7) ]

In [19]:
def weekdays_between_dates(date1,date2):
    date1 = dt.datetime.fromisoformat(date1)
    date2 = dt.datetime.fromisoformat(date2)
    # Find out what days the start and end date are
    start_day = date1.weekday()
    # Make the weekday mask, this will add some extensibility if we need to define weekends differently
    # for an upcoming 4 day work week at some point
    weekdays_mask = [0, 1, 2, 3, 4]  #5 and 6 correspond to saturday and sunday
    timedelta = abs(date1-date2)
    total_days = timedelta.days
    # Initialise a dictionary to track number of each weekday
    weekday_count = {0 : 0, 1 : 0, 2 : 0, 3 : 0, 4 : 0, 5 : 0, 6 : 0}
    total_weeks = total_days // 7
    remainder_days = total_days % 7
    # Need to create array for remaining weekdays and rotate it to line up to start day
    extra_days = [1 if i < remainder_days else 0 for i in range(7)]
    extra_days = extra_days[-start_day:] + extra_days[:-start_day]
    for day, extra_day in zip(weekday_count,extra_days):
        weekday_count[day] += total_weeks + extra_day
    # Now just sum up days that match the mask and return total value
    total_weekdays = 0
    for weekday in weekdays_mask:
        total_weekdays += weekday_count[weekday]
    return total_days, total_weekdays, total_weeks

weekdays_between_dates(weekday1,weekday2)
weekdays_between_dates(weekday1,weekday5)

(22, 16, 3)

The function is now outputting total number of days weeks and weekdays. We'll need to add a section to convert
the output into various timeunits. We'll also need to add a flag to indicate whether the end_date is
inclusive or not

In [20]:
import numpy as np

weekday1 = '2020-11-23'  #Monday
weekday2 = '2020-11-22'  #Sunday
weekday3 = '2020-11-30'  #Next Monday
weekday4 = '2020-12-14'  #Three weeks
weekday5 = '2020-12-15'

print(np.busday_count(weekday2,weekday3))

weekdays_between_dates(weekday2,weekday3)[1]

5


5

Did some refactoring, forgot to write in journal. Wrote a function to handling the report units. Just testing
here

In [21]:
year = dt.timedelta(days=365)

def time_report_units(timedelta, units='days'):
    if units == 'seconds':
        return timedelta.total_seconds()
    elif units == 'minutes':
        return timedelta.total_seconds()//60
    elif units == 'hours':
        return timedelta.total_seconds()//3600
    elif units =='days':
        return timedelta.days
    elif units == 'years':
        return timedelta.total_seconds()/31536000

print(time_report_units(year, units='days'))
print(time_report_units(year, units='minutes'))
print(time_report_units(year, units='hours'))
print(time_report_units(year, units='seconds'))
print(time_report_units(year, units='years'))

365
525600.0
8760.0
31536000.0
1.0


Let's write a function to process the date time strings and ensure they are aware


In [22]:
def date_string_processor(date):
    try:
        date = dt.datetime.fromisoformat(date)
    except:
        return 'Input Error. Please format date strings as YYYY-MM-DD'
    else:
        print('Hello')
        if date.tzinfo is not None and date.tzinfo.utcoffset(date) is not None :
            print('Aware datetime object')
    finally:
        print(date)


date_string_processor(weekday2)


Hello
2020-11-22 00:00:00


In [23]:
year = dt.timedelta(days=365)

def time_report_units(timedelta, units='days'):
    if units == 'seconds':
        return timedelta.total_seconds()
    elif units == 'minutes':
        return timedelta.total_seconds()//60
    elif units == 'hours':
        return timedelta.total_seconds()//3600
    elif units =='days':
        return timedelta.days
    elif units == 'years':
        return timedelta.total_seconds()/31536000

print(time_report_units(year, units='days'))
print(time_report_units(year, units='minutes'))
print(time_report_units(year, units='hours'))
print(time_report_units(year, units='seconds'))
print(time_report_units(year, units='years'))

365
525600.0
8760.0
31536000.0
1.0


In [30]:
from pytz import timezone
import pytz

utc = pytz.utc
utc.zone
eastern = timezone('US/Eastern')
eastern.zone
fmt = '%Y-%m-%d %H:%M:%S %Z%z'

def date_string_processor(date, timezone='utc'):
    date = dt.datetime.fromisoformat(date)
    if date.tzinfo is None :
        date = utc.localize(date)
    else:
        print(date.strftime(fmt))

date_string_processor(weekday1)