# "BC dates in Python - Part 3 - FlexiDate"
> "An approach to expressing BC dates in Python using a version of FlexiDate"

- toc: false
- branch: master
- badges: false
- comments: true
- categories: [datascience, history, python, time, pandas]
- image: images/placeholder.png
- hide: true
- search_exclude: true

## Recap: The problem
Python's [datetime module](https://docs.python.org/3/library/datetime.html) has a [MINYEAR](https://docs.python.org/3/library/datetime.html#datetime.MINYEAR) of 1AD, so we can't express BC dates like that. We'll need a different solution.

What are our requirements for a good solution? What functionality are we looking for?
- Expressing BC as well as AD dates
- Create from string and/or numeric parameters
- Print time
- Getters
- Add/subtract time span and getting time deltas
- Get time span delta
- Lightweight objects
- Useable in pandas?

In [None]:
#hide

#Imports
import importlib
import modules.flexidate as fd
importlib.reload(fd)

<module 'modules.flexidate' from '/Users/wolf/Documents/Projects/fast-blog/_notebooks/modules/flexidate.py'>

## 3. Custom made date classes
The best custom-made date class I've found is this `FlexiDate`-inspired one [here](https://github.com/okfn/datautil/blob/master/datautil/date.py) (which was in turn inspired by [this](https://rufuspollock.com/2009/06/18/flexible-dates-in-python/). Someone else wrote some tests for it [here](https://github.com/datopian/flexidate/blob/master/flexidate/test_flexidate.py) that I found useful to work out what the intended usage was.

It's a custom date format that supports BC dates as well as "imprecise" dates, e.g. you can specify dates as "ca. 1905" or "BC 41?" (to mean in the 410s BC). That sounds pretty exciting for historical research, especially since it still supports sorting.

Now, right off the bat, there are a few issues:
- FlexiDate was written in Python 2.x, so the code just doesn't work because Python itself has changed.
    - `unicode` has been renamed to `str`
    - By default, is parsed day-first which differs from Numpy/Pandas so I brought it in line to parse month-first
    - `dateutil.parser.parser._parse` now returns a tuple instead of just the parsing result
- `dateutil.parser` has a quirk where if the input is only a two-digit year, it assumes you must be talking about the second half of the 20th century or the first half of the 21st century. This isn't helpful when dealing with dates from 99BC to 99AD. I feel like I should just replace that parser with custom parsing.

I'm expecting to find more issues as I keep digging. For now I've moved a copy of `FlexiDate` [here](https://github.com/Wolololf/fast-blog/tree/master/_notebooks/modules/flexidate.py) for me to work on solving some of the incompatibilities. It's a work-in-progress, I'll iron out some of the issues and address some of the TODOs and write another post when it's ready. I'm tempted to replace the fuzzy parser with regular datetime parsing and then add negative years and uncertainty back on.

I also feel like I need to split `FlexiDate` into two classes: `FlexiDate` which supports negative time, and `FlexiDateRange` which builds on `FlexiDate` but supports ranges to express uncertainty. For time spans, I'm tempted to split it similarly, so `FlexiTimeSpan` just has the span as years, months and days, and a `FlexiTimeSpanRange` which supports a min/max range.

#### FlexiDate: Creating times BC/AD


In [70]:
#collapse-show

# Example code for creating BC and AD times

# Via parsing
ad_date_fd = fd.parse("2020-01-02")
print(ad_date_fd)

bc_date_fd = fd.parse("-0400-01-02")
print(bc_date_fd)

# Constructor
constructor_test = fd.FlexiDate(year=-123, month=5, day=6)
print(constructor_test)

bc_test = fd.parse("44 BC")
print(bc_test)

bc_test = fd.parse("144 BC")
print(bc_test)

2020-01-02
-0400-01-02
-0123-05-06
-2044
-0144


#### FlexiDate: Getters
Getters are easy in `FlexiDate` since its internal representation has attributes for year, month and day. It doesn't support time of day, so there are no getters for that.

In [71]:
#collapse-show

print(bc_date_fd.year)
print(bc_date_fd.month)
print(bc_date_fd.day)

-0400
01
02


#### FlexiDate: Time spans
Another major downside to `FlexiDate` is that it doesn't support time spans. There is plenty of code on the internet showing how to do time span calculations, I'm sure they could be adjusted to work with `FlexiDate`. A bigger challenge is the uncertainty aspect: How would subtracting two uncertain dates work? How would you add/subtract uncertain time spans?

I think subtraction of uncertain dates would need to yield a min-max range of time spans, e.g. `202?-199?` has a range of `[21, 39]`. Operating on two ranges would just operate on the min/max, e.g. `[21, 39] + [49, 61] = [70, 100]` and `[49, 61] - [21, 39] = [10, 40]`. Combining a `FlexiSpan` with a `FlexiDate` would return a `FlexiDate` if there was no uncertainty, or a tuple of `FlexiDates` for min/max results based on uncertainty in the date or time span.

I think this could work, but this is just an outline, there's work to be done.

#### FlexiDate and Pandas
I propose that `FlexiDate` would interact with Pandas in a similar way as Numpy's `datetime64` and `timedelta64`: Custom parsing functions for `date` columns or multiple date component columns

#### Conclusion
I love the idea of uncertainty in dates and time spans.