# EDTF support - demonstration and validation

This notebook demonstrates and validates `undate` support for specific parts of the [Extended Date/Time Format (EDTF)](https://www.loc.gov/standards/datetime/) specification.

This notebook follows the same structure and uses the example from the Library of Congress specification, demonstrating parsing EDTF dates and formatting dates in EDTF syntax, for the parts of the specification undate implements.

`undate` only handles dates and date intervals; time is not supported.

*Notebook authored by Rebecca Sutton Koeser, October 2024.*

## Level 0

Full support for **Date** and **Time Interval**; **Date and Time** is not supported.

### Date

```
complete representation:            [year][“-”][month][“-”][day]
Example 1          ‘1985-04-12’ refers to the calendar date 1985 April 12th with day precision.
reduced precision for year and month:   [year][“-”][month]
Example 2          ‘1985-04’ refers to the calendar month 1985 April with month precision.
reduced precision for year:  [year]
Example 3          ‘1985’ refers to the calendar year 1985 with year precision.
```

#### Parse EDTF format

Demonstrate that these EDTF strings can be parsed into `Undate` objects.

In [1]:
import datetime 

from undate import Undate, UndateInterval, DatePrecision

# Example 1: day
day = Undate.parse("1985-04-12", "EDTF")
assert day.precision == DatePrecision.DAY
assert day == datetime.date(1985, 4, 12)

# Example 2 : month
month = Undate.parse("1985-04", "EDTF")
assert month.year == "1985" and month.month == "04"
assert month.precision == DatePrecision.MONTH

# Example 3  : year
year = Undate.parse("1985", "EDTF")
assert year.year == "1985"
assert year.precision == DatePrecision.YEAR

#### Output in EDTF format

Demonstrate that initalizing `Undate` objects and serializing with EDTF formatter returns the expected value.

In [2]:
from undate.undate import Undate, DatePrecision
from undate.converters.edtf import EDTFDateConverter

# set default format to EDTF
Undate.DEFAULT_CONVERTER = "EDTF"

# Example 1: day
day = Undate(1985, 4, 12)
# confirm EDTF converter is being used
assert isinstance(day.converter, EDTFDateConverter)
# casting to str is now equivalent to day.format("EDTF")
assert str(day) == "1985-04-12"
assert day.precision == DatePrecision.DAY

# Example 2 : month
month = Undate(1985, 4)
assert str(month) == "1985-04"
assert month.precision == DatePrecision.MONTH

# Example 3  : year
year = Undate(1985)
assert str(year) == "1985"
assert year.precision == DatePrecision.YEAR

### Date and Time - not supported

### Time Interval

EDTF Level 0 adopts representations of a time interval where both the start and end are dates: start and end date only; that is, both start and duration, and duration and end, are excluded. Time of day is excluded.

```
    Example 1          ‘1964/2008’ is a time interval with calendar year precision, beginning sometime in 1964 and ending sometime in 2008.
    Example 2          ‘2004-06/2006-08’ is a time interval with calendar month precision, beginning sometime in June 2004 and ending sometime in August of 2006.
    Example 3          ‘2004-02-01/2005-02-08’ is a time interval with calendar day precision, beginning sometime on February 1, 2004 and ending sometime on February 8, 2005.
    Example 4          ‘2004-02-01/2005-02’ is a time interval beginning sometime on February 1, 2004 and ending sometime in February 2005. Since the start endpoint precision (day) is different than that of the end endpoint (month) the precision of the time interval at large is undefined.
    Example 5          ‘2004-02-01/2005’ is a time interval beginning sometime on February 1, 2004 and ending sometime in 2005. The start endpoint has calendar day precision and the end endpoint has calendar year precision. Similar to the previous example, the precision of the time interval at large is undefined.
    Example 6          ‘2005/2006-02’ is a time interval beginning sometime in 2005 and ending sometime in February 2006.
```

#### Parse EDTF format

In [3]:
# Example 1
year_range = Undate.parse("1964/2008", "EDTF")
assert isinstance(year_range, UndateInterval)
assert year_range.earliest == Undate(1964)
assert year_range.latest == Undate(2008)
# Example 2
month_range = Undate.parse("2004-06/2006-08", "EDTF")
assert isinstance(month_range, UndateInterval)
assert month_range.earliest == Undate(2004, 6)
assert month_range.latest == Undate(2006, 8)
# Example 3
day_range = Undate.parse("2004-02-01/2005-02-08", "EDTF")
assert isinstance(day_range, UndateInterval)
assert day_range.earliest == Undate(2004, 2, 1)
assert day_range.latest == Undate(2005, 2, 8)
# Example 4 
day_month_range = Undate.parse("2004-02-01/2005-02", "EDTF")
assert isinstance(day_range, UndateInterval)
assert day_month_range.earliest == Undate(2004, 2, 1)
assert day_month_range.latest == Undate(2005, 2)
assert day_month_range.earliest.precision == DatePrecision.DAY
assert day_month_range.latest.precision == DatePrecision.MONTH
# Example 5
day_year_range = Undate.parse("2004-02-01/2005", "EDTF")
assert isinstance(day_range, UndateInterval)
assert day_year_range.earliest == Undate(2004, 2, 1)
assert day_year_range.latest == Undate(2005)
assert day_year_range.earliest.precision == DatePrecision.DAY
assert day_year_range.latest.precision == DatePrecision.YEAR
# Example 6 
year_month_range = Undate.parse("2005/2006-02", "EDTF")
assert isinstance(year_month_range, UndateInterval)
assert year_month_range.earliest == Undate(2005)
assert year_month_range.latest == Undate(2006, 2)
assert year_month_range.earliest.precision == DatePrecision.YEAR
assert year_month_range.latest.precision == DatePrecision.MONTH


#### Output in EDTF format

In [4]:
# Example 1
assert UndateInterval(Undate(1964), Undate(2008)).format("EDTF") == "1964/2008"
# Example 2
assert UndateInterval(Undate(2004, 6), Undate(2006, 8)).format("EDTF") == "2004-06/2006-08"
# Example 3
assert UndateInterval(Undate(2004, 2, 1), Undate(2005, 2, 8)).format("EDTF") == "2004-02-01/2005-02-08"
# Example 4 
assert UndateInterval(Undate(2004, 2, 1), Undate(2005, 2)).format("EDTF") == "2004-02-01/2005-02"
# Example 5
assert UndateInterval(Undate(2004, 2, 1), Undate(2005)).format("EDTF") == "2004-02-01/2005"
# Example 6 
assert UndateInterval(Undate(2005), Undate(2006, 2)).format("EDTF") == "2005/2006-02"

## Level 1

### Letter-prefixed calendar year

'Y' may be used at the beginning of the date string to signify that the date is a year, when (and only when) the year exceeds four digits, i.e. for years later than 9999 or earlier than -9999.
```
    Example 1             'Y170000002' is the year 170000002
    Example 2             'Y-170000002' is the year -170000002
```


In [5]:
# Example 1
# parse
assert Undate.parse("Y170000002", "EDTF").year == "170000002"
# format
assert str(Undate(170000002)) == "Y170000002"

# Example 2
# parse
assert Undate.parse("Y-170000002", "EDTF").year == "-170000002"
# format
assert str(Undate(-170000002)) == "Y-170000002"

### Seasons - not supported


### Qualification of a date (complete) - not yet supported

The characters '?', '~' and '%' are used to mean "uncertain", "approximate", and "uncertain" as well as "approximate", respectively. These characters may occur only at the end of the date string and apply to the entire date.

```
    Example 1             '1984?'             year uncertain (possibly the year 1984, but not definitely)
    Example 2              '2004-06~''       year-month approximate
    Example 3        '2004-06-11%'          entire date (year-month-day) uncertain and approximate
```

### Unspecified digit(s) from the right 

The character 'X' may be used in place of one or more rightmost digits to indicate that the value of that digit is unspecified, for the following cases:
```
    A year with one or two (rightmost) unspecified digits in a year-only expression (year precision)
    Example 1       ‘201X’
    Example 2       ‘20XX’
    Year specified, month unspecified in a year-month expression (month precision)
    Example 3       ‘2004-XX’
    Year and month specified, day unspecified in a year-month-day expression (day precision)
    Example 4       ‘1985-04-XX’               
    Year specified, day and month unspecified in a year-month-day expression  (day precision)
    Example 5       ‘1985-XX-XX’              
```

In [6]:
# Example 1       ‘201X’
# parse
date = Undate.parse("201X", "EDTF")
assert date.year == "201X"
assert date.precision == DatePrecision.YEAR
# earliest/latest possible years
assert date.earliest.year == 2010
assert date.latest.year == 2019
# format
assert str(Undate("201X")) == "201X"

# Example 2       ‘20XX’
# parse
date = Undate.parse("20XX", "EDTF")
assert date.year == "20XX"
assert date.precision == DatePrecision.YEAR
# earliest/latest possible years
assert date.earliest.year == 2000
assert date.latest.year == 2099
# format
assert str(Undate("20XX")) == "20XX"

# Example 3       ‘2004-XX’
# parse
date = Undate.parse("2004-XX", "EDTF")
assert date.year == "2004"
assert date.month == "XX"
assert date.precision == DatePrecision.MONTH
# earliest/latest possible months
assert date.earliest.month == 1
assert date.latest.month == 12
# format
assert str(Undate(2004, "XX")) == "2004-XX"

# Example 4       ‘1985-04-XX’   
# parse
date = Undate.parse("1985-04-XX", "EDTF")
assert date.year == "1985"
assert date.month == "04"
assert date.day == "XX"
assert date.precision == DatePrecision.DAY
# earliest/latest possible days
assert date.earliest.day == 1
assert date.latest.day == 30
# format
assert str(Undate(1985, 4, "XX")) == "1985-04-XX"

# Example 5       ‘1985-XX-XX’      
# parse
date = Undate.parse("1985-XX-XX", "EDTF")
assert date.year == "1985"
assert date.month == "XX"
assert date.day == "XX"
assert date.precision == DatePrecision.DAY
# earliest/latest possible months
assert date.earliest.month == 1
assert date.latest.month == 12
# earliest/latest possible days
assert date.earliest.day == 1
assert date.latest.day == 31   # undate guesses maximum month length when month is unknown
# format
assert str(Undate(1985, "XX", "XX")) == "1985-XX-XX"

### Extended Interval (L1)

1. A null string may be used for the start or end date when it is unknown.
2. Double-dot (“..”) may be used when either the start or end date is not specified, either because there is none or for any other reason.
3. A modifier may appear at the end of the date to indicate "uncertain" and/or "approximate"

* * *

**NOTE:** `undate` does not currently distinguish between open intervals and intervals with an unknown start or end date.

#### Open end time interval

`undate` supports open ended time intervals, but does not currently distinguish between null string and double dot.


    Example 1          ‘1985-04-12/..’
    interval starting at 1985 April 12th with day precision; end open
    Example 2          ‘1985-04/..’
    interval starting at 1985 April with month precision; end open
    Example 3          ‘1985/..’
    interval starting at year 1985 with year precision; end open


In [7]:
import datetime

# Example 1          ‘1985-04-12/..’
# parse
interval = Undate.parse("1985-04-12/..", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == datetime.date(1985, 4, 12)
assert interval.earliest.precision == DatePrecision.DAY
assert interval.latest is None
# format
# NOTE: undate interval does not currently distinguish between double dot and null string
assert str(UndateInterval(Undate(1985, 4, 12), None)) == "1985-04-12/"

# Example 2          ‘1985-04/..’
# parse
interval = Undate.parse("1985-04/..", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == Undate(1985, 4)
assert interval.earliest.precision == DatePrecision.MONTH
assert interval.latest is None
# format
assert str(UndateInterval(Undate(1985, 4), None)) == "1985-04/"

# Example 3          ‘1985/..’
# parse
interval = Undate.parse("1985/..", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == Undate(1985)
assert interval.earliest.precision == DatePrecision.YEAR
assert interval.latest is None
# format
assert str(UndateInterval(Undate(1985), None)) == "1985/"

#### Open start time interval

    Example 4          ‘../1985-04-12’
    interval with open start; ending 1985 April 12th with day precision
    Example 5          ‘../1985-04’
    interval with open start; ending 1985 April with month precision
    Example 6          ‘../1985’
    interval with open start; ending at year 1985 with year precision

In [8]:
# Example 4          ‘../1985-04-12’
# parse
interval = Undate.parse("../1985-04-12", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == datetime.date(1985, 4, 12)
assert interval.latest.precision == DatePrecision.DAY
# format
# NOTE: undate interval does not currently distinguish between double dot and null string
assert str(UndateInterval(None, Undate(1985, 4, 12))) == "../1985-04-12"

# Example 5          ‘../1985-04’
# parse
interval = Undate.parse("../1985-04", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == Undate(1985, 4)
assert interval.latest.precision == DatePrecision.MONTH
# format
assert str(UndateInterval(None, Undate(1985, 4), )) == "../1985-04"

# Example 6          ‘../1985’
# parse
interval = Undate.parse("../1985", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == Undate(1985)
assert interval.latest.precision == DatePrecision.YEAR
# format
assert str(UndateInterval(None, Undate(1985))) == "../1985"

#### Time interval with unknown end

    Example 7          ‘1985-04-12/’
    interval starting 1985 April 12th with day precision; end unknown
    Example 8          ‘1985-04/’
    interval starting 1985 April with month precision; end unknown
    Example 9          ‘1985/’
    interval starting year 1985 with year precision; end unknown


In [9]:
# Example 7          ‘1985-04-12/’
# parse
interval = Undate.parse("1985-04-12/", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == datetime.date(1985, 4, 12)
assert interval.earliest.precision == DatePrecision.DAY
assert interval.latest is None
# format
# NOTE: undate interval does not currently distinguish between double dot and null string
assert str(UndateInterval(Undate(1985, 4, 12), None)) == "1985-04-12/"

# Example 8          ‘1985-04/’
# parse
interval = Undate.parse("1985-04/", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == Undate(1985, 4)
assert interval.earliest.precision == DatePrecision.MONTH
assert interval.latest is None
# format
assert str(UndateInterval(Undate(1985, 4), None)) == "1985-04/"

# Example 9          ‘1985/’
# parse
interval = Undate.parse("1985/", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest == Undate(1985)
assert interval.earliest.precision == DatePrecision.YEAR
assert interval.latest is None
# format
assert str(UndateInterval(Undate(1985), None)) == "1985/"

#### Time interval with unknown start

    Example 10       ‘/1985-04-12’
    interval with unknown start; ending 1985 April 12th with day precision
    Example 11       ‘/1985-04’
    interval with unknown start; ending 1985 April with month precision
    Example 12       ‘/1985’
    interval with unknown start; ending year 1985 with year precision


In [10]:
# Example 10       ‘/1985-04-12’
# parse
interval = Undate.parse("/1985-04-12", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == datetime.date(1985, 4, 12)
assert interval.latest.precision == DatePrecision.DAY
# format
# NOTE: undate interval does not currently distinguish between double dot and null string
assert str(UndateInterval(None, Undate(1985, 4, 12))) == "../1985-04-12"

# Example 11       ‘/1985-04’
# parse
interval = Undate.parse("/1985-04", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == Undate(1985, 4)
assert interval.latest.precision == DatePrecision.MONTH
# format
assert str(UndateInterval(None, Undate(1985, 4), )) == "../1985-04"

# Example 12       ‘/1985’
# parse
interval = Undate.parse("/1985", "EDTF")
assert isinstance(interval, UndateInterval)
assert interval.earliest is None
assert interval.latest == Undate(1985)
assert interval.latest.precision == DatePrecision.YEAR
# format
assert str(UndateInterval(None, Undate(1985))) == "../1985"

#### Negative calendar year

    Example 1       ‘-1985’

Note: ISO 8601 Part 1 does not support negative year. 

In [11]:
# Example 1       ‘-1985’
# parse
neg_year = Undate.parse("-1985", "EDTF")
assert neg_year.year == "-1985"
# format
assert str(Undate(-1985)) == "-1985"

## Level 2

The only part of L2 that `undate` currently supports is allowing an unspecified digit anywhere in the date.

#### Unspecified Digit

For level 2 the unspecified digit, 'X', may occur anywhere within a component.

    Example 1                 ‘156X-12-25’
    December 25 sometime during the 1560s
    Example 2                 ‘15XX-12-25’
    December 25 sometime during the 1500s
    Example 3                ‘XXXX-12-XX’
    Some day in December in some year
    Example 4                 '1XXX-XX’
    Some month during the 1000s
    Example 5                  ‘1XXX-12’
    Some December during the 1000s
    Example 6                  ‘1984-1X’
    October, November, or December 1984

In [12]:
# Example 1                 ‘156X-12-25’
# parse
december = Undate.parse("156X-12-25", "EDTF")
assert december.year == "156X"
assert december.month == "12"
assert december.day == "25"
assert december.precision == DatePrecision.DAY
assert december.earliest.year == 1560
assert december.latest.year == 1569
# format
assert str(Undate("156X", 12, 25)) == "156X-12-25"

# Example 2                 ‘15XX-12-25’
# parse
december = Undate.parse("15XX-12-25", "EDTF")
assert december.year == "15XX"
assert december.month == "12"
assert december.day == "25"
assert december.precision == DatePrecision.DAY
assert december.earliest.year == 1500
assert december.latest.year == 1599
# format
assert str(Undate("15XX", 12, 25)) == "15XX-12-25"

# Example 3                ‘XXXX-12-XX’
# parse
december = Undate.parse("XXXX-12-XX", "EDTF")
assert december.year == "XXXX"
assert december.month == "12"
assert december.day == "XX"
assert december.precision == DatePrecision.DAY
# TODO: these must be in a different branch...
# assert december.earliest.year == Undate.MIN_YEAR
# assert december.latest.year == Undate.MAX_YEAR
assert december.earliest.day == 1
assert december.latest.day == 31
# format
assert str(Undate("XXXX", 12, "XX")) == "XXXX-12-XX"

# Example 4                 '1XXX-XX’
# parse
some_month = Undate.parse("1XXX-XX", "EDTF")
assert some_month.year == "1XXX"
assert some_month.month == "XX"
assert some_month.precision == DatePrecision.MONTH
assert some_month.earliest.year == 1000
assert some_month.latest.year == 1999
# format
assert str(Undate("1XXX", "XX")) == "1XXX-XX"

# Example 5                  ‘1XXX-12’
# parse
some_december = Undate.parse("1XXX-12", "EDTF")
assert some_december.year == "1XXX"
assert some_december.month == "12"
assert some_december.precision == DatePrecision.MONTH
assert some_december.earliest.year == 1000
assert some_december.latest.year == 1999
# format
assert str(Undate("1XXX", 12)) == "1XXX-12"

# Example 6                  ‘1984-1X’
# parse
late_1984 = Undate.parse("1984-1X", "EDTF")
assert late_1984.year == "1984"
assert late_1984.month == "1X"
assert late_1984.precision == DatePrecision.MONTH
assert late_1984.earliest.month == 10
assert late_1984.latest.month == 12
# format
assert str(Undate(1984, "1X")) == "1984-1X"