Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISO 8601 week date edge cases #50

Open
trevorld opened this issue Dec 25, 2022 · 2 comments
Open

ISO 8601 week date edge cases #50

trevorld opened this issue Dec 25, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@trevorld
Copy link

I'm not sure how important the ISO week date parsing is but I'm observing what I suspect are bugs when the ISO week date "year" is different from the Gregorian date "year" when parsing ISO week dates

parttime::as.parttime("2009-W53-7") # 2010-01-03
Warning in strptime(paste(x[, "year"], x[, "week"], x[, "weekday"] - 1L,  :
  (0-based) yday 373 in year 2009 is invalid
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
parttime::as.parttime("2009-W01-1") # 2008-12-29
<partial_time<YMDhms+tz>[1]> 
[1] "2009-01-04" 
clock::iso_year_week_day(2009, 1, 1) |> as_year_month_day()
<year_month_day<day>[1]>
[1] "2008-12-29"
parttime::as.parttime("2009-W02") # although day is unknown year-month is definitely 2009-01
Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
  Date strings including week and excluding weekday can not be fully
represented. To avoid loss of datetime resolution, such partial dates
are best represented as timespans. See `?timespan`.
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
parttime::as.parttime("2009-W01") # unknown year since includes days in 2008 and 2009
Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
  Date strings including week and excluding weekday can not be fully
represented. To avoid loss of datetime resolution, such partial dates
are best represented as timespans. See `?timespan`.
<partial_time<YMDhms+tz>[1]> 
[1] "2009" 
@dgkf
Copy link
Owner

dgkf commented Jan 3, 2023

Thank you again for another thoughtful write-up!

The weekday parsing is definitely underdeveloped. It leans primarily on strptime to try to convert yearweeks. I must admit that this isn't a terribly familiar format for me, and reading through the spec more thoroughly I think I missed some things.

Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?

Bugs

Just to itemize the things that this brought to my attention:

  • These are currently parsed by strptime using %U (US convention, not %V, the ISO-8601 convention). ISO uses Monday as the start of the week and the US convention uses Sunday as the start of the week. If everything else was working properly, this is already off by 1.
  • Unfortunately the ISO parsing using %V is allowed but ignored, so a correct implementation would need to lean on another library or try to post-process the imputed date by handling the off-by-one issue.
  • Even thought strptime %U purports to use a range 00-53, it seems to parse the same values for both 00 and 01:
    strptime("2009-00-0", "%Y-%U-%w")
    ## [1] "2009-01-04 UTC"  # from ?strptime, the week 1 should be the first Sunday, expected 2008
    
    strptime("2009-01-0", "%Y-%U-%w")
    ## [1] "2009-01-04 UTC"
    
    strptime("2009-02-0", "%Y-%U-%w")
    ## [1] "2009-01-11 UTC"

Behaviors

Just using this example that you gave for discussion:

parttime::as.parttime("2009-W02") # although day is unknown year-month is definitely 2009-01
## Warning in warn_repr_data_loss(x, includes = "week", excludes = "weekday") :
##   Date strings including week and excluding weekday can not be fully
## represented. To avoid loss of datetime resolution, such partial dates
## are best represented as timespans. See `?timespan`.
## <partial_time<YMDhms+tz>[1]> 
## [1] "2009" 

Parsing to "2009-01" would be feasible. It would effectively parse first as a timespan (parsing the lower and upper bounds) and then coerce to a parttime. This would produce a month when the week is entirely within the month, or omit the month in cases where a week spans multiple months.

Since I wasn't aware of many use cases for this behavior, trying to be exhaustive here wasn't prioritized.

@dgkf dgkf added the bug Something isn't working label Jan 3, 2023
@trevorld
Copy link
Author

trevorld commented Jan 4, 2023

Before jumping into changes, I'm curious - do you have a use case where yearweeks are frequently used?

Although I have my Google calendar set to also show the ISO week number and I use ISO week dates in my bullet journal I don't have a use case where I need to parse them in R. If I did need to parse them I can first parse them with {datetimeoffset} (which uses {clock} in the background to convert to the Gregorian calendar) and if needed convert to {parttime}:

library(datetimeoffset)
library(parttime)
as_datetimeoffset("2009-W53-7") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2010-01-03" 
as_datetimeoffset("2009-W01-1") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2008-12-29" 
as_datetimeoffset("2009-W01") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] NA 
as_datetimeoffset("2009-W02") |> as.parttime()
<partial_time<YMDhms+tz>[1]> 
[1] "2009-01" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants