Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of conversion of millisecond Unix time to Refine datetime #608

Closed
tfmorris opened this issue Oct 15, 2012 · 33 comments · Fixed by #4784
Closed

Support of conversion of millisecond Unix time to Refine datetime #608

tfmorris opened this issue Oct 15, 2012 · 33 comments · Fixed by #4784
Assignees
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. grel The default expression language, GREL, could be improved in many ways! imported from old code repo Issue imported from Google Code in 2010 Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@tfmorris
Copy link
Member

tfmorris commented Oct 15, 2012

Original author: to...@benmoshe.com (September 15, 2012 11:52:15)

It would be nice to have a function such as FORMAT_UTC_USEC to convert Unix time ( which is a number) to a Datatime format, and to extract only year/month/day

Original issue: http://code.google.com/p/google-refine/issues/detail?id=609

@tfmorris
Copy link
Member Author

From tfmorris on September 18, 2012 15:31:52:
Can you provide a reference for the definition of this function?

Since we already provide a variety of date formatting functions, I'm guessing that what you really need is to be able to convert from a microsecond Unix time (ie Unix time multiplied by 1,000,000) to a Refine DateTime data type. From there you would be able to use any of the standard date functions.

@tfmorris
Copy link
Member Author

From to...@benmoshe.com on September 18, 2012 19:53:24:
Yes, it is very common to get the timestamp in microsecond unix time, a google example can be seen at rbig query functions https://developers.google.com/bigquery/docs/query-reference#timestampfunctions, but there are many other places you use it...
Axtuwlly, I wanted to use refine as part of automatic pre processing of data I am getting, and was wondering on size of input, and do it all in thbwckgrounf before loading csv to big query...

@magdmartin
Copy link
Member

convert-unix-time offer an API that you can call using the add column by fetching URL feature in OpenRefine.

@ghost ghost assigned tfmorris Mar 23, 2013
@iosonosempreio
Copy link

Hey, for me those API worked fine. The steps to get the readable date are the followings:

  • select the column containing the unix date, then add a new column by fetching url using this code:
'http://www.convert-unix-time.com/api?timestamp='+ substring(value, 0, 10)+'&timezone=Rome&returnType=json'
  • then transform the new column with this:
value.parseJson().utcDate

Note that I took the utcDate since I didn't need any timezone conversion for my date. If you need it, change the &timezone=Rome in the first code with the one you like. After you've fetched the urls, transform with this new one:

value.parseJson().localDate
That's it!

@magdmartin
Copy link
Member

magdmartin commented May 2, 2014

Thanks for sharing!

@tfmorris tfmorris modified the milestones: 2.6, 2.7 Apr 30, 2015
@magdmartin magdmartin modified the milestone: 2.7 Feb 12, 2017
@thadguidry thadguidry assigned jackyq2015 and unassigned tfmorris Apr 29, 2018
@thadguidry thadguidry added help wanted An issue that we would love anyone to help us with grel The default expression language, GREL, could be improved in many ways! Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. labels Nov 9, 2018
@thadguidry
Copy link
Member

For whomever works on this... Here's a good primer, since OpenRefine now uses java.time.offsetdatetime https://stackoverflow.com/questions/21242110/convert-java-util-date-to-java-time-localdate

Our code for the GREL toDate() function is here: https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/expr/functions/ToDate.java

@ostephens
Copy link
Member

ostephens commented Nov 12, 2018

So I'm currently looking at toDate as per #1759 and @thadguidry reminded me of this (thanks Thad!).

At the moment I can think of two options for handling this:

Extend toDate to have a GREL specific 'format' option which is (something like) 'epochms' (epoch milliseconds) which tells toDate to use a long number (or a string which can be converted to a long number) as millisecond Unix time and use that to generate a date/time

The alternative I can see is to have a special new function such as 'epochToDate' which does this specifically.

The second approach avoids making the toDate function even more complex than it already is, but has the downside of introducing a new function rather than a single 'toDate' function handling all conversions to a date.

@thadguidry @magdmartin do you have thoughts on this?

@thadguidry
Copy link
Member

Summary:
I think I would choose the format option to make it painless for our users (we do the hard things, to make it easier for them).

Errata:
While it is painful and complex within our toDate() function, this is necessary to keep GREL syntax nice and tidy. This is also how other programming languages handle this convention, like Javascript (Moment.js), Python, Pandas, Ruby. Our toDate() already has an optional format argument. Where I could see this leading to is possibly extending to handle a 3rd argument for timezone or utc.

Moment.js does things so beautifully and should be our model https://momentjs.com/docs/#/parsing/unix-timestamp-milliseconds/
Ruby uses time.at http://ruby-doc.org/core-1.9.3/Time.html#method-c-at
Pandas uses Pandas.Timestamp https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Timestamp.html#

We're not looking to completely incorporate all methods in Ruby and Pandas.Timestamp into our toDate(). Those can be handled by Python/Jython instead of GREL. But for GREL, it makes sense to at least have a common Unix timestamp convertor built into our toDate() for our users.

@ostephens
Copy link
Member

ostephens commented Nov 13, 2018

Thanks @thadguidry. I can see a new format option working

Our toDate() already has an optional format argument. Where I could see this leading to is possibly extending to handle a 3rd argument for timezone or utc.

One of the complexities of the toDate function in code is that it has a set of optional arguments- including one boolean type (month first) and two string types (locale and repeatable format).

Adding another optional argument (presumably of type string) for timezone will make it even more challenging to make sure the arguments are being interpreted correctly

@thadguidry
Copy link
Member

@ostephens Then don't worry about timezone at this time. Just use something like:

Date date = Date.from( Instant.ofEpochSecond( timeStamp ) );

and we should be good. That will give us the UTC/Greenwich time.

  • Also ensure that we document on the Wiki this very important footnote:

OpenRefine internally uses java.time which converts a Unix Timestamp using The Java Time-Scale.
The Java Time-Scale divides each calendar day into exactly 86400 subdivisions, known as seconds. These seconds may differ from the SI second. It closely matches the de facto international civil time scale, the definition of which changes from time to time. More information here: https://docs.oracle.com/javase/8/docs/api/java/time/Instant.html#Time-scale

keerthybreddy added a commit to keerthybreddy/OpenRefine that referenced this issue Apr 22, 2022
@ostephens
Copy link
Member

Originally I said:

A possibility I still like is to add a GREL function like epochToDate (I'm sure we can come up with a better name!) which accepts a number as input, outputs a date, and has parameter which allows you to specify the units (seconds, milliseconds, nanoseconds)

Which @keerthybreddy has been working on. During the work the question has arisen as to whether epochToDate is an appropriate name. Suggested alternatives we have so far are:

  • epochToDate()
  • unixToDate()
  • posixToDate()

I don't love any of these, but I'm torn between a more appropriate name and keeping the name short and simple ('timeSinceUnixEpochToDate' seems a bit OTT!)

Anyone else have suggestions?
@antoine2711 @wetneb @thadguidry

@thadguidry
Copy link
Member

thadguidry commented Apr 22, 2022

My opinion would be to keep it as epochToDate() because "epoch" is fairly common short terminology within Unix/Linux documentation and would be typically known coming from many places (including man pages in some cases) and https://en.wikipedia.org/wiki/Unix_time to be "time since Unix epoch". It's understood that the "epoch" is really an interval and that's represented in an encoding sequence. I would avoid the host and standards terms "unix" and "posix". (I would recant my opinion in previous #608 (comment))

For the documentation (and within the function help), I definitely would point to that Wikipedia link which clears up lots of confusion for most folks.

@wetneb
Copy link
Member

wetneb commented Apr 22, 2022

On my side (as stated on the pull request) I would prefer "unix" or "posix" terms since I consider that "epoch" does not refer to an interval but a specific origin date.

@thadguidry
Copy link
Member

thadguidry commented Apr 22, 2022

I'm just saying that the representation of the scalar value in Unix systems is commonly called just "Unix time". But universally, we would say "epoch time" or just "epoch".

I guess most users would be more familiar with "Unix time" (even though "epoch" is indeed a short term often seen as I stated).
So maybe unixToDate is fine... which is a short short version of saying unixTimeToDate which is not that bad an alternative to type out with GREL. Pick one @ostephens ;-)

To clarify my preference originally, it is because of this:

Sometimes, Unix time is mistakenly referred to as Epoch time, because Unix time is based on an epoch and because of a common misunderstanding that the Unix epoch is the only epoch (often called "the Epoch"[2]).[6][7][8]

and so "epoch" and "the Epoch" (January 1, 1970 (midnight UTC/GMT) not counting leap seconds in the case of Unix/Posix standard systems) are different things. https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_16

@antoine2711
Copy link
Member

(...) Although strictly Unix time is always in seconds since the Unix epoch.

Unix and Posix also aren't date formats so I'm not so keen on posixToDate or unixToDate for the same reason as @wetneb doesn't like epochToDate.

How about the middle ground, unixEpochToDate()?

Regards, Antoine

@wetneb
Copy link
Member

wetneb commented Apr 23, 2022

"UNIX epoch" refers even more unambigously to 1970-01-01 I would say, so rather not for me :)

@antoine2711
Copy link
Member

antoine2711 commented Apr 25, 2022

  1. unixTimeSecondsSinceEpochToDate() might cover it, but it seam a bit long to me.
  2. unixTimeSinceEpochToDate() could be an alternative, though still a bit long.
  3. Even unixTimeEpochToDate() could make it…
  4. And timeSinceUnixEpochToDate(), the one mentionned earlier by @ostephens
    Your take, @wetneb?

Regards, Antoine

@ostephens
Copy link
Member

If we're going to go for accuracy over brevity I think we may as well be precise, so out of those my vote goes to timeSinceUnixEpochToDate()

Note that the function takes an argument to specify seconds, milliseconds or microseconds - so it isn't strictly the unix time or seconds - it's time (in seconds, milliseconds or microseconds) since the Unix epoch converted to a Java date/time. So I think "unixTimeSeconds" is definitely wrong, and as Unix Time is expressed in seconds, technically unixTime is also wrong

@keerthybreddy
Copy link
Contributor

keerthybreddy commented May 1, 2022

After reading through the article that @thadguidry linked, I think I agree with @ostephens 's suggestion of timeSinceUnixEpochToDate() for the name. If no one else has any objections, I can change the name of the function to this.

keerthybreddy added a commit to keerthybreddy/OpenRefine that referenced this issue May 5, 2022
keerthybreddy added a commit to keerthybreddy/OpenRefine that referenced this issue May 7, 2022
keerthybreddy added a commit to keerthybreddy/OpenRefine that referenced this issue May 16, 2022
wetneb pushed a commit that referenced this issue May 16, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
@wetneb wetneb added this to the 3.6 milestone May 16, 2022
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jun 21, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
antoine2711 pushed a commit that referenced this issue Jul 6, 2022
* fixes #608 Conversion of Unix time to Refine datetime

* fixes #608 epochToDate function by removing extra comment

* fixes #608 removes unnecessary import statements

* fixes #608 updated test case

* fixes #608 removes unnecessary print statments

* fixes #608 added more test cases for complete code coverage

* fixes #608 function name, message, and variable name changes

* fixes #608 registers TimeSinceUnixEpochToDate() function

* fixes #608 changes made to getDescription message

Co-authored-by: Keerthy Reddy <keerthyreddy@Keerthys-Air.attlocal.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. grel The default expression language, GREL, could be improved in many ways! imported from old code repo Issue imported from Google Code in 2010 Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet