Skip to content
This repository was archived by the owner on Feb 6, 2024. It is now read-only.

Adds graphic for time-travel section of splash page#19

Merged
rdblue merged 4 commits intoapache:mainfrom
samredai:time_travel
Jan 31, 2022
Merged

Adds graphic for time-travel section of splash page#19
rdblue merged 4 commits intoapache:mainfrom
samredai:time_travel

Conversation

@samredai
Copy link
Contributor

@samredai samredai commented Jan 26, 2022

This adds a timeline graphic for the Time Travel section feature description on the splash page.

Screen Shot 2022-01-26 at 8 03 10 AM

@RussellSpitzer
Copy link
Member

Although I like the intent here, I'm not sure we want to call out "rollback_to_timestamp" as the key use of time travel. I think we should center querying as as of a certain time as "time travel" while rollback is a more of a maintenance procedure. I feel like the graphic kind of implies that the key way to time travel is to rollback the whole table.

Also are we sure we want to add in a Spark specific command here?

@rdblue
Copy link
Contributor

rdblue commented Jan 26, 2022

I have the same reaction as @RussellSpitzer. I like the visualization of snapshots, but I don't consider rollback to be time travel. Rollback alters the state of the table, while time travel actually reads older versions.

The trouble here is that there isn't a good SQL demonstration of time travel yet. We've added table names for time travel in 3.2, but we're waiting for Spark 3.3 to get the AS OF TIMESTAMP and AS OF VERSION syntax. Maybe we should use those anyway? Or maybe we should use Spark's dataframe syntax to demo time travel right now:

spark.read.option("as-of-timestamp", System.currentTimeMillis() - ONE_DAY_MS).load("db.table)

<ul class="timeline">

<!-- Item 1 -->
<li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to use tabs instead of spaces? It makes this harder to read.

@samredai
Copy link
Contributor Author

I'll update it to use the spark line @rdblue provided but to @RussellSpitzer's point about not using a spark specific command. That has me thinking that maybe we should eventually just have SQL everywhere and not specify any engine at all so as not to give the impression that these are fundamentally specific to an engine since these features can be implemented into any engine, even if it hasn't been yet. The question of which engines have yet to implement specific features feels like a distraction from describing what the features actually are.

@samredai
Copy link
Contributor Author

samredai commented Jan 27, 2022

Changed this to be a termynal example. Does the sentence "Version rollback allows users to quickly correct problems by resetting tables to a good state." still fit in here or is it better to just remove it completely?

spark-time-travel.mp4

@rdblue
Copy link
Contributor

rdblue commented Jan 27, 2022

I like the sentence about rollback. I'd probably update the heading to "Time travel and rollback"

@rdblue
Copy link
Contributor

rdblue commented Jan 27, 2022

On the termynal example, I think there are a couple things we can do to improve it. For example, we could first do spark.read.load("nyc.taxis").count() and show like 2,000,000 or something. Then we could do spark.read.option("as-of-timestamp", 1526266800000).count() and show a lower number. I think that's good to show that the data is changing, rather than relying on variable names. And I also think my earlier suggestion to use System.currentTimeMillis() - ONE_DAY_MS is a bad idea because it makes the code look way too long and complicated.

@samredai
Copy link
Contributor Author

I like the sentence about rollback. I'd probably update the heading to "Time travel and rollback"

Done!

@samredai
Copy link
Contributor Author

On the termynal example, I think there are a couple things we can do to improve it. For example, we could first do spark.read.load("nyc.taxis").count() and show like 2,000,000 or something. Then we could do spark.read.option("as-of-timestamp", 1526266800000).count() and show a lower number. I think that's good to show that the data is changing, rather than relying on variable names. And I also think my earlier suggestion to use System.currentTimeMillis() - ONE_DAY_MS is a bad idea because it makes the code look way too long and complicated.

Updated this example!
time_travel_example

<span data-ty="input" data-ty-cursor="▋" data-ty-prompt="scala>">val NOW=System.currentTimeMillis()</span>
<span data-ty="input" data-ty-cursor="▋" data-ty-prompt="scala>">(spark</span>
<span data-ty="input" data-ty-cursor="▋" data-ty-prompt="">.read</span>
<span data-ty="input" data-ty-cursor="▋" data-ty-prompt="">.option("as-of-timestamp", NOW_MS - ONE_DAY_MS)</span>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this line still wraps, is it possible to make a constant above? It would be better to wrap the val NOW line:

scala> val TUESDAY = System.currentTimeMillis() - ONE_DAY_MS;
scala> ...
>.option("as-of-timestamp", TUESDAY)
>...

We could call it something specific but short (Tuesday works for me) or we could call it YESTERDAY?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this also fixes the slight problem that you used NOW and NOW_MS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this which gets rid of all line wraps?
time_travel_example

@rdblue rdblue merged commit 651bd25 into apache:main Jan 31, 2022
@samredai samredai deleted the time_travel branch February 4, 2022 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants