Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZEPPELIN-143: Git as a versioned notebook storage #497

Closed
wants to merge 13 commits into from

Conversation

bzz
Copy link
Member

@bzz bzz commented Dec 1, 2015

This is very basic implementation of the ZEPPELIN-143 at the backend.

It makes a local git repository our of your /notebook dir and commits a new revision for each save/update.

It does not:

  • add any remotes to the git repo. It is totally possible to do that manually though. It would be interesting to add this later, to be able to push the notebook to hostings like GH
  • have any GUI modifications. It is left as further work, to add the ability for a user to switch "versions" of the notebook, navigating between previous runs.

Feedback is very welcome!

@bzz
Copy link
Member Author

bzz commented Dec 1, 2015

\cc @Leemoonsoo @jongyoul @felixcheung @khalidhuseynov for a review

@bzz
Copy link
Member Author

bzz commented Dec 1, 2015

CI fails with something, that looks not very relevant to the changes in this PR. Can not reproduce on my local env too.

Results :

Failed tests: 
  NotebookTest.testAbortParagraphStatusOnInterpreterRestart:347 expected:<ABORT> but was:<RUNNING>

Tests run: 34, Failures: 1, Errors: 0, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Zeppelin ........................................... SUCCESS [ 12.289 s]
[INFO] Zeppelin: Interpreter .............................. SUCCESS [01:18 min]
[INFO] Zeppelin: Zengine .................................. FAILURE [  8.411 s]

.....

Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.662 sec <<< FAILURE! - in org.apache.zeppelin.notebook.NotebookTest
testAbortParagraphStatusOnInterpreterRestart(org.apache.zeppelin.notebook.NotebookTest)  Time elapsed: 0.108 sec  <<< FAILURE!
java.lang.AssertionError: expected:<ABORT> but was:<RUNNING>
    at org.junit.Assert.fail(Assert.java:88)
    at org.junit.Assert.failNotEquals(Assert.java:743)
    at org.junit.Assert.assertEquals(Assert.java:118)
    at org.junit.Assert.assertEquals(Assert.java:144)
    at org.apache.zeppelin.notebook.NotebookTest.testAbortParagraphStatusOnInterpreterRestart(NotebookTest.java:347)

@khalidhuseynov
Copy link
Contributor

This CI problem should be solved by #468, may need to rebase for that

@@ -19,10 +19,27 @@ limitations under the License.
-->
### Notebook Storage

In Zeppelin there are two option for storage Notebook, by default the notebook is storage in the notebook folder in your local File System and the second option is S3.
Zeppelin a pluggable notebook storage mechanism with multiple implementations.
There are few Notebook storage options avaialble for a use,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have multiple storage simultaneously? Could we clarify in this doc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... multiple storage enabled simultaneously...

@felixcheung
Copy link
Member

looks good! thanks for working on this. looking forward to git push ;)

@jeffsteinmetz
Copy link
Contributor

Is there a document (or could one be added) that describes setting up the Git repo integration (git ssh or https path, connecting Zeppelin to a specific git repo, passwords, ssh keys, etc.)?

@bzz
Copy link
Member Author

bzz commented Dec 2, 2015

Guys, thanks for reviews, appreciate that a lot!

Will address each of them in next couple of days.

@felixcheung
Copy link
Member

@jeffsteinmetz I think this is committing to local only, no remote repo support yet

@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

@jeffsteinmetz @felixcheung well noted, this PR introduces automation to keep notebook dir inside the git repository in your local filesystem.

Right now, this repository does not have any 'remotes' setup, and Zeppelin does not push it anywhere, so there are no Zeppelin configuration needed (except documented property change).

It is possible and quite simple to sync this local repo manually by cd notebooks; git add remote ...; git push, as well as I can add this ability to Zeppelin itself, if people find it useful (in the separate PR).

@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

@khalidhuseynov thanks for looking into CI problem, that's a bummer!

This CI problem should be solved by #468, may need to rebase for that

It looks like it was merged 6 days ago, so current branch should be on top of it already, but just in case - I have synced with the latest master and it seems to pass! If that repeats - will file the jira issue with label 'flaky-test'.

@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

Have addressed all the reviews, please let me know if something is missing.

In 5d7ffea .close() lifecycle method was introduced for the NotebookRero to cleanup the resources that storage implementation might have occupied.
As this is an API change \cc @khalidhuseynov and @Leemoonsoo who worked on this before, as I recall, to take a quick look.

@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

New CI failure - cassandra interpreter tests (one more candidate for flaky-test label in JIRA)

[INFO] Zeppelin: Lens interpreter ......................... SUCCESS [  3.570 s]
[INFO] Zeppelin: Cassandra ................................ FAILURE [ 33.880 s]

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 21.414 sec <<< FAILURE! - in org.apache.zeppelin.cassandra.CassandraInterpreterTest
org.apache.zeppelin.cassandra.CassandraInterpreterTest  Time elapsed: 21.414 sec  <<< ERROR!
java.lang.ExceptionInInitializerError: null
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:269)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:183)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
    at info.archinnov.achilles.script.ScriptExecutor.executeScript(ScriptExecutor.java:58)
    at info.archinnov.achilles.embedded.AchillesInitializer.executeStartupScripts(AchillesInitializer.java:194)
    at info.archinnov.achilles.embedded.AchillesInitializer.initialize(AchillesInitializer.java:97)
    at info.archinnov.achilles.embedded.AchillesInitializer.initializeFromParameters(AchillesInitializer.java:60)
    at info.archinnov.achilles.embedded.CassandraEmbeddedServer.<init>(CassandraEmbeddedServer.java:76)
    at info.archinnov.achilles.junit.AchillesResource.initResource(AchillesResource.java:59)
    at info.archinnov.achilles.junit.AchillesResource.<init>(AchillesResource.java:52)
    at info.archinnov.achilles.junit.AchillesResourceBuilder.build(AchillesResourceBuilder.java:205)
    at org.apache.zeppelin.cassandra.CassandraInterpreterTest.<clinit>(CassandraInterpreterTest.java:62)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9267 (com.datastax.driver.core.OperationTimedOutException: [localhost/127.0.0.1:9267] Operation timed out))
    at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:216)
    at com.datastax.driver.core.RequestHandler.access$900(RequestHandler.java:45)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:374)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

re-triggering CI build

@bzz bzz closed this Dec 3, 2015
@bzz bzz reopened this Dec 3, 2015
@bzz
Copy link
Member Author

bzz commented Dec 3, 2015

CI passes now, ready to be merged.

@khalidhuseynov
Copy link
Contributor

@bzz thanks for starting this feature! .close() is required in this case and is fine to add to API. LGTM

@Leemoonsoo
Copy link
Member

@bzz Tested and working really well. And the API looks fine.

Could you also take care zeppelin-distribution/src/bin_license/LICENSE for newly added dependency, jgit? jgit and it's transitive dependency looks like

+- org.eclipse.jgit:org.eclipse.jgit:jar:4.1.1.201511131810-r:compile
  +- com.jcraft:jsch:jar:0.1.53:compile
  +- com.googlecode.javaewah:JavaEWAH:jar:0.7.9:compile
  +- org.apache.httpcomponents:httpclient:jar:4.3.6:compile
  \- org.eclipse.jdt:org.eclipse.jdt.annotation:jar:1.1.0:compile

@bzz
Copy link
Member Author

bzz commented Dec 7, 2015

@Leemoonsoo thank you for kind reminder! Will do it and ping back

@bzz
Copy link
Member Author

bzz commented Dec 7, 2015

Thank all participants for kind reviews!

@Leemoonsoo could you check 468a858 and let me know if you think that is enough?

@Leemoonsoo
Copy link
Member

Thanks @bzz Looks good to me!

@bzz
Copy link
Member Author

bzz commented Dec 8, 2015

Thank you!
Merging if there is no more discussion

@asfgit asfgit closed this in b5e2e62 Dec 8, 2015
@bzz bzz deleted the add-git-notebook-repo branch December 8, 2015 02:01
@p0wl
Copy link

p0wl commented Jul 26, 2017

Thanks for the feature!

Any news on automatic github integration? Or a guidline how to setup github sync?

@herval
Copy link

herval commented Jul 26, 2017

@p0wl I started remote git support here: https://github.com/herval/zeppelin/tree/remote-git-support - it's a WIP, if someone wants to take over (it would work w/ github or any other git repo)

@Gauravshah
Copy link

@herval were you able to finish it ?

@herval
Copy link

herval commented Nov 23, 2017

Haven't had the time to look into that further. TBH, I don't think storing notebooks as text files is a good strategy - not in the environment we use Zeppelin, anyway. We're likely moving towards some sort of centralized storage (maybe a KV storage), in order to be able to scale horizontally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants