Skip to content

Conversation

Leemoonsoo
Copy link
Member

@Leemoonsoo Leemoonsoo commented Jan 27, 2017

What is this PR for?

This PR implements TableData abstraction that every interpreter can use to share data through resource pool.

TableData.java - Definition of TableData abstraction
TableDataProxy.java - Proxy object of TableData. This enables data access through ResourcePool
InterpreterResultTableData - Convert Table type interpret result into TableData. Each interpreter can implement it's own TableData abstraction. For example JDBCTableData, DataFrameTableData, and so on.

What type of PR is it?

Feature

Todos

  • - TableData
  • - TableDataProxy
  • - Unittest

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-753

How should this be tested?

Unittest included

Questions:

  • Does the licenses files need update? no
  • Is there breaking changes for older versions? no
  • Does this needs documentation? no

@Leemoonsoo Leemoonsoo force-pushed the ZEPPELIN-753 branch 2 times, most recently from 42aacf7 to 781284a Compare February 1, 2017 23:36
@1ambda
Copy link
Member

1ambda commented Feb 2, 2017

@Leemoonsoo Nice to have abstracted table structure inside zeppelin. I think we can do a lot of things using this feature even i just glanced at!

I will take look deeply and comment soon.

@Leemoonsoo
Copy link
Member Author

@1ambda do you have any comment?

@1ambda
Copy link
Member

1ambda commented Feb 8, 2017

Sorry for late reply. I was taking a look also recent resource related commit to understand fully.

  1. This PR is well abstracted and shipped with enough unit test.
  2. And easy to share table type interpreter results with other interpreters. The only thing interpreter develop should do is creating it's own table data class (e.g JDBCTableData, DataFrameTableData as Leemoonsoo mentioned.) to be inserted into pool using interpreter specific low-level result (e.g JDBCResultSet, ...)
  3. This also enable us to create table data view for shared result in pools.

@Leemoonsoo Now i have a question. (maybe this is future plan, but just for curiosity).

  • I can't imagine how one interpreter (e.g JDBC) can use shared table results from another interpreter (e.g spark.sql). I know we have APIs to get results from pool. But getting doesn't mean using (e.g querying) directly.

Thanks!

@Leemoonsoo
Copy link
Member Author

Leemoonsoo commented Feb 8, 2017

@1ambda Thanks for taking a look.

Regarding query directly over the TableData in the ResourcePool, it depends on each interpreter.
For example, SparkInterpreter can implement DataSource API to load TableData from ResourcePool. Then SparkInterpreter will able to query data created from other interpreter, when other interpreters store data as TableData in the ResourcePool.

The same approaches can be applied to all other interpreter that supports some types of plugin data source layer.

@Leemoonsoo
Copy link
Member Author

@1ambda further discussion about "how interpreter access data from other interpreter" can be done in https://issues.apache.org/jira/browse/ZEPPELIN-2019.

Merge this PR to master if no further discussions.

@asfgit asfgit closed this in 9413323 Feb 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants