Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a python client #1744

Open
philippefutureboy opened this issue Jan 10, 2021 · 10 comments
Open

Add support for a python client #1744

philippefutureboy opened this issue Jan 10, 2021 · 10 comments
Labels
enhancement New feature proposal help wanted Community contributions are welcome.

Comments

@philippefutureboy
Copy link
Contributor

philippefutureboy commented Jan 10, 2021

Description

Cube.js would greatly benefit from having a Python client to ease adoption by the data science community. I'm opening this issue because I am interested in championing the effort for the development of such a client, starting end of February, to delivery by max end of March.

Python is a different ecosystem from the JS ecosystem, and as such I'd suggest that a redesign of the API be considered to better integrate in the ecosystem/be more pythonic. As for what these changes imply, I do not have a clear opinion yet. Here are a few places to get started:

  1. Refactor to support pythonic syntax & constructs. In python, everything is customizable, which makes it a language with great potential for and examples of powerful APIs. Every class in python implement a subset of the data model hooks, which give the objects of said class specific attributes such as operator logic (addition, multiplication, gte, key access (object[key], a bit like Proxy in JS), etc.). Leveraging these would be key to a successful integration.
  2. Refactor to integrate properly with python's approach to concurrency (none vs asyncio, futures, etc).
  3. Integrate implementation with the data science ecosystem libraries (numpy, pandas, matplotlib, plotly, etc.).

I am no Python expert; I only have ~6-8 months of experience in a data engineering, scripting & data science context, but I'd be happy to contribute to the best of my knowledge to the design of this API. To make this a resounding success however I'd suggest to get some experienced python data scientist's opinion on the API they would like to use as your future customers.

It would also possible (minus a few exceptions) to simply port the API as is, but I think there's much more power in integrating in the language and ecosystem. In the end, it's the team's prerogative to make a choice and I'll be happy to contribute either way :)

Cheers!
Philippe

@paveltiunov
Copy link
Member

@philippefutureboy Hey Philippe! I believe it'll be a great contribution! I think the community can benefit if you prepare some proposal of what you're going to implement and we'll try to get some attention to it. What do you think?

@paveltiunov paveltiunov added the enhancement New feature proposal label Jan 12, 2021
@igorlukanin
Copy link
Member

@philippefutureboy Hey! Thanks for sharing! I'd really like to learn more about the possible use cases for the data science community you're thinking about.

@philippefutureboy
Copy link
Contributor Author

Hey @paveltiunov & @igorlukanin!

@paveltiunov: Noted! I'll do some research on my side to see what I can come up with.

@igorlukanin: I don't have a clear use case for the data science community. However, to me Cube.js is a marvelous tool to increase accessibility to your data, and I see it as a consumer/provider that inserts itself at some point in a data pipeline. At Arthur, Cube.js is our previous to last layer before we present data to the client - we use a layer of python on top to reconcile different data sources and prep the final data that allows us to generate our reports. To us the primary value of Cube.js is to provide an API that is really easy to interact with and that abstracts away the complexity of SQL via reusable, templatable, approachable queries. Think of it like an ORM for OLAP and analytics. Hence, our primary use for a client like this one is interoperability with the python ecosystem to have a smooth transition to pandas and friends.

@philippefutureboy
Copy link
Contributor Author

philippefutureboy commented Feb 17, 2021

In order to better design the Python client to match the needs of the Python data science/analytics/engineering community, I will be starting a few threads in major data Python projects. I'll update this message with links to the discussions in various community as I create them:

Pandas/PyData: https://groups.google.com/g/pydata/c/YpJVfI5HT20/m/9ik0iH1HBgAJ
Airflow: https://app.slack.com/client/TCQ18L22Z/CCY359SCV/thread/CCY359SCV-1609933583.050900
TensorFlow: Dropped
Plotly: plotly/dash#1573

Please come and say hi if you are coming from one of these threads! :) 👋
Also feel free to suggest other projects which could be receptive to contribute here :)

@philippefutureboy
Copy link
Contributor Author

philippefutureboy commented Mar 14, 2021

Hey there!
Little update on my side:

I've investigated a bit the ecosystem and your @cubejs/client-core package code, and here are my conclusions so far:

  • I think that the google.cloud.bigquery client library could be a very good template to get started from
  • I'd most likely implement ResultSet as a wrapper around a pandas DataFrame
  • The concurrency/async model in python is fairly different from that of JS, which will require some modification in how the library's API is exposed. I need to investigate more how the concurrent.futures, asyncio stdlib packages and google.cloud.bigquery client library manage asynchronicity.
  • My understanding of google.cloud.bigquery client library's way to manage asynchronicity is to first emit a call to the BQ API, then wait until the .result() method is called to fetch the answer. The initial call is non-blocking, while the .result() call is.

These led to the following questions for the team:

  • Does Cube.js support an initial non-blocking call then a follow-up polling call to get the result?
  • Does Cube.js support loading the resultset via a cursor, row by row?

I'll read up on the async model as well as the impl of the BQ library this week and then the API should start to form.

Thanks!

@qianxuanyon
Copy link

Hope there is a similar realization
https://github.com/DataBrewery/cubes

@qianxuanyon
Copy link

Data processing is generally done by back-end personnel, and back-end personnel may not be too familiar with js

The simplest solution to support different languages should be a wrapper for REST API

At present, in the data field, python is suitable for data scientists, data analysts, developers, etc. It is more popular, so if cubrs can enter this ecology, it would be better.

@ivan-vdovin ivan-vdovin added the help wanted Community contributions are welcome. label Jul 28, 2022
@github-actions
Copy link

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you.
If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines.
You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

@oleg-savko
Copy link

Any news about python client?

Its seems like cube can be very helpfull for data science and data engineering, and can be use like feature store.

@paveltiunov
Copy link
Member

As an alternative https://cube.dev/docs/backend/sql/#sql-api can be tried instead. It should work with SQLAlchemy already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature proposal help wanted Community contributions are welcome.
Projects
None yet
Development

No branches or pull requests

6 participants