Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Google Cloud Bigtable in SQLAlchemy #2762

Closed
arita37 opened this issue Nov 22, 2016 · 11 comments
Closed

Add support for Google Cloud Bigtable in SQLAlchemy #2762

arita37 opened this issue Nov 22, 2016 · 11 comments
Assignees
Labels
api: bigtable Issues related to the Bigtable API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@arita37
Copy link

arita37 commented Nov 22, 2016

It would be very useful to have Google Bigtable support in Python SQLAlchemy.

It seems Cloud MySQL is fine since SQL Alchemy supports it. However, for NoSQL tables, it seems there is not yet support from Google Bigtable.

Also, Google Big Query should have compliant interface with SQL Alchemy.
#2434

It will really help to develop new applications from Bigtable.

@daspecster daspecster added api: bigtable Issues related to the Bigtable API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Nov 22, 2016
@mbrukman mbrukman changed the title Support of SQLAlchemy for Google Big Table Add support for Google Cloud Bigtable in SQLAlchemy Nov 23, 2016
@mbrukman
Copy link
Contributor

mbrukman commented Nov 23, 2016

The integration between SQLAlchemy and Google Cloud Bigtable would have to be done in SQLAlchemy. I was going to file a bug on SQLAlchemy on your behalf, but looks like you've already filed a feature request and it was closed as wontfix:

unfortunately Google bigtable is non-relational and non-SQL, SQLAlchemy does not have support for key/value stores.

and a previous email thread on the sqlalchemy@ list about adding support for NoSQL databases like HBase (which is very similar to Bigtable) ended up without any answers.

Thus, I am afraid we won't be able to help you use SQLAlchemy together with Bigtable.


That said, as an alternative, consider using Apache Hue, which works with Apache HBase, and can be made to work similarly with Bigtable. We don't have a simple howto for how to connect Apache Hue to Cloud Bigtable yet, but I imagine it can be done as follows:

  1. Apache Hue -> (a: Thrift API) -> Apache HBase Thrift proxy -> (b: gRPC API) -> Google Cloud Bigtable

    The first connection (a) should work out-of-the-box for Hue and HBase. The second connection (b) can use the Google Cloud Bigtable Java client for HBase. This is not as complicated as it looks, although there are several parts to connect together to make it all work.

  2. Apache Hue -> (gRPC API) -> Google Cloud Bigtable

    This could be done using the Google Cloud Bigtable Java client for HBase, but it requires Apache Hue to use the HBase 1.x API (which I believe is not yet the case, I believe it's using 0.9x API and/or Thrift), so I would recommend following option (1) above for now instead.

Hope this is helpful.

@arita37
Copy link
Author

arita37 commented Nov 24, 2016

Ok, Thanks for SQL Alchemy feedback.

What about integrating Google Big Table into Blaze Python ?
They are using Impala integrator like this one :

PY Impala
https://github.com/cloudera/impyla

Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.
Features
HiveServer2 compliant; works with Impala and Hive, including nested data

Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python

Works with Kerberos, LDAP, SSL
SQLAlchemy connector
Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

@mbrukman
Copy link
Contributor

Integrating Apache Impala with Cloud Bigtable certainly makes sense, and should enable the other integrations you are looking for (Baze in particular).

Looks like Apache Impala already works with Apache HBase so, in theory, it should also be compatible with Cloud Bigtable via the Cloud Bigtable HBase-compatible Java client library.

Please feel free to try using the Cloud Bigtable Java client for HBase with Apache Impala (and Apache Hive, since it provides the metadata store for Impala) and if you run into any issues, please let us know and also file a bug on Apache Impala and/or Apache Hive, as appropriate.

@arita37
Copy link
Author

arita37 commented Nov 28, 2016 via email

@mbrukman mbrukman self-assigned this Dec 6, 2016
@mbrukman
Copy link
Contributor

mbrukman commented Dec 6, 2016

@arita37, I think we're talking about the same thing, but perhaps I was unclear. Let me clarify what I think you're looking for and how to accomplish your goal.

As I understand it (please correct me if I'm wrong), you would like to connect the following set of components:

BlazeImpylaApache ImpalaGoogle Cloud Bigtable

What I am suggesting is that to accomplish last connection (Impala → Bigtable), since Impala is written in Java and integrates with Apache HBase, you need to:

  1. download the Cloud Bigtable HBase-compatible Java client library
  2. modify /etc/impala/conf/hbase-site.xml as described on this page

You don't need to write any Java code to do this, just change configuration in an XML file. We are not suggesting that you should switch from Python to Java for your development.

I hope this helps. Let us know if you run into any issues with this setup, configuration, or performance.

@mbrukman
Copy link
Contributor

@arita37, we recently launched support for BigQuery to query data stored in Cloud Bigtable, and BigQuery supports SQL. It removes the need to run Impala or Hive or any other system, so anything that can already talk to BigQuery can use this (read-only) bridge to query data in Cloud Bigtable.

Would this address your use case?

@mbrukman mbrukman removed their assignment Mar 20, 2017
@mbrukman
Copy link
Contributor

@daspecster, @dhermes -- not sure why it says I removed my assignment, I did not do that (only left a comment) and had no intention of doing this. However, I am now unable to reassign this issue to myself (looks like I don't have permission to do so). Please feel free to assign it back to me. Thanks!

@lukesneeringer
Copy link
Contributor

I think you lost your assignment because you are not on a team with write access to this repository. We did a pretty thorough permissions scrub a few weeks ago.

Poke me on Hangouts if you actively need write access; otherwise, feel free to just treat this issue as if it is yours and send a pull request.

@mbrukman
Copy link
Contributor

@lukesneeringer – I don't have a strong need for repo-wide write access; I just wanted to keep this assigned to me for clarity and tracking purposes. There isn't a specific PR to send at this time. If you have repo write access, maybe you can assign it to me?

@dhermes
Copy link
Contributor

dhermes commented Mar 21, 2017

From Assigning issues and pull requests to other GitHub users:

If you have write access to a repository, you can assign issues and pull requests to yourself, collaborators on personal projects, or members of your organization with read permissions on the repository.

@lukesneeringer Was under the impression you needed write access to be assigned but it looks like read access will be sufficient?

@lukesneeringer lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017
@lukesneeringer lukesneeringer removed the priority: p2 Moderately-important priority. Fix may not be included in next release. label Aug 11, 2017
@lukesneeringer
Copy link
Contributor

lukesneeringer commented Aug 11, 2017

I think at this point, I am going to close this issue. The real need now is to integrate BigQuery with SQLAlchemy, which is tracked in #3023.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants