-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support of JOIN statement for queries #875
Comments
http://airbnb.io/caravel/faq.html#can-i-query-join-multiple-tables-at-one-time Can I query/join multiple tables at one time?Not directly no. A Caravel SQLAlchemy datasource can only be a single table or a view. When working with tables, the solution would be to materialize a table that contains all the fields needed for your analysis, most likely through some scheduled batch process. A view is a simple logical layer that abstract an arbitrary SQL queries as a virtual table. This can allow you to join and union multiple tables, and to apply some transformation using arbitrary SQL expressions. The limitation there is your database performance as Caravel effectively will run a query on top of your query (view). A good practice may be to limit yourself to joining your main large table to one or many small tables only, and avoid using GROUP BY where possible as Caravel will do its own GROUP BY and doing the work twice might slow down performance. Whether you use a table or a view, the important factor is whether your database is fast enough to serve it in an interactive fashion to provide a good user experience in Caravel. |
Though we may add support for joinable tables in the future it's not on the short term roadmap, it would only be for many to one left joins on static joining criteria. |
@mistercrunch I've seen this FAQ answer, of course. But it looks more confusing than helpful for me, because I understood it in a way, that I can create new views inside Caravel somehow rather than directly inside DB. OK, thank you. |
@ololobus @mistercrunch This is actually possible at the moment through some recently added functionality added for the work on additional datetimes. This method is way faster than using views (unless you're able to materialise), due to how sqlalchemy executes the query. Head into your table setup and click the edit button: Go to the "List Table Columns" Tab and click add give your column a name (in your case,
Save and then you should be able to group by this column. What you're effectively doing it added a composite column into the sqlalchemy model which is then injected directly into the query at runtime. |
@alanmcruickshank It worked, thank you! I've take a look on the produced SQL query in this case: SELECT (select languages.code from languages where languages.id =
documents.language_id) AS language_code, COUNT(*) AS count
FROM documents JOIN (SELECT (select languages.code from languages where languages.id =
documents.language_id) AS language_code__
FROM documents
WHERE created_at >= '2016-07-26 23:09:12.000000' AND created_at <= '2016-08-02 23:09:12.000000' GROUP BY (select languages.code from languages where languages.id =
documents.language_id) ORDER BY COUNT(*) DESC
LIMIT 50) AS anon_1 ON (select languages.code from languages where languages.id =
documents.language_id) = language_code__
WHERE created_at >= '2016-07-26 23:09:12.000000' AND created_at <= '2016-08-02 23:09:12.000000' GROUP BY (select languages.code from languages where languages.id =
documents.language_id) ORDER BY count DESC
LIMIT 50000 It results in a very tricky Anyway, I'm glad that I can achieve the same result directly from Caravel. Thank you again. P.S. It would be great to add such an example inside this FAQ answer, because it's frequent case, I guess :) |
I found this issue when I was searching for multiple tables support. In my application, I need to perform data analysis based on raw tables and then display it through superset (hopefully). My raw database used many small tables, while new table/view to be displayed could be stored in a materialized table for view purpose if needed. But I'm not sure if it's a proper use case for superset anymore. |
Using a view, materialized or not, is how you'd make that happen in Superset. It is a proper use case. |
Thanks a lot @mistercrunch Not sure I fully understand this. By "view" do you mean a class in views.py or just a SQL expression? |
I meant it in the database/SQL sense of the term, as in |
Appreciate it. Now I understand. |
@alanmcruickshank Not work at all.
I added |
@gclsoft
isn't supported by all database engines. Just create a view! |
@gclsoft - what database engine are you using? The other thing that might help you is since this original question, the release of the SQL Lab feature has made it much easier to create tables in superset which point at joined tables within your database. |
@alanmcruickshank Hi there, I'm trying to explore what you are saying (use SQL Lab to create tables based on sql statement, and so also table made of joins), but I've not been very successful to find a well defined process to do that. Only thing I figured out is to use SQL Lab -> run the query -> click on "visualize" ->this create "magically" a new datasource, which seems more a workaround than a proper process to create a new datasource from a query. |
@lilloraffa that's it, what do you have in mind for a more structured way to handle it? |
@xrmx well, since tables are created and managed in Sources->Tables, I would expect a functionality there to let me create tables based on sql statements. I mean, letting people create table using sql statement from a "visualize" function in SQL Lab seems more like a workaround to me. |
@lilloraffa i think the idea is that the people will explore the sql they need as datasource as they do for the data. And they don't have access to to a sql shell :) Your use case is that you already know the sql and have another shell to try it and you just want to copy and paste it in a new table right? That should be doable. |
@xrmx that is correct, in general I would like to have the flexibility to define table with sql statement so to join more table, even coming from different database. |
@lilloraffa it shouldn't take much to implement, feel free to open a PR :) |
The tricky part is getting the column metadata. In the To do this in the "Add Table" flow, we'd have to run a quick query to do this, which may time out or whatever and again we're just guessing the data types based on the result set which isn't ideal. Seems like it would be much easier to have another flow that would run a Though then we may need some way to allow users to |
Dear All, I have a table i joined in SQL Lab in Superset but only one of the Table is displaying. I have used LEFT and Right Join but same thing.. Any help please. |
@olokedeomotayo you already filed an issue, no need to repeat yourself here please. |
Does superset support |
Still no support for joins at the moment and in the near future. We still tell people to create a view and use that as a source. Many more advanced dbs with good query optmizers will in fact omit the join from the plan when left-joining on a field that has a unique constraint and no fields from the remote table are referenced in the query. My point being that there may be no extra cost for unused joins in views in some cases. |
Does superset support left join now? Instead of using views or only one table? |
seems still not support after my 1 hour research |
by this can we join table from multiple databases ? |
hey were you able to do that join table from multiple databases? |
@blackhawk28 |
… TS (apache#875) * chore: add types to all control panels * move nvd3 shared section to individual viz
… TS (apache#875) * chore: add types to all control panels * move nvd3 shared section to individual viz
… TS (apache#875) * chore: add types to all control panels * move nvd3 shared section to individual viz
… TS (apache#875) * chore: add types to all control panels * move nvd3 shared section to individual viz
Are there any plans to add JOINs support?
Here is a simple case: there are two tables
![document_languages](https://cloud.githubusercontent.com/assets/912790/17333871/bf2054fe-58dc-11e6-92d3-279172218cb4.PNG)
documents
andlanguages
, every document haslanguage_id
column, every language hasname
andcode
columns. Thus, now I can only display distribution bylanguage_id
which is not too informative.It could be solved andding a simple join statement with select of e.g.
languages.code
instead ofdocuments.language_id
.I would be glad to help somehow and even started reading the project code, but it's hard enough to navigate through all this code. I found that main functions for queries performing are placed here https://github.com/airbnb/caravel/blob/master/caravel/models.py
P.S. Sorry, if I missed something and there is already a possibility to do it somehow.
The text was updated successfully, but these errors were encountered: