Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import / export of the dashboards. #1197

Merged
merged 13 commits into from
Oct 12, 2016
Merged

Import / export of the dashboards. #1197

merged 13 commits into from
Oct 12, 2016

Conversation

bkyryliuk
Copy link
Member

@bkyryliuk bkyryliuk commented Sep 26, 2016

Provides import export functionality for the dashboards using pickle library to serialize the objects. It requires the tables to have the same names to be able to hook slices up to them.
Import overrides the existing dashboards and slices. Newly added metrics and table column will stay, old definition will be overridden.

Git Issues: #1237. #1173

Exports:

  • table datasources
  • metrics
  • column definitions
  • dashboards
  • slices

Code:

  • stored additional metadata for slices, datasources and dashboards remote_id, database_name to be able to locate and recreated the dependencies
  • creates the migration to add metadata field to the tables
  • implements Import mixin with copy and override functions
  • removes table_columns relation
    • uses subqueryload instead of joinedload as it's more efficient

Future works:

  • support druid / generic datasource
  • make alter_params mixin function too, requires standard params for the models

Tested:

  • multiple cases manually
  • extensive unit testing

Reviewers:

CC:

screen shot 2016-09-26 at 9 12 17 am

screen shot 2016-09-26 at 9 13 05 am

@ascott
Copy link
Contributor

ascott commented Sep 27, 2016

what is the use case for importing/exporting slices and dashboards?

@bkyryliuk bkyryliuk changed the title Dashboards export. Work in progress Dashboards export. Sep 28, 2016
<script>
window.onload = function() {
window.open(window.location += '&action=go');
window.location = '{{ dashboards_url }}';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do this redirect in the views.py rather than in an html template?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the challenge here is to download a file and redirect right after, flask doesn't provide an easy way to do that :(

<div class="container">
<title>Import the dashboards.</title>
<h1>Import the dashboards.</h1>
<form action="" method=post enctype=multipart/form-data>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since action="" you can leave it out altogether and it will still post to the current url.

<h1>Import the dashboards.</h1>
<form action="" method=post enctype=multipart/form-data>
<p><input type=file name=file>
<input type=submit value=Upload>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing closing </p> tag

session = db.session
make_transient(slc)
slc.dashboards = []
id_pos = slc.perm.find('(id') if slc.perm else -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a typo? find('(id') or find('id')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment describing how the perm string is created.

@bkyryliuk bkyryliuk changed the title Dashboards export. Import / export of the dashboards. Sep 28, 2016
appbuilder.add_link(
'Import Dashboards',
href='/caravel/import_dashboards',
icon="fa-flask",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fa-flask is for SQL Lab, here should be some file-import icon?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed the icon.
Here is the list: http://fontawesome.io/icons/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed your comment @bkyryliuk 👍

make_transient(slc)
slc.dashboards = []
# perms are named [{db_name}].[{datasource_name}](id:{id})
id_pos = slc.perm.find('(id') if slc.perm else -1
Copy link
Member

@mistercrunch mistercrunch Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a perm_slice_id property to Slice, I think some other place in the code used a regex to do something like this, let's always use that new property instead.

That property should raise if the perm is malformed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored in a way that there is no need to use the permission anymore


perm_prefix = slc.perm[:id_pos]
datasource_class = SourceRegistry.sources[slc.datasource_type]
datasources = session.query(datasource_class).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_by missing?,

we may want a shortcut function SourceRegistry.get_datasource(type, id, session=None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter by doesn't work for lookups by name, created function SourceRegistry.get_datasource_by_name

datasource_class = SourceRegistry.sources[slc.datasource_type]
datasources = session.query(datasource_class).all()
if not datasources:
logging.warning('Datasource {} was not found for the slice {}.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we assume that the datasource should always exist we sould just raise.

return None

params_dict = {}
if slc.params:
Copy link
Member

@mistercrunch mistercrunch Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a new method Slice.alert_params(**kwargs) could be useful here, resulting in a simple call:
slc.alert_params(remote_id=slc.id, import_time=import_time)

# find if the slice was already imported
slc_to_override = (
session.query(Slice)
.filter(Slice.params.ilike('%"remote_id": {}%'.format(slc.id)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems brittle, you should at least have a comma after the {} otherwise slice_id=5 would match with 5.*

return session.query(models.Dashboard).filter_by(
id=dash_id).first()

def get_slc(name, table, slc_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_slice

def test_import_dashboard(self):
session = db.session

def get_dashboard(title, id=0, slcs=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_dashboard

slug='{}_imported'.format(title.lower()),
)

def find_dash(dash_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_dash


def find_dash(dash_id):
return session.query(models.Dashboard).filter_by(
id=dash_id).first()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.one()

imported_dash_id_4 = models.Dashboard.import_dashboard(
dash_with_2_slices_new, import_time=1992)
imported_dash_4 = find_dash(imported_dash_id_4)
self.assertEquals('dash_with_2_slices_new',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert imported_dash_4.dashboard_title == imported_dash_2.dashboard_title instead of hard coding the title twice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imported_dash_2 doesn't exist when this check is done, sqlalchemy magic.

@bkyryliuk
Copy link
Member Author

@mistercrunch - please take another look. PR is getting bigger :(
There is quite some code duplication, but I can't find an easy way to tackle that in this PR.

@@ -283,6 +310,48 @@ def get_viz(self, url_params_multidict=None):
slice_=self
)

def alter_params(self, **kwargs):
params_dict = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d =  self.params_dict
d.update(kwargs)
self.params = json.dumps(d)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch - done


# find if the slice was already imported
slc_to_override = None
for s in session.query(Slice).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT (optional): by convention s is used for short lived strings

slc_to_override = s

slc.id = None
params = json.loads(slc.params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params = slc.params_dict

def get_datasource_by_name(cls, session, datasource_type, datasource_name,
schema, database_name):
datasource_class = SourceRegistry.sources[datasource_type]
datasources = session.query(datasource_class).all()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why returning everything and filtering on the server? You should just use a filter in session.query and use .one()

Copy link
Member Author

@bkyryliuk bkyryliuk Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that was that easy. I need to use properties for filtering, otherwise I have to implement 2 separate queries, as druid and table reference databases using different attributes. @mistercrunch

@@ -26,6 +29,8 @@
from flask_appbuilder.models.sqla.filters import BaseFilter

from sqlalchemy import create_engine
from werkzeug import secure_filename
from werkzeug.datastructures import ImmutableMultiDict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dup with line 35


appbuilder.add_view(
DashboardModelView,
"Dashboards",
label=__("Dashboards"),
label=__("Dashboards List"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dashboard List

self.assertEquals(
2, len([c for c in [t.columns for t in exported_tables_2]]))
self.assertEquals(
2, len([m for m in [t.metrics for t in exported_tables_2]]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the outer comprehension doesn't do anything

return session.query(models.SqlaTable).filter_by(
id=table_id).first()

# Case 1. Import table with no metrics and columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I personally prefer having a smaller atomicity of tests and no number-suffixed variables. Shared items can be refactored into the class or module scope, methods won't run if it doesn't start with test_.

If case 5 fails here there's just more stuff to look at, understand whether case 5 is tangled with other tests or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, in this case test cases are build upon each other. I'll try to separate them.

"""Testing export import functionality for dashboards"""

examples_loaded = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires_examples should be set to True here as you assume examples have been loaded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftovers, sorry

{'new_metric1', 'metric1', 'metric2'},
set([m.metric_name for m in imported_table_5.metrics]))

# cleanup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would if make sense to cleanup at the beginning to setup to test? if the test fails half way, it's nice to be able to assume that you can just re-run the test itself (not the full suite) to fix the state. Idempotent, atomic tests ftw!

return {}

@classmethod
def import_obj(cls, datasource_to_import, import_time=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot in common in all these import_obj methods. It'd be nice to distill some of it into re-usable chunks, maybe adding a CaravelModel base class and adding a get_or_create @classmethod. The second answer here is inspiring:
http://stackoverflow.com/questions/2546207/does-sqlalchemy-have-an-equivalent-of-djangos-get-or-create

And maybe a generic import_obj if that's reasonable though that might be tricky with relationships, so not a requirement for this PR...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch - making query to retrieve the object that have remote id is not possible as "remote_id": {id} could be in the end of the json and there will be no comma after it. Json doesn't support trailing commas. Because of it I loop through all slices / dashboards to find the object to override.
I see 2 possible solutions:

  • use "remote_id": {id}{prefix}instead
  • add remote_id field to the model

What would you suggest?
Would it be fine to have it in the separate PR ?

@mistercrunch
Copy link
Member

LGTM!

@bkyryliuk
Copy link
Member Author

thanks @mistercrunch !

@bkyryliuk bkyryliuk merged commit 73cd2ea into apache:master Oct 12, 2016
@dpgaspar dpgaspar mentioned this pull request Dec 9, 2019
12 tasks
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.12.0 labels Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants