Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to import CSV as datasource #2381

Closed
wants to merge 57 commits into from
Closed
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
71665f7
Testing gitlab remote repo
Nov 14, 2016
fe81595
test commit
Ryan4815 Nov 14, 2016
c16bd58
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Nov 16, 2016
668f443
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Nov 23, 2016
6ddcb22
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Nov 28, 2016
452e9df
Merge remote-tracking branch 'origin/master' into csv-import
Nov 28, 2016
87937e5
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Nov 29, 2016
3f3ab33
trying out various ways of uploading xls/csv with flask
Nov 29, 2016
39c2aba
Merge remote-tracking branch 'origin/master' into csv-import
Nov 29, 2016
9ffbf9a
revert previous changes
Ryan4815 Nov 29, 2016
dcb0f14
Add CSV button placed on database page
Ryan4815 Nov 29, 2016
a539090
Changed upload csv to use same layout/template as import dashboards
Ryan4815 Dec 1, 2016
b5222fc
Starting to add csv to db form (to expose panda api)
Ryan4815 Dec 2, 2016
38762b0
CsvToDatabaseView updated to include most of the pandas api read_csv …
Ryan4815 Dec 5, 2016
ae1e9d5
Exposed the pandas .to_sql API in the upload csv form. Removed the se…
Ryan4815 Dec 6, 2016
ff37bd7
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Dec 7, 2016
0474253
File select part of csvtodatabaseview form
Ryan4815 Dec 7, 2016
fb186f2
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Dec 8, 2016
0dc844e
If cause to set the header column to row 0 if name is none
Ryan4815 Dec 8, 2016
f0e534d
Merge remote-tracking branch 'origin/master' into csv-import
Ryan4815 Dec 8, 2016
96ebb82
Used flask-babel lazy_gettext to ready the new strings for translation.
Ryan4815 Dec 8, 2016
7425b65
Moved Csv form from views.py to forms.py and removed unused imports
Ryan4815 Dec 9, 2016
3781a60
Changed BooleanField to use BetterBooleanField
Ryan4815 Dec 9, 2016
1b4d62f
Added FAQ entry on uploading csv
Ryan4815 Dec 12, 2016
8ebf1f8
Merge branch 'master' of https://github.com/airbnb/superset
Ryan4815 Dec 12, 2016
38c54c3
Merge remote-tracking branch 'origin/master' into csv-import
Ryan4815 Dec 12, 2016
1623bf8
Merge Ryan4815's fork into my fork
Jan 23, 2017
2c7eed7
Fix merge conflict
Jan 24, 2017
3f8ecfb
Add tests to CSV-import
Mar 8, 2017
e560c5c
Change the default of replace to append
Mar 9, 2017
40707a0
Add additional form validations
Mar 9, 2017
fb18459
Allow for large csv files exceeding machine memory
Mar 9, 2017
82c5281
Fix filepath bug
Mar 9, 2017
bc08de0
Fix linting errors
Mar 10, 2017
e62ad5e
fix linting errors
Mar 10, 2017
ea0fcfa
Remove extraneous csv file after use
Mar 21, 2017
5a2c593
Remove extraneous file at the end of test
Apr 3, 2017
d7da6e3
Fix more linting errors
Apr 3, 2017
8aef7e2
Change port back to default
Apr 3, 2017
0995cdf
Fix linting errors
Apr 3, 2017
295f5f6
Fix linting error
Apr 7, 2017
cb9b707
Add babel files to gitignore
Jun 1, 2017
58e4db4
remove babel files from gitignore
Jun 1, 2017
b93b1b7
Add localizations and finish merge
Jun 1, 2017
edf024f
Fix issue remaining from previous merge
Jun 1, 2017
3636429
Fix previous merge issues.
Jun 2, 2017
49c4baa
Add missing file.
Jun 2, 2017
3959fe1
Merge branch 'master' of https://github.com/airbnb/superset into csv-…
Jun 2, 2017
4e50ef9
Add unicodecsv to setup.py
Jun 2, 2017
011a2fa
test commit
Jun 3, 2017
ed13c1e
Fix codeclimate and travis issues
Jun 8, 2017
4f57e05
Merge branch 'master' of https://github.com/airbnb/superset into csv-…
Jun 8, 2017
b64f293
Merge branch 'csv-import-travis' into csv-import
Jun 8, 2017
0992d19
Fix merge conflicts in translation files
SalehHindi Jul 29, 2017
fd2ec17
Merge remote-tracking branch 'upstream/master' into csv-import
SalehHindi Aug 1, 2017
ab43c77
Add newly imported csv table to the list of table models.
SalehHindi Aug 24, 2017
61a217c
Sync fork with upstream
SalehHindi Sep 6, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,539 changes: 978 additions & 561 deletions babel/messages.pot

Large diffs are not rendered by default.

9 changes: 9 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,15 @@ visualizations.
https://github.com/airbnb/superset/issues?q=label%3Aexample+is%3Aclosed


Can I upload and visualize csv data?
-------------------------------------

Yes, using the ``Add CSV Table to Database`` button on the ``Sources -> Database``
page. This brings up a wizard allowing you to configure the csv to dataframe
conversion, and to select which database to add the csv table to. The table can
then be loaded like any other on the ``Sources -> Tables`` page.


Why are my queries timing out?
------------------------------

Expand Down
4 changes: 4 additions & 0 deletions superset/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,10 @@
ENABLE_CORS = False
CORS_OPTIONS = {}

# Allowed format types for upload on Database view
# TODO: Add processing of other spreadsheet formats (xls, xlsx etc)
ALLOWED_EXTENSIONS = set(['csv'])


# ---------------------------------------------------
# List of viz_types not allowed in your environment
Expand Down
148 changes: 147 additions & 1 deletion superset/forms.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,21 @@
import math

from flask_babel import lazy_gettext as _

from flask_appbuilder.fieldwidgets import BS3TextFieldWidget
from flask_appbuilder.forms import DynamicForm

from flask_wtf.file import FileField, FileAllowed, FileRequired

from wtforms import (
Form, SelectMultipleField, SelectField, TextField, TextAreaField,
BooleanField, IntegerField, HiddenField, DecimalField)
BooleanField, IntegerField, HiddenField, DecimalField, StringField)
from wtforms import validators, widgets
from wtforms.validators import DataRequired, Optional, NumberRange

from superset import app


config = app.config

TIMESTAMP_CHOICES = [
Expand Down Expand Up @@ -1150,3 +1158,141 @@ def add_to_form(attrs):
'description': _("Time related form attributes"),
},) + tuple(QueryForm.fieldsets)
return QueryForm


class CsvToDatabaseForm(DynamicForm):
# These are the fields exposed by Pandas read_csv()
csv_file = FileField(_('CSV File'),
description=_('Select a CSV file to be uploaded to a database.'),
validators=[FileRequired(), FileAllowed(['csv'], _('CSV Files Only!'))])
sep = StringField(_('Delimiter'),
description=_('Delimiter used by CSV file (for whitespace use \s+).'),
validators=[DataRequired()],
widget=BS3TextFieldWidget())
header = IntegerField(_('Header Row'),
description=_('Row containing the headers to use as column names (0 is '
'first line of data). Leave empty if there is no header row.'),
validators=[Optional(), NumberRange(0, 1E+20)],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
names = StringField(_('Column Names'),
description=_('List of comma-separated column names to use if header row '
'not specified above. Leave empty if header field populated.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
index_col = IntegerField(_('Index Column'),
description=_('Column to use as the row labels of the dataframe. '
'Leave empty if no index column.'),
validators=[Optional(), NumberRange(0, 1E+20)],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
squeeze = BetterBooleanField(_('Squeeze'),
description=_('Parse the data as a series (specify this option if the '
'data contains only one column.)'))
prefix = StringField(_('Prefix'),
description=_('Prefix to add to column numbers when no header '
'(e.g. "X" for "X0, X1").'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
mangle_dupe_cols = BetterBooleanField(_('Mangle Duplicate Columns'),
description=_('Specify duplicate columns as '
'"X.0, X.1".'))
skipinitialspace = BetterBooleanField(_('Skip Initial Space'),
description=_('Skip spaces after delimiter.'))
skiprows = IntegerField(_('Skip Rows'),
description=_('Number of rows to skip at start of file.'),
validators=[Optional(), NumberRange(0, 1E+20)],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
nrows = IntegerField(_('Rows to Read'),
description=_('Number of rows of file to read.'),
validators=[Optional(), NumberRange(0, 1E+20)],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
skip_blank_lines = BetterBooleanField(_('Skip Blank Lines'),
description=_('Skip blank lines rather than '
'interpreting them as NaN values.'))
parse_dates = BetterBooleanField(_('Parse Dates'),
description=_('Parse date values.'))
infer_datetime_format = BetterBooleanField(_('Infer Datetime Format'),
description=_('Use Pandas to interpret the '
'datetime format '
'automatically.'))
dayfirst = BetterBooleanField(_('Day First'),
description=_('Use DD/MM (European/International) date format.'))
thousands = StringField(_('Thousands Separator'),
description=_('Separator for values in thousands.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
decimal = StringField(_('Decimal Character'),
description=_('Character to interpret as decimal point.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or '.'])
quotechar = StringField(_('Quote Character'),
description=_('Character used to denote the start and end of a '
'quoted item.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or "'"])
escapechar = StringField(_('Escape Character'),
description=_('Character used to escape a quoted item.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
comment = StringField(_('Comment Character'),
description=_('Character used to denote the start of a comment.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
encoding = StringField(_('Encoding'),
description=_('Encoding to use for UTF when reading/writing (e.g. "utf-8").'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
error_bad_lines = BetterBooleanField(_('Error On Bad Lines'),
description=_('Error on bad lines (e.g. a line with '
'too many commas). If false these bad '
'lines will instead be dropped from '
'the resulting dataframe.'))

# These are the fields exposed by Pandas .to_sql()
name = StringField(_('Table Name'),
description=_('Name of table to be created from csv data.'),
validators=[DataRequired()],
widget=BS3TextFieldWidget())
con = StringField(_('Database URI'),
description=_('URI of database in which to add above table.'),
validators=[DataRequired()],
widget=BS3TextFieldWidget())
schema = StringField(_('Schema'),
description=_('Specify a schema (if database flavour supports this).'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
if_exists = SelectField(_('Table Exists'),
description=_('If table exists do one of the following: Fail (do '
'nothing), Replace (drop and recreate table) or Append '
'(insert data).'),
choices=[('fail', _('Fail')),
('replace', _('Replace')),
('append', _('Append'))],
validators=[DataRequired()])
index = BetterBooleanField(_('Dataframe Index'),
description=_('Write dataframe index as a column.'))
index_label = StringField(_('Column Label(s)'),
description=_('Column label for index column(s). If None is given '
'and Dataframe Index is True, Index Names are used.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
chunksize = IntegerField(_('Chunksize'),
description=_('If empty, all rows will be written at once. Otherwise, '
'rows will be written in batches of this '
'many rows at a time.'),
validators=[Optional()],
widget=BS3TextFieldWidget(),
filters=[lambda x: x or None])
10 changes: 10 additions & 0 deletions superset/templates/superset/widgets/list.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{% import "appbuilder/general/lib.html" as lib %}
{% extends "appbuilder/general/widgets/list.html" %}

{% block list_header %}
{{ super() }}
<a href="/csvtodatabaseview/form" class="btn btn-sm btn-primary" id="uploadbutton"
title="Add a new database table based on a local CSV file" rel="tooltip" data-toggle="tooltip">
<i class="fa fa-table"> Add CSV Table to Database </i>
</a>
{% endblock %}
Binary file modified superset/translations/es/LC_MESSAGES/messages.mo
Binary file not shown.
Loading