COPY TO for data obs queries - Part 2 use fetch #592

alrocar · 2019-04-05T11:09:58Z

Review after #591 has been merged

Closes #565

test/test_context.py

andy-esch

Looks great! Just a few minor things that we need to figure out more.

andy-esch · 2019-04-06T18:59:55Z

cartoframes/context.py

                warn(
                    '{s0} was augmented as {s1} because of name '
                    'collision'.format(s0=suggested, s1=names[suggested])
                )
            else:
                names[suggested] = suggested

+        # drop description columns to lighten the query


❤️
It's also worth exploring what the minimum set of metadata is we need to send. My guess is that only half of the metadata columns are needed to uniquely define the measure.

andy-esch · 2019-04-07T16:06:28Z

cartoframes/datasets.py

@@ -334,6 +344,8 @@ def clean_dataframe_from_carto(df, table_columns, decode_geom=False):
    for column_name in table_columns:
        if table_columns[column_name]['type'] == 'date':
            df[column_name] = pd.to_datetime(df[column_name], errors='ignore')
+        elif table_columns[column_name]['type'] == 'boolean':


Good catch, I wasn't aware that bools weren't properly converted. Maybe the second bullet item in the docstring should be expanded to include process date and boolean columns

andy-esch · 2019-04-07T16:10:22Z

cartoframes/datasets.py

+        columns = get_columns(self.cc, query).keys()
+
+        if exclude and isinstance(exclude, list):
+            columns = list(set(columns) - set(exclude))


We should ensure that exclude is a list or tuple and not a str because:

>>> exclude = 'the_geom_webmercator' >>> set(exclude) {'_', 'a', 'b', 'c', 'e', 'g', 'h', 'm', 'o', 'r', 't', 'w'}

yep, we are already checking that it is an instance of list. Anyway we'll add proper docs once all these new methods are consolidated.

Ha, I missed the isinstance(exclude, list) above 🤦‍♂️ Sorry for the noise!

andy-esch · 2019-04-07T16:20:24Z

cartoframes/context.py

-        )['fields'].keys()
+        # get column names except the_geom_webmercator
+        dataset = Dataset(self, table_name)
+        table_columns = dataset.get_table_column_names(exclude=['the_geom_webmercator'])


FYI, the DEFAULT_SQL_ARGS are the way to get the cartoframes calls to show up in logs so we can monitor performance, etc. They're fit in so that each method get registered once despite having multiple calls.

mmm is this still true? I see DEFAULT_SQL_ARGS just having a do_post attribute.

Good question. Are POST request args searchable in kibana?

@oleurud ^^

The POST arguments are not saved by default. The only things we can check in Kibana about a POST request to SQL API are the queries

Two options:

Use the user agent header, which is designed to do exactly this (and it's how we do it in carto-python)

Add a SQL comment to the queries (seems hacky)

I'd do the user agent thing, which should not be hard to do and it makes it searchable everywhere: Kibana, Rollbar, etc.

Just in case it will be solved with a user agent header, I had created this issue the handle it #601

I added the user-agent thing here

Also (CartoDB/carto-python#111).

Nice, well solved 👍

…into 565_data_obs_part_2

javitonino · 2019-04-08T14:28:40Z

cartoframes/context.py

@@ -1099,8 +1099,7 @@ def data_boundaries(self, boundary=None, region=None, decode_geom=False,
            boundaries in `region` (or the world if `region` is ``None``)
        """
        # TODO: create a function out of this?
-        if (isinstance(region, collections.Iterable)
-                and not isinstance(region, str)):
+        if (isinstance(region, collections.Iterable) and not isinstance(region, str)):


Maybe check the isintance(str) first, so we don't have to do the negative check? e.g:

if isinstance(str) ... elif isinstance(Iterable) ...

javitonino · 2019-04-08T14:29:28Z

cartoframes/context.py

@@ -1294,7 +1292,7 @@ def data_discovery(self, region, keywords=None, regex=None, time=None,
            except ValueError:
                # TODO: make this work for general queries
                # see if it's a table
-                self.sql_client.send(
+                self.batch_sql_client.create_and_wait_for_completion(


Does batch SQL work with EXPLAIN?

mm the point is that the result of the sql_client was not being stored anywhere so I assume the query is just to check if the table exists, so the batch_sql_client works for that purpose as well.

It seems overkill to use the batch API (which might take a while to run due to scheduling). I'd use the normal SQL API if the change is easy.

javitonino · 2019-04-08T14:32:23Z

cartoframes/context.py

+
+        result = self.fetch(query)
+        if persist_as:
+            self.write(result, persist_as, overwrite=True)


Maybe we should be smarter and if you need to store it, just do a Batch API SELECT INTO and then download the result (or not even download the result)?

Makes sense, I've added a Dataset.from_query method that creates a table from the query. Then we use Dataset.download to get the result.

oleurud

Blessings

…565_data_obs_part_2

Alberto Romeu added 3 commits April 5, 2019 13:05

use fetch in data observatory queries

404450a

add persist_as test

997dcfc

update comment

4bbb566

alrocar changed the base branch from master to 565_data_obs April 5, 2019 11:14

Alberto Romeu added 2 commits April 5, 2019 13:19

refactor get_table_column_names

0b224c7

comment column collision test

c12413d

houndci-bot reviewed Apr 5, 2019

View reviewed changes

test/test_context.py Show resolved Hide resolved

Alberto Romeu added 2 commits April 5, 2019 14:40

add FIXME

a478deb

remove tablecols

aca6acf

alrocar changed the title ~~Use fetch in data observatory queries~~ COPY TO for data obs queries - Part 2 use fetch Apr 5, 2019

hound

3299732

alrocar requested review from andy-esch and javitonino April 5, 2019 12:51

andy-esch suggested changes Apr 7, 2019

View reviewed changes

alrocar requested a review from oleurud April 8, 2019 08:49

Alberto Romeu added 2 commits April 8, 2019 10:51

Merge branch '565_data_obs' of https://github.com/CartoDB/cartoframes …

2727093

…into 565_data_obs_part_2

update docs

ca6f2d8

javitonino reviewed Apr 8, 2019

View reviewed changes

alrocar mentioned this pull request Apr 8, 2019

Include description columns in data requests #593

Closed

create with batch and download if persist_as

151ff7e

alrocar mentioned this pull request Apr 10, 2019

Allow user-agent CartoDB/carto-python#110

Closed

add user agent

9c73e21

oleurud approved these changes Apr 11, 2019

View reviewed changes

alrocar changed the base branch from 565_data_obs to master April 11, 2019 21:09

Alberto Romeu added 3 commits April 11, 2019 23:12

Merge branch 'master' of https://github.com/CartoDB/cartoframes into …

03a5133

…565_data_obs_part_2

rename from_query method

6da1e4e

remove unused test

04469b2

alrocar merged commit 7c039a3 into master Apr 11, 2019

Jesus89 deleted the 565_data_obs_part_2 branch September 30, 2019 16:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COPY TO for data obs queries - Part 2 use fetch #592

COPY TO for data obs queries - Part 2 use fetch #592

alrocar commented Apr 5, 2019 •

edited

Loading

andy-esch left a comment

andy-esch Apr 6, 2019

andy-esch Apr 7, 2019

andy-esch Apr 7, 2019

alrocar Apr 8, 2019

andy-esch Apr 9, 2019

alrocar Apr 9, 2019

andy-esch Apr 7, 2019

alrocar Apr 8, 2019

andy-esch Apr 9, 2019

alrocar Apr 9, 2019

oleurud Apr 9, 2019

javitonino Apr 10, 2019

oleurud Apr 10, 2019

alrocar Apr 10, 2019

oleurud Apr 10, 2019

andy-esch Apr 10, 2019

javitonino Apr 8, 2019

javitonino Apr 8, 2019

alrocar Apr 9, 2019

javitonino Apr 9, 2019

javitonino Apr 8, 2019

alrocar Apr 9, 2019

oleurud left a comment

COPY TO for data obs queries - Part 2 use fetch #592

COPY TO for data obs queries - Part 2 use fetch #592

Conversation

alrocar commented Apr 5, 2019 • edited Loading

andy-esch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleurud left a comment

Choose a reason for hiding this comment

alrocar commented Apr 5, 2019 •

edited

Loading