Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/first stage enrichment polygons #1016

Merged
merged 11 commits into from
Sep 24, 2019

Conversation

alejandrohall
Copy link
Contributor

@alejandrohall alejandrohall commented Sep 23, 2019

agg_operators = {variable: agg_operators for variable in variables}

elif agg_operators is None:
variables_sql = variables + ['ST_Area(ST_Intersection(geo_table.geom, data_table.{data_geom_column})) / ST_area(data_table.{data_geom_column}) AS measures_proportion'.format(data_geom_column=data_geom_column)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (217 > 120 characters)

grouper = 'group by data_table.{enrichment_id}'.format(enrichment_id=enrichment_id)

if agg_operators:
variables_sql = ['{operator}({variable} * (ST_Area(ST_Intersection(enrichment_geo_table.geom, data_table.{data_geom_column})) / ST_area(data_table.{data_geom_column}))) as {variable}'.format(variable=variable, data_geom_column=data_geom_column, operator=agg_operators[variable]) for variable in variables]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (313 > 120 characters)

Copy link
Contributor

@alrocar alrocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments.

  • The one about delete_table is important when integrating with the token API.
  • The second one is about refactoring, there's some functions duplicated that could be extracted to a common file and imported in both points and polygons enrichment. Also the enrich functions have the same structure and we could do some refactor work there.

This is a great first step towards our enrichment API, I propose to merge this PR as it is so we can unblock next steps.

Let's prioritize having something working and do a refactor effort, step by step on upcoming PRs.


return response

def delete_table(self, tablename, project, dataset):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not decided yet how to delete tables, but at the moment the role we'll use in BQ does not have delete permission.

That means when we integrate with the token API, we might have to disable deletion of tables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add TTL when uploading table and avoid deleting at the end of the process

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No needed, we want to deal with this in a separate issue. Don't worry about that by now.

Copy link
Contributor

@oleurud oleurud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things and a +1 to the refactor of the 2 enrichment methods (but no in this one)

{filters};
'''.format(enrichment_id=enrichment_id, variables=', '.join(variables),
enrichment_table=enrichment_table, enrichment_geo_table=enrichment_geo_table,
user_workspace=user_workspace.replace('-', '_'), working_project=working_project,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A detail: I think .replace('-', '_') it is already done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

credentials = credentials or get_default_credentials()
bq_client = bigquery_client.BigQueryClient(credentials)

user_workspace = credentials.username.replace('-', '_')
Copy link
Contributor

@oleurud oleurud Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment: getting the user dataset name, should it be a responsibility of credentials class? If yes, I will add it to https://github.com/CartoDB/data-observatory/issues/147 issue

A detail: should we call it dataset? avoiding using several names for the same thing

return data_copy


def __prepare_sql(enrichment_id, variables, enrichment_table, enrichment_geo_table, user_workspace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A detail: in order to set a method as private, we are using only one dash _prepare_sql. I don't know if we should use 2 or 1 but we should use always the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using two underscores, using just one is a convention to treat that method as a private, but you are relying on the developer. Using two underscores actually mangle the method and hide it. https://dbader.org/blog/meaning-of-underscores-in-python

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg

This exception is raised when a problem is encountered while using enrichment functions.
"""
def __init__(self, message):
super(DiscoveryException, self).__init__(message)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line at end of file

Copy link
Contributor

@oleurud oleurud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the last changes adding geopandas, we need to talk about it, because right now it is not a CARTOframes dependency (I should be IMO). But most import, stop this one here having a feature to test and validate and start the improvements in a new branch

@oleurud oleurud merged commit 7d21fe8 into develop Sep 24, 2019
@alrocar
Copy link
Contributor

alrocar commented Sep 24, 2019

Thanks @oleurud

Yep, let's focus on small and frequent PRs tackling one thing at a time and tied to well defined issues. That way we can iterate faster.

@alejandrohall Great job BTW!! 🚀 💪

@Jesus89 Jesus89 deleted the feature/first_stage_enrichment_polygons branch September 26, 2019 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants