-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/first stage enrichment polygons #1016
Conversation
…st_stage_enrichment_polygons
agg_operators = {variable: agg_operators for variable in variables} | ||
|
||
elif agg_operators is None: | ||
variables_sql = variables + ['ST_Area(ST_Intersection(geo_table.geom, data_table.{data_geom_column})) / ST_area(data_table.{data_geom_column}) AS measures_proportion'.format(data_geom_column=data_geom_column)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line too long (217 > 120 characters)
grouper = 'group by data_table.{enrichment_id}'.format(enrichment_id=enrichment_id) | ||
|
||
if agg_operators: | ||
variables_sql = ['{operator}({variable} * (ST_Area(ST_Intersection(enrichment_geo_table.geom, data_table.{data_geom_column})) / ST_area(data_table.{data_geom_column}))) as {variable}'.format(variable=variable, data_geom_column=data_geom_column, operator=agg_operators[variable]) for variable in variables] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line too long (313 > 120 characters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments.
- The one about
delete_table
is important when integrating with thetoken
API. - The second one is about refactoring, there's some functions duplicated that could be extracted to a common file and imported in both points and polygons enrichment. Also the
enrich
functions have the same structure and we could do some refactor work there.
This is a great first step towards our enrichment API, I propose to merge this PR as it is so we can unblock next steps.
Let's prioritize having something working and do a refactor effort, step by step on upcoming PRs.
|
||
return response | ||
|
||
def delete_table(self, tablename, project, dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not decided yet how to delete tables, but at the moment the role we'll use in BQ does not have delete permission.
That means when we integrate with the token
API, we might have to disable deletion of tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add TTL when uploading table and avoid deleting at the end of the process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No needed, we want to deal with this in a separate issue. Don't worry about that by now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things and a +1 to the refactor of the 2 enrichment methods (but no in this one)
{filters}; | ||
'''.format(enrichment_id=enrichment_id, variables=', '.join(variables), | ||
enrichment_table=enrichment_table, enrichment_geo_table=enrichment_geo_table, | ||
user_workspace=user_workspace.replace('-', '_'), working_project=working_project, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A detail: I think .replace('-', '_')
it is already done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
credentials = credentials or get_default_credentials() | ||
bq_client = bigquery_client.BigQueryClient(credentials) | ||
|
||
user_workspace = credentials.username.replace('-', '_') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment: getting the user dataset name, should it be a responsibility of credentials class? If yes, I will add it to https://github.com/CartoDB/data-observatory/issues/147 issue
A detail: should we call it dataset? avoiding using several names for the same thing
return data_copy | ||
|
||
|
||
def __prepare_sql(enrichment_id, variables, enrichment_table, enrichment_geo_table, user_workspace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A detail: in order to set a method as private, we are using only one dash _prepare_sql
. I don't know if we should use 2 or 1 but we should use always the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be using two underscores, using just one is a convention to treat that method as a private, but you are relying on the developer. Using two underscores actually mangle the method and hide it. https://dbader.org/blog/meaning-of-underscores-in-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cartoframes/exceptions.py
Outdated
This exception is raised when a problem is encountered while using enrichment functions. | ||
""" | ||
def __init__(self, message): | ||
super(DiscoveryException, self).__init__(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line at end of file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the last changes adding geopandas, we need to talk about it, because right now it is not a CARTOframes dependency (I should be IMO). But most import, stop this one here having a feature to test and validate and start the improvements in a new branch
Thanks @oleurud Yep, let's focus on small and frequent PRs tackling one thing at a time and tied to well defined issues. That way we can iterate faster. @alejandrohall Great job BTW!! 🚀 💪 |
Closes https://github.com/CartoDB/data-observatory/issues/174
Closes https://github.com/CartoDB/data-observatory/issues/142
Closes https://github.com/CartoDB/data-observatory/issues/143