Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/refactor enrichment functions and use token #1034

Merged
merged 35 commits into from
Oct 2, 2019

Conversation

alejandrohall
Copy link
Contributor

@alejandrohall alejandrohall commented Sep 27, 2019

Solves a part of https://github.com/CartoDB/data-observatory/issues/147

  • use the access token
  • refactoring enrichment functions
  • split the main task in several ones (auth things, upload dataframe, enrichment, final dataframe preparation) allowing testing each part of the process
  • remove all the stuff related with the temp table deletion
  • add tests

@simon-contreras-deel simon-contreras-deel changed the title Refactor/refactor enrichment functions Refactor/refactor enrichment functions and use token Sep 27, 2019
@simon-contreras-deel
Copy link
Contributor

Please @alejandrohall can you do a CR?

data_copy = data_copy.merge(data_geometry_id_enriched, on=_ENRICHMENT_ID, how='left')\
.drop(_ENRICHMENT_ID, axis=1)
data_enriched = enrich(_prepare_sql, data=data, variables=variables, agg_operators=agg_operators,
data_geom_column=data_geom_column, filters=filters, credentials=credentials)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid requesting this parameter data_geom_column='geom' using https://github.com/CartoDB/cartoframes/blob/develop/cartoframes/utils/columns.py#L155. We would need to extract that method first.

Probably it is not a priority

Copy link
Contributor Author

@alejandrohall alejandrohall Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about if the user has two columns with geometries? and what about if the column with geometries has a name not related to usual geometry names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for not supporting dataframes with more than one geometry column by design, otherwise we'll end up trying to do magic to support corner cases.

About geometry column name, it has to be abstracted somehow (the columns module could work) otherwise we'll end up with different logic in different places to do the same thing.

My two cents :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote for not supporting dataframes with more than one geometry column by design

+1 We are doing that way in DB too.

we'll end up with different logic in different places to do the same thing

@alejandrohall and I have talked about other things here, mainly describing the initial source of pain is not having geoDataFrames, but this is a bigger task.

So, I agree with @alrocar, that is the point. We should solve it in a similar fashion in every place. If you want @alejandrohall, I can do it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buy maybe we can do it in a different PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey lets remove the geo column name provided by the user!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to an external issue #1046

except RefreshError:
self.client = self._init_client()
try:
return func(self, *args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally what happened with retries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not happen 2 times :) or the error should be different

try:
return func(self, *args, **kwargs)
except RefreshError:
raise CartoException('Something went wrong accessing data. '
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add more information regarding the error, that is due to the token or smth because if the user doesn't have internet or whatever and get that error will contact with support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are catching only RefreshError, so everything is working on the user side. But for some reason, the user is not able to refresh the token, but we don't want to share big query information to the user. This is why I have set this basic message about it.

cartoframes/data/enrichment/enrichment_service.py Outdated Show resolved Hide resolved
if isinstance(data, Dataset):
data = data.dataframe

data_copy = data.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this copy? #1045

@simon-contreras-deel
Copy link
Contributor

Merging this one. We put in the "pending column" #1045, #1046 and more testing

@simon-contreras-deel simon-contreras-deel merged commit c5af553 into develop Oct 2, 2019
@simon-contreras-deel simon-contreras-deel deleted the refactor/refactor_enrichment_functions branch October 2, 2019 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants