Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/do discovery #949

Merged
merged 26 commits into from
Sep 11, 2019
Merged

Feature/do discovery #949

merged 26 commits into from
Sep 11, 2019

Conversation

alrocar
Copy link
Contributor

@alrocar alrocar commented Aug 30, 2019

Closes #960

cartoframes/data/catalog/dataset.py Outdated Show resolved Hide resolved
cartoframes/data/catalog/country.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_variable_repo.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_geography_repo.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_dataset_repo.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_country_repo.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_category_repo.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_category_repo.py Outdated Show resolved Hide resolved
test/data/catalog/test_do.py Outdated Show resolved Hide resolved
test/data/catalog/test_dataset.py Outdated Show resolved Hide resolved
test/data/catalog/test_country.py Outdated Show resolved Hide resolved
test/data/catalog/test_country.py Outdated Show resolved Hide resolved
test/data/catalog/test_country.py Outdated Show resolved Hide resolved
test/data/catalog/repository/test_category_repo.py Outdated Show resolved Hide resolved
do_metadata_key.py Outdated Show resolved Hide resolved
test/data/catalog/test_variable.py Outdated Show resolved Hide resolved
test/data/catalog/test_geography.py Outdated Show resolved Hide resolved
test/data/catalog/test_category.py Outdated Show resolved Hide resolved
@alrocar
Copy link
Contributor Author

alrocar commented Sep 9, 2019

I see tests are in red (also we already merged the namespaces refactor to the develop branch). Could we try to merge develop + make the tests pass before reviewing? 🙏 I think that would be easier.

Thanks!

@esloho
Copy link
Contributor

esloho commented Sep 10, 2019

Tests are green, weeeeee!!

try:
from unittest.mock import Mock, patch
except ImportError:
from mock import Mock, patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mock.Mock' imported but unused

test/data/observatory/test_geography.py Show resolved Hide resolved
try:
from unittest.mock import Mock, patch
except ImportError:
from mock import Mock, patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mock.Mock' imported but unused

try:
from unittest.mock import Mock, patch
except ImportError:
from mock import Mock, patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mock.Mock' imported but unused

try:
from unittest.mock import Mock, patch
except ImportError:
from mock import Mock, patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mock.Mock' imported but unused

try:
from unittest.mock import Mock, patch
except ImportError:
from mock import Mock, patch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'mock.Mock' imported but unused

import pandas as pd

from cartoframes.data.observatory.repository.category_repo import get_category_repo
from cartoframes.data.observatory.repository.dataset_repo import get_dataset_repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we want the API methods to be static we can't have the repositories instances in these classes, that's why I created those helper methods, but I'm not sure if I like it xD

Would be better to import the Repo instead and create a private instance for the module? Something like:

from cartoframes.data.catalog.repository.category_repo import CategoryRepository

 __category_repo = CategoryRepository()

Copy link
Contributor Author

@alrocar alrocar Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going to use the Repository pattern to switch to a different backend API when needed, what of these options would be easier to replace when we have a different backend API?

  1. The wrapper functions
  2. Instantiate the repository class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the helper. Less code and less coupling.

what of these options would be easier to replace when we have a different backend API?

I think, we could change to a different backend just changing the RepoClient.

Copy link
Contributor Author

@alrocar alrocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upss turns out I'm the author of this pull request, so I can just comment.

👏 Great work so far!

I added a bunch of comments mostly of aesthetic stuff. I'd do this:

  • Let's try to change to relative paths to follow our library convention.
  • The rest of my comments, don't need to be tackled in this PR, otherwise it's going to be a never ending task.

I'd merge this one (after the relative paths thing being changed) so we can start integrating with the enrichment functions and create a new issue with some of my suggestions (the ones we decide to tackle).

cartoframes/data/observatory/catalog.py Outdated Show resolved Hide resolved
cartoframes/data/observatory/catalog.py Show resolved Hide resolved
import pandas as pd

from cartoframes.data.observatory.repository.category_repo import get_category_repo
from cartoframes.data.observatory.repository.dataset_repo import get_dataset_repo
Copy link
Contributor Author

@alrocar alrocar Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going to use the Repository pattern to switch to a different backend API when needed, what of these options would be easier to replace when we have a different backend API?

  1. The wrapper functions
  2. Instantiate the repository class

def _to_categories(results):
from cartoframes.data.observatory.category import Categories

return Categories([CategoryRepository._to_category(result) for result in results])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my previous comment, I see we have two ways of creating a Category instance, am I right?

CategoryRepository._to_category(result) or using the constructor.

If that's true, I'd try to standardize that.

This comment applies to the rest of repositories.

@@ -0,0 +1,1171 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Github does not deal well with changes in notebook files, I think it's better to commit the notebooks without the outputs. So let's remove them, please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we added it here only for sharing and feedback gathering but I guess there's no need of that anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's useful. Let's remove them then before merging.

_GEOGRAPHY_FIELD_ID = 'id'


class Geography(pd.Series):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing in the data model, that Geography has a geom_coverage column of type GEOMETRY. How are we dealing with this type of column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not, panda is dealing with it. We don't deal with columns anymore. Does it sound right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 It's not important right now, but let's take into account in the future, since if we want to do some spatial operation, we'd need to map Geography as a geodataframe instead of a dataframe.

Copy link
Contributor

@oleurud oleurud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first comment about singleton

Copy link
Contributor

@oleurud oleurud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks cool :) I have added some comments and one more here: maybe we could add a parent class of the models and another one of the repositories, avoiding repeat some methods (JAVA vision :) ), but probably I would not do it now in this PR

import pandas as pd

from cartoframes.data.observatory.repository.category_repo import get_category_repo
from cartoframes.data.observatory.repository.dataset_repo import get_dataset_repo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the helper. Less code and less coupling.

what of these options would be easier to replace when we have a different backend API?

I think, we could change to a different backend just changing the RepoClient.

return Category({
'id': result['id'],
'name': result['name']
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping the classes, we could have a problem with previous versions in the future: if we release a version doing the mapping, and after that, we change a column from the database, the released version will fail.

As we want to have all the catalog public, we could put the responsibility of the API properties in the backend and making the CARTOframes client transparent


@staticmethod
def _to_country(result):
from cartoframes.data.observatory.country import Country
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer doing the import on the top of the file. Are you trying to avoid the circular pain?

Copy link
Contributor

@esloho esloho Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see my comment above. This is the only way I found to make it work without changing the API. I think it's a design flaw due to mixing model and behavior :(

class TestCategoryRepo(unittest.TestCase):

def setUp(self):
RepoClient.get_categories = Mock(return_value=[db_category1, db_category2])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is not a multiple of four

Copy link
Contributor Author

@alrocar alrocar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 LGTM!

I cannot approve the PR because I created it, but it's approved.

Let's clean the notebook output before merging, please.

@esloho esloho merged commit e3515b3 into develop Sep 11, 2019
@Jesus89 Jesus89 deleted the feature/do-discovery branch September 30, 2019 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discovery API to return DataFrames instances (navigation)
4 participants