Update flatten to accept DataFrame and GeoDataFrame #103

FelipeSBarros · 2024-02-22T21:07:39Z

@cuducos the purpose of this PR is to follow the development of flatten function (#96), making it accept and work with DataFrame;

tests/test_flatten.py

crossfire/__init__.py

…e df and dict using only isinstance;

FelipeSBarros

@cuducos what do you think about this proposal?

crossfire/__init__.py

…ansformation

FelipeSBarros

Not sure if I should be commiting this, but as you always says: it is better to discuss considering code implemented then ideas (actualy you never said exactly that. but always ask to see the code... so, this is how I understand your claim.

crossfire/__init__.py

cuducos

Have you tested without pandas installed?

crossfire/__init__.py

Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>

…lation

FelipeSBarros

I know that this last commit doesn't solve the concerns your mentioned before, about the early return. But tried to work improve in caso the data is DAtaFrame....

crossfire/__init__.py

FelipeSBarros

OK, now with ìs_empty function and some tests...

crossfire/__init__.py

tests/test_flatten.py

crossfire/__init__.py

tests/test_flatten.py

… _flatten_df ot called

…n, add test with geodf

crossfire/__init__.py

Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>

crossfire/__init__.py

tests/test_flatten.py

cuducos · 2024-05-10T11:54:37Z

I agree with the last approach, but… why popping the key just to reinsert it next? Can’t you prevent popping it instead?

FelipeSBarros · 2024-05-12T02:49:09Z

I agree with the last approach, but… why popping the key just to reinsert it next? Can’t you prevent popping it instead?

@cuducos , I am not sure if I could understand your last comment. You agree it is the worst scenarious? I see the first proposal as the most insteresting: not "popping" anythink. just flattenning when there is something to be flattened.
This way the original key will be always kept. And those dicts with nested values would have new keys with the nested values flatened.

[UPDATE]
My last commits refactor _flatten function (list and pd) avoiding popping the nested keys/colummns ...

…s new expected results

… expected values

…e test w expected values" This reverts commit c222262.

crossfire/clients/occurrences.py

…tests to match new expected values

…nality

FelipeSBarros · 2024-05-15T03:21:39Z

@cuducos in my last commits I have refactored the _flaten_df so it has the same behavior as _flatten_list, when dealing with DataFrame with None values in the potential nested_columns;
Also, before merging this PR, I decided to follow your suggestion: expose flatten functionality, not as a function.

I added the flat evaluation in Occurrences' __call__
method, and updated the tests.

I am missing a integration test to confirm if when I call Occurrences(..., flat=True), the flatten() function is being called. but I am not sure if it is necessary.

I plan to implement flatten function as functionlity. I still have work to do.

FelipeSBarros · 2024-05-15T14:58:22Z

few things I still have to do:

Confirm/fix if implementation of flat paramter is fine;

Trying the implementation I realized that when requesting data as ´dictformat, theflaten` function is not being used:

from crossfire import occurrences

occs_dict = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
                id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
                initial_date='2018-04-01', flat=True, format='dict')
'contextInfo_mainReason' in occs_dict[0].keys()
#False

occs_df = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
                id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
                initial_date='2018-04-01', flat=True, format='df')
'contextInfo_mainReason' in occs_df.columns
#True

occs_gdf = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
                id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
                initial_date='2018-04-01', flat=True, format='geodf')
'contextInfo_mainReason' in occs_gdf.columns
#True

Figure out how to deal with nested columns in nested columns;

When trying the implementation I realized that contextInfo usually come with other nested key/columns as values:

occs_df = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
                id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
                initial_date='2018-04-01', flat=True, format='df')
'contextInfo_mainReason' in occs_df.columns

occs_df.iloc[0]['contextInfo_mainReason']
#{'id': 'baa3b299-67ad-41d2-aaf0-23ec8288cadb', 'name': 'Homicidio/Tentativa'}

How could we confirm if flattened key/column come with nested value and flatten then also?

Rewrite documentation;

cuducos · 2024-05-16T13:24:51Z

How could we confirm if flattened key/column come with nested value and flatten then also?
I think the point is to check the type of value:

for key, value in occurrence.items():
    if isinstance(value, dict):
        flat(value)

Maybe for Pandas it would be a string formatted as JSON, so you can try to parse it as JSON and check if it is a dictionary:

def is_nested(text):
    try:
        data = loads(text)
    except ValueError:  # not sure this is the right exception, this is just an example
        return False
    return isinstance(data, dict)

FelipeSBarros · 2024-05-17T03:49:38Z

How could we confirm if flattened key/column come with nested value and flatten then also?
I think the point is to check the type of value:
for key, value in occurrence.items():
    if isinstance(value, dict):
        flat(value)
Maybe for Pandas it would be a string formatted as JSON, so you can try to parse it as JSON and check if it is a dictionary:
def is_nested(text):
    try:
        data = loads(text)
    except ValueError:  # not sure this is the right exception, this is just an example
        return False
    return isinstance(data, dict)

Thanks for your advice. I hope I have implemented the right way...

By the way, I am facing a new problem:
when I run the test test_flatten_pd_with_nested_columns_with_nested_values only, it pass:

poetry run pytest -k test_flatten_pd_with_nested_columns_with_nested_values
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/felipe/repos/crossfire
configfile: pyproject.toml
plugins: ruff-0.2.1, anyio-4.1.0, asyncio-0.21.1
asyncio: mode=strict
collected 104 items / 103 deselected / 1 selected                                                                                                                                        

tests/test_flatten.py .                                                                                                                                                            [100%]

But running all tests, it fails.

FAILED tests/test_flatten.py::test_flatten_pd_with_nested_columns_with_nested_values - AssertionError: DataFrame are different

I have no idea of what might be causing the probem... do you have any clue?

cuducos · 2024-05-17T11:37:34Z

when I run the test test_flatten_pd_with_nested_columns_with_nested_values only, it pass:

[...] [100%]

But running all tests, it fails.

Probably because when you run this you mutate DICT_DATA_WITH_NESTED_VALUES_IN_NESTED_COLUMNS — flatten does not return a new dictionary, it mutates the one passed as an argument. So the input is different on the second test when you run both tests. You can copy the dictionary in each test to prevent that.

… making it pass

FelipeSBarros · 2024-05-17T13:50:24Z

when I run the test test_flatten_pd_with_nested_columns_with_nested_values only, it pass:
[...] [100%]
But running all tests, it fails.

Probably because when you run this you mutate DICT_DATA_WITH_NESTED_VALUES_IN_NESTED_COLUMNS — flatten does not return a new dictionary, it mutates the one passed as an argument. So the input is different on the second test when you run both tests. You can copy the dictionary in each test to prevent that.

Thanks, Cuducos! I had this in mind last night. But I was so tired that I didn't realize that I should use a deepcopy to make sure all values has copied not ony the structure.

Could you review the implementation of flatten() function as an Occurrence parameter (point 1 here )? I am not confident about the way I have implemented... I also it is not being executed when requesting the Occurrences as dict. :/

cuducos

it is not being executed when requesting the Occurrences as dict. :/

So, the test suite has a bug? It is green, meaning test_occurrences_as_list_dicts_with_flat_parameter is passing — how is it passing if flatten is not being called? Or does this test has a deceiving name?

tests/test_flatten.py

FelipeSBarros · 2024-05-19T02:30:12Z

So, the test suite has a bug? It is green, meaning test_occurrences_as_list_dicts_with_flat_parameter is passing — how is it passing if flatten is not being called? Or does this test has a deceiving name?

Yes, it seems I am the problem... I have just confirmed that the mentioned test is well written and passes. I also did a manual test requesting data from the API using format='dict' and it came as expected...
Let's move on.

…ut its usage.

FelipeSBarros · 2024-05-19T02:59:02Z

@cuducos I have updated the REAMDE adding flat as an Occurrences' parameter. I also added a small section to explaining its meaning and an example of the data returned by using it. Let me know if you agree.

cuducos

Really great work : )

FelipeSBarros · 2024-05-21T19:08:25Z

Really great work : )

I am glad to hear it that! Thank you for your patience and teaching.

create test passing df to flatten ⚠️ not passing

87175d8

FelipeSBarros commented Feb 22, 2024

View reviewed changes

tests/test_flatten.py Outdated Show resolved Hide resolved

FelipeSBarros commented Feb 22, 2024

View reviewed changes

tests/test_flatten.py Outdated Show resolved Hide resolved

FelipeSBarros changed the title ~~Update flatten to accept DataFrame~~ Update flatten to accept DataFrame and GeoDataFrame Feb 23, 2024

update flatten function to work with DataFrame, test_flatten_pd passing

82848f7

FelipeSBarros commented Feb 23, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

cuducos reviewed Feb 24, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

update flatten funnction: created _mak_flatten function to DRY; manag…

6e5b24d

…e df and dict using only isinstance;

FelipeSBarros commented Feb 24, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

cuducos requested changes Feb 25, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

update flatten function: create flatten_df trying to use less data tr…

7e5d472

…ansformation

FelipeSBarros commented Feb 25, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

cuducos requested changes Feb 26, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

FelipeSBarros and others added 4 commits February 26, 2024 16:22

update flatten function: alter import pd.concat name

277347d

update flatten function: change list to tuple

be4fffb

Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>

update flatten_df function alter to _flatten_df

8704f70

update flatten_df add data input validation considering pandas instal…

5257e79

…lation

FelipeSBarros commented Feb 26, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

update flatten: try to improve early return when data not DataFrame

1a1b730

FelipeSBarros commented Feb 26, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

create is_empty function; create tests

c6ba078

FelipeSBarros commented Feb 27, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

crossfire/__init__.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

cuducos requested changes Feb 28, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

tests/test_flatten.py Outdated Show resolved Hide resolved

FelipeSBarros added 3 commits February 28, 2024 15:05

simplify is_empty logic; update tests: use unitest.mock.patch, assert…

d91fd32

… _flatten_df ot called

update flatten function: add geodf instance validation for early retu…

0bba5e6

…n, add test with geodf

update test_flatten_gpd chagne codestyle

e9b1298

cuducos reviewed Feb 28, 2024

View reviewed changes

crossfire/__init__.py Outdated Show resolved Hide resolved

simplify is_empty even more

59b4ed4

Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>

cuducos requested changes Feb 28, 2024

View reviewed changes

FelipeSBarros added 5 commits May 11, 2024 23:51

refactor _flatten_list so it doesn't pop key from dict; refactor test…

fc9f2f3

…s new expected results

refactor _flatten_df keepong column with nested values; update test w…

c222262

… expected values

update README

2422263

Revert "refactor _flatten_df keepong column with nested values; updat…

bf9a3ee

…e test w expected values" This reverts commit c222262.

update README

7525140

cuducos previously approved these changes May 13, 2024

View reviewed changes

crossfire/clients/occurrences.py Outdated Show resolved Hide resolved

FelipeSBarros added 3 commits May 13, 2024 21:42

alter _flatten_list using flat path

d0b1181

update _flatten_df so missing nested column dont raise error; update …

7898f72

…tests to match new expected values

add flat parameter to Occurrences to call flatten function as functio…

a8f7485

…nality

FelipeSBarros added 2 commits May 16, 2024 23:03

update _flatten_list to flatten also dict values

e577f36

update _flatten_df to flatten also dict values from nested columns

4270969

update test_flatten [pd, list] with_nested_columns_with_nested_values…

22ef83d

… making it pass

cuducos requested changes May 17, 2024

View reviewed changes

tests/test_flatten.py Show resolved Hide resolved

update README adding flat as Occurrences parameter and aclaration abo…

efd25bd

…ut its usage.

FelipeSBarros requested a review from cuducos May 19, 2024 02:59

cuducos approved these changes May 21, 2024

View reviewed changes

FelipeSBarros merged commit 89a346a into master May 21, 2024
24 checks passed

FelipeSBarros deleted the make_flatten_gdf branch May 21, 2024 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update flatten to accept DataFrame and GeoDataFrame #103

Update flatten to accept DataFrame and GeoDataFrame #103

FelipeSBarros commented Feb 22, 2024

FelipeSBarros left a comment

FelipeSBarros left a comment

cuducos left a comment

FelipeSBarros left a comment

FelipeSBarros left a comment

cuducos commented May 10, 2024 •

edited

FelipeSBarros commented May 12, 2024 •

edited

FelipeSBarros commented May 15, 2024

FelipeSBarros commented May 15, 2024

cuducos commented May 16, 2024 •

edited

FelipeSBarros commented May 17, 2024

cuducos commented May 17, 2024

FelipeSBarros commented May 17, 2024

cuducos left a comment

FelipeSBarros commented May 19, 2024

FelipeSBarros commented May 19, 2024

cuducos left a comment

FelipeSBarros commented May 21, 2024

Update flatten to accept DataFrame and GeoDataFrame #103

Update flatten to accept DataFrame and GeoDataFrame #103

Conversation

FelipeSBarros commented Feb 22, 2024

FelipeSBarros left a comment

Choose a reason for hiding this comment

FelipeSBarros left a comment

Choose a reason for hiding this comment

cuducos left a comment

Choose a reason for hiding this comment

FelipeSBarros left a comment

Choose a reason for hiding this comment

FelipeSBarros left a comment

Choose a reason for hiding this comment

cuducos commented May 10, 2024 • edited

FelipeSBarros commented May 12, 2024 • edited

FelipeSBarros commented May 15, 2024

FelipeSBarros commented May 15, 2024

cuducos commented May 16, 2024 • edited

FelipeSBarros commented May 17, 2024

cuducos commented May 17, 2024

FelipeSBarros commented May 17, 2024

cuducos left a comment

Choose a reason for hiding this comment

FelipeSBarros commented May 19, 2024

FelipeSBarros commented May 19, 2024

cuducos left a comment

Choose a reason for hiding this comment

FelipeSBarros commented May 21, 2024

cuducos commented May 10, 2024 •

edited

FelipeSBarros commented May 12, 2024 •

edited

cuducos commented May 16, 2024 •

edited