-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update flatten to accept DataFrame and GeoDataFrame #103
Conversation
…e df and dict using only isinstance;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuducos what do you think about this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I should be commiting this, but as you always says: it is better to discuss considering code implemented then ideas (actualy you never said exactly that. but always ask to see the code... so, this is how I understand your claim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested without pandas
installed?
Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this last commit doesn't solve the concerns your mentioned before, about the early return. But tried to work improve in caso the data
is DAtaFrame
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now with ìs_empty
function and some tests...
… _flatten_df ot called
…n, add test with geodf
Co-authored-by: Eduardo Cuducos <4732915+cuducos@users.noreply.github.com>
I agree with the last approach, but… why popping the key just to reinsert it next? Can’t you prevent popping it instead? |
@cuducos , I am not sure if I could understand your last comment. You agree it is the worst scenarious? I see the first proposal as the most insteresting: not "popping" anythink. just flattenning when there is something to be flattened. [UPDATE] |
…s new expected results
…e test w expected values" This reverts commit c222262.
…tests to match new expected values
@cuducos in my last commits I have refactored the I added the I am missing a integration test to confirm if when I call |
I plan to implement flatten function as functionlity. I still have work to do.
few things I still have to do:
Trying the implementation I realized that when requesting data as ´dict from crossfire import occurrences
occs_dict = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
initial_date='2018-04-01', flat=True, format='dict')
'contextInfo_mainReason' in occs_dict[0].keys()
#False
occs_df = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
initial_date='2018-04-01', flat=True, format='df')
'contextInfo_mainReason' in occs_df.columns
#True
occs_gdf = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
initial_date='2018-04-01', flat=True, format='geodf')
'contextInfo_mainReason' in occs_gdf.columns
#True
When trying the implementation I realized that occs_df = occurrences(id_state='813ca36b-91e3-4a18-b408-60b27a1942ef',
id_cities='5bd3bfe5-4989-4bc3-a646-fe77a876fce0',
initial_date='2018-04-01', flat=True, format='df')
'contextInfo_mainReason' in occs_df.columns
occs_df.iloc[0]['contextInfo_mainReason']
#{'id': 'baa3b299-67ad-41d2-aaf0-23ec8288cadb', 'name': 'Homicidio/Tentativa'} How could we confirm if flattened key/column come with nested value and flatten then also?
|
for key, value in occurrence.items():
if isinstance(value, dict):
flat(value) Maybe for Pandas it would be a string formatted as JSON, so you can try to parse it as JSON and check if it is a dictionary: def is_nested(text):
try:
data = loads(text)
except ValueError: # not sure this is the right exception, this is just an example
return False
return isinstance(data, dict) |
Thanks for your advice. I hope I have implemented the right way... By the way, I am facing a new problem: poetry run pytest -k test_flatten_pd_with_nested_columns_with_nested_values
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.2, pytest-7.4.3, pluggy-1.3.0
rootdir: /home/felipe/repos/crossfire
configfile: pyproject.toml
plugins: ruff-0.2.1, anyio-4.1.0, asyncio-0.21.1
asyncio: mode=strict
collected 104 items / 103 deselected / 1 selected
tests/test_flatten.py . [100%] But running all tests, it fails. FAILED tests/test_flatten.py::test_flatten_pd_with_nested_columns_with_nested_values - AssertionError: DataFrame are different
I have no idea of what might be causing the probem... do you have any clue? |
Probably because when you run this you mutate |
Thanks, Cuducos! I had this in mind last night. But I was so tired that I didn't realize that I should use a deepcopy to make sure all values has copied not ony the structure. Could you review the implementation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not being executed when requesting the Occurrences as dict. :/
So, the test suite has a bug? It is green, meaning test_occurrences_as_list_dicts_with_flat_parameter
is passing — how is it passing if flatten
is not being called? Or does this test has a deceiving name?
Yes, it seems I am the problem... I have just confirmed that the mentioned test is well written and passes. I also did a manual test requesting data from the API using |
@cuducos I have updated the REAMDE adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great work : )
I am glad to hear it that! Thank you for your patience and teaching. |
@cuducos the purpose of this PR is to follow the development of
flatten
function (#96), making it accept and work withDataFrame
;