Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing ObsCollection always returns "obs" column #210

Closed
martinvonk opened this issue Apr 12, 2024 · 7 comments · Fixed by #220
Closed

Indexing ObsCollection always returns "obs" column #210

martinvonk opened this issue Apr 12, 2024 · 7 comments · Fixed by #220
Assignees
Labels
bug Something isn't working

Comments

@martinvonk
Copy link
Collaborator

martinvonk commented Apr 12, 2024

Not sure why this happens but when I use loc on the ObsCollection the column with the "obs" is always returned

oc_sel = oc.loc[:, ["x", "y"]]
oc_sel.columns -> Index(['x', 'y', 'obs'], dtype='object')

Same happens with iloc
oc.iloc[:, [0,1]]

@martinvonk
Copy link
Collaborator Author

This is the reason why #209 fails

@martinvonk
Copy link
Collaborator Author

martinvonk commented Apr 12, 2024

Happened to me after updating pandas to v.2.2.2
Updating pandas (2.2.1 -> 2.2.2)

martinvonk added a commit that referenced this issue Apr 12, 2024
@martinvonk martinvonk self-assigned this Apr 12, 2024
@martinvonk
Copy link
Collaborator Author

Not sure if we need to do something with this. #209 is already fixed.
Maybe if behavior keeps existing after v2.2.2 we need to adress this.

@martinvonk martinvonk added the bug Something isn't working label Apr 12, 2024
@OnnoEbbens
Copy link
Collaborator

It has become an issue in the bro_bronhouder notebook now as well. I think it has something to do with this error: pandas-dev/pandas#57032

@OnnoEbbens
Copy link
Collaborator

Geopandas had the same problem: geopandas/geopandas#3060

@OnnoEbbens
Copy link
Collaborator

For now I will pin the pandas version to 2.2.1 or lower because this also gives errors in nlmod.

OnnoEbbens added a commit that referenced this issue Apr 17, 2024
OnnoEbbens added a commit that referenced this issue Jun 25, 2024
@OnnoEbbens OnnoEbbens linked a pull request Jun 25, 2024 that will close this issue
@OnnoEbbens
Copy link
Collaborator

It seems like a change in pandas behavior that caused this issue can actually help to improve hydropandas.

Behavior pandas <=2.2.1:

oc.loc[:, ["x", "y"]]

returned an ObsCollection. Just like in geopandas:

gdf.loc[:, ['id']]

would return a GeoDataframe

Recent changes to pandas (=2.2.2) changed this behavior in such a way that:

oc.loc[:, ["x", "y"]]

returns a DataFrame. Due to this behavior the ObsCollection constructor would attach an Obs column to it. I was quite surprised that the loc function calls the constructor when it returns an object but apparently it does.

I added an extra check to the constructor to see if it was called using a DataFrame and an obs_list/ObsClass. Only if one of the obs_list/ObsClass is given the 'obs' column is attached.

We should improve hydropandas further by following the behavior of Geopandas. If you slice an ObsCollection without the 'obs' column a DataFrame should be returned. Now an ObsCollection without the 'obs' column is returned which is an invalid ObsCollection.

We could even think about supporting an ObsSeries. Atm when you obtain a single column from an ObsCollection it will always return a pandas Series. For now I don't really like to do this. I think this adds more headache than hooray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants