Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support orient keyword in to_json #3187

Open
1 task
paddymul opened this issue Feb 16, 2024 · 3 comments
Open
1 task

Support orient keyword in to_json #3187

paddymul opened this issue Feb 16, 2024 · 3 comments

Comments

@paddymul
Copy link
Contributor

  • [X ] I have checked that this issue has not already been reported.

  • [X ] I have confirmed this bug exists on the latest version of geopandas.

  • (optional) I have confirmed this bug exists on the main branch of geopandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
from shapely.geometry import Point
import geopandas
d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
gdf = geopandas.GeoDataFrame(d, crs="EPSG:3857")
gdf.to_json(orient='table')

Problem description

I have a table library that supports pandas. I expected to be able to serialize a GeoDataFrame with the same serializer methods, but to_json doesn't support orient at all.

I understand adding extra arguments for geo specific features, but I would expect pandas default arguments to work.

Expected Output

given that pandas of

import pandas as pd
print(pd.DataFrame({'a': [{'foo':9, 'bar':10}, {'c':9}]}).to_json(orient='table', indent=2))

outputs as

{
  "schema":{
    "fields":[  { "name":"index",  "type":"integer"  },   {  "name":"a",    "type":"string"  } ],
    "primaryKey":[ "index"  ],
    "pandas_version":"1.4.0"
  },
  "data":[
    { "index":0,  "a":{ "foo":9, "bar":10  }  },
    { "index":1,  "a":{ "c":9  }
    }
  ]
}

I would expect the following GeoPandas to output as

d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
gdf = geopandas.GeoDataFrame(d, crs="EPSG:3857")
gdf.to_json(orient='table')

to output as

{
    "type": "FeatureCollection",
    "schema":{
        "fields":[  { "name":"id",       "type":"integer"  },
                    { "name":"col1",     "type":"string"  },
                    { "name":"geometry", "type": "FeatureCollection"}],
        "primaryKey":[ "id"  ],
        "pandas_version":"geo-1.4.0"
  },

    "data": [
        {"col1": "name1",
         "id": "0",
         "features": {"type": "Feature",
                      "geometry": {
                          "type": "Point",
                          "coordinates": [1.0, 2.0 ]}}},
        {"col1": "name2",
         "id": "1",
         "features": {"type": "Feature",
                      "geometry": {
                          "type": "Point",
                          "coordinates": [2.0, 1.0 ]}}}
    ]
}
 

Output of geopandas.show_versions()

SYSTEM INFO

python : 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
executable : /usr/bin/python3
machine : Linux-6.1.58+-x86_64-with-glibc2.35

GEOS, GDAL, PROJ INFO

GEOS : 3.11.2
GEOS lib : None
GDAL : 3.6.4
GDAL data dir: /usr/local/lib/python3.10/dist-packages/fiona/gdal_data
PROJ : 9.3.0
PROJ data dir: /usr/local/lib/python3.10/dist-packages/pyproj/proj_dir/share/proj

PYTHON DEPENDENCIES

geopandas : 0.13.2
numpy : 1.25.2
pandas : 1.5.3
pyproj : 3.6.1
shapely : 2.0.2
fiona : 1.9.5
geoalchemy2: None
geopy : 2.3.0
matplotlib : 3.7.1
mapclassify: None
pygeos : None
pyogrio : None
psycopg2 : 2.9.9 (dt dec pq3 ext lo64)
pyarrow : 10.0.1
rtree : None

@m-richards
Copy link
Member

Thanks @paddymul for opening this issue. I wouldn't say this is a bug per se, but a mismatch of expectations. In the geospatial world the standard json representation of features is GeoJson - which specifies a specific layout for storing features separate to geometry alongside storing spatial metadata like a CRS (coordinate reference system).

(For instance compare the output of gdf.to_json(indent=2) and pd.DataFrame(gdf.to_wkt()).to_json(indent=2) in your example).

In principle I think we would be open to supporting other orientations, so long as that does not lead to confusion about what is a generic interoperable geojson format and what is a pandas specific format. It'd be good for other maintainers to weight in, but my initial proposal to support this would be to introduce an orient="GeoJSON" which is the default value and support the other values via pandas. Geometry would be exported as WKT and we'd probably need to emit a warning about the loss of CRS metadata (and then perhaps a second stage would be to agree on a way to embed CRSs if there was an appetite for that.

@m-richards m-richards changed the title BUG: GeoPandas to_json doesn't support orient like pandas does Support orient keyword in to_json Feb 25, 2024
@paddymul
Copy link
Contributor Author

paddymul commented Apr 3, 2024

FWIW I was able to get a regular pandas dataframe with the following code:

pd_df = pd.DataFrame(dict(zip(df.columns, df.to_numpy().T)))

Then I serialize via existing pandas methods.

I no longer need geopandas to change the behavior of to_json, but I figure this code could be useful for others who stumble across this problem.

@m-richards
Copy link
Member

FWIW I was able to get a regular pandas dataframe with the following code:

pd_df = pd.DataFrame(dict(zip(df.columns, df.to_numpy().T)))

I haven't compared the performance, but I think a slightly clearer way to express this would be pd.DataFrame(gdf).astype({"geometry":"object"}) (or a generalisation of it if there are multiple geometry columns), but it is more verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants