New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with geopandas geometries #588

Open
mattijn opened this Issue Mar 16, 2018 · 16 comments

Comments

Projects
None yet
8 participants
@mattijn
Contributor

mattijn commented Mar 16, 2018

geopandas is a package that reads many type of geometric datasets, including geojson, topojson and shapefiles (albeit through fiona) and parses it into a pandas dataframe, where the geometry is parsed as a separate column.

It would be great if this is geometry type is recognized in Altair to make a map easily.

Problems might arise with projections as Vega doesn’t support all EPSG projections, but EPSG:4326 equals to Mercator and that is supported.

@jakevdp

This comment has been minimized.

Member

jakevdp commented Mar 16, 2018

That's a really interesting idea... I've not done much work with geopandas, so I'm not certain what it would take.

@kovasb

This comment has been minimized.

kovasb commented Mar 18, 2018

Also interested in this.

@bnaul

This comment has been minimized.

bnaul commented Apr 3, 2018

Would this be as simple as wrapping GeoDataframe.to_json() and passing the GeoJSON along to whatever handles it? I couldn't actually find an altair example that plots features from GeoJSON, is it handled natively by VegaLite?

@jakevdp

This comment has been minimized.

Member

jakevdp commented Apr 3, 2018

I have no idea what it would entail... I'm not familiar enough with GeoJSON or GeoPandas, or what geo formats are supported in Vega-Lite.

@chekos

This comment has been minimized.

chekos commented Apr 12, 2018

from vega-lite's website
it seems like Vega-Lite uses GeoJSON only (can take a TopoJSON object but will convert it into GeoJSON).

I think integration with GeoPandas would be a great addition to Altair. GeoPandas can read (through fiona) a lot of different geo files and write them as GeoJSON which altair can then use for the illustration. I think a simple wrapper of the geometry series in the geodataframe to convert it to json would work. (i.e. gdf.to_file('example.geojson', driver = 'GeoJSON'))

@mattijn

This comment has been minimized.

Contributor

mattijn commented Apr 12, 2018

Got it working :)

import altair as alt
import geopandas as gpd
import pandas as pd
import json
%matplotlib inline
Load two datasets
counties = r'/Users/mattijnvanhoek/Desktop/us-10m.json'
unemp_data = r'/Users/mattijnvanhoek/Desktop/unemployment.tsv'
df = pd.read_csv(unemp_data, sep='\t')
gdf = gpd.read_file(counties, driver='TopoJSON')
gdf.id = gdf.id.astype(int)
Apply inner-join on GeoDataFrame and DataFrame (gdf should be on 'left' side and df on the right to maintain geometry properties in the resulting joined GeoDataFrame)
gdf_merged = gdf.merge(df, left_on='id', right_on='id', how='inner')
Plot the GeoDataFrame using matplotlib and print the head(). The rate column is joined to the DataFrame.
gdf_merged.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x119e79128>

output_7_1

gdf_merged.head()
geometry id rate
0 () 22051 .065
1 (POLYGON ((-90.1077214366575 30.19168413151698... 22051 .065
2 (POLYGON ((-120.8536146368232 49.0001146177235... 53073 .078
3 POLYGON ((-106.1123837970986 48.99904031068445... 30105 .046
4 POLYGON ((-114.0698488011574 48.99904031068445... 30029 .088
Prepare GeoDataFrame for Altair
# dump as json
json_gdf = gdf_merged.to_json()
# load as a GeoJSON object.
json_features = json.loads(json_gdf)
Make the Choropleth Map
# parse variable `features` from json_features to `alt.Data`
data_geo = alt.Data(values=json_features['features'])

# plot map, where variables ares nested within `properties`, 
alt.Chart(data_geo).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(
    projection={'type': 'albersUsa'},
    width=700,
    height=400
).encode(
    color='properties.rate:Q')

choropleth

👍

@pagpires

This comment has been minimized.

pagpires commented May 1, 2018

  1. Thanks a lot! The discussion here is super helpful, esp. the use of alt.Data(). However, I cannot get what's shown here working, can you show me what does your us-10m.json structure look like (I notice you directly assign it to counties so I guess the json file is different from the one from example?)

  2. Here is a more-altair, less-pandas way to make the same plot (basically use alt.Data() to wrap your local file, and use transform_lookup rather than merge)

# download file and read into variables
us_10m = vega_datasets.data.us_10m()
unemp_data = vega_datasets.data.unemployment() 

# the original unemp_data has two columns merged, need to split them
unemp_data['id'], unemp_data['rate'] = unemp_data['id\trate'].str.split('\t', 1).str

# key: convert whatever variables to altair-recognizable format by using alt.Data()
# note we need to specify the format and feed it with alt.SomeKindDataFormat()
# we need to specify feature or mesh to extract TopoJSON, the type can only be 'topojson'
counties = alt.Data(
    values=us_10m, 
    format=alt.TopoDataFormat(feature='counties',type='topojson')
)

# plot by lookup 
# same as the example in (https://altair-viz.github.io/user_guide/transform.html#lookup-transform
alt.Chart(counties).mark_geoshape().encode(
    color='rate:Q'
).properties(
    projection={'type': 'albersUsa'},
    width=500, height=300
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(unemp_data, 'id', ['rate'])
)
  1. Personally I guess pd.merge is a better way than transform_lookup(), since pd.merge separates data processing from the visualization step...

  2. I guess it would be great if we have some more examples to show how to plot local, non-pandas type of data (for now I can only think of geo-related data)? Personally I think Altair is really good for exploratory analysis, which will most likely handle private/local data...

@mattijn

This comment has been minimized.

Contributor

mattijn commented May 1, 2018

No the file is the same, but I had downloaded the file to disk first, before reading. To use vega_datasets directly for parsing remote vega data into a DataFrame and GeoDataFrame (not so straightforward for TopoJSON data) do as follow:

import altair as alt
import geopandas as gpd
import pandas as pd
import json
# extra
from vega_datasets import data
import requests
import fiona
%matplotlib inline
# load the tab separated unemployment file into a DataFrame
df = pd.read_csv(data.unemployment.url, sep='\t')
# parse the us_10m topojson file into memory
request = requests.get(data.us_10m.url)
visz = fiona.ogrext.buffer_to_virtual_file(bytes(request.content))

# read the features from a fiona collection into a GeoDataFrame
with fiona.Collection(visz, driver='TopoJSON') as f:
    gdf = gpd.GeoDataFrame.from_features(f, crs=f.crs)
# continue as above
gdf.id = gdf.id.astype(int)
gdf_merged = gdf.merge(df, left_on='id', right_on='id', how='inner')
gdf_merged.head()
geometry id rate
0 () 22051 0.065
1 (POLYGON ((-90.1077214366575 30.19168413151698... 22051 0.065
2 (POLYGON ((-120.8536146368232 49.0001146177235... 53073 0.078
3 POLYGON ((-106.1123837970986 48.99904031068445... 30105 0.046
4 POLYGON ((-114.0698488011574 48.99904031068445... 30029 0.088

Continue from step Prepare GeoDataFrame for Altair in previous comment

@jakevdp

This comment has been minimized.

Member

jakevdp commented May 1, 2018

Would it make sense to build some of that data preparation into vega_datasets?

@mattijn

This comment has been minimized.

Contributor

mattijn commented May 1, 2018

Yes, if the result is a GeoDataFrame it would be much cleaner (and people won't run away yet)

@iliatimofeev

This comment has been minimized.

Contributor

iliatimofeev commented May 3, 2018

I think it's little bit simpler

import altair as alt
import pandas as pd
import geopandas as gpd

alt.renderers.enable('notebook')


world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world  = world[world.continent!='Antarctica'] # do not display Antarctica

data  = alt.InlineData(values = world.to_json(), #geopandas to geojson string
                       # root object type is "FeatureCollection" but we need its features
                       format = alt.DataFormat(property='features',type='json')) 
alt.Chart(data).mark_geoshape(
).encode( 
    color='properties.pop_est:Q', # DataFrame fields are accessible through a "properties" object 
    tooltip='properties.name:N'
).properties( 
    projection={"type":'mercator'},
    width=500,
    height=300
)

visualization 1

But it will crush if we add Timestamp type field to DataFrame. To to avoid crashing it could be sanitized by alt.InlineData(values = alt.utils.core.sanitize_dataframe(world).to_json(),

In general case will be great to support any object with geo_interface that is widely supported by python GIS libraries. I suggest to have a special class for this case something like: data = alt.GeoData(world). I could make a PR, if it is needed.

@jakevdp

This comment has been minimized.

Member

jakevdp commented May 4, 2018

I suggest to have a special class for this case something like: data = alt.GeoData(world). I could make a PR, if it is needed.

That would be great!

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue May 5, 2018

@iliatimofeev iliatimofeev referenced a pull request that will close this issue May 6, 2018

Open

Integration with geopandas #588 #818

3 of 3 tasks complete

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue May 16, 2018

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue May 16, 2018

Merge remote-tracking branch 'altair-viz/master' into altair-viz#588-…
…geopandas

# Conflicts:
#	altair/__init__.py
#	doc/conf.py
#	doc/getting_started/installation.rst
#	setup.py

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue May 16, 2018

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue May 16, 2018

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue Jun 9, 2018

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue Jun 10, 2018

Merge remote-tracking branch 'altair-viz/master' into it-altair-viz#588
…-geopandas

# Conflicts:
#	altair/utils/data.py

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue Jun 10, 2018

Merge remote-tracking branch 'altair-viz/master' into it-altair-viz#588
…-geopandas

# Conflicts:
#	altair/utils/data.py

iliatimofeev added a commit to iliatimofeev/altair that referenced this issue Jul 26, 2018

@iliatimofeev

This comment has been minimized.

Contributor

iliatimofeev commented Aug 15, 2018

To avoid static dependencies between Altair and GeoPandas I have published gpdvega as a connector.
See details in documentation https://iliatimofeev.github.io/gpdvega/

@JoeGermuska

This comment has been minimized.

JoeGermuska commented Sep 12, 2018

For what it's worth, I installed gpdvega and tried this with my own data, and found that it works well and as expected.

My taste would be for tighter integration rather than yet-another-library, but I recognize that I'm not trying to maintain a rapidly evolving library in a rapidly evolving ecosystem. If it won't be integrated, at least a reference in the altair docs would be great.

@jakevdp

This comment has been minimized.

Member

jakevdp commented Sep 12, 2018

I agree – sorry this has been so slow, but getting it more tightly integrated depends on a redesign of the data_transformer architecture that hasn't happened yet.

@JoeGermuska

This comment has been minimized.

JoeGermuska commented Sep 12, 2018

sure, i totally understand... just wanted to verify that it's solid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment