Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fio command for calculating new properties #273

Closed
perrygeo opened this issue Sep 22, 2015 · 8 comments · Fixed by #348
Closed

fio command for calculating new properties #273

perrygeo opened this issue Sep 22, 2015 · 8 comments · Fixed by #348

Comments

@perrygeo
Copy link
Collaborator

fio calc could take a stream of GeoJSON features on stdin calculate additional properties.

For each feature, it could evaluate one or more expressions, place the derived variable(s) in the properties dict then print the feature to stdout.

So for this input:

{'type': 'Feature',
 'id': 1,
 'properties': {'a': 1, 'b': 2}}

You might have a CLI something like this

fio calc --new "c" "f['properties']['a'] + f['properties']['b'] * 2"

which would yield

{'type': 'Feature',
 'id': 1,
 'properties': {'a': 1, 'b': 2, 'c': 5}}

There are obviously some details to work out

  • what is in the namespace when the expression is evaluated
  • python expressions or some other DSL?
  • what's the best interface for args and options

If there's interest in the general idea, I can get started on a PR

@geowurster
Copy link
Member

@perrygeo Yes! I have been thinking about this as well but have not had a reason to dig in. I have a fio-geoprocessing project that has been sitting around for a while that might interest you. The goal is to take advantage of click command chaining to string together a bunch of shapely and/or custom commands that operate on single features. The open PR covers some of the background but the big question is whether or not we can make the current fio CLI work with click chaining and what complications that introduces for plugin commands. I haven't had much time to work on it, but this issue and #272 issue are both features I have thought about or prototyped.

This could look something like:

fio \
    cat infile.geojson \
    buffer --dist 100 \
    simplify --tolerance 10 \
    reproject --dst-crs EPSG:4326 \
    calc --new area "shape(f['geometry']).area" \
    load --driver GeoJSON out.geojson

Or maybe just reading from stdin and writing to stdout with cat and loads handling the streaming and dumping to disk:

fio cat infile.geojson | \
    buffer --dist 100 \
    simplify --tolerance 10 \
    reproject --dst-crs EPSG:4326 \
    calc --new area "shape(feature['geometry']).area" |\
        fio load --driver GeoJSON out.geojson

Or as its own subcommand with an initial command to open the file and a final command to save:

fio pipe \
    open infile.geojson \
    buffer --dist 100 \
    simplify --tolerance 10 \
    reproject --dst-crs EPSG:4326 \
    calc --new area "shape(feature['geometry']).area" \
    save --driver GeoJSON out.geojson

@perrygeo
Copy link
Collaborator Author

@geowurster Thanks for pointing out fio-geoprocessing - that really opens up the doors for some awesome fio-based workflows. I'll check it out...

As for the click command chaining, it looks great, cleaner syntax for sure. And it avoids serializing-deserializing at each command, firing up a new interpreter, etc. A bit OT for this issue but where does #173 stand?

command-chaining vs pipes aside, what do you think about the interface (--new propertyname expression) and the use of python expressions in general? I noticed you included a call to shape in your example so there is at least one function we should have in the namespace - any others that might be useful?

@geowurster
Copy link
Member

@perrygeo I only pointed out fio-geoprocessing because we both have had some of the same thoughts and wanted you to see how far I got in the context of this issue, #173, and #272.

#173 is held up by some design decisions that are dictated by whether or not the current CLI can be made chain-able or if chaining will need to be relegated to a subcommand. Each have their pros and cons. I haven't had time to experiment with it but it miiiiiiight be possible to make the current CLI chain-able.

As far as the syntax and use of Python expressions go, see my comment in #272.

@perrygeo
Copy link
Collaborator Author

Does @geowurster's pyin utility cover this use case? Yep, and more. Maybe a fio calc is superfluous?

the original example would become

pyin \
  "json.loads(line)" \
  "line['properties']['c'] = line['properties']['a'] + line['properties']['b'] * 2"

simple enough and more general-purpose. If we had dot notation in pyin, it starts looking even better at the expense of being a little bit longer

pyin \
  "json.loads(line)" \
  "from munch import munchify; munchify(line)" \
  "line.properties.c = line.properties.a + line.properties.b * 2"

So is there any value in a fio calc when other tools will work? Maybe it would just be a convenience to handle the json and munch stuff and expose the feature as f.

@geowurster
Copy link
Member

@perrygeo No need to import munch, "munch.munchify(json.loads(line))" works automagically™.

Unfortunately eval() can't handle assignment, so your last statement throws a syntax error, but exec() can. With a little regex to direct expressions to the right function, and some experimentation to see what exec() has access to, I can definitely bring it into pyin.

Another option is to depend on pyin and use its pmap() generator for fio calc, which would let us control the variable name and feed it loaded dictionaries. Adding an optional base scope argument would be pretty easy.

@sgillies
Copy link
Member

sgillies commented May 5, 2016

@geowurster @perrygeo is this one resolved or is it still active?

@perrygeo
Copy link
Collaborator Author

perrygeo commented May 6, 2016

I'd love to see something like this, the geojson cli equivalent to "attribute table calculator" which is core functionality in many GIS workflows. Just waiting for some consensus on if/how it should be implemented in fiona. Still an open conversation, let's not close just yet.

My take: adding this functionality would be simple and generically useful. I've developed lots of feature processing functions which do more specialized tasks (e.g. add a new id property with a uuiid) but I realize they could all be solved with a general tool that evaluates python expressions and stores those values as properties.

Is that a good idea? Are there other general json processing tools that already do this? Or is this the realm of python code and doesn't really fit the command line?

@geowurster
Copy link
Member

geowurster commented May 18, 2016

Agreed that a command line field calculator would be a great addition. Seems like eval() should be able to handle this as well, although I'm not sure it supports item assignment, so maybe exec()?

@perrygeo perrygeo mentioned this issue May 21, 2016
@sgillies sgillies added this to the 1.7 milestone May 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants