-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fiona <-> OGR field mappings take 2 (refactor) #1366
Conversation
Resolves #1321
Instead we dispatch to cdef classes kept in several new mappings.
Instead we dispatch to cdef classes kept in several new mappings.
elif ftype.startswith('int'): | ||
width = int((ftype.split(':')[1:] or ['0'])[0]) | ||
if GDAL_VERSION_NUM >= 2000000 and (width == 0 or width >= 10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We require GDAL 3.4+. I'm removing these checks as I encounter them.
@@ -24,9 +28,10 @@ def test_types(): | |||
assert prop_type("str") == str | |||
assert isinstance(0, prop_type("int")) | |||
assert isinstance(0.0, prop_type("float")) | |||
assert prop_type("date") == FionaDateType | |||
assert prop_type("date") == str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FionaDateType
no longer derives from str
.
types = set(FIELD_TYPES) | ||
types.remove(None) | ||
return list(sorted(types)) + [None] | ||
types = set(NAMED_FIELD_TYPES.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIELD_TYPES
remains, but internal usage now uses the more complete NAMED_FIELD_TYPES
to get, for example FionaBooleanType
by its name "bool"
.
|
||
cdef set(self, OGRFeatureH feature, int i, object value, object kwds): | ||
"""Set the value of a feature's field.""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes that derive from this one replace the ugly if/def statements in the feature builders.
@@ -225,7 +559,37 @@ cdef class FeatureBuilder: | |||
argument is not destroyed. | |||
""" | |||
|
|||
cdef build(self, void *feature, encoding='utf-8', bbox=False, driver=None, ignore_fields=None, ignore_geometry=False): | |||
cdef object driver | |||
cdef object property_getter_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some state to the class so that we can reuse field getter and setters.
props[key] = bool(OGR_F_GetFieldAsInteger64(feature, i)) | ||
else: | ||
if fieldkey in self.property_getter_cache: | ||
getter = self.property_getter_cache[fieldkey] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we try to reuse an existing field getter.
string_list_index += 1 | ||
else: | ||
props[key] = None | ||
props[key] = getter.get(feature, i, {"encoding": encoding}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that old crap replaced by this 😄
value = json.dumps(value) | ||
|
||
setter.set(cogr_feature, i, value, {"encoding": encoding}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm immoderately pleased about this.
|
||
if cogr_layer == NULL: | ||
raise ValueError("Null layer") | ||
|
||
cdef OGRFeatureBuilder feat_builder = OGRFeatureBuilder(driver=collection.driver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an optimization that I've overlooked for a while: reusing the feature builder.
To reproduce behavior of Fiona 1.9
(OFTBinary, OFSTNone, "bytearray"): BinaryField, | ||
(OFTBinary, OFSTNone, "memoryview"): BinaryField, | ||
(OFTStringList, OFSTNone, "list"): StringListField, | ||
(OFTString, OFSTJSON, "dict"): JSONField, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispatching to a field setter based on field type and value type restores the implicit behavior in fiona version 1.9.x.
@mwtoews @snorfalorpagus @perrygeo I'm tagging you for review of this refactor PR. Meanwhile I'm going to work on benchmarking. |
Last call for review! It's a lot to digest, I know, and if you're primarily using fiona via geopandas, maybe not interesting. Still, I appreciate a skim if you can. |
Follow up on #1358.
The project's original mappings predated OGR's sub-types (https://gdal.org/development/rfc/rfc50_ogr_field_subtype.html) and those were kinda bolted on. In this version we embrace them. We map
(fieldtype, subtype)
tuples to Fiona field types.The extremely gross sections of
if/else
conditionals in the OGR and Fiona feature builders have been reduced to looking up a class in one of the maps and calling one of its methods. The increase in lines of code is largely explained by the new small classes and lack of normalization.Original tests pass, save one, which had the wrong expected value.
Benchmarking: this script runs in 0.49 seconds with Fiona 1.9.6 and 0.44 seconds with the code in this PR. Measured by cProfile. So, a ~10% speedup for a 288 record dataset with features that have 60+ fields.
This PR is about making the code more readable and easier to maintain. A small speedup is a nice bonus.
When reviewing, use the "split" view, it's much easier to read for these particular changes.