Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fiona <-> OGR field mappings take 2 (refactor) #1366

Merged
merged 14 commits into from
Apr 3, 2024
Merged

Fiona <-> OGR field mappings take 2 (refactor) #1366

merged 14 commits into from
Apr 3, 2024

Conversation

sgillies
Copy link
Member

@sgillies sgillies commented Mar 29, 2024

Follow up on #1358.

The project's original mappings predated OGR's sub-types (https://gdal.org/development/rfc/rfc50_ogr_field_subtype.html) and those were kinda bolted on. In this version we embrace them. We map (fieldtype, subtype) tuples to Fiona field types.

The extremely gross sections of if/else conditionals in the OGR and Fiona feature builders have been reduced to looking up a class in one of the maps and calling one of its methods. The increase in lines of code is largely explained by the new small classes and lack of normalization.

Original tests pass, save one, which had the wrong expected value.

Benchmarking: this script runs in 0.49 seconds with Fiona 1.9.6 and 0.44 seconds with the code in this PR. Measured by cProfile. So, a ~10% speedup for a 288 record dataset with features that have 60+ fields.

import fiona

with fiona.open(
    "ne_10m_admin_0_countries_lakes/ne_10m_admin_0_countries_lakes.shp"
) as src:
    with fiona.open("/tmp/test_perf.shp", "w", **src.profile) as dst:
        dst.writerecords(src)

This PR is about making the code more readable and easier to maintain. A small speedup is a nice bonus.

When reviewing, use the "split" view, it's much easier to read for these particular changes.

@sgillies sgillies added this to the 1.10 milestone Mar 29, 2024
@sgillies sgillies self-assigned this Mar 29, 2024
@sgillies sgillies changed the title Issue1321bis Fiona <-> OGR field mappings take 2 Mar 29, 2024
@sgillies sgillies marked this pull request as draft March 29, 2024 21:23
fiona/ogrext.pyx Outdated Show resolved Hide resolved
fiona/ogrext.pyx Outdated Show resolved Hide resolved
elif ftype.startswith('int'):
width = int((ftype.split(':')[1:] or ['0'])[0])
if GDAL_VERSION_NUM >= 2000000 and (width == 0 or width >= 10):
Copy link
Member Author

@sgillies sgillies Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require GDAL 3.4+. I'm removing these checks as I encounter them.

tests/test_collection.py Outdated Show resolved Hide resolved
@@ -24,9 +28,10 @@ def test_types():
assert prop_type("str") == str
assert isinstance(0, prop_type("int"))
assert isinstance(0.0, prop_type("float"))
assert prop_type("date") == FionaDateType
assert prop_type("date") == str
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FionaDateType no longer derives from str.

types = set(FIELD_TYPES)
types.remove(None)
return list(sorted(types)) + [None]
types = set(NAMED_FIELD_TYPES.keys())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIELD_TYPES remains, but internal usage now uses the more complete NAMED_FIELD_TYPES to get, for example FionaBooleanType by its name "bool".


cdef set(self, OGRFeatureH feature, int i, object value, object kwds):
"""Set the value of a feature's field."""
raise NotImplementedError
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes that derive from this one replace the ugly if/def statements in the feature builders.

@@ -225,7 +559,37 @@ cdef class FeatureBuilder:
argument is not destroyed.
"""

cdef build(self, void *feature, encoding='utf-8', bbox=False, driver=None, ignore_fields=None, ignore_geometry=False):
cdef object driver
cdef object property_getter_cache
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some state to the class so that we can reuse field getter and setters.

props[key] = bool(OGR_F_GetFieldAsInteger64(feature, i))
else:
if fieldkey in self.property_getter_cache:
getter = self.property_getter_cache[fieldkey]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we try to reuse an existing field getter.

string_list_index += 1
else:
props[key] = None
props[key] = getter.get(feature, i, {"encoding": encoding})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that old crap replaced by this 😄

value = json.dumps(value)

setter.set(cogr_feature, i, value, {"encoding": encoding})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm immoderately pleased about this.


if cogr_layer == NULL:
raise ValueError("Null layer")

cdef OGRFeatureBuilder feat_builder = OGRFeatureBuilder(driver=collection.driver)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an optimization that I've overlooked for a while: reusing the feature builder.

(OFTBinary, OFSTNone, "bytearray"): BinaryField,
(OFTBinary, OFSTNone, "memoryview"): BinaryField,
(OFTStringList, OFSTNone, "list"): StringListField,
(OFTString, OFSTJSON, "dict"): JSONField,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatching to a field setter based on field type and value type restores the implicit behavior in fiona version 1.9.x.

@sgillies sgillies changed the title Fiona <-> OGR field mappings take 2 Fiona <-> OGR field mappings take 2 (refactor) Apr 1, 2024
@sgillies sgillies marked this pull request as ready for review April 1, 2024 15:29
@sgillies
Copy link
Member Author

sgillies commented Apr 1, 2024

@mwtoews @snorfalorpagus @perrygeo I'm tagging you for review of this refactor PR. Meanwhile I'm going to work on benchmarking.

@sgillies
Copy link
Member Author

sgillies commented Apr 3, 2024

Last call for review! It's a lot to digest, I know, and if you're primarily using fiona via geopandas, maybe not interesting. Still, I appreciate a skim if you can.

@sgillies sgillies merged commit 6581da6 into main Apr 3, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant