Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove per-feature logging to improve performance issue#685 #718

Merged
merged 3 commits into from Feb 18, 2019

Conversation

snowman2
Copy link
Contributor

@snowman2 snowman2 commented Feb 3, 2019

I saw the issue #685, and so I took a stab at it based on the discussion to see if I could help move things forward.

I commented out the debug print statements instead of deleting them so that you don't have to re-create them when you are debugging (and an official decision on what to do with them has not been reached as far as I know). However, that presents the danger of being re-introduced by accident when committing changes after debugging.

Using this example:

from itertools import chain, repeat
import fiona
def test_to_file():
    with fiona.Env(),fiona.open("../tests/data/coutwildrnp.shp") as collection:
        features = chain.from_iterable(repeat(list(collection), 2000))
        with fiona.open("/tmp/out.gpkg", "w", schema=collection.schema, crs=collection.crs, driver="GPKG") as dst:
            dst.writerecords(features)

And timing it:

%%timeit
test_to_file()

Before:
1min 3s ± 2.32 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
After:
15.3 s ± 1.13 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

8 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling 1a80b73 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@coveralls
Copy link

coveralls commented Feb 3, 2019

Coverage Status

Coverage increased (+0.003%) to 82.629% when pulling ee57081 on snowman2:issue_685 into cd5da17 on Toblerity:master.

@snowman2
Copy link
Contributor Author

snowman2 commented Feb 3, 2019

Looks like the GDAL trunk is using PROJ.6 now: configure: error: PROJ 6 symbols not found

@sgillies sgillies self-assigned this Feb 8, 2019
@sgillies
Copy link
Member

sgillies commented Feb 8, 2019

@snowman2 wow, great! I think I'll be able to have some review time tomorrow.

@snowman2
Copy link
Contributor Author

snowman2 commented Feb 8, 2019

Another idea I had recently was to change:

for key, value in feature['properties'].items():

To instead iterate over:

collection.schema['properties']

And set the property to use:

set_field_null(cogr_feature, i)

If the property does not exist in feature['properties'].

This would eliminate the need for this check:

if set(record['properties'].keys()) != schema_props_keys:

But, I wanted to run the idea by you first.

@@ -283,7 +283,7 @@ cdef class OGRGeomBuilder:
coordinates = geometry.get('geometries')
return self._buildGeometryCollection(coordinates)
else:
raise ValueError("Unsupported geometry type %s" % typename)
raise UnsupportedGeometryTypeError("Unsupported geometry type %s" % typename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

log.warning("Failed to encode %s using %s codec", key, encoding)
key_bytes = ogr_key
# log.debug("Normalizing schema type for key %r in schema %r to %r", key, collection.schema['properties'], schema_type)
key_bytes = strencode(ogr_key, encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

fiona/ogrext.pyx Outdated Show resolved Hide resolved
Copy link
Member

@sgillies sgillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a suggestion inline

@sgillies
Copy link
Member

sgillies commented Feb 8, 2019

@snowman2 what would you think about holding off on the schema/property checks for now? I think it's separable from elimination of the debug statements, which is a big win by itself.

I'm inclined to remove the debug statements that you've commented. I put them in when building the original feature and I don't think they are necessarily needed anymore.

@snowman2
Copy link
Contributor Author

snowman2 commented Feb 9, 2019

I'm inclined to remove the debug statements that you've commented.

I removed those in a separate commit.

what would you think about holding off on the schema/property checks for now?

I checked the difference between checking the properties and the alternative method and I was surprised that it didn't make a significant difference. So, I agree that it is an unnecessary change. It makes me wonder if it would be a good idea to do a check and remove all unnecessary log.debug/log.info statements in rasterio & fiona for performance reasons?

@sgillies
Copy link
Member

@snorfalorpagus I tagged you for review of this, too.

@sgillies sgillies changed the title comment out logging and move validation to improve performance issue#685 remove per-feature logging to improve performance issue#685 Feb 15, 2019
@sgillies
Copy link
Member

I edited the PR title to reflect our decision to take a one thing at a time approach to optimization.

Copy link
Member

@snorfalorpagus snorfalorpagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@sgillies sgillies merged commit eef8b9f into Toblerity:master Feb 18, 2019
@sgillies sgillies added this to the 1.8.5 milestone Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants