Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for records() speedup #67

Closed
wants to merge 4 commits into from
Closed

Fixes for records() speedup #67

wants to merge 4 commits into from

Conversation

karimbahgat
Copy link
Collaborator

This fixes the issues and doctest-failures introduced in the records() speedup #62, see issue #66.

Problem was I thought all row values were unpacked with the correct format, but instead they are unpacked as strings and have to be manually parsed into their appropriate types. Also forgot to account for the DeleteFlag field when grouping the flat rowlist into rows.

In order to fix these issues, this PR moves the row value type parsing into a separate method and calling it where necessary.

karimbahgat and others added 4 commits August 24, 2016 22:16
Previous records() speedup forgot that row values had to be parsed from
string into their appropriate field types. So moved the record parsing
into a separate method and call it where needed.

Also forgot that deleteflag is a field when grouping the flat rowlist
into rows. Changed to correct length.

All doctests pass again after these fixes.
@karimbahgat
Copy link
Collaborator Author

Actually, it appears the 20x speedup was almost entirely from not having to parse the value types. After this fix, the batch "records()" method seems to not have any noticable speedup from the original. So not sure the changes to the internal API are worth it.

Closing this. I suggest just changing "records()" back to the pre-speedup merge.

@karimbahgat
Copy link
Collaborator Author

I.e, this should fix it back to normal:

    def records(self):
        """Returns all records in a dbf file."""
        if not self.numRecords:
            self.__dbfHeader()
        records = []
        f = self.__getFileObj(self.dbf)
        f.seek(self.__dbfHeaderLength())
        for i in range(self.numRecords):
            r = self.__record()
            if r:
                records.append(r)
        return records

@micahcochran
Copy link
Contributor

@GeospatialPython
How do you want to handle this? Does someone need to put a PR together to restore the records() function?

@karimbahgat
Copy link
Collaborator Author

I put together PR #68 that reverts records() to the original, which should do it. All tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants