Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make map conversions more efficient. #188

Closed
wants to merge 3 commits into from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Apr 13, 2022

Changes

  • Make convert_mapping_values work more efficiently (for a dict of length 1000, runtime drops from 471us to 294us on my machine)

Notes

Since the numpy dtypes are C functions, pushing the loop into C (by using map) will be far more efficient.

There was a type mismatch between the type of mapping and the return of the function. Updated the type of mapping to be MutableMapping.

@hellkite500
Copy link
Member

@groutr just a quick glance through the test logs:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

mapping = true_positive     1
false_positive    2
false_negative    3
true_negative     4
dtype: object
converter = <class 'numpy.float64'>

    def convert_mapping_values(
        mapping: MutableMapping[str, npt.DTypeLike],
        converter: np.dtype = np.float64
        ) -> MutableMapping:
        """Convert mapping values to a consistent type. Primarily used to validate
        contingency tables.
    
        Parameters
        ----------
        mapping: dict-like, required
            Input mapping with string keys and values that can be coerced into a
            numpy data type.
        converter: numpy.dtype, optional, default numpy.float64
            Converter data type or function used to convert mapping values to a
            consistent type.
    
        Returns
        -------
        converted_mapping: dict-like, same type as mapping
            New mapping with converted values.
    
        """
        # Build object of same type of mapping
        d = type(mapping)()
>       d.update(zip(mapping.keys(), np.fromiter(mapping.values(), dtype=converter)))
E       TypeError: 'numpy.ndarray' object is not callable

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/hydrotools/metrics/metrics.py:262: TypeError

Seems like a similar failure causes a few of the tests to fail. Thoughts?

@jarq6c
Copy link
Collaborator

jarq6c commented Apr 13, 2022

The issue could be related to assumptions made in the original method. The original (slow) method was meant to work with both Python dict and pandas.Series. For dict values is a Callable. However, for pandas.Series values is a non-callable property.

An additional nuance is that pandas.Series.update takes an object that must be coercible into a pandas.Series. I'm unsure of what Series will do when feeding it zip directly like this.

@jarq6c
Copy link
Collaborator

jarq6c commented Apr 13, 2022

Actually come to think of it, it may be easier (and faster) to offload this task to pandas.

return pd.Series(mapping, dtype=converter)

@groutr
Copy link
Contributor Author

groutr commented Apr 13, 2022

Actually come to think of it, it may be easier (and faster) to offload this task to pandas.

return pd.Series(mapping, dtype=converter)

That was my initial thought, however it seemed like there was effort made to make this general to any mapping type. If a pandas series is assumed, this is going to be the best way.

I was adhering to the type information in the function signature. pd.Series is not a Mapping type (though it sort of mimics the python dictionary behavior)

isinstance(pd.Series(), Mapping)  # False

@jarq6c
Copy link
Collaborator

jarq6c commented Apr 14, 2022

Thus far, this method is only used internally to validate contingency tables. @groutr were you calling this method directly? If not, this might be a good argument for replacing this method with something more strict, but hopefully faster (that only returns pandas.Series).

@groutr
Copy link
Contributor Author

groutr commented Apr 18, 2022

@jarq6c I am not calling the method directly. Replacing it with something more strict seems like a good idea to me.

@jarq6c
Copy link
Collaborator

jarq6c commented May 24, 2022

Superseded by #191

@jarq6c jarq6c closed this May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants