Skip to content

Distance analyzer for detecting feature drift with PyDeequ #164

Description

@thuber

Is your feature request related to a problem? Please describe.
It looks like in PyDeequ 1.1.1 the Deequ distance analyzer is not available. It means that this type of analyzer cannot be run via PyDeequ.

Describe the solution you'd like
I'd like to run the distance analysis on my data, so that I can detect feature drift via the 2 methods currently available in Deequ (L-infinity and chi-squared). I'd like to be able to do this in the same way as with the other analyzers.

Are there any plans to add this analyzer in the future?

Describe alternatives you've considered
I've read the documentation, but couldn't find anything related to the distance analyzer.

As a hacky workaround, I experimented with invoking the numericalDistance() method directly with Py4J, but found that instantiating a Scala object of QuantileNonSample[Double] doesn't seem to be a straightforward thing to do in Python.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions