Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph Profiler: save() and load() Functionality #628

Merged
merged 6 commits into from
Sep 14, 2022

Conversation

micdavis
Copy link
Contributor

  • Added functionality for save() and load() for the GraphProfiler
  • Wrote a test to cover this new functionality

with open(filepath, "rb") as infile:
data = pickle.load(infile)

profiler = cls(None, options=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just any class? should there not be some logic around loading the correct class for the profiler?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls is the class it is loading within, this is same logic in BaseProfiler which will allow us to merge all 3 profilers in the long term.


import networkx as nx
import numpy as np
import pandas as pd
import pickle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pickle is built-in so I think should be within the top imports

"_continuous_distribution": self._continuous_distribution,
"_categorical_distribution": self._categorical_distribution,
"metadata": self.metadata,
"__calculations": self.__calculations,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need __calculations bc we aren't saving options, is that the reason?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see options used anywhere...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, we don't

Comment on lines 152 to 153
save_profile.profile["continuous_distribution"]["weight"].pop("properties")
load_profile.profile["continuous_distribution"]["weight"].pop("properties")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also assert these props are the same.
e.g. above:

save_continuous_distribution_props = profile.profile["continuous_distribution"]["weight"].pop("properties")

# Then can  (unsure if it is a dict or list or what format.
self.assertEqual(save_continuous_distribution_props, load_continuous_distribution_props)

@taylorfturner taylorfturner added the High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable label Sep 14, 2022
@@ -2,17 +2,17 @@
[Great Expectations](https://greatexpectations.io/) is an open source tool built around the idea that should always know what to expect from your data. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit formatting

@@ -183,4 +183,4 @@
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit formatting

@@ -291,4 +291,4 @@
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit formatting

@@ -300,4 +300,4 @@
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit formatting

@@ -300,4 +300,4 @@
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit formatting

@@ -23,7 +25,6 @@ def __init__(self, data, options=None):
Initialize Graph Profiler.

:param data: data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param data: data
:param data: data
:type name: String

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be: one of these two types:
data_readers.graph_data.GraphData or nx.Graph

taylorfturner
taylorfturner previously approved these changes Sep 14, 2022
@@ -23,7 +25,6 @@ def __init__(self, data, options=None):
Initialize Graph Profiler.

:param data: data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be: one of these two types:
data_readers.graph_data.GraphData or nx.Graph

@taylorfturner taylorfturner merged commit 9998919 into capitalone:main Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants