-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UTILs] Adding top-level function for distributed merging of profiles #552
[UTILs] Adding top-level function for distributed merging of profiles #552
Conversation
…m/taylorfturner/DataProfiler into feature/distributed_profile_merge
…m/taylorfturner/DataProfiler into feature/distributed_profile_merge
self.reverse_label_mapping = self.data_labeler.reverse_label_mapping | ||
num_labels = self.data_labeler.model.num_labels | ||
|
||
# remove PAD from output (reserved zero index) | ||
if self.data_labeler.model.requires_zero_mapping: | ||
self.reverse_label_mapping.pop(0, None) | ||
num_labels -= 1 | ||
|
||
self._possible_data_labels = list(self.reverse_label_mapping.values()) | ||
self._possible_data_labels = [ # sort the data_labels based on index | ||
x | ||
for _, x in sorted( | ||
zip(self.reverse_label_mapping.keys(), self._possible_data_labels) | ||
) | ||
] | ||
self.rank_distribution = dict([(key, 0) for key in self._possible_data_labels]) | ||
self._sum_predictions = np.zeros(num_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving these to _update_predictions
so they are not created on __init__
@@ -865,7 +866,7 @@ def update_profile(self, data, sample_size=None, min_true_samples=None): | |||
if file_type is not None: | |||
self.file_type = file_type | |||
|
|||
def _remove_data_labelers(self): | |||
def _remove_data_labelers(self, replacement_type=BaseDataLabeler()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving this instantiation to the actual function definitions BaseDataLabeler
@@ -21,6 +21,7 @@ | |||
|
|||
from .. import data_readers, dp_logging | |||
from ..labelers.data_labelers import DataLabeler | |||
from ..labelers.base_data_labeler import BaseDataLabeler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding this import so the code can set the replacement_type
for when _restore_data_labelers
is called on a profile object
list
--> into a singleProfile
that is the merging of all the profilesStructuredProfiler
merge operation