Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UTILs] Adding top-level function for distributed merging of profiles #552

Merged
merged 31 commits into from
Jul 22, 2022
Merged

[UTILs] Adding top-level function for distributed merging of profiles #552

merged 31 commits into from
Jul 22, 2022

Conversation

taylorfturner
Copy link
Contributor

@taylorfturner taylorfturner commented Jul 21, 2022

  • Adding functions in utils to merge together multiple profile objects provided in a list --> into a single Profile that is the merging of all the profiles
  • Updates to low-level code for StructuredProfiler merge operation
  • Unit Testing w/mocked data labeler to avoid TF operations that are expensive

@taylorfturner taylorfturner added Bug Something isn't working Work In Progress Solution is being developed High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable labels Jul 21, 2022
@taylorfturner taylorfturner self-assigned this Jul 21, 2022
Comment on lines -49 to -65
self.reverse_label_mapping = self.data_labeler.reverse_label_mapping
num_labels = self.data_labeler.model.num_labels

# remove PAD from output (reserved zero index)
if self.data_labeler.model.requires_zero_mapping:
self.reverse_label_mapping.pop(0, None)
num_labels -= 1

self._possible_data_labels = list(self.reverse_label_mapping.values())
self._possible_data_labels = [ # sort the data_labels based on index
x
for _, x in sorted(
zip(self.reverse_label_mapping.keys(), self._possible_data_labels)
)
]
self.rank_distribution = dict([(key, 0) for key in self._possible_data_labels])
self._sum_predictions = np.zeros(num_labels)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving these to _update_predictions so they are not created on __init__

@@ -865,7 +866,7 @@ def update_profile(self, data, sample_size=None, min_true_samples=None):
if file_type is not None:
self.file_type = file_type

def _remove_data_labelers(self):
def _remove_data_labelers(self, replacement_type=BaseDataLabeler()):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this instantiation to the actual function definitions BaseDataLabeler

@@ -21,6 +21,7 @@

from .. import data_readers, dp_logging
from ..labelers.data_labelers import DataLabeler
from ..labelers.base_data_labeler import BaseDataLabeler
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding this import so the code can set the replacement_type for when _restore_data_labelers is called on a profile object

@taylorfturner taylorfturner removed the Work In Progress Solution is being developed label Jul 22, 2022
@JGSweets JGSweets enabled auto-merge (squash) July 22, 2022 16:48
@JGSweets JGSweets merged commit e1bf2d2 into capitalone:main Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working High Priority Dramatic improvement, inaccurate calculation(s) or bug / feature making the library unusable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants