Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature delete mapping #458

Merged
merged 13 commits into from
Nov 4, 2019
Merged

Feature delete mapping #458

merged 13 commits into from
Nov 4, 2019

Conversation

gusmith
Copy link
Contributor

@gusmith gusmith commented Nov 1, 2019

The goal of this PR is to remove the mapping output type from the entity-service. The mapping output type is indeed redundant with the groups output type for 2 parties, however, the output representation is slightly different.

Few notes:

  • the documentation is horrible to update as there are a lot of things out of date.
  • the tutorial python notebooks are not manageable from here: we are mainly testing clkhash
  • most of the docs requires to have the entity-service already available (they point to testing.es.data61.xyz) which makes them impossible to keep up to date in the same PR as the one modifying the service

@gusmith gusmith requested a review from wilko77 November 1, 2019 04:09
def test_run_status_with_clks_2p(requests, mapping_project):
size = mapping_project['size']
ensure_run_progressing(requests, mapping_project, size)
def test_permutations_run_status_with_clks_2p(requests, permutations_project):
Copy link
Contributor Author

@gusmith gusmith Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not have understood the reasons of these tests: it was done only on mapping but not on permutations and similarity scores... so I removed the mapping but add the two others.

@@ -263,7 +263,7 @@ def wait_approx_run_time(size, assumed_rate=1_000_000):
"""
number_comparisons = sum(
size0 * size1 for size0, size1 in itertools.combinations(size, 2))
time.sleep(5)
time.sleep(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relates to issue #457

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1 sec? Given that there is another sleep right after it, we might just remove this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wasn't sure why it has been introduced, I reduced it to 1 second and opened an issue. I'll check what happens without this wait (f7601d1), if the tests fail, I'll revert it, otherwise keep it.

docs/future.rst Show resolved Hide resolved
Copy link
Collaborator

@wilko77 wilko77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that must have been fun...

some minor comments inline.

@@ -113,8 +94,10 @@ def permute_mapping_data(project_id, run_id, len_filters1, len_filters2, parent_
random.shuffle(remaining_new_indexes)
log.info("Assigning random indexes for {} matched entities".format(number_in_common))

for mapping_number, a_index in enumerate(mapping):
b_index = mapping[a_index]
for mapping_number, group in enumerate(groups):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename mapping_number. Maybe group_index or group_number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely spotted. 👍

* `"permutations"` - Creates random permutations and a mask.
* `"similarity_scores"` - Outputs a list of similarity scores of `[indexA, indexB, score]`, where `score`
represents the likelihood that `indexA = indexB`.
* `"groups"` - Outputs a list of groups of records, where each group represents one entity.

Only `"groups"` supports multi-party linkage. `"mapping"`, `"permutations"`, and `"similarity_scores"` only support linkage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 43 and 71 mentioning four different output types. -> change to three.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -263,7 +263,7 @@ def wait_approx_run_time(size, assumed_rate=1_000_000):
"""
number_comparisons = sum(
size0 * size1 for size0, size1 in itertools.combinations(size, 2))
time.sleep(5)
time.sleep(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 1 sec? Given that there is another sleep right after it, we might just remove this one?

@gusmith gusmith merged commit 0974091 into develop Nov 4, 2019
@gusmith gusmith deleted the feature-delete-mapping branch November 4, 2019 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants