Skip to content
This repository has been archived by the owner on Apr 11, 2024. It is now read-only.

Commit

Permalink
Change preprocessing of DeepImage
Browse files Browse the repository at this point in the history
There are some duplicate vectors, hence we now keep only one copy of
each in the new dataset
  • Loading branch information
Cecca committed May 26, 2023
1 parent 954c8d9 commit f064bdd
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions join-experiments/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -1031,6 +1031,22 @@ def deep_image(out_fn):
fv = np.fromfile(filename, dtype=np.float32)
dim = fv.view(np.int32)[0]
fv = fv.reshape(-1, dim + 1)[:, 1:]

# Normalize
print("Normalizing")
fv = sklearn.preprocessing.normalize(fv, axis=1, norm='l2')
# Now remove duplicates, there are 4867 of them
# that make the top-k global case trivial
#
# First sort lexicographically the rows
print("Sorting columns")
perm = np.lexsort(fv.T[::-1])
fv = fv[perm]
# Then keep only the distinct rows
print("Removing duplicates")
fv = np.unique(fv, axis=0)

print("Writing output")
write_dense(out_fn, fv)
return out_fn

Expand Down

0 comments on commit f064bdd

Please sign in to comment.