add similarity_search.py in machine_learning #3864

SteveKimSR · 2020-11-05T09:53:48Z

adding similarity_search algorithm in machine_learning

Describe your change:

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Documentation change?

Checklist:

adding similarity_search algorithm in machine_learning

cclauss · 2020-11-05T14:21:16Z

Please format your code with psf/black as discussed in CONTRIBUTING.md.

mrmaxguns · 2020-11-05T14:43:19Z

Isort (import sorting) failed. Make sure to install isort pip install isort and run it.
Codespell failed. Change datas ==> data

machine_learning/similarity_search.py

mrmaxguns · 2020-11-05T15:26:35Z

machine_learning/similarity_search.py

+    return None
+
+
+def similarity_search(dataset: np, value: np) -> list:


These type hints seem off. Do the arguments dataset and value require the numpy module type? This should probably changed to:

def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: ...

isort, codespell changed. applied feedback(np -> np.ndarray)

add type hints to euclidean method

machine_learning/similarity_search.py

cclauss · 2020-11-06T07:22:33Z

machine_learning/similarity_search.py

+                "Wrong input data's shape... dataset : ",
+                dataset.shape[1],
+                ", value : ",
+                value.shape[1],


machine_learning/similarity_search.py

cclauss · 2020-11-06T07:23:51Z

machine_learning/similarity_search.py

+            "Input data have different datatype... dataset : ",
+            dataset.dtype,
+            ", value : ",
+            value.dtype,


machine_learning/similarity_search.py

- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation

cclauss

Use iteration of keys, values, and items more and use indexes less.

cclauss · 2020-11-11T07:32:23Z

machine_learning/similarity_search.py

+    dist = 0
+
+    try:
+        for index, v in enumerate(input_a):


We should be zip() ing these lists together.

cclauss · 2020-11-11T08:10:32Z

machine_learning/similarity_search.py

+        raise TypeError("Euclidean's input types are not right ...")
+
+
+def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:


Suggested change

def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:

def similarity_search(dataset: np.ndarray, value_array: np.ndarray) -> list:

This is not a single value but an array of values.

cclauss · 2020-11-11T08:14:54Z

machine_learning/similarity_search.py

+import numpy as np
+
+
+def euclidean(input_a: np.ndarray, input_b: np.ndarray):


Suggested change

def euclidean(input_a: np.ndarray, input_b: np.ndarray):

def euclidean(input_a: np.ndarray, input_b: np.ndarray) -> float:

machine_learning/similarity_search.py

cclauss · 2020-11-11T08:20:51Z

machine_learning/similarity_search.py

+    >>> a = np.array([[0], [1], [2]])
+    >>> b = np.array([[0]])
+    >>> similarity_search(a, b)


Suggested change

>>> a = np.array([[0], [1], [2]])

>>> b = np.array([[0]])

>>> similarity_search(a, b)

>>> dataset = np.array([[0], [1], [2]])

>>> value_array = np.array([[0]])

>>> similarity_search(dataset, value_array)

Repeat for other these below...

Please add tests that raise errors.

cclauss · 2020-11-11T08:26:43Z

machine_learning/similarity_search.py

+    for index, v in enumerate(value):
+        dist = euclidean(value[index], dataset[0])
+        vector = dataset[0].tolist()
+
+        for index2 in range(1, len(dataset)):
+            temp_dist = euclidean(value[index], dataset[index2])
+
+            if dist > temp_dist:
+                dist = temp_dist
+                vector = dataset[index2].tolist()


Suggested change

for index, v in enumerate(value):

dist = euclidean(value[index], dataset[0])

vector = dataset[0].tolist()

for index2 in range(1, len(dataset)):

temp_dist = euclidean(value[index], dataset[index2])

if dist > temp_dist:

dist = temp_dist

vector = dataset[index2].tolist()

for value in value_array.values():

dist = euclidean(value, dataset[0])

vector = dataset[0].tolist()

for dataset_value in dataset[1:].values():

temp_dist = euclidean(value, dataset_value)

if dist > temp_dist:

dist = temp_dist

vector = dataset_value.tolist()

cclauss · 2020-11-13T06:22:54Z

Please add some tests that raise errors like https://github.com/TheAlgorithms/Python/blob/master/arithmetic_analysis/bisection.py does and then I think we are ready to merge this one.

- deleted try/catch in euclidean - added error tests - name change(value -> value_array)

machine_learning/similarity_search.py

SteveKimSR · 2020-11-13T14:04:51Z

@cclauss When adding error examples, one of the examples couldn't pass flake8. Is there any ways to avoid this?
(line 91, TypeError: Input data have different datatype... dataset : float32, value_array : int32)
Or should i change error outputs??

cclauss

Nice!!!

* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>

add similarity_search.py in machine_learning

40a6503

adding similarity_search algorithm in machine_learning

mrmaxguns suggested changes Nov 5, 2020

View reviewed changes

SteveKimSR added 2 commits November 6, 2020 11:05

fix pre-commit test, apply feedback

09caa3b

isort, codespell changed. applied feedback(np -> np.ndarray)

apply feedback

7ce2cce

add type hints to euclidean method

cclauss requested changes Nov 6, 2020

View reviewed changes

apply feedback

f38fb3e

- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation

SteveKimSR requested a review from cclauss November 11, 2020 06:57

cclauss requested changes Nov 11, 2020

View reviewed changes

apply feedback

ebfe05a

- deleted try/catch in euclidean - added error tests - name change(value -> value_array)

cclauss reviewed Nov 13, 2020

View reviewed changes

machine_learning/similarity_search.py Outdated Show resolved Hide resolved

cclauss approved these changes Nov 13, 2020

View reviewed changes

cclauss added 3 commits November 13, 2020 15:17

# doctest: +NORMALIZE_WHITESPACE

2d2c6b8

Update machine_learning/similarity_search.py

9637de7

placate flake8

6f7c9ce

cclauss merged commit ae4d7d4 into TheAlgorithms:master Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add similarity_search.py in machine_learning #3864

add similarity_search.py in machine_learning #3864

SteveKimSR commented Nov 5, 2020 •

edited

Loading

cclauss commented Nov 5, 2020

mrmaxguns commented Nov 5, 2020

mrmaxguns Nov 5, 2020

cclauss Nov 6, 2020

SteveKimSR Nov 7, 2020

cclauss Nov 6, 2020

cclauss left a comment

cclauss Nov 11, 2020

cclauss Nov 11, 2020

SteveKimSR Nov 12, 2020

cclauss Nov 11, 2020

cclauss Nov 11, 2020

cclauss Nov 11, 2020

cclauss commented Nov 13, 2020

SteveKimSR commented Nov 13, 2020

cclauss left a comment

		return None


		def similarity_search(dataset: np, value: np) -> list:

		raise TypeError("Euclidean's input types are not right ...")


		def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:

		import numpy as np


		def euclidean(input_a: np.ndarray, input_b: np.ndarray):

add similarity_search.py in machine_learning #3864

add similarity_search.py in machine_learning #3864

Conversation

SteveKimSR commented Nov 5, 2020 • edited Loading

Describe your change:

Checklist:

cclauss commented Nov 5, 2020

mrmaxguns commented Nov 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cclauss left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cclauss commented Nov 13, 2020

SteveKimSR commented Nov 13, 2020

cclauss left a comment

Choose a reason for hiding this comment

SteveKimSR commented Nov 5, 2020 •

edited

Loading