Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add similarity_search.py in machine_learning #3864

Merged
merged 8 commits into from
Nov 13, 2020

Conversation

SteveKimSR
Copy link
Contributor

@SteveKimSR SteveKimSR commented Nov 5, 2020

adding similarity_search algorithm in machine_learning

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms have a URL in its comments that points to Wikipedia or other similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

adding similarity_search algorithm in machine_learning
@cclauss
Copy link
Member

cclauss commented Nov 5, 2020

Please format your code with psf/black as discussed in CONTRIBUTING.md.

@mrmaxguns
Copy link
Member

  1. Isort (import sorting) failed. Make sure to install isort pip install isort and run it.
  2. Codespell failed. Change datas ==> data

machine_learning/similarity_search.py Outdated Show resolved Hide resolved
return None


def similarity_search(dataset: np, value: np) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type hints seem off. Do the arguments dataset and value require the numpy module type? This should probably changed to:

def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:
    ...

isort, codespell changed.
applied feedback(np -> np.ndarray)
add type hints to euclidean method
machine_learning/similarity_search.py Outdated Show resolved Hide resolved
machine_learning/similarity_search.py Outdated Show resolved Hide resolved
machine_learning/similarity_search.py Outdated Show resolved Hide resolved
machine_learning/similarity_search.py Outdated Show resolved Hide resolved
Comment on lines 75 to 78
"Wrong input data's shape... dataset : ",
dataset.shape[1],
", value : ",
value.shape[1],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

machine_learning/similarity_search.py Outdated Show resolved Hide resolved
Comment on lines 86 to 89
"Input data have different datatype... dataset : ",
dataset.dtype,
", value : ",
value.dtype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-string

machine_learning/similarity_search.py Outdated Show resolved Hide resolved
machine_learning/similarity_search.py Outdated Show resolved Hide resolved
- changed euclidean's type hints
- changed few TypeError to ValueError
- changed range(len()) to enumerate()
- changed error's strings to f-string
- implemented without type()
- add euclidean's explanation
Copy link
Member

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use iteration of keys, values, and items more and use indexes less.

dist = 0

try:
for index, v in enumerate(input_a):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be zip() ing these lists together.

raise TypeError("Euclidean's input types are not right ...")


def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:
def similarity_search(dataset: np.ndarray, value_array: np.ndarray) -> list:

This is not a single value but an array of values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed!

import numpy as np


def euclidean(input_a: np.ndarray, input_b: np.ndarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def euclidean(input_a: np.ndarray, input_b: np.ndarray):
def euclidean(input_a: np.ndarray, input_b: np.ndarray) -> float:

machine_learning/similarity_search.py Show resolved Hide resolved
Comment on lines 49 to 51
>>> a = np.array([[0], [1], [2]])
>>> b = np.array([[0]])
>>> similarity_search(a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> a = np.array([[0], [1], [2]])
>>> b = np.array([[0]])
>>> similarity_search(a, b)
>>> dataset = np.array([[0], [1], [2]])
>>> value_array = np.array([[0]])
>>> similarity_search(dataset, value_array)

Repeat for other these below...

Please add tests that raise errors.

Comment on lines 94 to 103
for index, v in enumerate(value):
dist = euclidean(value[index], dataset[0])
vector = dataset[0].tolist()

for index2 in range(1, len(dataset)):
temp_dist = euclidean(value[index], dataset[index2])

if dist > temp_dist:
dist = temp_dist
vector = dataset[index2].tolist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for index, v in enumerate(value):
dist = euclidean(value[index], dataset[0])
vector = dataset[0].tolist()
for index2 in range(1, len(dataset)):
temp_dist = euclidean(value[index], dataset[index2])
if dist > temp_dist:
dist = temp_dist
vector = dataset[index2].tolist()
for value in value_array.values():
dist = euclidean(value, dataset[0])
vector = dataset[0].tolist()
for dataset_value in dataset[1:].values():
temp_dist = euclidean(value, dataset_value)
if dist > temp_dist:
dist = temp_dist
vector = dataset_value.tolist()

@cclauss
Copy link
Member

cclauss commented Nov 13, 2020

Please add some tests that raise errors like https://github.com/TheAlgorithms/Python/blob/master/arithmetic_analysis/bisection.py does and then I think we are ready to merge this one.

- deleted try/catch in euclidean
- added error tests
- name change(value -> value_array)
@SteveKimSR
Copy link
Contributor Author

@cclauss When adding error examples, one of the examples couldn't pass flake8. Is there any ways to avoid this?
(line 91, TypeError: Input data have different datatype... dataset : float32, value_array : int32)
Or should i change error outputs??

Copy link
Member

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!!

@cclauss cclauss merged commit ae4d7d4 into TheAlgorithms:master Nov 13, 2020
stokhos pushed a commit to stokhos/Python that referenced this pull request Jan 3, 2021
* add similarity_search.py in machine_learning
adding similarity_search algorithm in machine_learning

* fix pre-commit test, apply feedback

isort, codespell changed.
applied feedback(np -> np.ndarray)

* apply feedback

add type hints to euclidean method

* apply feedback

- changed euclidean's type hints
- changed few TypeError to ValueError
- changed range(len()) to enumerate()
- changed error's strings to f-string
- implemented without type()
- add euclidean's explanation

* apply feedback

- deleted try/catch in euclidean
- added error tests
- name change(value -> value_array)

* # doctest: +NORMALIZE_WHITESPACE

* Update machine_learning/similarity_search.py

* placate flake8

Co-authored-by: Christian Clauss <cclauss@me.com>
peRFectBeliever pushed a commit to peRFectBeliever/Python that referenced this pull request Apr 1, 2021
* add similarity_search.py in machine_learning
adding similarity_search algorithm in machine_learning

* fix pre-commit test, apply feedback

isort, codespell changed.
applied feedback(np -> np.ndarray)

* apply feedback

add type hints to euclidean method

* apply feedback

- changed euclidean's type hints
- changed few TypeError to ValueError
- changed range(len()) to enumerate()
- changed error's strings to f-string
- implemented without type()
- add euclidean's explanation

* apply feedback

- deleted try/catch in euclidean
- added error tests
- name change(value -> value_array)

* # doctest: +NORMALIZE_WHITESPACE

* Update machine_learning/similarity_search.py

* placate flake8

Co-authored-by: Christian Clauss <cclauss@me.com>
Panquesito7 pushed a commit to Panquesito7/Python that referenced this pull request May 13, 2021
* add similarity_search.py in machine_learning
adding similarity_search algorithm in machine_learning

* fix pre-commit test, apply feedback

isort, codespell changed.
applied feedback(np -> np.ndarray)

* apply feedback

add type hints to euclidean method

* apply feedback

- changed euclidean's type hints
- changed few TypeError to ValueError
- changed range(len()) to enumerate()
- changed error's strings to f-string
- implemented without type()
- add euclidean's explanation

* apply feedback

- deleted try/catch in euclidean
- added error tests
- name change(value -> value_array)

* # doctest: +NORMALIZE_WHITESPACE

* Update machine_learning/similarity_search.py

* placate flake8

Co-authored-by: Christian Clauss <cclauss@me.com>
shermanhui pushed a commit to shermanhui/Python that referenced this pull request Oct 22, 2021
* add similarity_search.py in machine_learning
adding similarity_search algorithm in machine_learning

* fix pre-commit test, apply feedback

isort, codespell changed.
applied feedback(np -> np.ndarray)

* apply feedback

add type hints to euclidean method

* apply feedback

- changed euclidean's type hints
- changed few TypeError to ValueError
- changed range(len()) to enumerate()
- changed error's strings to f-string
- implemented without type()
- add euclidean's explanation

* apply feedback

- deleted try/catch in euclidean
- added error tests
- name change(value -> value_array)

* # doctest: +NORMALIZE_WHITESPACE

* Update machine_learning/similarity_search.py

* placate flake8

Co-authored-by: Christian Clauss <cclauss@me.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants