-
-
Notifications
You must be signed in to change notification settings - Fork 44.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add similarity_search.py in machine_learning #3864
Conversation
adding similarity_search algorithm in machine_learning
Please format your code with psf/black as discussed in CONTRIBUTING.md. |
|
return None | ||
|
||
|
||
def similarity_search(dataset: np, value: np) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type hints seem off. Do the arguments dataset
and value
require the numpy module type? This should probably changed to:
def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list:
...
isort, codespell changed. applied feedback(np -> np.ndarray)
add type hints to euclidean method
"Wrong input data's shape... dataset : ", | ||
dataset.shape[1], | ||
", value : ", | ||
value.shape[1], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
"Input data have different datatype... dataset : ", | ||
dataset.dtype, | ||
", value : ", | ||
value.dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string
- changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use iteration of keys, values, and items more and use indexes less.
dist = 0 | ||
|
||
try: | ||
for index, v in enumerate(input_a): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be zip()
ing these lists together.
raise TypeError("Euclidean's input types are not right ...") | ||
|
||
|
||
def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def similarity_search(dataset: np.ndarray, value: np.ndarray) -> list: | |
def similarity_search(dataset: np.ndarray, value_array: np.ndarray) -> list: |
This is not a single value but an array of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed!
import numpy as np | ||
|
||
|
||
def euclidean(input_a: np.ndarray, input_b: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def euclidean(input_a: np.ndarray, input_b: np.ndarray): | |
def euclidean(input_a: np.ndarray, input_b: np.ndarray) -> float: |
>>> a = np.array([[0], [1], [2]]) | ||
>>> b = np.array([[0]]) | ||
>>> similarity_search(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> a = np.array([[0], [1], [2]]) | |
>>> b = np.array([[0]]) | |
>>> similarity_search(a, b) | |
>>> dataset = np.array([[0], [1], [2]]) | |
>>> value_array = np.array([[0]]) | |
>>> similarity_search(dataset, value_array) |
Repeat for other these below...
Please add tests that raise errors.
for index, v in enumerate(value): | ||
dist = euclidean(value[index], dataset[0]) | ||
vector = dataset[0].tolist() | ||
|
||
for index2 in range(1, len(dataset)): | ||
temp_dist = euclidean(value[index], dataset[index2]) | ||
|
||
if dist > temp_dist: | ||
dist = temp_dist | ||
vector = dataset[index2].tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for index, v in enumerate(value): | |
dist = euclidean(value[index], dataset[0]) | |
vector = dataset[0].tolist() | |
for index2 in range(1, len(dataset)): | |
temp_dist = euclidean(value[index], dataset[index2]) | |
if dist > temp_dist: | |
dist = temp_dist | |
vector = dataset[index2].tolist() | |
for value in value_array.values(): | |
dist = euclidean(value, dataset[0]) | |
vector = dataset[0].tolist() | |
for dataset_value in dataset[1:].values(): | |
temp_dist = euclidean(value, dataset_value) | |
if dist > temp_dist: | |
dist = temp_dist | |
vector = dataset_value.tolist() |
Please add some tests that raise errors like https://github.com/TheAlgorithms/Python/blob/master/arithmetic_analysis/bisection.py does and then I think we are ready to merge this one. |
- deleted try/catch in euclidean - added error tests - name change(value -> value_array)
@cclauss When adding error examples, one of the examples couldn't pass flake8. Is there any ways to avoid this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!!
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
* add similarity_search.py in machine_learning adding similarity_search algorithm in machine_learning * fix pre-commit test, apply feedback isort, codespell changed. applied feedback(np -> np.ndarray) * apply feedback add type hints to euclidean method * apply feedback - changed euclidean's type hints - changed few TypeError to ValueError - changed range(len()) to enumerate() - changed error's strings to f-string - implemented without type() - add euclidean's explanation * apply feedback - deleted try/catch in euclidean - added error tests - name change(value -> value_array) * # doctest: +NORMALIZE_WHITESPACE * Update machine_learning/similarity_search.py * placate flake8 Co-authored-by: Christian Clauss <cclauss@me.com>
adding similarity_search algorithm in machine_learning
Describe your change:
Checklist:
Fixes: #{$ISSUE_NO}
.