-
Notifications
You must be signed in to change notification settings - Fork 151
Knn dev 1129 #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knn dev 1129 #184
Conversation
|
Refer to this link for build results (access rights to CI server needed): |
orhankislal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit title and message should be descriptive of the changes, not just a JIRA ID. Please check the contribution guidelines for an example:
https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines
| " 'r' for regression OR 'c' for classification.". | ||
| format(operation)) | ||
| test_id, output_table, k, output_neighbors , **kwargs): | ||
| # if not operation or operation not in ['c', 'r']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove these commented lines from the final code.
| test_source, test_column_name, id_column_name, output_table, | ||
| operation, k): | ||
| def knn(schema_madlib, point_source, point_column_name,point_id, label_column_name, | ||
| test_source, test_column_name, test_id, output_table, k,output_neighbors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after , is needed. k, output_neighbors
|
|
||
| drop table if exists madlib_knn_result_classification; | ||
| select knn('knn_train_data','data','id','label','knn_test_data','data','id','madlib_knn_result_classification',3,True); | ||
| select assert(array_agg(k_nearest_neighbours order by id)='{ {1,2,3},{5,4,3},{7,6,5},{4,5,3},{9,6,7},{6,7,8} }', 'Wrong output in classification with k=3') from madlib_knn_result_classification; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IC fails on greenplum 4.3 with the following error:
select assert(array_agg(k_nearest_neighbours order by id)='{ {1,2,3},{5,4,3},{7,6,5},{4,5,3},{9,6,7},{6,7,8} }', 'Wrong output in classification with k=3') from madlib_knn_result_classification;
psql:/tmp/madlib-logs-GREENPLUM_4_3ORCA/madlib.834EYG/knn/test/knn.sql_in.tmp:159: ERROR: could not find array type for data type integer[]
LINE 1: select assert(array_agg(k_nearest_neighbours order by id)='{...
…ed code in knn.py_in
|
Refer to this link for build results (access rights to CI server needed): |
|
Jenkins ok to test |
|
Refer to this link for build results (access rights to CI server needed): |
|
|
||
| drop table if exists madlib_knn_result_classification; | ||
| select knn('knn_train_data','data','id','label','knn_test_data','data','id','madlib_knn_result_classification',3,True); | ||
| select assert((k_nearest_neighbours )='{1,2,3}', 'Wrong output in classification with k=3') from madlib_knn_result_classification where id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion might fail on gpdb since the order of the nearest neighbors might change. I would suggest using a query like select unnest(k_nearest_neighbours) = ANY('{1,2,3}'::int[]) from madlib_knn_result_classification where id = 1;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orhankislal
This query is giving the following error:
select unnest(k_nearest_neighbours) = ANY('{1,2}'::int[]) from madlib_knn_result_reg;
ERROR: op ANY/ALL (array) does not support set arguments
So as a result, test case for KNN is failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! You can try the following to make sure the array order is fixed: select array_agg(x) from (select unnest(k_nearest_neighbours) as x from madlib_knn_result_classification where id = 1 order by x asc) y;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is working. Modifying the assert to this:
select assert(array_agg(x)= '{1,2,3}','Wrong output in classification with k=3')
from (select unnest(k_nearest_neighbours) as x
from madlib_knn_result_classification where id = 1 order by x asc) y;
|
Refer to this link for build results (access rights to CI server needed): |
|
Works on GPDB. LGTM. |
JIRA: MADLIB-1129 Closes apache#184
KNN Changes for Jira: 1129