Skip to content

Conversation

@hpandeycodeit
Copy link
Member

KNN Changes for Jira: 1129

@asfgit
Copy link

asfgit commented Sep 8, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/227/

Copy link
Contributor

@orhankislal orhankislal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit title and message should be descriptive of the changes, not just a JIRA ID. Please check the contribution guidelines for an example:
https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines

" 'r' for regression OR 'c' for classification.".
format(operation))
test_id, output_table, k, output_neighbors , **kwargs):
# if not operation or operation not in ['c', 'r']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove these commented lines from the final code.

test_source, test_column_name, id_column_name, output_table,
operation, k):
def knn(schema_madlib, point_source, point_column_name,point_id, label_column_name,
test_source, test_column_name, test_id, output_table, k,output_neighbors):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after , is needed. k, output_neighbors


drop table if exists madlib_knn_result_classification;
select knn('knn_train_data','data','id','label','knn_test_data','data','id','madlib_knn_result_classification',3,True);
select assert(array_agg(k_nearest_neighbours order by id)='{ {1,2,3},{5,4,3},{7,6,5},{4,5,3},{9,6,7},{6,7,8} }', 'Wrong output in classification with k=3') from madlib_knn_result_classification;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC fails on greenplum 4.3 with the following error:

select assert(array_agg(k_nearest_neighbours order by id)='{ {1,2,3},{5,4,3},{7,6,5},{4,5,3},{9,6,7},{6,7,8} }', 'Wrong output in classification with k=3') from madlib_knn_result_classification;
psql:/tmp/madlib-logs-GREENPLUM_4_3ORCA/madlib.834EYG/knn/test/knn.sql_in.tmp:159: ERROR:  could not find array type for data type integer[]
LINE 1: select assert(array_agg(k_nearest_neighbours order by id)='{...

@asfgit
Copy link

asfgit commented Sep 15, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/240/

@edespino
Copy link
Contributor

Jenkins ok to test

@asfgit
Copy link

asfgit commented Sep 19, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/247/


drop table if exists madlib_knn_result_classification;
select knn('knn_train_data','data','id','label','knn_test_data','data','id','madlib_knn_result_classification',3,True);
select assert((k_nearest_neighbours )='{1,2,3}', 'Wrong output in classification with k=3') from madlib_knn_result_classification where id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion might fail on gpdb since the order of the nearest neighbors might change. I would suggest using a query like select unnest(k_nearest_neighbours) = ANY('{1,2,3}'::int[]) from madlib_knn_result_classification where id = 1;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orhankislal
This query is giving the following error:

select unnest(k_nearest_neighbours) = ANY('{1,2}'::int[]) from madlib_knn_result_reg;
ERROR:  op ANY/ALL (array) does not support set arguments

So as a result, test case for KNN is failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You can try the following to make sure the array order is fixed: select array_agg(x) from (select unnest(k_nearest_neighbours) as x from madlib_knn_result_classification where id = 1 order by x asc) y;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is working. Modifying the assert to this:

select assert(array_agg(x)= '{1,2,3}','Wrong output in classification with k=3') 
from (select unnest(k_nearest_neighbours) as x
 from madlib_knn_result_classification where id = 1 order by x asc) y;

@asfgit
Copy link

asfgit commented Oct 12, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/madlib-pr-build/259/

@orhankislal
Copy link
Contributor

Works on GPDB. LGTM.

orhankislal added a commit to orhankislal/madlib that referenced this pull request Oct 12, 2017
@asfgit asfgit closed this in 0a7efca Oct 12, 2017
@hpandeycodeit hpandeycodeit deleted the knn_dev_1129 branch May 2, 2018 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants