cs-project-ml

Several distance-based learning algorithms, including our study topic TransD.

Motivation

Pre-specified features often restrict performance of various algorithms. Distance-based features thus provide an alternative to perform learning, especially in the situation where similarity relation is easier to get or analyze, such as computer vision, bioinformatics natural language processing etc.

Goal

Transform data into a “neat” distribution, by pulling or pushing each pair of points.
Use simple distance-based algorithm to get the final prediction!

TransD

Semi-supervised: Train unlabeled data with labeled data.

for each round:
  determine label for unlabeled data
  for each pair of data:
    if(pass the conditions):
      adjust their distance
    if(data is neat enough): end

round: maximum of 20 rounds

conditions:

𝑐𝑖, 𝑐𝑗 are calculated in the Bayesian KNN.
If random 𝑟 >= 𝜉𝑖𝑗 , transform to new distance. else keep it.

neat enough:

Consensus of 1-nn and 1-mi algorithm.

adjust:

Bayesian KNN

We have k hypothesis : 1-NN, 2-NN, …, K-NN.

Linear Transform Approximation

Our model becomes a single linear transform matrix 𝑇!

Improve and Experiments

Other transformation approximation:

Use feature space extension method. Result for quadratic transformation:
Linear

Quadratic

Some significant good, some significant bad. We can treat different transformation as learning parameter, tune for a specific dataset.

Clustering Preprocessing:

Improving accuracy:
Result: We can increase accuracy in some dataset using clustering preprocessing, however, the overhead time isn’t worthy.
Compressing data:
Result: Compress unlabeled data into 1/5 or even 1/10 with same accuracy (No significant bad), thus saving a lot of time performing TransD.

Randomness Adjustment:

Randomly return class based on the weight. The randomness will decrease after every iteration.
Result: not significant, need more experiments!

Further Issue

We need more experiments on big data.
Further improve time and space complexity.
Implement the algorithm on CUDA (run on GPU).
Other ways to compress data—Fewer data but higher dimension?

Reference

Yuh-Jyh Hu, Min-Che Yu, Hsiang-An Wang, and Zih-Yun Ting, “A Similarity-Based Learning Algorithm Using Distance Transformation,” IEEE TKDE., vol. 27, no. 6, pp. June. 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
CS_project_ML		CS_project_ML
plot		plot
.gitignore		.gitignore
CS_project_ML.sln		CS_project_ML.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cs-project-ml

Motivation

Goal

TransD

Bayesian KNN

Linear Transform Approximation

Improve and Experiments

Other transformation approximation:

Clustering Preprocessing:

Randomness Adjustment:

Further Issue

Reference

About

Releases

Packages

Contributors 2

Languages

SongRongLee/cs-project-ml

Folders and files

Latest commit

History

Repository files navigation

cs-project-ml

Motivation

Goal

TransD

Bayesian KNN

Linear Transform Approximation

Improve and Experiments

Other transformation approximation:

Clustering Preprocessing:

Randomness Adjustment:

Further Issue

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages