Pairwise-Sample-Similarity

Pairwise sample similarity (cosine) between records.

It is sometimes necessary to know how similar our data are compared to other data in the database. In this repository, I have written a program that will provide pair-wise similarity between the records.

For example, if we have data coming to the same database from different sources we might need to automate the process of how similar the samples are. Since sometimes we might have a similar kind of data and we do not want that, or it might be necessary to delete the duplicate (or close to duplicate) data.

The dataset already had numerical values, therefore reducing the trouble of encoding it (for example, from text to numerical values). It's future work :)

TODO

Provide proper documentaiton
Dataset characteristics
Try different simlarity measures
Work with text data and encode and then find the similarity

Dataset:

https://archive.ics.uci.edu/ml/datasets/covertype

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pairwise-Sample-Similarity

TODO

Dataset:

About

Releases

Packages

Languages

abdullahalzubaer/Pairwise-Sample-Similarity

Folders and files

Latest commit

History

Repository files navigation

Pairwise-Sample-Similarity

TODO

Dataset:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages