Skip to content
/ cossim Public

PoC for text similarity using the cosine distance

License

Notifications You must be signed in to change notification settings

aplanas/cossim

Repository files navigation

cossim

PoC for text similarity using the cosine distance

Setup

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

There are two samples that I take from the samples (good and bad):

diff -u ./good/000021f5c87d56000fd13a651be10b19.txt sample_from_good.txt
diff -u ./bad/000007e9c8f71febdfb37b51b6bbed77.txt sample_from_bad.txt

The changed documents are properly identified:

python3 cossim.py --good ./good --bad ./bad sample_from_bad.txt

Maximum similarity with good samples: {} 0.2326964857509056
Maximum similarity with bad samples: {} 0.9995798580507443
Similar document: ./bad/000007e9c8f71febdfb37b51b6bbed77.txt
python3 cossim.py --good ./good --bad ./bad sample_from_good.txt

Maximum similarity with good samples: {} 0.9960348410695655
Maximum similarity with bad samples: {} 0.3742297492165573
Similar document: ./good/000021f5c87d56000fd13a651be10b19.txt

About

PoC for text similarity using the cosine distance

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages