GitHub

This assumes you have kandlcontentpipeline checked out in a folder next to this one

You will need numpy, scipy and gensim installed, ideally in a virtualenv

In the examples below, 50 is the number of features, 300 is the document index

python loaddocs.py python processdocs_lsi.py 50 python setupsim_lsi.py 50

python do_query_lsi.py 50 300

python do_query_lsi.py 50 500

Outputs the document being checked, and the top three similarity references

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
HACKDAY		HACKDAY
corpora		corpora
indexes		indexes
models		models
README.md		README.md
do_query.py		do_query.py
do_query_lsi.py		do_query_lsi.py
loaddocs.py		loaddocs.py
processdocs.py		processdocs.py
processdocs_lsi.py		processdocs_lsi.py
setupsim.py		setupsim.py
setupsim_lsi.py		setupsim_lsi.py
some_examples.txt		some_examples.txt

Provide feedback