## Usage example

Here is an example of a LSA pipeline that:
1. Ingests a collection of texts
2. Makes the corresponding document-term matrix using stemming and removing stop words
3. Extracts 40 topics
4. Shows a table with the extracted topics
5. Shows a table with statistical thesaurus entries for selected words  

In [2]:
import random
from LatentSemanticAnalyzer.LatentSemanticAnalyzer import *
from LatentSemanticAnalyzer.DataLoaders import *
import snowballstemmer

In [15]:
# Collection of texts
dfAbstracts = load_abstracts_data_frame()
docs = dict(zip(dfAbstracts.ID, dfAbstracts.Abstract))
len(docs)

578

In [4]:
# Stemmer object (to preprocess words in the pipeline below)
stemmerObj = snowballstemmer.stemmer("english")

In [5]:
# Words to show statistical thesaurus entries for
words = ["notebook", "computational", "function", "neural", "talk", "programming"]

In [6]:
# Reproducible results
random.seed(12)

In [13]:
# Remove non-strings
docs2 = { k:v for k, v in docs.items() if isinstance(v, str) }
len(docs2)

567

In [14]:
# LSA pipeline
lsaObj = (LatentSemanticAnalyzer()
          .make_document_term_matrix(docs=docs2,
                                     stop_words=True,
                                     stemming_rules=True,
                                     min_length=3)
          .apply_term_weight_functions(global_weight_func="IDF",
                                       local_weight_func="None",
                                       normalizer_func="Cosine")
          .extract_topics(number_of_topics=40, min_number_of_documents_per_term=10, method="NNMF")
          .echo_topics_interpretation(number_of_terms=12, wide_form=True)
          .echo_statistical_thesaurus(terms=stemmerObj.stemWords(words),
                                      wide_form=True,
                                      number_of_nearest_neighbors=12,
                                      method="cosine",
                                      echo_function=lambda x: print(x.to_string())))

                                           0           1            2          3          4            5         6           7          8            9           10          11
tpc.000.player-app-view                player         app         view     storag    account       websit       cdf        file      cloud     interact    document       learn
tpc.001.complex-look-plot             complex        look         plot     number       valu        quick  function        take     improv        exist       cover        talk
tpc.002.materi-assign-teacher          materi      assign      teacher     school    concept      univers      made     explain     physic     particip  understand     student
tpc.003.audio-signal-analysi            audio      signal      analysi    extract    classif      generat    effect       refer    process        updat    introduc        show
tpc.004.curv-algebra-plane               curv     algebra        plane    theorem    project    represent     defin     