### Interactive NLP

The `seaqube` package provides a simple toolkit for simple usage of pre trained nlp models or fro self trained models like from `gensim`.

_Whatever for a NLP model is used. If the model training, saving and loading process is implemented in a class which inherits from `SeaQuBeWordEmbeddingsModel`, the seaqube toolkit can be used for interactive NLP usage_

In [None]:
from seaqube.nlp.types import SeaQuBeWordEmbeddingsModel

In [None]:
# Lets have a look at a contexted based NLP model, called Context2Vec
from seaqube.nlp.context2vec.context2vec import Context2Vec

In [None]:
# Import some seaqube tools:
from seaqube.nlp.tools import word_count_list
from seaqube.nlp.types import RawModelTinCan
from seaqube.nlp.seaqube_model import SeaQuBeNLPLoader, SeaQuBeCompressLoader
from seaqube.nlp.tools import tokenize_corpus

To use the seaqube word embedding evaluation OR just to make nlp usage easier, it is neccessary to wrap such a model to a `SeaQuBeWordEmbeddingsModel` like we can see in the following:

In [None]:
class SeaQuBeWordEmbeddingsModelC2V(SeaQuBeWordEmbeddingsModel):
    def __init__(self, c2v: Context2Vec):
        self.c2v = c2v

    def vocabs(self):
        return self.c2v.wv.vocabs

    @property
    def wv(self):
        return self.c2v.wv

    def word_vector(self, word):
        return self.c2v.wv[word]

    def matrix(self):
        return self.c2v.wv.matrix

We load a corpus which then will be used for model training

In [None]:
star_wars_cites = ["How you get so big eating food of this kind?", "'Spring the trap!'", "Same as always…", "You came in that thing? You’re braver than I thought!", "Who’s scruffy looking?", "Let the Wookiee win.", "The Emperor is not as forgiving as I am", "I don’t know where you get your delusions, laserbrain.", "Shutting up, sir,", "Boring conversation anyway…", ]
corpus = tokenize_corpus(star_wars_cites)

Traing a Context2Vec instance

In [None]:
c2v = Context2Vec(epoch=3)

In [None]:
c2v.train(corpus)

This context2Vec model can be completely saved with:

In [None]:
c2v.save("starwars_c2v")

Now, it is time to wrap the model to a seaqube understandable format

In [None]:
seaC2V = SeaQuBeWordEmbeddingsModelC2V(c2v)

In [None]:
tin_can = RawModelTinCan(seaC2V, word_count_list(corpus))

In [None]:
SeaQuBeCompressLoader.save_model_compressed(tin_can, "c2v_small")

The next step transform a nlp model with extra information to a nlp object, which provides interactive usage

In [None]:
nlp = SeaQuBeNLPLoader.load_model_from_tin_can(tin_can, "c2v")

This line tansforms a document to a SeaQuBeNLPDoc object which provides some features about similarity and word embeddings

In [None]:
nlp("This is a test")

`doc` is a list of tokens

In [None]:
doc = list(nlp("This is a test")); print(doc); type(doc[0])

For every token a embedding vector can be obtained. Here just for the first one:

In [None]:
nlp("This is a test")[0].vector

The vector can be merged, using mean or the dif algorithm, if vecor is used from the document contexts.

In [None]:
nlp("This is a test").vector

In [None]:
nlp("This is a test").sif_vector

Also the similarity between words or documents can be calulated, whereas for document the `sif` method gives a better semantic result.

In [None]:
nlp("Is the Emperor a laserbrain?").similarity("Boring conversation anyway…")

In [None]:
nlp("Is the Emperor a laserbrain?").similarity("Boring conversation anyway…", vector="sif")

Similarity for words

In [None]:
word = nlp("Is the Emperor a laserbrain?")[2]

In [None]:
word

In [None]:
word.similarity("Wookiee")

Get the vocab of the model

In [None]:
nlp.vocab()