# Example usage

To use `py_skipgram_24` in a project:

### 1. Import the package: 
This imports the SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, train_model, and get_word_vectors functions from your package.

In [1]:
from py_skipgram_24 import SkipgramModel, create_input_pairs, get_vocab, MyPreprocessor, train_model, get_word_vectors

### 2. Define the corpus: 
This is the text data that the model will be trained on.\
The MyPreprocessor class is used to preprocess the corpus. This includes tokenizing the text into sentences and words, converting all words to lowercase, and removing stop words and punctuation.\
Get the vocabulary: The get_vocab function is used to get a list of unique words in the preprocessed corpus.

In [3]:
# Defining the corpus
corpus = ["It was a great day. I loved the movie and spending time with you. I wish we had more time.", 
          "The sky is always blue underneath. Remember that."]
# Preprocessing the corpus
sentences = MyPreprocessor(corpus)
pp_corpus = list(sentences)
# Getting the vocabulary
vocab = get_vocab(pp_corpus)

print(vocab)

['wish', 'spending', 'underneath', 'loved', 'time', 'movie', 'sky', 'blue', 'day', 'great', 'remember', 'always']


### Create a word-to-index mapping and input pairs: 
A dictionary is created to map each word in the vocabulary to a unique index.\
The create_input_pairs function is used to create pairs of context words and target words from the preprocessed corpus.

In [5]:
# Creating a dictionary to map words to indices
word2idx = {word: idx for idx, word in enumerate(vocab)}

# Creating input pairs for the Skipgram model
idx_pairs = create_input_pairs(pp_corpus, word2idx, context_size=2)

print(word2idx)

{'wish': 0, 'spending': 1, 'underneath': 2, 'loved': 3, 'time': 4, 'movie': 5, 'sky': 6, 'blue': 7, 'day': 8, 'great': 9, 'remember': 10, 'always': 11}


### Initialize the Skipgram model: 
The SkipgramModel class is used to initialize the model with the size of the vocabulary and the desired embedding dimension.

In [6]:
# Initializing the Skipgram model
model = SkipgramModel(len(vocab), 10)

### Train the model: 
The train_model function is used to train the model on the input pairs. The number of epochs and the learning rate can be specified.

In [7]:
# Training the model
train_model(model, idx_pairs, epochs=250, learning_rate=0.025)

Epoch: 1, Loss: 69.66356539726257
Epoch: 10, Loss: 23.22104699909687
Epoch: 20, Loss: 22.50970959942788
Epoch: 30, Loss: 22.341058851918206
Epoch: 40, Loss: 22.252611347241327
Epoch: 50, Loss: 22.191937721450813
Epoch: 60, Loss: 22.145032958535012
Epoch: 70, Loss: 22.10644987388514
Epoch: 80, Loss: 22.073530403198674
Epoch: 90, Loss: 22.04476288310252
Epoch: 100, Loss: 22.019195186090656
Epoch: 110, Loss: 21.99618454091251
Epoch: 120, Loss: 21.975261333442177
Epoch: 130, Loss: 21.95607686121366
Epoch: 140, Loss: 21.93836108618416
Epoch: 150, Loss: 21.92190594664862
Epoch: 160, Loss: 21.906548081438814
Epoch: 170, Loss: 21.892170914637973
Epoch: 180, Loss: 21.878671098504128
Epoch: 190, Loss: 21.865960126371647
Epoch: 200, Loss: 21.853941085770202
Epoch: 210, Loss: 21.842528625045816
Epoch: 220, Loss: 21.831639897398418
Epoch: 230, Loss: 21.821229962131838
Epoch: 240, Loss: 21.811249946022144
Epoch: 250, Loss: 21.801652225849466


### Get the word vectors: 
After training, the get_word_vectors function is used to get the word vectors for each word in the vocabulary.

In [8]:
# Getting the word vectors
word_vectors = get_word_vectors(model, word2idx)

# Printing the word vectors
print(word_vectors)


{'wish': array([-3.500595  ,  3.0575395 , -3.185072  ,  3.4259958 ,  2.9964807 ,
        0.44140282, -6.356248  ,  2.5054865 ,  2.3415475 ,  2.043522  ],
      dtype=float32), 'spending': array([-1.6623857 , -0.00498109,  2.0974936 ,  0.3883536 ,  2.2632246 ,
       -0.01828435, -0.2495814 ,  0.00351789, -0.5676195 ,  0.5703435 ],
      dtype=float32), 'underneath': array([-3.2485276e-03, -8.2185902e-02, -8.7680750e-02, -3.7572870e-01,
       -1.5392040e+00, -2.3681475e-01, -1.6000400e+00, -3.3550653e+00,
       -2.6649153e-01, -5.4437512e-01], dtype=float32), 'loved': array([ 0.99293745,  0.7222526 ,  0.38154927, -0.3577101 ,  2.0718076 ,
       -0.01006974,  2.1421793 , -1.7211496 , -0.81001824,  0.94534254],
      dtype=float32), 'time': array([-0.04788318,  1.2507159 , -1.5895323 ,  0.0210836 ,  0.02665376,
       -0.02811124,  1.8835703 , -0.31172168, -0.07244606,  0.65597695],
      dtype=float32), 'movie': array([ 1.7467133 , -0.01310594,  0.25065175,  0.9707049 ,  2.6826327 ,
 