# 1. Introduction
## 1.1 Definition
**Word embeddings** are *vector representations of words,* that capture semantic information and relationships between them. The idea is to transform words into numerical vectors where words with similar meanings or contexts have similar representations. This approach allows machine learning models to understand language better by placing words in a high-dimensional space where distance indicates similarity.


## 1.2 Examples of word embedding techniques include:
1. **Word2Vec:** This model, developed by *Google*, learns word embeddings using neural networks. It has two main approaches: **CBOW (Continuous Bag of Words)** and **Skip-Gram.**
  * **CBOW** predicts a word based on its surrounding context.
  * **Skip-Gram** predicts the context based on a given word.
2. **GloVe (Global Vectors for Word Representation):** Developed by *Stanford*, GloVe captures global statistical information from a corpus by training on word co-occurrence probabilities.
3. **FastText:** An extension of Word2Vec developed by *Facebook*, **FastText** represents words as n-grams of characters, making it effective for morphologically (*similar in context*) rich languages and out-of-vocabulary words.

# 2. Import libraries
This imports the `gensim` library and the `Word2Vec` class.
* `gensim` is a Python library for training word embeddings and performing **NLP** tasks.
* The `Word2Vec` class specifically implements the `Word2Vec` model, which converts words into vector representations.

In [1]:
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

# 3. Prepare the text dataset
* Define a `sentences`, which is a list of lists.
* Each inner list is a sentence, split into individual words. These words will be used as **tokens** by the **Word2Vec** model.
* In **Word2Vec**, sentences are used to learn the relationships between words, with similar words or words in similar contexts receiving similar vector representations.


In [2]:
# Sample sentences
sentences = [
    ['hello', 'world'],
    ['i', 'love', 'natural', 'language', 'processing'],
    ['hello', 'from', 'the', 'other', 'side']
]

# 4. Initialize and Train Word2Vec Model
#### Key parameters:-
- sentences: *corpus of text, an iterable of sentences, where each sentence is a list of words.*
- vector_size: *the size of the word vectors.*
- window: *context window size.*
- min_count: *minimum frequency of words to be considered.*
- sg: *training algorithm; sg=1, skip-gram and sg=0, CBOW (Continuous Bag of Words) model.*


