# What is Gensim?
Gensim is an open-source Python library for <b>unsupervised topic modeling</b> and <b>natural language processing</b>, using modern statistical machine learning.

# What is Word2Vec?
Word2Vec is a model that transforms words into vectors of numbers (word embeddings), capturing their semantic meaning. It uses neural networks to learn relationships between words based on their usage in a large corpus.

There are two main archituctures:
1. <b>CBOW (Continuous Bag of Words)</b> – Predicts a word from surrounding context words.\
2. <b>Skip-Gram</b> – Predicts surrounding context words given a word.

## Why Gensim for Word2Vec?
* It's optimized and fast
* Easy to train and save/load models
* Preprocessing and tokenization support
* Scalable to large corpor

# How to Use Word2Vec with Gensim
1. Install Gensim
   * pip install gensim
  
2. Prepare the Data

   
    You need a list of tokenized sentences:
   


In [None]:
from gensim.models import Word2Vec

sentences = [
    ['i', 'love', 'machine', 'learning'],
    ['deep', 'learning', 'is', 'fun'],
    ['i', 'love', 'natural', 'language', 'processing']
]

3. Train the Word2Vec Model

In [1]:
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)


NameError: name 'Word2Vec' is not defined

## Parameters:

* vector_size: Dimensionality of the word vectors

* window: Maximum distance between the current and predicted word

* min_count: Ignores words with total frequency lower than this

* workers: Number of CPU threads

4. Save and Load the Model

In [None]:
# Save model
model.save("word2vec.model")

# Load model
model = Word2Vec.load("word2vec.model")


5. Use the Model

In [None]:
# Get Word Vector
vector = model.wv['machine']

# Similar Words
model.wv.most_similar('learning', topn=5)

# Word Similarity
model.wv.similarity('machine', 'learning')

# Analogy
model.wv.most_similar(positive=['king', 'woman'], negative=['man'])


# Tokenization Tip
#### You should preprocess your text before training:

In [None]:
from gensim.utils import simple_preprocess

text = "I love data science and machine learning!"
tokens = simple_preprocess(text)
# ['love', 'data', 'science', 'and', 'machine', 'learning']


## Visualizing Word Embeddings
Use *TSNE* from *sklearn.manifold* to reduce dimensions and plot vectors.