<a href="https://colab.research.google.com/github/Pmskabir1234/Machine_Learning/blob/main/word2vec.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***Word2Vec***: a neural network model that turns words into numbers (vectors) so that words with similar meanings end up close together in vector space.


In [None]:
from gensim.models import Word2Vec

In [None]:
#sampple sentences
corpus = [
    "I like Deep Learning",
    "I like NLP",
    "Deep learning likes Data",
    "NLP likes Machine Learning"
]

#preprocessing
space = [sentence.lower().split() for sentence in corpus]
space

[['i', 'like', 'deep', 'learning'],
 ['i', 'like', 'nlp'],
 ['deep', 'learning', 'likes', 'data'],
 ['nlp', 'likes', 'machine', 'learning']]

**CBOW**: “Given the words around it, guess the middle one.”

In [None]:
#cbow model (sg=0 means CBOW)
cbow_model = Word2Vec(
    sentences=space,
    vector_size=10,  #embedding dimension
    window=2,       #context window size
    min_count=1,    #ignore words that appear less than this
    sg = 0,
    epochs = 100
)

**Skip-Gram**: Predict the surrounding words given the center word.

In [None]:
#skipgram model (sg=1 means Skip-Gram)
sg_model = Word2Vec(
    sentences=space,
    vector_size=10, #embedding dimension
    window=2,       #context window size
    min_count=1,    #ignore words that appear less than this
    sg = 1,
    epochs = 100
)

In [None]:
#get vector for a word
print("Vecotor for 'deep': \n", cbow_model.wv['deep'])

#Find similiar words
print("\nMost similar to 'deep':")
print(cbow_model.wv.most_similar('deep'))


Vecotor for 'deep': 
 [-0.07520349 -0.00928309  0.09546158 -0.07325868 -0.02334496 -0.01941634
  0.08086929 -0.05935739  0.00043506 -0.04756866]

Most similar to 'deep':
[('data', 0.2948375642299652), ('learning', 0.10603682696819305), ('i', 0.09316898137331009), ('nlp', -0.10483908653259277), ('machine', -0.11371947079896927), ('likes', -0.20980709791183472), ('like', -0.36572766304016113)]


Now Save and Load model

In [None]:
#save
cbow_model.save('cbow_model.model')
sg_model.save('sg_model.model')

#load
new_model = Word2Vec.load("cbow_model.model")
print("\nLoaded model vocab:", list(new_model.wv.index_to_key))


Loaded model vocab: ['learning', 'likes', 'nlp', 'deep', 'like', 'i', 'machine', 'data']


Conclusion:
1.Word2Vec converts words into meaningful numerical vectors that capture relationships between them.

2.CBOW predicts a target word from its context, while Skip-Gram predicts context words from a target.

3.These embeddings form the foundation of NLP models, helping machines understand semantics, similarity, and context.

In short: Word2Vec gives words brains — turning plain text into smart data!