In [19]:
import nltk
from gensim.models import Word2Vec
from nltk.corpus import stopwords

import re


In [5]:
paragraph = """Thank you very much. Good afternoon. I am
honored to be in the timeless city of Cairo, and to be hosted by two
remarkable institutions. For over a thousand years, Al-Azhar has stood
as a beacon of Islamic learning; and for over a century, Cairo University
has been a source of Egypt's advancement. And together, you
represent the harmony between tradition and progress. I'm grateful for
your hospitality, and the hospitality of the people of Egypt. And I'm also
proud to carry with me the goodwill of the American people, and a
greeting of peace from Muslim communities in my country: Assalaamu
alaykum. (Applause.)We meet at a time of great tension between the United States and
Muslims around the world -- tension rooted in historical forces that go
beyond any current policy debate. The relationship between Islam and
the West includes centuries of coexistence and cooperation, but also
conflict and religious wars. More recently, tension has been fed by
colonialism that denied rights and opportunities to many Muslims, and a
Cold War in which Muslim-majority countries were too often treated as
proxies without regard to their own aspirations. Moreover, the sweeping
change brought by modernity and globalization led many Muslims to
view the West as hostile to the traditions of Islam.Violent extremists have exploited these tensions in a small but potent
minority of Muslims. The attacks of September 11, 2001 and the
continued efforts of these extremists to engage in violence against
civilians has led some in my country to view Islam as inevitably hostile
not only to America and Western countries, but also to human rights.
All this has bred more fear and more mistrust.
So long as our relationship is defined by our differences, we will
empower those who sow hatred rather than peace, those who promote
conflict rather than the cooperation that can help all of our people
achieve justice and prosperity. And this cycle of suspicion and discord
must end.I've come here to Cairo to seek a new beginning between the United
States and Muslims around the world, one based on mutual interest and 
mutual respect, and one based upon the truth that America and Islam
are not exclusive and need not be in competition. Instead, they overlap,
and share common principles -- principles of justice and progress;
tolerance and the dignity of all human beings.I do so recognizing that change cannot happen overnight. I know
there's been a lot of publicity about this speech, but no single speech
can eradicate years of mistrust, nor can I answer in the time that I have
this afternoon all the complex questions that brought us to this point.
But I am convinced that in order to move forward, we must say openly
to each other the things we hold in our hearts and that too often are said
only behind closed doors. There must be a sustained effort to listen to
each other; to learn from each other; to respect one another; and to
seek common ground. As the Holy Koran tells us, "Be conscious of
God and speak always the truth." (Applause.) That is what I will try to
do today -- to speak the truth as best I can, humbled by the task before
us, and firm in my belief that the interests we share as human beings
are far more powerful than the forces that drive us apart.
"""

In [9]:
# Cleaning and preprocessing the data

text = re.sub(r'\[[0-9]*\]',' ',paragraph)
text = re.sub(r'\s+',' ',text)
text = text.lower()
text = re.sub(r'\d',' ',text)
text = re.sub(r'\s+',' ', text)
text

'thank you very much. good afternoon. i am honored to be in the timeless city of cairo, and to be hosted by two remarkable institutions. for over a thousand years, al-azhar has stood as a beacon of islamic learning; and for over a century, cairo university has been a source of egypt\'s advancement. and together, you represent the harmony between tradition and progress. i\'m grateful for your hospitality, and the hospitality of the people of egypt. and i\'m also proud to carry with me the goodwill of the american people, and a greeting of peace from muslim communities in my country: assalaamu alaykum. (applause.)we meet at a time of great tension between the united states and muslims around the world -- tension rooted in historical forces that go beyond any current policy debate. the relationship between islam and the west includes centuries of coexistence and cooperation, but also conflict and religious wars. more recently, tension has been fed by colonialism that denied rights and opp

#### divided the paragraph to sentences and  then sentence to word go through the sentences and remove all stopwords

In [22]:
sentences = nltk.sent_tokenize(text)
sentences = [nltk.word_tokenize(sentence) for sentence in sentences]

for i in range(len(sentences)):
    sentences[i] = [word for word in sentences[i] if word not in stopwords.words('english') ]
    print(sentences[i])



['thank', 'much', '.']
['good', 'afternoon', '.']
['honored', 'timeless', 'city', 'cairo', ',', 'hosted', 'two', 'remarkable', 'institutions', '.']
['thousand', 'years', ',', 'al-azhar', 'stood', 'beacon', 'islamic', 'learning', ';', 'century', ',', 'cairo', 'university', 'source', 'egypt', "'s", 'advancement', '.']
['together', ',', 'represent', 'harmony', 'tradition', 'progress', '.']
["'m", 'grateful', 'hospitality', ',', 'hospitality', 'people', 'egypt', '.']
["'m", 'also', 'proud', 'carry', 'goodwill', 'american', 'people', ',', 'greeting', 'peace', 'muslim', 'communities', 'country', ':', 'assalaamu', 'alaykum', '.']
['(', 'applause', '.']
[')', 'meet', 'time', 'great', 'tension', 'united', 'states', 'muslims', 'around', 'world', '--', 'tension', 'rooted', 'historical', 'forces', 'go', 'beyond', 'current', 'policy', 'debate', '.']
['relationship', 'islam', 'west', 'includes', 'centuries', 'coexistence', 'cooperation', ',', 'also', 'conflict', 'religious', 'wars', '.']
['recently'

#### Train word2vec model

In [20]:
model = Word2Vec(sentences, min_count = 1)

In [24]:
words = model.wv.vocab
words

{'thank': <gensim.models.keyedvectors.Vocab at 0x11e11765978>,
 'much': <gensim.models.keyedvectors.Vocab at 0x11e1176cf28>,
 '.': <gensim.models.keyedvectors.Vocab at 0x11e1cf6d080>,
 'good': <gensim.models.keyedvectors.Vocab at 0x11e1cf6de48>,
 'afternoon': <gensim.models.keyedvectors.Vocab at 0x11e1cf6df60>,
 'honored': <gensim.models.keyedvectors.Vocab at 0x11e1cf6ddd8>,
 'timeless': <gensim.models.keyedvectors.Vocab at 0x11e1cf6dda0>,
 'city': <gensim.models.keyedvectors.Vocab at 0x11e1cf61048>,
 'cairo': <gensim.models.keyedvectors.Vocab at 0x11e1cf610f0>,
 ',': <gensim.models.keyedvectors.Vocab at 0x11e1cf61278>,
 'hosted': <gensim.models.keyedvectors.Vocab at 0x11e1cf612b0>,
 'two': <gensim.models.keyedvectors.Vocab at 0x11e1cf612e8>,
 'remarkable': <gensim.models.keyedvectors.Vocab at 0x11e1cf61320>,
 'institutions': <gensim.models.keyedvectors.Vocab at 0x11e1cf61438>,
 'thousand': <gensim.models.keyedvectors.Vocab at 0x11e1cf614e0>,
 'years': <gensim.models.keyedvectors.Vocab

In [29]:
vector = model.wv['interest']
vector

array([ 0.00278227,  0.00227593,  0.00111605,  0.00319227, -0.00330121,
        0.00406355, -0.00219051,  0.00179917, -0.00034407,  0.00470973,
        0.00065543, -0.0033708 , -0.00031873,  0.00313567, -0.00355444,
        0.00492666, -0.00036784,  0.00097425,  0.00052797,  0.00317564,
       -0.00342624,  0.00023264, -0.00228235,  0.00221898,  0.00423632,
        0.00294869, -0.00244206,  0.0003388 ,  0.00452181, -0.00148123,
       -0.00287035,  0.00287213, -0.00456662,  0.00084645, -0.00167484,
        0.00424542, -0.00457176, -0.00433015,  0.00179486,  0.00347766,
       -0.00093244,  0.00100697,  0.00337595,  0.00342346,  0.0021079 ,
        0.00222245,  0.00170007, -0.00359065,  0.0024511 , -0.00227853,
       -0.00492469, -0.00446944,  0.002259  ,  0.0021892 , -0.00171673,
       -0.00154679,  0.00461425,  0.00021184,  0.00050761, -0.00491458,
        0.00435889,  0.00103254, -0.0010591 , -0.00200613,  0.00062177,
       -0.00234018,  0.003921  , -0.00114799,  0.00276635,  0.00

In [28]:
# similar word
similar = model.wv.most_similar('interest')
similar

[('denied', 0.3408757746219635),
 ('recently', 0.2368129938840866),
 ('principles', 0.2165326327085495),
 ('world', 0.21257524192333221),
 ('islamic', 0.21135717630386353),
 ('proud', 0.21067407727241516),
 ('cooperation', 0.20402562618255615),
 ('beyond', 0.20036834478378296),
 ('mutual', 0.19747115671634674),
 ('exploited', 0.19251707196235657)]