In [56]:
paragraph="""Probably the most effective way to achieve paragraph unity is to express the central idea of the paragraph in a 
topic sentence. Topic sentences are similar to mini thesis statements. Like a thesis statement, a topic sentence has a specific
main point. Whereas the thesis is the main point of the essay, the topic sentence is the main point of the paragraph. Like the 
thesis statement, a topic sentence has a unifying function. But a thesis statement or topic sentence alone doesn’t guarantee 
unity. An essay is unified if all the paragraphs relate to the thesis, whereas a paragraph is unified if all the sentences 
relate to the topic sentence. Note: Not all paragraphs need topic sentences. In particular, opening and closing paragraphs, 
which serve different functions from body paragraphs, generally don’t have topic sentences.
In academic writing, the topic sentence nearly always works best at the beginning of a paragraph so that the reader knows what 
to expect:
The embrace of Twitter by politicians and journalists has been one of its most notable features in recent years: for both groups
the use of Twitter is becoming close to a requirement. —Paul Bernal, “A Defence of Responsible Tweeting”
This topic sentence forecasts the central idea or main point of the paragraph: “politicians” and “journalists” rely on Twitter.
The rest of the paragraph will focus on these two Twitter-user groups, thereby fulfilling the promise made by the topic sentence.
By avoiding irrelevant information that does not relate to the topic sentence, you can compose a unified paragraph."""

In [57]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
wordnet=WordNetLemmatizer()

In [62]:
text=re.sub('[^a-zA-Z]', ' ', paragraph)
text=text.lower()
text

'probably the most effective way to achieve paragraph unity is to express the central idea of the paragraph in a  topic sentence  topic sentences are similar to mini thesis statements  like a thesis statement  a topic sentence has a specific main point  whereas the thesis is the main point of the essay  the topic sentence is the main point of the paragraph  like the  thesis statement  a topic sentence has a unifying function  but a thesis statement or topic sentence alone doesn t guarantee  unity  an essay is unified if all the paragraphs relate to the thesis  whereas a paragraph is unified if all the sentences  relate to the topic sentence  note  not all paragraphs need topic sentences  in particular  opening and closing paragraphs   which serve different functions from body paragraphs  generally don t have topic sentences  in academic writing  the topic sentence nearly always works best at the beginning of a paragraph so that the reader knows what  to expect  the embrace of twitter b

In [63]:
# lets convert the paragraph into sentences and sentences into words:
sentences=nltk.sent_tokenize(text)
sentences=[nltk.word_tokenize(i) for i in sentences]
sentences

[['probably',
  'the',
  'most',
  'effective',
  'way',
  'to',
  'achieve',
  'paragraph',
  'unity',
  'is',
  'to',
  'express',
  'the',
  'central',
  'idea',
  'of',
  'the',
  'paragraph',
  'in',
  'a',
  'topic',
  'sentence',
  'topic',
  'sentences',
  'are',
  'similar',
  'to',
  'mini',
  'thesis',
  'statements',
  'like',
  'a',
  'thesis',
  'statement',
  'a',
  'topic',
  'sentence',
  'has',
  'a',
  'specific',
  'main',
  'point',
  'whereas',
  'the',
  'thesis',
  'is',
  'the',
  'main',
  'point',
  'of',
  'the',
  'essay',
  'the',
  'topic',
  'sentence',
  'is',
  'the',
  'main',
  'point',
  'of',
  'the',
  'paragraph',
  'like',
  'the',
  'thesis',
  'statement',
  'a',
  'topic',
  'sentence',
  'has',
  'a',
  'unifying',
  'function',
  'but',
  'a',
  'thesis',
  'statement',
  'or',
  'topic',
  'sentence',
  'alone',
  'doesn',
  't',
  'guarantee',
  'unity',
  'an',
  'essay',
  'is',
  'unified',
  'if',
  'all',
  'the',
  'paragraphs',
  '

In [64]:
# lets stopped the stopwords:
for i in range(0,len(sentences)):
    sentences[i]=[word for word in sentences[i] if word not in set(stopwords.words('english'))]
sentences

[['probably',
  'effective',
  'way',
  'achieve',
  'paragraph',
  'unity',
  'express',
  'central',
  'idea',
  'paragraph',
  'topic',
  'sentence',
  'topic',
  'sentences',
  'similar',
  'mini',
  'thesis',
  'statements',
  'like',
  'thesis',
  'statement',
  'topic',
  'sentence',
  'specific',
  'main',
  'point',
  'whereas',
  'thesis',
  'main',
  'point',
  'essay',
  'topic',
  'sentence',
  'main',
  'point',
  'paragraph',
  'like',
  'thesis',
  'statement',
  'topic',
  'sentence',
  'unifying',
  'function',
  'thesis',
  'statement',
  'topic',
  'sentence',
  'alone',
  'guarantee',
  'unity',
  'essay',
  'unified',
  'paragraphs',
  'relate',
  'thesis',
  'whereas',
  'paragraph',
  'unified',
  'sentences',
  'relate',
  'topic',
  'sentence',
  'note',
  'paragraphs',
  'need',
  'topic',
  'sentences',
  'particular',
  'opening',
  'closing',
  'paragraphs',
  'serve',
  'different',
  'functions',
  'body',
  'paragraphs',
  'generally',
  'topic',
  'sen

In [76]:
from gensim.models import Word2Vec

In [77]:
model = Word2Vec(sentences, min_count=1)  # if word appear less than  1, it will skip, will get only valuable words

In [80]:
words=model.wv.vocab
words

{'probably': <gensim.models.keyedvectors.Vocab at 0x272bf1d47c8>,
 'effective': <gensim.models.keyedvectors.Vocab at 0x272bf254bc8>,
 'way': <gensim.models.keyedvectors.Vocab at 0x272bf254288>,
 'achieve': <gensim.models.keyedvectors.Vocab at 0x272bf254908>,
 'paragraph': <gensim.models.keyedvectors.Vocab at 0x272bf254f08>,
 'unity': <gensim.models.keyedvectors.Vocab at 0x272bf254348>,
 'express': <gensim.models.keyedvectors.Vocab at 0x272bf254d08>,
 'central': <gensim.models.keyedvectors.Vocab at 0x272bf254c08>,
 'idea': <gensim.models.keyedvectors.Vocab at 0x272bf254d88>,
 'topic': <gensim.models.keyedvectors.Vocab at 0x272bf254988>,
 'sentence': <gensim.models.keyedvectors.Vocab at 0x272bf254ac8>,
 'sentences': <gensim.models.keyedvectors.Vocab at 0x272bf24c908>,
 'similar': <gensim.models.keyedvectors.Vocab at 0x272bf24ca08>,
 'mini': <gensim.models.keyedvectors.Vocab at 0x272bf24cb88>,
 'thesis': <gensim.models.keyedvectors.Vocab at 0x272bf24c608>,
 'statements': <gensim.models.ke

In [83]:
# lets find the word vector of a particular word:
vector=model.wv['central']
vector

array([ 5.2671699e-04,  1.6785746e-04,  3.0443314e-03,  4.5509236e-03,
       -5.3872936e-04,  1.6720730e-03,  2.0828734e-04,  1.2814670e-03,
        4.3203286e-03,  1.4692486e-03, -2.5705684e-03, -3.8945696e-03,
       -2.0221067e-03,  1.6534737e-03,  4.5239431e-04, -4.7686958e-04,
       -4.2808377e-03, -3.6210059e-03, -1.2330529e-03,  1.9387755e-03,
       -3.0960115e-03,  7.6036371e-04, -4.8247357e-03,  1.9486472e-03,
       -3.1785364e-03, -4.9568230e-04, -1.1967772e-03, -2.0714903e-03,
       -1.9724548e-03, -9.9403132e-04,  2.8624453e-03, -1.1177268e-03,
       -1.7071258e-03, -1.6949495e-03,  3.2527836e-03,  2.1800648e-03,
       -3.6508376e-03,  4.9968353e-03, -4.2304057e-03, -3.4308855e-03,
       -2.0145492e-04,  2.9220851e-03, -2.3648178e-03,  4.5922664e-03,
        3.2257652e-03,  1.7874800e-03, -2.2921546e-03,  1.7844295e-03,
       -1.6672607e-03, -4.5956727e-03,  4.5379335e-03,  2.0191055e-03,
        3.0854498e-03,  2.2059630e-03,  1.8084837e-03,  3.3854358e-03,
      

In [87]:
# lets find a similar word of a word:
similar_word=model.wv.most_similar('central')
similar_word

[('forecasts', 0.27768009901046753),
 ('function', 0.24482116103172302),
 ('different', 0.2321092039346695),
 ('irrelevant', 0.18512892723083496),
 ('knows', 0.184354767203331),
 ('twitter', 0.1798315942287445),
 ('becoming', 0.16909268498420715),
 ('paul', 0.168805330991745),
 ('whereas', 0.16762787103652954),
 ('sentence', 0.15543414652347565)]