In [2]:
import nltk
from gensim.models import Word2Vec
from nltk.corpus import stopwords
import re

In [38]:
paragraph ="""Paragraphs are the building blocks of papers. Many students define paragraphs in terms of length: a paragraph is a group of at least five sentences, a paragraph is half a page long, etc. In reality, though, the unity and coherence of ideas among sentences is what constitutes a paragraph. A paragraph is defined as “a group of sentences or a single sentence that forms a unit” (Lunsford and Connors 116). Length and appearance do not determine whether a section in a paper is a paragraph. For instance, in some styles of writing, particularly journalistic styles, a paragraph can be just one sentence long. Ultimately, a paragraph is a sentence or group of sentences that support one main idea. In this handout, we will refer to this as the “controlling idea,” because it controls what happens in the rest of the paragraph.Before you can begin to determine what the composition of a particular paragraph will be, you must first decide on an argument and a working thesis statement for your paper. What is the most important idea that you are trying to convey to your reader? The information in each paragraph must be related to that idea. In other words, your paragraphs should remind your reader that there is a recurrent relationship between your thesis and the information in each paragraph. A working thesis functions like a seed from which your paper, and your ideas, will grow. The whole process is an organic one—a natural progression from a seed to a full-blown paper where there are direct, familial relationships between all of the ideas in the paper.

The decision about what to put into your paragraphs begins with the germination of a seed of ideas; this “germination process” is better known as brainstorming. There are many techniques for brainstorming; whichever one you choose, this stage of paragraph development cannot be skipped. Building paragraphs can be like building a skyscraper: there must be a well-planned foundation that supports what you are building. Any cracks, inconsistencies, or other corruptions of the foundation can cause your whole paper to crumble.

So, let’s suppose that you have done some brainstorming to develop your thesis. What else should you keep in mind as you begin to create paragraphs? Every paragraph in a paper should be:

Unified: All of the sentences in a single paragraph should be related to a single controlling idea (often expressed in the topic sentence of the paragraph).
Clearly related to the thesis: The sentences should all refer to the central idea, or thesis, of the paper (Rosen and Behrens 119).
Coherent: The sentences should be arranged in a logical manner and should follow a definite plan for development (Rosen and Behrens 119).
Well-developed: Every idea discussed in the paragraph should be adequately explained and supported through evidence and details that work together to explain the paragraph’s controlling idea (Rosen and Behrens 119).
How do I organize a paragraph?
There are many different ways to organize a paragraph. The organization you choose will depend on the controlling idea of the paragraph. Below are a few possibilities for organization, with links to brief examples:

Narration: Tell a story. Go chronologically, from start to finish. (See an example.)
Description: Provide specific details about what something looks, smells, tastes, sounds, or feels like. Organize spatially, in order of appearance, or by topic. (See an example.)
Process: Explain how something works, step by step. Perhaps follow a sequence—first, second, third. (See an example.)
Classification: Separate into groups or explain the various parts of a topic. (See an example.)
Illustration: Give examples and explain how those examples prove your point. (See the detailed example in the next section of this handout.)
5-step process to paragraph development
Let’s walk through a 5-step process for building a paragraph. For each step there is an explanation and example. Our example paragraph will be about slave spirituals, the original songs that African Americans created during slavery. The model paragraph uses illustration (giving examples) to prove its point.

Step 1. Decide on a controlling idea and create a topic sentence
Paragraph development begins with the formulation of the controlling idea. This idea directs the paragraph’s development. Often, the controlling idea of a paragraph will appear in the form of a topic sentence. In some cases, you may need more than one sentence to express a paragraph’s controlling idea. Here is the controlling idea for our “model paragraph,” expressed in a topic sentence:

Model controlling idea and topic sentence — Slave spirituals often had hidden double meanings.
Step 2. Explain the controlling idea
Paragraph development continues with an expression of the rationale or the explanation that the writer gives for how the reader should interpret the information presented in the idea statement or topic sentence of the paragraph. The writer explains his/her thinking about the main topic, idea, or focus of the paragraph. Here’s the sentence that would follow the controlling idea about slave spirituals:

Model explanation — On one level, spirituals referenced heaven, Jesus, and the soul; but on another level, the songs spoke about slave resistance.
Step 3. Give an example (or multiple examples)
Paragraph development progresses with the expression of some type of support or evidence for the idea and the explanation that came before it. The example serves as a sign or representation of the relationship established in the idea and explanation portions of the paragraph. Here are two examples that we could use to illustrate the double meanings in slave spirituals:

Model example A — For example, according to Frederick Douglass, the song “O Canaan, Sweet Canaan” spoke of slaves’ longing for heaven, but it also expressed their desire to escape to the North. Careful listeners heard this second meaning in the following lyrics: “I don’t expect to stay / Much longer here. / Run to Jesus, shun the danger. / I don’t expect to stay.”
Model example B — Slaves even used songs like “Steal Away to Jesus (at midnight)” to announce to other slaves the time and place of secret, forbidden meetings.
Step 4. Explain the example(s)
The next movement in paragraph development is an explanation of each example and its relevance to the topic sentence and rationale that were stated at the beginning of the paragraph. This explanation shows readers why you chose to use this/or these particular examples as evidence to support the major claim, or focus, in your paragraph.

Continue the pattern of giving examples and explaining them until all points/examples that the writer deems necessary have been made and explained. NONE of your examples should be left unexplained. You might be able to explain the relationship between the example and the topic sentence in the same sentence which introduced the example. More often, however, you will need to explain that relationship in a separate sentence. Look at these explanations for the two examples in the slave spirituals paragraph:

Model explanation for example A — When slaves sang this song, they could have been speaking of their departure from this life and their arrival in heaven; however, they also could have been describing their plans to leave the South and run, not to Jesus, but to the North.
Model explanation for example B — [The relationship between example B and the main idea of the paragraph’s controlling idea is clear enough without adding another sentence to explain it."""

In [48]:
#processing data
text = re.sub(r'\[0-9]*\]',' ' ,paragraph)
text = re.sub(r'\s+',' ',text)
text = text.lower()
text = re.sub(r'\d',' ',text)
text = re.sub(r'\s+',' ',text)
stop_updated =  ["...","..","!!"]

In [49]:
#preparing data set
sentences = nltk.sent_tokenize(text)

In [50]:
sentences[:5]

['paragraphs are the building blocks of papers.',
 'many students define paragraphs in terms of length: a paragraph is a group of at least five sentences, a paragraph is half a page long, etc.',
 'in reality, though, the unity and coherence of ideas among sentences is what constitutes a paragraph.',
 'a paragraph is defined as “a group of sentences or a single sentence that forms a unit” (lunsford and connors ).',
 'length and appearance do not determine whether a section in a paper is a paragraph.']

In [42]:
sentences = [nltk.word_tokenize(sentence) for sentence in sentences]


In [43]:
for i in range(len(sentences)):
    sentences[i]= [word for word in sentences[i] if word not in (stopwords.words('english') and stop_updated)]


In [44]:
sentences

[['paragraphs', 'are', 'the', 'building', 'blocks', 'of', 'papers', '.'],
 ['many',
  'students',
  'define',
  'paragraphs',
  'in',
  'terms',
  'of',
  'length',
  ':',
  'a',
  'paragraph',
  'is',
  'a',
  'group',
  'of',
  'at',
  'least',
  'five',
  'sentences',
  ',',
  'a',
  'paragraph',
  'is',
  'half',
  'a',
  'page',
  'long',
  ',',
  'etc',
  '.'],
 ['in',
  'reality',
  ',',
  'though',
  ',',
  'the',
  'unity',
  'and',
  'coherence',
  'of',
  'ideas',
  'among',
  'sentences',
  'is',
  'what',
  'constitutes',
  'a',
  'paragraph',
  '.'],
 ['a',
  'paragraph',
  'is',
  'defined',
  'as',
  '“',
  'a',
  'group',
  'of',
  'sentences',
  'or',
  'a',
  'single',
  'sentence',
  'that',
  'forms',
  'a',
  'unit',
  '”',
  '(',
  'lunsford',
  'and',
  'connors',
  ')',
  '.'],
 ['length',
  'and',
  'appearance',
  'do',
  'not',
  'determine',
  'whether',
  'a',
  'section',
  'in',
  'a',
  'paper',
  'is',
  'a',
  'paragraph',
  '.'],
 ['for',
  'instance

In [29]:
#training the word2Vec model
model = Word2Vec(sentences, min_count=1)

In [30]:
model

<gensim.models.word2vec.Word2Vec at 0x2868f7db8d0>

In [33]:
words= model.wv.vocab
words

{'paragraphs': <gensim.models.keyedvectors.Vocab at 0x2868f7dbac8>,
 'building': <gensim.models.keyedvectors.Vocab at 0x2868f7dbcf8>,
 'blocks': <gensim.models.keyedvectors.Vocab at 0x2868f7dbdd8>,
 'papers': <gensim.models.keyedvectors.Vocab at 0x2868f7ddb70>,
 '.': <gensim.models.keyedvectors.Vocab at 0x2868f7ddd30>,
 'many': <gensim.models.keyedvectors.Vocab at 0x2868f7dd470>,
 'students': <gensim.models.keyedvectors.Vocab at 0x2868f7dd0f0>,
 'define': <gensim.models.keyedvectors.Vocab at 0x2868f7dd518>,
 'terms': <gensim.models.keyedvectors.Vocab at 0x2868f7dd550>,
 'length': <gensim.models.keyedvectors.Vocab at 0x2868f7dd6d8>,
 ':': <gensim.models.keyedvectors.Vocab at 0x2868f7dd048>,
 'paragraph': <gensim.models.keyedvectors.Vocab at 0x2868f7dd160>,
 'group': <gensim.models.keyedvectors.Vocab at 0x2868f7dddd8>,
 'least': <gensim.models.keyedvectors.Vocab at 0x2868f7dd128>,
 'five': <gensim.models.keyedvectors.Vocab at 0x2868f7ddf28>,
 'sentences': <gensim.models.keyedvectors.Voca

In [35]:
#Finding vector words
vector = model.wv['terms'] # var is a word present in test
vector

array([-2.4400328e-03, -3.3724878e-03,  4.1683754e-03,  3.4917365e-03,
        4.3024141e-03, -5.0188415e-03,  1.6430558e-03, -3.5033599e-03,
       -3.7150215e-03,  1.6620752e-03,  2.7807874e-03,  5.5625773e-04,
        2.9234453e-03, -4.8002582e-03, -4.6941865e-04, -1.7345458e-03,
       -1.4890955e-03,  4.8574473e-04,  7.8776310e-04, -2.8160887e-03,
        1.3495638e-03,  2.3896273e-03,  4.0822877e-03, -1.2477842e-03,
       -4.3352186e-03, -8.5723551e-04,  4.1467487e-03,  3.2000591e-03,
        3.6623909e-03,  4.1553001e-03,  1.4210083e-03,  2.5684754e-03,
        4.5450106e-03,  1.3746636e-03, -2.5926362e-04,  3.5603896e-03,
       -8.4793166e-04,  4.6397243e-03,  8.2968135e-04,  3.0183536e-03,
        4.8604459e-03, -2.0267412e-03, -2.7917866e-03,  1.3815126e-03,
       -4.1284803e-03, -2.0573139e-03, -1.4520315e-03,  4.6578594e-03,
        8.3328475e-04,  1.2808080e-03, -2.2522880e-04, -4.0610950e-03,
       -4.2614709e-03,  4.6424870e-03, -3.0882224e-03, -1.0466165e-03,
      

In [36]:
# just similar words
similar = model.wv.most_similar('controls')

In [37]:
similar

[('process', 0.3043335974216461),
 ('could', 0.3040611743927002),
 ('lunsford', 0.2996363639831543),
 ('styles', 0.28013649582862854),
 ('explains', 0.2683500647544861),
 ('need', 0.23094502091407776),
 ('forms', 0.21792668104171753),
 ('though', 0.21217919886112213),
 ('explanations', 0.20704711973667145),
 ('start', 0.20313595235347748)]