In [13]:
import nltk

In [25]:
information = """
Abraham Lincoln was an American statesman and lawyer who served as the 16th president of the United States from 1861 until his assassination in 1865. Lincoln led the nation through the American Civil War, the country's greatest moral, cultural, constitutional, and political crisis. He succeeded in preserving the Union, abolishing slavery, bolstering the federal government, and modernizing the U.S. economy.

Lincoln was born into poverty in a log cabin and was raised on the frontier primarily in Indiana. He was self-educated and became a lawyer, Whig Party leader, Illinois state legislator, and U.S. Congressman from Illinois. In 1849, he returned to his law practice but became vexed by the opening of additional lands to slavery as a result of the Kansas–Nebraska Act. He reentered politics in 1854, becoming a leader in the new Republican Party, and he reached a national audience in the 1858 debates against Stephen Douglas. Lincoln ran for President in 1860, sweeping the North in victory. Pro-slavery elements in the South equated his success with the North's rejection of their right to practice slavery, and southern states began seceding from the union. To secure its independence, the new Confederate States fired on Fort Sumter, a U.S. fort in the South, and Lincoln called up forces to suppress the rebellion and restore the Union.

As the leader of moderate Republicans, Lincoln had to navigate a contentious array of factions with friends and opponents on both sides. War Democrats rallied a large faction of former opponents into his moderate camp, but they were countered by Radical Republicans, who demanded harsh treatment of the Southern Confederates. Anti-war Democrats (called "Copperheads") despised him, and irreconcilable pro-Confederate elements plotted his assassination. Lincoln managed the factions by exploiting their mutual enmity, by carefully distributing political patronage, and by appealing to the U.S. people. His Gettysburg Address became a historic clarion call for nationalism, republicanism, equal rights, liberty, and democracy. Lincoln scrutinized the strategy and tactics in the war effort, including the selection of generals and the naval blockade of the South's trade. He suspended habeas corpus, and he averted British intervention by defusing the Trent Affair. He engineered the end to slavery with his Emancipation Proclamation and his order that the Army protect and recruit former slaves. He also encouraged border states to outlaw slavery, and promoted the Thirteenth Amendment to the United States Constitution, which outlawed slavery across the country.

Lincoln managed his own successful re-election campaign. He sought to heal the war-torn nation through reconciliation. On April 14, 1865, just days after the war's end at Appomattox, Lincoln was attending a play at Ford's Theatre with his wife Mary when he was assassinated by Confederate sympathizer John Wilkes Booth. Lincoln is remembered as the martyr hero of the United States and he is consistently ranked as one of the greatest presidents in American history.
"""

In [26]:
sentences = nltk.sent_tokenize(information)

In [27]:
words = nltk.word_tokenize(information)

In [29]:
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords

In [30]:
stemmer = PorterStemmer()

In [20]:
for i in range(0,len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [31]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [32]:
for i in range(0,len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i] = ' '.join(words)

In [34]:
import re

In [37]:
corpus = []
for i in range(len(sentences)):
    info = re.sub('[^a-zA-Z]',' ',sentences[i])
    info = info.lower()
    info = info.split()
    info = [lemmatizer.lemmatize(word) for word in info if not word in set(stopwords.words('english'))]
    info = ' '.join(info)
    corpus.append(info)

In [38]:
corpus

['abraham lincoln american statesman lawyer served th president united state assassination',
 'lincoln led nation american civil war country greatest moral cultural constitutional political crisis',
 'succeeded preserving union abolishing slavery bolstering federal government modernizing u economy',
 'lincoln born poverty log cabin raised frontier primarily indiana',
 'self educated became lawyer whig party leader illinois state legislator u',
 'congressman illinois',
 'returned law practice became vexed opening additional land slavery result kansa nebraska act',
 'reentered politics becoming leader new republican party reached national audience debate stephen douglas',
 'lincoln ran president sweeping north victory',
 'pro slavery element south equated success north rejection right practice slavery southern state began seceding union',
 'secure independence new confederate state fired fort sumter u fort south lincoln called force suppress rebellion restore union',
 'leader moderate re

In [40]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=500)

In [41]:
X = cv.fit_transform(corpus).toarray()

In [42]:
X

array([[0, 1, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 1, 1],
       [0, 0, 0, ..., 0, 0, 0]])

In [43]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfv = TfidfVectorizer()

In [44]:
X_tfv = tfv.fit_transform(corpus).toarray()

In [45]:
X_tfv

array([[0.        , 0.35006365, 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.33114484, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.25009011,
        0.25009011],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])