<a href="https://colab.research.google.com/github/alijablack/data-science/blob/main/Wikipedia_NLP_Sentiment_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Natural Language Processing

## Problem Statement

Use natural language processing on Wikipedia articles to identify the overall sentiment analysis for a page and number of authors.

## Data Collection

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


In [None]:
from textblob import TextBlob
import numpy as np
import pandas as pd
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer

In [None]:
people_path = '/content/drive/My Drive/Copy of people_db.csv'
people_df = pd.read_csv(people_path)

## Exploratory Data Analysis

## Part 1 of Project



This dataset from dbpedia includes over 42,000 entries.

In [None]:
people_df.info

<bound method DataFrame.info of                                                      URI  ...                                               text
0            <http://dbpedia.org/resource/Digby_Morrell>  ...  digby morrell born 10 october 1979 is a former...
1           <http://dbpedia.org/resource/Alfred_J._Lewy>  ...  alfred j lewy aka sandy lewy graduated from un...
2            <http://dbpedia.org/resource/Harpdog_Brown>  ...  harpdog brown is a singer and harmonica player...
3      <http://dbpedia.org/resource/Franz_Rottensteiner>  ...  franz rottensteiner born in waidmannsfeld lowe...
4                   <http://dbpedia.org/resource/G-Enka>  ...  henry krvits born 30 december 1974 in tallinn ...
...                                                  ...  ...                                                ...
42781   <http://dbpedia.org/resource/Motoaki_Takenouchi>  ...  motoaki takenouchi born july 8 1967 saitama pr...
42782  <http://dbpedia.org/resource/Alan_Judge_(footb...  ...  a

Explore the first 100 to decide who to choose.

In [None]:
people_df.head(100).T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,...,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99
URI,<http://dbpedia.org/resource/Digby_Morrell>,<http://dbpedia.org/resource/Alfred_J._Lewy>,<http://dbpedia.org/resource/Harpdog_Brown>,<http://dbpedia.org/resource/Franz_Rottensteiner>,<http://dbpedia.org/resource/G-Enka>,<http://dbpedia.org/resource/Sam_Henderson>,<http://dbpedia.org/resource/Aaron_LaCrate>,<http://dbpedia.org/resource/Trevor_Ferguson>,<http://dbpedia.org/resource/Grant_Nelson>,<http://dbpedia.org/resource/Cathy_Caruth>,<http://dbpedia.org/resource/Sophie_Crumb>,<http://dbpedia.org/resource/Jenn_Ashworth>,<http://dbpedia.org/resource/Jonathan_Hoefler>,<http://dbpedia.org/resource/Anthony_Gueterboc...,<http://dbpedia.org/resource/David_Chernushenko>,<http://dbpedia.org/resource/Joerg_Steineck>,<http://dbpedia.org/resource/Andrew_Pinsent>,<http://dbpedia.org/resource/Paddy_Dunne_(Gael...,<http://dbpedia.org/resource/Alexandros_Mouzas>,<http://dbpedia.org/resource/John_Angus_Campbell>,<http://dbpedia.org/resource/Chris_Batstone>,<http://dbpedia.org/resource/Ceiron_Thomas>,<http://dbpedia.org/resource/Adel_Sellimi>,<http://dbpedia.org/resource/Faith_Soloway>,<http://dbpedia.org/resource/Tom_Jennings>,<http://dbpedia.org/resource/Vic_Stasiuk>,<http://dbpedia.org/resource/Anthony_Caruana>,<http://dbpedia.org/resource/Ian_Mitchell_(aut...,<http://dbpedia.org/resource/Leon_Hapgood>,<http://dbpedia.org/resource/Emily_Osment>,<http://dbpedia.org/resource/Dom_Flora>,<http://dbpedia.org/resource/Ted_Hill_(mathema...,<http://dbpedia.org/resource/Mindaugas_Murza>,<http://dbpedia.org/resource/Bob_Reece>,<http://dbpedia.org/resource/Douglas_Davies>,<http://dbpedia.org/resource/Freimut_B%C3%B6rn...,<http://dbpedia.org/resource/Th%C3%BCring_Br%C...,<http://dbpedia.org/resource/Aharon_Solomons>,<http://dbpedia.org/resource/Steven_Weil>,<http://dbpedia.org/resource/Gary_Emineth>,...,<http://dbpedia.org/resource/David_Vernon_Will...,<http://dbpedia.org/resource/Guillermo_Roux>,<http://dbpedia.org/resource/William_J._Ely>,<http://dbpedia.org/resource/Alan_Roper>,<http://dbpedia.org/resource/Bob_Havens>,<http://dbpedia.org/resource/Susan_Christie>,<http://dbpedia.org/resource/Elizabeth_Gunn_(a...,<http://dbpedia.org/resource/Bo_Hampton>,<http://dbpedia.org/resource/Milva>,<http://dbpedia.org/resource/Will_Tiao>,<http://dbpedia.org/resource/Quintin_E._Primo_...,<http://dbpedia.org/resource/Geoffrey_Bayldon>,<http://dbpedia.org/resource/Ulf-Dietrich_Reips>,<http://dbpedia.org/resource/Marcel_J._Melan%C...,<http://dbpedia.org/resource/Mike_Powell_(jour...,<http://dbpedia.org/resource/Vladimir_Yurchenko>,<http://dbpedia.org/resource/Nobuo_Uematsu>,<http://dbpedia.org/resource/John_Warner_(writ...,<http://dbpedia.org/resource/Doug_McIntosh>,<http://dbpedia.org/resource/Avi_Muchnick>,<http://dbpedia.org/resource/Robin_MacPherson>,<http://dbpedia.org/resource/Judi_Silvano>,<http://dbpedia.org/resource/Evelin_Lindner>,<http://dbpedia.org/resource/Robert_S._Gold>,<http://dbpedia.org/resource/Agnes_Baltsa>,<http://dbpedia.org/resource/Zvonimir_Juri%C4%87>,<http://dbpedia.org/resource/John-Paul_Himka>,<http://dbpedia.org/resource/Elisha_Qimron>,<http://dbpedia.org/resource/Robert_Grant_Irving>,<http://dbpedia.org/resource/Sam_Leach_(artist)>,<http://dbpedia.org/resource/Patricia_Crowther...,<http://dbpedia.org/resource/Katja_Herbers>,<http://dbpedia.org/resource/Shaka_Hislop>,<http://dbpedia.org/resource/Martin_Iveson>,<http://dbpedia.org/resource/Eva_Habermann>,<http://dbpedia.org/resource/Steve_Castle>,<http://dbpedia.org/resource/Armen_Ra>,<http://dbpedia.org/resource/David_Shaughnessy>,<http://dbpedia.org/resource/John_Reynolds_(Ca...,<http://dbpedia.org/resource/Dean_Greig>
name,Digby Morrell,Alfred J. Lewy,Harpdog Brown,Franz Rottensteiner,G-Enka,Sam Henderson,Aaron LaCrate,Trevor Ferguson,Grant Nelson,Cathy Caruth,Sophie Crumb,Jenn Ashworth,Jonathan Hoefler,"Anthony Gueterbock, 18th Baron Berkeley",David Chernushenko,Joerg Steineck,Andrew Pinsent,Paddy Dunne (Gaelic footballer),Alexandros Mouzas,John Angus Campbell,Chris Batstone,Ceiron Thomas,Adel Sellimi,Faith Soloway,Tom Jennings,Vic Stasiuk,Anthony Caruana,Ian Mitchell (author),Leon Hapgood,Emily Osment,Dom Flora,Ted Hill (mathematician),Mindaugas Murza,Bob Reece,Douglas Davies,Freimut B%C3%B6rngen,Th%C3%BCring Br%C3%A4m,Aharon Solomons,Steven Weil,Gary Emineth,...,David Vernon Williams,Guillermo Roux,William J. Ely,Alan Roper,Bob Havens,Susan Christie,Elizabeth Gunn (author),Bo Hampton,Milva,Will Tiao,Quintin E. Primo III,Geoffrey Bayldon,Ulf-Dietrich Reips,Marcel J. Melan%C3%A7on,Mike Powell (journalist),Vladimir Yurchenko,Nobuo Uematsu,John Warner (writer),Doug McIntosh,Avi Muchnick,Robin MacPherson,Judi Silvano,Evelin Lindner,Robert S. Gold,Agnes Baltsa,Zvonimir Juri%C4%87,John-Paul Himka,Elisha Qimron,Robert Grant Irving,Sam Leach (artist),Patricia Crowther (caver),Katja Herbers,Shaka Hislop,Martin Iveson,Eva Habermann,Steve Castle,Armen Ra,David Shaughnessy,John Reynolds (Canadian politician),Dean Greig
text,digby morrell born 10 october 1979 is a former...,alfred j lewy aka sandy lewy graduated from un...,harpdog brown is a singer and harmonica player...,franz rottensteiner born in waidmannsfeld lowe...,henry krvits born 30 december 1974 in tallinn ...,sam henderson born october 18 1969 is an ameri...,aaron lacrate is an american music producer re...,trevor ferguson aka john farrow born 11 novemb...,grant nelson born 27 april 1971 in london also...,cathy caruth born 1955 is frank h t rhodes pro...,sophia violet sophie crumb born september 27 1...,jenn ashworth is an english writer she was bor...,jonathan hoefler born august 22 1970 is an ame...,anthony fitzhardinge gueterbock 18th baron ber...,david chernushenko born june 1963 in calgary a...,joerg steineck is a german filmmaker editor an...,fr andrew pinsent born 19 august 1966 is resea...,paddy dunne was a gaelic football player from ...,alexandros mouzas born 1962 is a greek compose...,john angus campbell born march 10 1942 in port...,chris batstone was the 20002002 lead singer of...,ceiron thomas born 23 october 1983 is a welsh ...,adel sellimi arabic was born on 16 november 19...,faith soloway born march 28 1964 is an america...,tom jennings born 1955 as thomas daniel jennin...,victor john stasiuk born may 23 1929 is a reti...,anthony caruana born 2 january 1968 is a melbo...,ian mitchell is a scottish author who grew up ...,leon duane hapgood born 7 august 1979is an eng...,emily jordan osment born march 10 1992 is an a...,dominick a dom flora born june 12 1935 is an a...,theodore preston hill born december 28 1943 is...,mindaugas gervaldas until 2012 year mindaugas ...,robert scott reece born january 5 1951 is an a...,douglas james davies born 1947 is professor in...,freimut brngen born october 17 1930 is a germa...,thring brm born 10 april 1944 is a swiss compo...,aharon solomons b september 27 1939 is a freed...,steven weil is an american rabbi who grew up o...,gary emineth born october 24 1958 is a politic...,...,professor david vernon williams is a professor...,guillermo roux born in 1929 is an argentine pa...,william jonas ely jr born december 29 1911 is ...,alan roper born may 1939 in tipton staffordshi...,bob havens born may 3 1930 is an american big ...,susan christie is an american singersongwriter...,elizabeth gunn is an american author of myster...,bo hampton born 1954 in united states is a not...,maria ilva biolcati commander omri italian pro...,will tiao is a taiwanese american actor and pr...,quintin e primo iii born march 14 1955 is the ...,geoffrey bayldon born 7 january 1924 in leeds ...,prof dr ulfdietrich reips is a full professor ...,marcel j mlanon is a philosopher and scientist...,mike powell is a british former newspaper and ...,uladzimir vasilyevich yurchanka belarusian rus...,nobuo uematsu uematsu nobuo born march 21 1959...,john warner born 1970 is an american writer an...,doug mcinstosh is a retired american basketbal...,avi muchnick born 1979 is an artist author pro...,robin macpherson born 1959 glasgow scotland is...,judi silvano born 1951 is a jazz singer and co...,evelin gerda lindner born may 13 1954 in hamel...,robert s gold is a researcher in the applicati...,agnes baltsa greek a m born 19 november 1944 i...,zvonimir juri born 4 june 1971 is a croatian f...,johnpaul himka born may 18 1949 in detroit mic...,elisha qimron is an academic in the study of a...,robert grant irving phd is an author and lectu...,sam leach born 1973 is an emerging australian ...,this article is about the cave explorer for th...,katja mira herbers dutch pronunciation ktja mi...,neil shaka hislop cm born 22 february 1969 is ...,martin iveson is a british composer known for ...,eva felicitas habermann born january 16 1976 i...,steve castle born 17 may 1966 in barking is a ...,armen ra is an american artist and performer o...,david james shaughnessy also spelled shaughnes...,john douglas reynolds pc born january 19 1942 ...,dean greig born 31 october 1968 is a former au...


Select a person, Armen Ra, from the list to use as the input for sentiment analysis. Output Armen Ra's overview from the database.

In [None]:
my_person = [people_df.iloc[96]['text']]
my_person

['armen ra is an american artist and performer of iranianarmenian descent born in tehran iran he was raised by his mother a concert pianist and his aunt an opera singer and ikebana master he taught himself to play theremin his music fuses armenian folk music with modern instrumentation along with melodic lounge standards and classical arias his concerts are known for their combination of both visual arts and his musicarmen ra has played at the united nations wiener konzerthaus mozartsaal vienna cbgbs knitting factory la mama etc joes pub boulder museum of modern art lincoln center the gershwin hotel bb king museum and dietch projects he has been featured on and appeared in cnn hbo mtv vh1 vogue the new york times the new york post the village voice rolling stone and glamourhe has performed and recorded with various bands and on many projects including a collaboration with british recording artist marc almond on the song my madness i from his 2010 release variet his debut solo cd plays 

### Data Processing

#### Vector Analysis

In [None]:
vect_people = CountVectorizer(stop_words='english')
word_weight = vect_people.fit_transform(people_df['text'])

In [None]:
word_weight

<42786x437190 sparse matrix of type '<class 'numpy.int64'>'
	with 5847547 stored elements in Compressed Sparse Row format>

#### Nearest Neighbors

Fit the nearest neighbors model with content from people dataframe.

In [None]:
nn = NearestNeighbors(metric='euclidean')
nn.fit(word_weight)

NearestNeighbors(algorithm='auto', leaf_size=30, metric='euclidean',
                 metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                 radius=1.0)

In [None]:
ra_index = people_df[people_df['name'] == 'Armen Ra'].index[0]
ra_index

96

Use the nearest neighbor model to output people with overviews similar to Armen Ra's page.

In [None]:
distances, indices = nn.kneighbors(word_weight[ra_index], n_neighbors=11)

In [None]:
distances

array([[ 0.        , 17.23368794, 17.23368794, 17.29161647, 17.29161647,
        17.3781472 , 17.3781472 , 17.40689519, 17.40689519, 17.4642492 ,
        17.4642492 ]])

Show the index of 10 similar overviews.

In [None]:
indices

array([[   96, 15962, 32628, 16715, 33930, 12629, 20032,  4156, 25187,
        33651, 35976]])

Output the 10 people with overviews closest to Armen Ra.

In [None]:
people_df.iloc[indices[0],:]

Unnamed: 0,URI,name,text
96,<http://dbpedia.org/resource/Armen_Ra>,Armen Ra,armen ra is an american artist and performer o...
15962,<http://dbpedia.org/resource/Jagori_Tanna>,Jagori Tanna,jagori tanna born andrew koshowski in hamilton...
32628,<http://dbpedia.org/resource/Vilayna_LaSalle>,Vilayna LaSalle,vilayna lasalle is an american model of an afr...
16715,<http://dbpedia.org/resource/Robin_Holcomb>,Robin Holcomb,robin lynn holcomb is an american singer songw...
33930,<http://dbpedia.org/resource/Karen_Maruyama>,Karen Maruyama,karen maruyama born may 29 1958 is an american...
12629,<http://dbpedia.org/resource/Rachel_Aggs>,Rachel Aggs,rachel aggs is a musician based in london prim...
20032,<http://dbpedia.org/resource/Roger_Kamien>,Roger Kamien,roger kamien born in 1934 is a retired profess...
4156,<http://dbpedia.org/resource/Patrick_Clifford>,Patrick Clifford,patrick clifford born in new york city 1966 is...
25187,<http://dbpedia.org/resource/Peter_Frankl>,Peter Frankl,this article is about the pianist for the math...
33651,<http://dbpedia.org/resource/Jonathan_M._Woodw...,Jonathan M. Woodward,jonathan mark woodward is an american actor he...


In [None]:
top_ten = people_df.iloc[indices[0],1:11]

In [None]:
top_ten.head(11)

Unnamed: 0,name,text
96,Armen Ra,armen ra is an american artist and performer o...
15962,Jagori Tanna,jagori tanna born andrew koshowski in hamilton...
32628,Vilayna LaSalle,vilayna lasalle is an american model of an afr...
16715,Robin Holcomb,robin lynn holcomb is an american singer songw...
33930,Karen Maruyama,karen maruyama born may 29 1958 is an american...
12629,Rachel Aggs,rachel aggs is a musician based in london prim...
20032,Roger Kamien,roger kamien born in 1934 is a retired profess...
4156,Patrick Clifford,patrick clifford born in new york city 1966 is...
25187,Peter Frankl,this article is about the pianist for the math...
33651,Jonathan M. Woodward,jonathan mark woodward is an american actor he...


In [None]:
df2 = people_df[['text','name']]
# For each row, combine all the columns into one column
df3 = df2.apply(lambda x: ','.join(x.astype(str)), axis=1)
# Store them in a pandas dataframe
df_clean = pd.DataFrame({'clean': df3})
# Create the list of list format of the custom corpus for gensim modeling 
sent = [row.split(',') for row in df_clean['clean']]
# show the example of list of list format of the custom corpus for gensim modeling 
sent[:2]

[['digby morrell born 10 october 1979 is a former australian rules footballer who played with the kangaroos and carlton in the australian football league aflfrom western australia morrell played his early senior football for west perth his 44game senior career for the falcons spanned 19982000 and he was the clubs leading goalkicker in 2000 at the age of 21 morrell was recruited to the australian football league by the kangaroos football club with its third round selection in the 2001 afl rookie draft as a forward he twice kicked five goals during his time with the kangaroos the first was in a losing cause against sydney in 2002 and the other the following season in a drawn game against brisbaneafter the 2003 season morrell was traded along with david teague to the carlton football club in exchange for corey mckernan he played 32 games for the blues before being delisted at the end of 2005 he continued to play victorian football league vfl football with the northern bullants carltons vf

Another way to output the 10 people with overviews closest to Armen Ra's page.

In [None]:
import gensim 
from gensim.models import Word2Vec

model = Word2Vec(sent, min_count=1,size= 50,workers=3, window =3, sg = 1)

In [None]:
model['Armen Ra']

  """Entry point for launching an IPython kernel.


array([ 0.00818728,  0.00035559, -0.00903071, -0.00838886,  0.0078609 ,
       -0.00474584, -0.002292  , -0.00637336,  0.00325204,  0.00941323,
       -0.00696781, -0.00610431,  0.00868803,  0.00275211, -0.00117894,
       -0.00155437,  0.00522238,  0.00125257,  0.00744486, -0.00989244,
       -0.0036832 ,  0.0002664 ,  0.0041829 , -0.00449497,  0.00126217,
        0.00454358,  0.0004319 , -0.00297771,  0.00377538, -0.00314259,
       -0.00875818,  0.00515982, -0.00477383,  0.00153132, -0.00621622,
       -0.00327826,  0.00778831, -0.00956785, -0.00707163, -0.00747374,
       -0.00903284, -0.00830177,  0.0034522 , -0.0070265 ,  0.0041228 ,
        0.00836104, -0.00890977,  0.00141965,  0.0008621 ,  0.00135281],
      dtype=float32)

In [None]:
model.most_similar('Armen Ra'[:10])

  """Entry point for launching an IPython kernel.
  if np.issubdtype(vec.dtype, np.int):


[('tamas wells tems tayms is an australian singersongwriter based in rangoon burma wells first came to attention in his home country in 2002 with airplay of a threetrack demo cigarettes a tie and a free magazine recorded with three friends they followed this up with an ep stitch in time the same year the band took off in 2004 when they were spotted by record producer tim whitten and invited to record their debut album a mark on the pane with popboomerang records beginning that year they performed five national toursin early 2006 wells relocated to rangoon burma to participate in a community health hivaids education project the bands second album a plea en vendredi appeared later that year in addition to the australian release by popboomerang agreements were entered into with inpartmaint and pocket records for the album to be released in japan and china respectively in august 2007 the band performed a sellout tour of four japanese cities wells third album two years in april was released

This method outputs a different set of people than the nearest neighbors method. The nearest neighbors method output appears more closely aligned with the substance of Armen Ra's overview by similarly outputting people in creative industries. Whereas the similarity method outputs people with overviews that share a similar tone and format as Armen Ra's overview that is brief, informational, neutral.

#### Sentiment Analysis

Make Armen Ra's overview a string.

In [None]:
df2 = pd.DataFrame(my_person)
# For each row, combine all the columns into one column
df3 = df2.apply(lambda x: ','.join(x.astype(str)), axis=1)
# Store them in a pandas dataframe
df_clean = pd.DataFrame({'clean': df3})
# Create the list of list format of the custom corpus for gensim modeling 
sent1 = [row.split(',') for row in df_clean['clean']]
# show the example of list of list format of the custom corpus for gensim modeling 
sent1[:2]

[['armen ra is an american artist and performer of iranianarmenian descent born in tehran iran he was raised by his mother a concert pianist and his aunt an opera singer and ikebana master he taught himself to play theremin his music fuses armenian folk music with modern instrumentation along with melodic lounge standards and classical arias his concerts are known for their combination of both visual arts and his musicarmen ra has played at the united nations wiener konzerthaus mozartsaal vienna cbgbs knitting factory la mama etc joes pub boulder museum of modern art lincoln center the gershwin hotel bb king museum and dietch projects he has been featured on and appeared in cnn hbo mtv vh1 vogue the new york times the new york post the village voice rolling stone and glamourhe has performed and recorded with various bands and on many projects including a collaboration with british recording artist marc almond on the song my madness i from his 2010 release variet his debut solo cd plays

Assign tags to each word in the overview.

In [None]:
!python -m textblob.download_corpora
from textblob import TextBlob
wiki = TextBlob(str(sent1))
wiki.tags

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Package conll2000 is already up-to-date!
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!
Finished.


[('[', 'JJ'),
 ('[', 'NNP'),
 ("'armen", 'POS'),
 ('ra', 'NN'),
 ('is', 'VBZ'),
 ('an', 'DT'),
 ('american', 'JJ'),
 ('artist', 'NN'),
 ('and', 'CC'),
 ('performer', 'NN'),
 ('of', 'IN'),
 ('iranianarmenian', 'JJ'),
 ('descent', 'NN'),
 ('born', 'VBN'),
 ('in', 'IN'),
 ('tehran', 'NN'),
 ('iran', 'NN'),
 ('he', 'PRP'),
 ('was', 'VBD'),
 ('raised', 'VBN'),
 ('by', 'IN'),
 ('his', 'PRP$'),
 ('mother', 'NN'),
 ('a', 'DT'),
 ('concert', 'NN'),
 ('pianist', 'NN'),
 ('and', 'CC'),
 ('his', 'PRP$'),
 ('aunt', 'NN'),
 ('an', 'DT'),
 ('opera', 'NN'),
 ('singer', 'NN'),
 ('and', 'CC'),
 ('ikebana', 'JJ'),
 ('master', 'NN'),
 ('he', 'PRP'),
 ('taught', 'VBD'),
 ('himself', 'PRP'),
 ('to', 'TO'),
 ('play', 'VB'),
 ('theremin', 'VB'),
 ('his', 'PRP$'),
 ('music', 'NN'),
 ('fuses', 'VBZ'),
 ('armenian', 'JJ'),
 ('folk', 'NN'),
 ('music', 'NN'),
 ('with', 'IN'),
 ('modern', 'JJ'),
 ('instrumentation', 'NN'),
 ('along', 'IN'),
 ('with', 'IN'),
 ('melodic', 'JJ'),
 ('lounge', 'NN'),
 ('standards', 'NNS

Identify the nouns in the overview.

In [None]:
wiki.noun_phrases

WordList(["[ [ 'armen ra", 'american artist', 'iranianarmenian descent', 'tehran iran', 'concert pianist', 'opera singer', 'ikebana master', 'music fuses armenian folk music', 'modern instrumentation', 'melodic lounge standards', 'classical arias', 'visual arts', 'musicarmen ra', 'nations wiener konzerthaus mozartsaal vienna cbgbs', 'mama etc joes pub boulder museum', 'modern art lincoln center', 'gershwin hotel bb king museum', 'dietch projects', 'cnn hbo mtv vh1 vogue', 'new york times', 'new york post', 'village voice', 'various bands', 'artist marc almond', 'madness i', 'release variet', 'debut solo cd plays', 'bowl fork records', 'classical armenian laments', 'folk songs', 'armens heritage', 'musical influence', 'cameo appearance', 'film party monsterhe', 'hollywood californiain october', 'guest judge', 'logo network show', 'sharon needles album pg13', 'voltaires album', '] ]'])

In [None]:
zen = TextBlob(str(sent1))

Identify the words in the overview.

In [None]:
zen.words

WordList(["'armen", 'ra', 'is', 'an', 'american', 'artist', 'and', 'performer', 'of', 'iranianarmenian', 'descent', 'born', 'in', 'tehran', 'iran', 'he', 'was', 'raised', 'by', 'his', 'mother', 'a', 'concert', 'pianist', 'and', 'his', 'aunt', 'an', 'opera', 'singer', 'and', 'ikebana', 'master', 'he', 'taught', 'himself', 'to', 'play', 'theremin', 'his', 'music', 'fuses', 'armenian', 'folk', 'music', 'with', 'modern', 'instrumentation', 'along', 'with', 'melodic', 'lounge', 'standards', 'and', 'classical', 'arias', 'his', 'concerts', 'are', 'known', 'for', 'their', 'combination', 'of', 'both', 'visual', 'arts', 'and', 'his', 'musicarmen', 'ra', 'has', 'played', 'at', 'the', 'united', 'nations', 'wiener', 'konzerthaus', 'mozartsaal', 'vienna', 'cbgbs', 'knitting', 'factory', 'la', 'mama', 'etc', 'joes', 'pub', 'boulder', 'museum', 'of', 'modern', 'art', 'lincoln', 'center', 'the', 'gershwin', 'hotel', 'bb', 'king', 'museum', 'and', 'dietch', 'projects', 'he', 'has', 'been', 'featured', '

Identify the sentences in the overview.

In [None]:
zen.sentences

[Sentence("[['armen ra is an american artist and performer of iranianarmenian descent born in tehran iran he was raised by his mother a concert pianist and his aunt an opera singer and ikebana master he taught himself to play theremin his music fuses armenian folk music with modern instrumentation along with melodic lounge standards and classical arias his concerts are known for their combination of both visual arts and his musicarmen ra has played at the united nations wiener konzerthaus mozartsaal vienna cbgbs knitting factory la mama etc joes pub boulder museum of modern art lincoln center the gershwin hotel bb king museum and dietch projects he has been featured on and appeared in cnn hbo mtv vh1 vogue the new york times the new york post the village voice rolling stone and glamourhe has performed and recorded with various bands and on many projects including a collaboration with british recording artist marc almond on the song my madness i from his 2010 release variet his debut so

In [None]:
sentence = TextBlob(str(sent1))

In [None]:
sentence.words

WordList(["'armen", 'ra', 'is', 'an', 'american', 'artist', 'and', 'performer', 'of', 'iranianarmenian', 'descent', 'born', 'in', 'tehran', 'iran', 'he', 'was', 'raised', 'by', 'his', 'mother', 'a', 'concert', 'pianist', 'and', 'his', 'aunt', 'an', 'opera', 'singer', 'and', 'ikebana', 'master', 'he', 'taught', 'himself', 'to', 'play', 'theremin', 'his', 'music', 'fuses', 'armenian', 'folk', 'music', 'with', 'modern', 'instrumentation', 'along', 'with', 'melodic', 'lounge', 'standards', 'and', 'classical', 'arias', 'his', 'concerts', 'are', 'known', 'for', 'their', 'combination', 'of', 'both', 'visual', 'arts', 'and', 'his', 'musicarmen', 'ra', 'has', 'played', 'at', 'the', 'united', 'nations', 'wiener', 'konzerthaus', 'mozartsaal', 'vienna', 'cbgbs', 'knitting', 'factory', 'la', 'mama', 'etc', 'joes', 'pub', 'boulder', 'museum', 'of', 'modern', 'art', 'lincoln', 'center', 'the', 'gershwin', 'hotel', 'bb', 'king', 'museum', 'and', 'dietch', 'projects', 'he', 'has', 'been', 'featured', '

In [None]:
sentence.words[-1].pluralize()

'batss'

In [None]:
sentence.words[-1].singularize()

'bat'

In [None]:
b = TextBlob(str(sentence))
print(b.correct())

[['armed ra is an american artist and performer of iranianarmenian descent born in than ran he was raised by his mother a concert pianist and his aunt an opera singer and ikebana master he taught himself to play therein his music fuses armenian folk music with modern instrumentation along with melody lounge standards and classical areas his concerts are known for their combination of both visual arts and his musicarmen ra has played at the united nations winner konzerthaus mozartsaal vienna clubs knitting factory la mamma etc does pub bolder museum of modern art lincoln center the gershwin hotel by king museum and ditch projects he has been features on and appeared in can ho mt the vogue the new york times the new york post the village voice rolling stone and glamour has performed and recorded with various bands and on many projects including a collaboration with british recording artist mary almond on the song my madness i from his 2010 release variety his debut solo d plays the there

Output the sentiment for Armen Ra's overview.

In [None]:
for sentence in zen.sentences:
  print(sentence.sentiment[0])

0.09986631016042781


## Part 2 of Project

### Data Collection

Install Wikipedia API. Wikipedia will be the main datasource for this step to access the full content of Armen Ra's page.

In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-cp36-none-any.whl size=11686 sha256=6d5f7fc6b4f0f0e4c31c37d8d160860598453ba077438b8e07d1e5f5e039efe8
  Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [None]:
import wikipedia

### Data Processing

Produce the entire page of Armen Ra

In [None]:
#search wikipedia for Armen Ra
print(wikipedia.search('Armen Ra'))

['Armen Ra', 'Ra (disambiguation)', 'Armen (name)', 'List of American musicians of Armenian descent', 'Party Monster (film)', 'Tammie Brown', 'Miss Fame', 'Blue Madonna', 'Armen Hambardzumyan', '2003 in film']


In [None]:
#output the summary for Armen Ra
print(wikipedia.summary("Armen Ra"))


Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.


In [None]:
#output the page for Armen Ra
print(wikipedia.page("Armen Ra"))

<WikipediaPage 'Armen Ra'>


In [None]:
#output the page content for Armen Ra
print(wikipedia.page('Armen Ra').content)

Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.


== Musical career ==


=== Career (2010-2013) ===
Ra began studying the theremin in 2001, debuting with the orchestral group Antony & the Johnsons in New York City.
Ra has played at the United Nations, Wiener Konzerthaus Mozartsaal Vienna, CBGBs, Knitting Factory, La MaMa E.T.C., Joe's Pub, Boulder Museum of Modern Art, Lincoln Center, The Gershwin Hotel, B.B. King Museum, and Dietch Projects.
He has performed and recorded with various bands and on many projects (including a collaboration with British recording artist Marc Almond 
(of Soft Cell), on the song "My Madness & I" from his 2010 release Varieté). His debut solo CD Plays the Theremin (released on Bowl & Fork Records in 2010) showcases many classical Armenian laments and folk songs. Ra performed on the Sharon Needles album PG-13 on band Ministry's cover track "Everyday Is Halloween".


=== Career (2014-present) ===
In

In [None]:
#output the url for Armen Ra's Wikipedia page
print(wikipedia.page('Armen Ra').url)

https://en.wikipedia.org/wiki/Armen_Ra


In [None]:
ra_df = pd.read_html('https://en.wikipedia.org/wiki/Armen_Ra')

In [None]:
type(ra_df)

list

In [None]:
page = wikipedia.page('Armen Ra')

In [None]:
page.summary

'Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.'

In [None]:
page.content

'Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.\n\n\n== Musical career ==\n\n\n=== Career (2010-2013) ===\nRa began studying the theremin in 2001, debuting with the orchestral group Antony & the Johnsons in New York City.\nRa has played at the United Nations, Wiener Konzerthaus Mozartsaal Vienna, CBGBs, Knitting Factory, La MaMa E.T.C., Joe\'s Pub, Boulder Museum of Modern Art, Lincoln Center, The Gershwin Hotel, B.B. King Museum, and Dietch Projects.\nHe has performed and recorded with various bands and on many projects (including a collaboration with British recording artist Marc Almond \n(of Soft Cell), on the song "My Madness & I" from his 2010 release Varieté). His debut solo CD Plays the Theremin (released on Bowl & Fork Records in 2010) showcases many classical Armenian laments and folk songs. Ra performed on the Sharon Needles album PG-13 on band Ministry\'s cover track "Everyday Is Halloween".\n\n\n=== Career (2014

In [None]:
type(page.content)

str

In [None]:
wiki1 = TextBlob(page.content)
wiki1.tags

[('Armen', 'NNP'),
 ('Ra', 'NNP'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('Persian-Armenian', 'JJ'),
 ('artist', 'NN'),
 ('self-taught', 'JJ'),
 ('thereminist', 'NN'),
 ('production', 'NN'),
 ('designer', 'NN'),
 ('director', 'NN'),
 ('and', 'CC'),
 ('performer', 'NN'),
 ('==', 'JJ'),
 ('Musical', 'NNP'),
 ('career', 'NN'),
 ('==', 'NNP'),
 ('===', 'NNP'),
 ('Career', 'NNP'),
 ('2010-2013', 'JJ'),
 ('===', 'FW'),
 ('Ra', 'NNP'),
 ('began', 'VBD'),
 ('studying', 'VBG'),
 ('the', 'DT'),
 ('theremin', 'NN'),
 ('in', 'IN'),
 ('2001', 'CD'),
 ('debuting', 'VBG'),
 ('with', 'IN'),
 ('the', 'DT'),
 ('orchestral', 'JJ'),
 ('group', 'NN'),
 ('Antony', 'NNP'),
 ('&', 'CC'),
 ('the', 'DT'),
 ('Johnsons', 'NNPS'),
 ('in', 'IN'),
 ('New', 'NNP'),
 ('York', 'NNP'),
 ('City', 'NNP'),
 ('Ra', 'NNP'),
 ('has', 'VBZ'),
 ('played', 'VBN'),
 ('at', 'IN'),
 ('the', 'DT'),
 ('United', 'NNP'),
 ('Nations', 'NNPS'),
 ('Wiener', 'NNP'),
 ('Konzerthaus', 'NNP'),
 ('Mozartsaal', 'NNP'),
 ('Vienna', 'NNP'),
 ('CBGBs', '

In [None]:
wiki1.noun_phrases

WordList(['armen ra', 'persian-armenian', 'self-taught thereminist', 'production designer', 'musical', 'career == ===', 'career', 'ra', 'orchestral group', 'antony', 'johnsons', 'york', 'ra', 'wiener konzerthaus mozartsaal vienna', 'cbgbs', 'knitting', 'mama e.t.c.', 'joe', 'pub', 'boulder', 'art', 'lincoln', 'gershwin', 'b.b', 'king museum', 'dietch projects', 'various bands', 'marc almond', 'soft cell', 'madness', 'varieté', 'debut solo cd', 'plays', 'theremin', 'fork records', 'classical armenian laments', 'folk songs', 'ra', 'sharon needles', 'pg-13', 'ministry', 'everyday', 'halloween', 'career', 'recent years', 'ra', 'armen', 'current', 'honeysuckle', 'multiple songs', 'armen', 'debut album', 'sharon needles', 'songs “', 'everyday', 'halloween', 'haunted', 'house ”', 'voltaire', "'s album", 'raised', 'bats', 'theremin classique', 'european arias', 'armen', 'dle yaman', 'michael schmidt', '’ s 3-d gown', 'selena gomez', '’ s', 'revival', 'track “', 'girls', 'gwen stefani', '’ s al

##### Sentiment Analysis

Produce the sentiment for Armen Ra's page.

In [None]:
testimonial = TextBlob(page.content)

In [None]:
testimonial.sentiment

Sentiment(polarity=0.006088664421997764, subjectivity=0.30434904601571267)

Sentiment analysis shows a primarily neutral and objective tone throughout the page. 

In [None]:
zen = TextBlob(page.content)

Process Armen Ra's page into words and sentences to determine how the sentiment changes throughout the page.

In [None]:
zen.words

WordList(['Armen', 'Ra', 'is', 'a', 'Persian-Armenian', 'artist', 'self-taught', 'thereminist', 'production', 'designer', 'director', 'and', 'performer', 'Musical', 'career', 'Career', '2010-2013', 'Ra', 'began', 'studying', 'the', 'theremin', 'in', '2001', 'debuting', 'with', 'the', 'orchestral', 'group', 'Antony', 'the', 'Johnsons', 'in', 'New', 'York', 'City', 'Ra', 'has', 'played', 'at', 'the', 'United', 'Nations', 'Wiener', 'Konzerthaus', 'Mozartsaal', 'Vienna', 'CBGBs', 'Knitting', 'Factory', 'La', 'MaMa', 'E.T.C', 'Joe', "'s", 'Pub', 'Boulder', 'Museum', 'of', 'Modern', 'Art', 'Lincoln', 'Center', 'The', 'Gershwin', 'Hotel', 'B.B', 'King', 'Museum', 'and', 'Dietch', 'Projects', 'He', 'has', 'performed', 'and', 'recorded', 'with', 'various', 'bands', 'and', 'on', 'many', 'projects', 'including', 'a', 'collaboration', 'with', 'British', 'recording', 'artist', 'Marc', 'Almond', 'of', 'Soft', 'Cell', 'on', 'the', 'song', 'My', 'Madness', 'I', 'from', 'his', '2010', 'release', 'Varie

In [None]:
zen.sentences

[Sentence("Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer."),
 Sentence("== Musical career ==
 
 
 === Career (2010-2013) ===
 Ra began studying the theremin in 2001, debuting with the orchestral group Antony & the Johnsons in New York City."),
 Sentence("Ra has played at the United Nations, Wiener Konzerthaus Mozartsaal Vienna, CBGBs, Knitting Factory, La MaMa E.T.C., Joe's Pub, Boulder Museum of Modern Art, Lincoln Center, The Gershwin Hotel, B.B."),
 Sentence("King Museum, and Dietch Projects."),
 Sentence("He has performed and recorded with various bands and on many projects (including a collaboration with British recording artist Marc Almond 
 (of Soft Cell), on the song "My Madness & I" from his 2010 release Varieté)."),
 Sentence("His debut solo CD Plays the Theremin (released on Bowl & Fork Records in 2010) showcases many classical Armenian laments and folk songs."),
 Sentence("Ra performed on the Sharon Needles albu

Determine any changes in sentiment throughout the page.

In [None]:
for sentence in zen.sentences:
  print(sentence.sentiment[0])

0.0
0.06818181818181818
0.05
0.0
0.15
0.25
-0.2
0.0
0.0
0.0
-0.2
0.0
0.0
0.0
-0.15000000000000002
0.08333333333333333
-0.125
0.0
0.0
-0.6999999999999998
0.0
0.0
0.0
0.0
0.012121212121212116


Estimate 6 or 7 authors contributed to the Wikipedia article based on changes in the sentiment analysis.

Output a summary of the Armen Ra page

In [None]:
page.summary

'Armen Ra is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.'

In [None]:
sentence = TextBlob(page.content)
sentence.words

WordList(['Armen', 'Ra', 'is', 'a', 'Persian-Armenian', 'artist', 'self-taught', 'thereminist', 'production', 'designer', 'director', 'and', 'performer', 'Musical', 'career', 'Career', '2010-2013', 'Ra', 'began', 'studying', 'the', 'theremin', 'in', '2001', 'debuting', 'with', 'the', 'orchestral', 'group', 'Antony', 'the', 'Johnsons', 'in', 'New', 'York', 'City', 'Ra', 'has', 'played', 'at', 'the', 'United', 'Nations', 'Wiener', 'Konzerthaus', 'Mozartsaal', 'Vienna', 'CBGBs', 'Knitting', 'Factory', 'La', 'MaMa', 'E.T.C', 'Joe', "'s", 'Pub', 'Boulder', 'Museum', 'of', 'Modern', 'Art', 'Lincoln', 'Center', 'The', 'Gershwin', 'Hotel', 'B.B', 'King', 'Museum', 'and', 'Dietch', 'Projects', 'He', 'has', 'performed', 'and', 'recorded', 'with', 'various', 'bands', 'and', 'on', 'many', 'projects', 'including', 'a', 'collaboration', 'with', 'British', 'recording', 'artist', 'Marc', 'Almond', 'of', 'Soft', 'Cell', 'on', 'the', 'song', 'My', 'Madness', 'I', 'from', 'his', '2010', 'release', 'Varie

In [None]:
sentence.words[2].singularize()

'is'

In [None]:
sentence.words[2].pluralize()

'iss'

In [None]:
b = TextBlob(page.content)
print(b.correct())

Men A is a Persian-Armenian artist, self-taught thereminist, production designer, director, and performer.


== Musical career ==


=== Career (2010-2013) ===
A began studying the therein in 2001, refuting with the orchestra group Anthony & the Johnson in New Work City.
A has played at the United Nations, Dinner Konzerthaus Mozartsaal Vienna, CBGBs, Knitting Factory, A papa E.T.C., Toe's Sub, Shoulder Museum of Modern Art, Lincoln Enter, The Gershwin Hotel, B.B. King Museum, and Fetch Projects.
He has performed and recorded with various bands and on many projects (including a collaboration with British recording artist Arc Almond 
(of Soft Well), on the song "By Sadness & I" from his 2010 release Variety). His debut solo of Plays the Therein (released on Bowl & Work Records in 2010) showcases many classical Armenian laments and folk songs. A performed on the Charon Needles album of-13 on band Ministry's cover track "Everyday Is Halloween".


=== Career (2014-present) ===
In recent year

Consider algorithmic bias and errors in the natural language processing tools as Armen Ra's name is being shortened to 'Men A' or 'A'.

In [None]:
blob = TextBlob(page.content)
blob.ngrams(n=3)

[WordList(['Armen', 'Ra', 'is']),
 WordList(['Ra', 'is', 'a']),
 WordList(['is', 'a', 'Persian-Armenian']),
 WordList(['a', 'Persian-Armenian', 'artist']),
 WordList(['Persian-Armenian', 'artist', 'self-taught']),
 WordList(['artist', 'self-taught', 'thereminist']),
 WordList(['self-taught', 'thereminist', 'production']),
 WordList(['thereminist', 'production', 'designer']),
 WordList(['production', 'designer', 'director']),
 WordList(['designer', 'director', 'and']),
 WordList(['director', 'and', 'performer']),
 WordList(['and', 'performer', 'Musical']),
 WordList(['performer', 'Musical', 'career']),
 WordList(['Musical', 'career', 'Career']),
 WordList(['career', 'Career', '2010-2013']),
 WordList(['Career', '2010-2013', 'Ra']),
 WordList(['2010-2013', 'Ra', 'began']),
 WordList(['Ra', 'began', 'studying']),
 WordList(['began', 'studying', 'the']),
 WordList(['studying', 'the', 'theremin']),
 WordList(['the', 'theremin', 'in']),
 WordList(['theremin', 'in', '2001']),
 WordList(['in',

In [None]:
#The sentiment of Armen Ra's page is in a informational, neutral tone
testimonial = TextBlob(page.content)
testimonial.sentiment

Sentiment(polarity=0.006088664421997764, subjectivity=0.30434904601571267)

### Communication of Results

Ultimately, the sentiment analysis for Armen Ra's page shows the tone is primarily informational, objective, and neutral. When using Nearest Neighbors or Model Most Similar to identify Wikipedia pages similar to Armen Ra's, there were different results presented based on the method was used. Nearest Neighbors presented pages of individuals that had similarly neutral tones, while Most Similar showed individuals in similar industries as Armen Ra. The natural language processing tools at times output errors in Armen Ra's name and typos throughout the content. Consider further analysis into algorithmic bias present within the natural language processing tools and alternative data analysis and visualization methods available.

## Live Coding

In addition to presenting our slides to each other, at the end of the presentation each analyst will demonstrate their code using a famous person randomly selected from the database.

In [None]:
Roddy = people_df[people_df['name'].str.contains('Roddy Piper')]

In [None]:
Roddy

Unnamed: 0,URI,name,text
32819,<http://dbpedia.org/resource/Roddy_Piper>,Roddy Piper,roderick roddy george toombs born april 17 195...


In [None]:
wikipedia.search('Roddy Piper')



['Roddy Piper',
 'Hell Comes to Frogtown',
 'Ric Flair',
 'Hulk Hogan',
 'They Live',
 'Bret Hart',
 'WrestleMania 2',
 "Hulk Hogan's Rock 'n' Wrestling",
 'Pro Wrestlers vs Zombies',
 'Starrcade (1996)']

In [None]:
wikipedia.summary('Roddy Piper')


'Roderick George Toombs (April 17, 1954 – July 31, 2015) was a Canadian professional wrestler, amateur wrestler, amateur boxer, and actor, better known by his ring name "Rowdy" Roddy Piper.\nIn professional wrestling, Piper was best known to international audiences for his work with the World Wrestling Federation (WWF, now WWE) and World Championship Wrestling (WCW) between 1984 and 2000. Although he was Canadian, because of his Scottish heritage he was billed as coming from Glasgow and was known for his signature kilt and bagpipe entrance music. Piper earned the nicknames "Rowdy" and "Hot Rod" by displaying his trademark "Scottish" rage, spontaneity, and quick wit. According to The Daily Telegraph, he is "considered by many to be the greatest \'heel\' (or villain) wrestler ever".One of wrestling\'s most recognizable stars, Piper headlined multiple PPV events, including the WWF and WCW\'s respective premier annual events, WrestleMania and Starrcade. He accumulated 34 championships and 

In [None]:
wikipedia.page('Roddy Piper')


<WikipediaPage 'Roddy Piper'>

In [None]:
wikipedia.page('Roddy Piper').url

'https://en.wikipedia.org/wiki/Roddy_Piper'

In [None]:
famous_page = wikipedia.page('Roddy Piper')

In [None]:
famous_page.summary

'Roderick George Toombs (April 17, 1954 – July 31, 2015) was a Canadian professional wrestler, amateur wrestler, amateur boxer, and actor, better known by his ring name "Rowdy" Roddy Piper.\nIn professional wrestling, Piper was best known to international audiences for his work with the World Wrestling Federation (WWF, now WWE) and World Championship Wrestling (WCW) between 1984 and 2000. Although he was Canadian, because of his Scottish heritage he was billed as coming from Glasgow and was known for his signature kilt and bagpipe entrance music. Piper earned the nicknames "Rowdy" and "Hot Rod" by displaying his trademark "Scottish" rage, spontaneity, and quick wit. According to The Daily Telegraph, he is "considered by many to be the greatest \'heel\' (or villain) wrestler ever".One of wrestling\'s most recognizable stars, Piper headlined multiple PPV events, including the WWF and WCW\'s respective premier annual events, WrestleMania and Starrcade. He accumulated 34 championships and 

In [None]:
testimonial = TextBlob(famous_page.content)

In [None]:
testimonial.sentiment

Sentiment(polarity=0.08889616748991759, subjectivity=0.4185351839518508)

Nearest Neighbors

In [None]:
people_df1 = [people_df.iloc[32819]['text']]
people_df1

['roderick roddy george toombs born april 17 1954 better known by his ring name rowdy roddy piper is a retired canadian professional wrestler film actor and podcast host per a wwe legends contract he makes occasional tv and promotional appearances for the company in professional wrestling he is best known for his work with the world wrestling federation wwf now wwe although he is canadian due to his scottish heritage he was billed as coming from glasgow in scotland and was known for his signature kilt and bagpipe entrance music he earned the nickname rowdy by displaying his trademark scottish rage spontaneity and quick wit despite being a crowd favorite for his rockstarlike persona he often played the villain he was also nicknamed hot rodpiper headlined several major payperview events he participated in the main events of wrestlemania i and wrestlemania x as a special guest referee in the latter never a world champion he nevertheless accumulated 34 championships in various promotions d

In [None]:
nn = NearestNeighbors(metric='euclidean')
nn.fit(word_weight)

NearestNeighbors(algorithm='auto', leaf_size=30, metric='euclidean',
                 metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                 radius=1.0)

In [None]:
roddy_index = people_df[people_df['name'] == 'Roddy Piper'].index[0]
roddy_index

32819

In [None]:
distances, indices = nn.kneighbors(word_weight[roddy_index], n_neighbors=11)

In [None]:
distances

array([[ 0.        , 15.32970972, 15.39480432, 15.39480432, 15.45962483,
        15.90597372, 16.09347694, 16.21727474, 16.21727474, 16.21727474,
        16.2788206 ]])

In [None]:
indices

array([[32819, 15845,  2037, 18432, 19757, 21038, 35633, 35976,  3689,
        28802, 32308]])

In [None]:
people_df.iloc[indices[0],:]

Unnamed: 0,URI,name,text
32819,<http://dbpedia.org/resource/Roddy_Piper>,Roddy Piper,roderick roddy george toombs born april 17 195...
15845,<http://dbpedia.org/resource/Lita_(wrestler)>,Lita (wrestler),amy christine dumas born april 14 1975 better ...
2037,<http://dbpedia.org/resource/William_Regal>,William Regal,darren kenneth matthews born 10 may 1968 is a ...
18432,<http://dbpedia.org/resource/Jake_Roberts>,Jake Roberts,aurelian smith jr born may 30 1955 is a semire...
19757,<http://dbpedia.org/resource/Sable_(wrestler)>,Sable (wrestler),rena marlette lesnar born rena greek born augu...
21038,<http://dbpedia.org/resource/Trish_Stratus>,Trish Stratus,patricia anne stratigias strtdis born december...
35633,<http://dbpedia.org/resource/Marie_Brassard>,Marie Brassard,marie brassard is a quebec actress author and ...
35976,<http://dbpedia.org/resource/Rocky_Dzidzornu>,Rocky Dzidzornu,rocky dzidzornu also known as rocky dijon is a...
3689,<http://dbpedia.org/resource/Ralph_McTell>,Ralph McTell,ralph mctell born ralph may 3 december 1944 is...
28802,<http://dbpedia.org/resource/The_Undertaker>,The Undertaker,mark william calaway born march 24 1965 better...


In [None]:
people_df.iloc[2037]['text']

'darren kenneth matthews born 10 may 1968 is a semiretired english professional wrestler trainer talent scout author and color commentator he is currently signed to wwe under the ring name william regal he is also known for his time in wcw under the ring name steven regal having started his career on a rare surviving carnival booth in britain he moved on to wrestle for nationallevel promotions on the british wrestling circuit and on british television he then progressed to touring around the world in countries such as germany and south africa before being called up to wcw in 1993in 2000 after leaving wcw matthews joined the world wrestling federation later world wrestling entertainment wwe where he became commissioner more recently he has been general manager of raw the 2008 king of the ring and the official match coordinator for nxt redemption in 2011 he is currently the general manager of nxt he has achieved considerable championship success in professional wrestling although he has 

In [None]:
people_df.iloc[18432]['text']

'aurelian smith jr born may 30 1955 is a semiretired american professional wrestler and the son of former wrestler aurelian grizzly smith he is best known by his ring name of jake the snake roberts and often brought snakes into the ring most famously a python named damienroberts is best known for his two stints in the world wrestling federationthe first between 1986 and 1992 and the second between 1996 and 1997 he wrestled in the national wrestling alliance in 1983 world championship wrestling in 1992 and the mexicobased asistencia asesora y administracin between 1993 and 1994 and again in 1997 he appeared in extreme championship wrestling during the summer of 1997 and made appearances for total nonstop action wrestling from 2006 through 2008throughout his career he was known for his intense and cerebral promos his dark charisma his extensive use of psychology in his matches and has been credited as inventor of the ddt maneuver which has been named the coolest maneuver of all time by t

In [None]:
people_df.iloc[21038]['text']

'patricia anne stratigias strtdis born december 18 1975 is a canadian retired professional wrestler former fitness model and fitness guru actress and television personality better known by her stage name and former ring name trish stratus she is best known for her tenure with wweafter beginning her career as a fitness model stratus began working for the world wrestling federation wwf which was later renamed world wrestling entertainment wwe early in her career she was involved in sexually themed storylines such as managing the team t a and an affair with vince mcmahon as stratus spent more time in the ring her perceived wrestling skills strengthened and her popularity increased because of this she was made a onetime wwe hardcore champion threetime wwe babe of the year and was proclaimed diva of the decade after nearly seven years in the business stratus retired from professional wrestling at wwe unforgiven on september 17 2006 after winning her seventh wwe womens championship the most 

In [None]:
people_df.iloc[35633]['text']

'marie brassard is a quebec actress author and theatre director living in montreal for many years her professional endeavors were closely linked with robert lepage under his direction she participated along with other artists in the creation of several plays and films among them the dragons trilogy polygraph the seven streams of the river ota and geometry of miraclesin 2001 she created her first solo production jimmy within the framework of the festival de thtre des amriques now festival transamriques in the meantime she has produced six other works the darkness peepshow the glass eye with louis negin the invisible me talking to myself in the future and trieste in which she has continued to experiment with technology and explore the many ways with which sound can be manipulated in theater by interlacing voices and music and traversing the planes of reality she leads audiences to a world where the boundaries between public and private dissolve and the relationship between human beings a