## How are people feeling about AI and its impact in the workplace?
### Analysing the cleaned text and performing sentiment analysis.
<hr>
After finding the relevant tokens for each website, I prepared the text to be analysed with the nltk.sentiment.vader library.
One of the issues I found during this stage was the correct working of a data frame column. I learnt that the column of tokens needed to be converted into a numpy array first to be able to apply str/ list functions to it.

In [14]:
import requests
import pandas as pd
import nltk
nltk.download('stopwords')
nltk.download('vader_lexicon')
from nltk.tokenize import TweetTokenizer
from nltk.corpus import stopwords
from nltk.sentiment import SentimentIntensityAnalyzer
import string
import numpy as np
import re

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Diana\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Diana\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Load my previously built CSV
This CSV contains:
- The web_id, which simply correspond to the total amount of webistes in the file. 
- Links:the links I have previously scraped
- Tokens: the cleaned text I extracted from each link

In [15]:
dataFrame = pd.read_csv("Cleaned_data.csv")
#dataFrame.drop('Unnamed: 0', inplace=True, axis =1)
dataFrame.rename(columns={'Unnamed: 0': 'web_id'}, inplace = True)
dataFrame

Unnamed: 0,web_id,links,tokens
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'..."
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ..."
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit..."
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '..."
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '..."
...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ..."
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'..."
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped..."
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin..."


### Creating a function that prepares the text to be analysed
Because the content on the 'tokens' column is a list of strings, I need to iterate over each list and remove the commas and the quotation marks, so that I get a long string that can be analysed.

In [16]:
##converting my dataFrame column into a numpy array
tokenList = dataFrame['tokens'].to_numpy()
#print(tokenList)
        
def freedTokens(tokenList):
    
    modifiedList = []
    
    for token in tokenList:
        #print(token)
        remove = re.sub(r'[,\']', '', token)
        modifiedList.append(remove)
    return modifiedList

#print(freedTokens(tokenList))

In [17]:
#dataFrame['stringToken'] = dataFrame['tokens'].apply(lambda tokenList: freedTokens(tokenList))
dataFrame['stringToken'] = freedTokens(tokenList)
dataFrame

Unnamed: 0,web_id,links,tokens,stringToken
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'...",[pause giant ai experiments open letter future...
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ...",[artificial future life institute skip content...
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit...",[ai principles future life institute skip cont...
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '...",[planning agi beyond closesearch submit skip m...
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '...",[faqs fli open letter calling pause giant ai e...
...,...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ...",[ai automation future work ten things solve te...
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'...",[advantages ai workplace advantages ai workpla...
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped...",[workplace impact artificial wikipedia jump co...
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin...",[hr def â\x80\x9csmartâ\x80\x9d using artifici...


### Sentiment analysis
After preparing the text, I now find the average 'positivity', 'negativity', and 'neutrality' for each link.
At the end I find the 'compound' value, which is the 'final average'. The compound value helps me to understand the general sentiment present in the text:
- If the compound value is more than 0, the sentiment is positive
- If the compound value is less than 0, the sentiment is negative
- if the compound value is equal to 0, the sentiment is neutral

In [18]:
sentiment = SentimentIntensityAnalyzer()

dataFrame['positive_sentiment'] = dataFrame['stringToken'].apply(lambda x: sentiment.polarity_scores(''.join(x))['pos'])
dataFrame

Unnamed: 0,web_id,links,tokens,stringToken,positive_sentiment
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'...",[pause giant ai experiments open letter future...,0.180
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ...",[artificial future life institute skip content...,0.167
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit...",[ai principles future life institute skip cont...,0.224
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '...",[planning agi beyond closesearch submit skip m...,0.261
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '...",[faqs fli open letter calling pause giant ai e...,0.185
...,...,...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ...",[ai automation future work ten things solve te...,0.207
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'...",[advantages ai workplace advantages ai workpla...,0.218
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped...",[workplace impact artificial wikipedia jump co...,0.120
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin...",[hr def â\x80\x9csmartâ\x80\x9d using artifici...,0.149


In [19]:
dataFrame['negative_sentiment'] = dataFrame['stringToken'].apply(lambda x: sentiment.polarity_scores(''.join(x))['neg'])
dataFrame

Unnamed: 0,web_id,links,tokens,stringToken,positive_sentiment,negative_sentiment
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'...",[pause giant ai experiments open letter future...,0.180,0.096
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ...",[artificial future life institute skip content...,0.167,0.126
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit...",[ai principles future life institute skip cont...,0.224,0.078
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '...",[planning agi beyond closesearch submit skip m...,0.261,0.062
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '...",[faqs fli open letter calling pause giant ai e...,0.185,0.088
...,...,...,...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ...",[ai automation future work ten things solve te...,0.207,0.047
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'...",[advantages ai workplace advantages ai workpla...,0.218,0.047
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped...",[workplace impact artificial wikipedia jump co...,0.120,0.106
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin...",[hr def â\x80\x9csmartâ\x80\x9d using artifici...,0.149,0.048


In [20]:
dataFrame['neutral'] = dataFrame['stringToken'].apply(lambda x: sentiment.polarity_scores(''.join(x))['neu'])
dataFrame

Unnamed: 0,web_id,links,tokens,stringToken,positive_sentiment,negative_sentiment,neutral
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'...",[pause giant ai experiments open letter future...,0.180,0.096,0.724
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ...",[artificial future life institute skip content...,0.167,0.126,0.708
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit...",[ai principles future life institute skip cont...,0.224,0.078,0.699
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '...",[planning agi beyond closesearch submit skip m...,0.261,0.062,0.677
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '...",[faqs fli open letter calling pause giant ai e...,0.185,0.088,0.727
...,...,...,...,...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ...",[ai automation future work ten things solve te...,0.207,0.047,0.746
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'...",[advantages ai workplace advantages ai workpla...,0.218,0.047,0.735
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped...",[workplace impact artificial wikipedia jump co...,0.120,0.106,0.775
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin...",[hr def â\x80\x9csmartâ\x80\x9d using artifici...,0.149,0.048,0.803


In [21]:
dataFrame['overall_sentiment'] = dataFrame['stringToken'].apply(lambda x: sentiment.polarity_scores(''.join(x))['compound'])
dataFrame

Unnamed: 0,web_id,links,tokens,stringToken,positive_sentiment,negative_sentiment,neutral,overall_sentiment
0,0,https://futureoflife.org/open-letter/pause-gia...,"['pause', 'giant', 'ai', 'experiments', 'open'...",[pause giant ai experiments open letter future...,0.180,0.096,0.724,0.9980
1,1,https://futureoflife.org/cause-area/artificial...,"['artificial', 'future', 'life', 'institute', ...",[artificial future life institute skip content...,0.167,0.126,0.708,0.9897
2,2,https://futureoflife.org/open-letter/ai-princi...,"['ai', 'principles', 'future', 'life', 'instit...",[ai principles future life institute skip cont...,0.224,0.078,0.699,0.9992
3,3,https://openai.com/blog/planning-for-agi-and-b...,"['planning', 'agi', 'beyond', 'closesearch', '...",[planning agi beyond closesearch submit skip m...,0.261,0.062,0.677,0.9996
4,4,https://futureoflife.org/ai/faqs-about-flis-op...,"['faqs', 'fli', 'open', 'letter', 'calling', '...",[faqs fli open letter calling pause giant ai e...,0.185,0.088,0.727,0.9978
...,...,...,...,...,...,...,...,...
108,108,https://www.mckinsey.com/featured-insights/fut...,"['ai', 'automation', 'future', 'work', 'ten', ...",[ai automation future work ten things solve te...,0.207,0.047,0.746,0.9998
109,109,https://www.bbntimes.com/technology/advantages...,"['advantages', 'ai', 'workplace', 'advantages'...",[advantages ai workplace advantages ai workpla...,0.218,0.047,0.735,0.9995
110,110,https://en.wikipedia.org/wiki/Workplace_impact...,"['workplace', 'impact', 'artificial', 'wikiped...",[workplace impact artificial wikipedia jump co...,0.120,0.106,0.775,0.9905
111,111,https://www.akerman.com/en/perspectives/hr-def...,"['hr', 'def', 'â\x80\x9csmartâ\x80\x9d', 'usin...",[hr def â\x80\x9csmartâ\x80\x9d using artifici...,0.149,0.048,0.803,0.9978


In [22]:
#dataFrame.to_csv('Sentiment_per_link_data.csv')

In [23]:
positive = len(dataFrame[dataFrame.overall_sentiment > 0])
negative = len(dataFrame[dataFrame.overall_sentiment < 0])
neutral = len(dataFrame[dataFrame.overall_sentiment == 0])
print(positive, negative, neutral)

107 5 1


### Plotting the overall sentiment for each link
The plotting of the information helps to not only see the general sentiment of the different texts 
but also to identify possible bugs with websites.
- I have found that very negative websites might be blocked, hence the vocabulary used rates them as negative. This could be used to replace the link with a more relevant resource.

In [24]:
import plotly.express as px
fig = px.scatter(dataFrame, x= 'web_id', y= 'overall_sentiment', color = 'overall_sentiment', color_continuous_scale=px.colors.sequential.Agsunset)
fig.show()

### Plotting the positive sentiment scores for each link

In [29]:
fig = px.scatter(dataFrame, x= 'web_id', y= 'positive_sentiment', color = 'positive_sentiment', color_continuous_scale=px.colors.sequential.Agsunset)
fig.show()

### Plotting the neutral sentiment scores for each link

In [28]:
fig = px.scatter(dataFrame, x= 'web_id', y= 'neutral', color = 'neutral', color_continuous_scale=px.colors.sequential.Agsunset)
fig.show()

### Finally I save my new data in a new CSV

In [25]:
#dataFrame.to_csv('Sentiment_per_link_data.csv')