# ADS 509 Sentiment Assignment

This notebook holds the Sentiment Assignment for Module 6 in ADS 509, Applied Text Mining. Work through this notebook, writing code and answering questions where required. 

In a previous assignment you put together Twitter data and lyrics data on two artists. In this assignment we apply sentiment analysis to those data sets. If, for some reason, you did not complete that previous assignment, data to use for this assignment can be found in the assignment materials section of Blackboard. 


## General Assignment Instructions

These instructions are included in every assignment, to remind you of the coding standards for the class. Feel free to delete this cell after reading it. 

One sign of mature code is conforming to a style guide. We recommend the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html). If you use a different style guide, please include a cell with a link. 

Your code should be relatively easy-to-read, sensibly commented, and clean. Writing code is a messy process, so please be sure to edit your final submission. Remove any cells that are not needed or parts of cells that contain unnecessary code. Remove inessential `import` statements and make sure that all such statements are moved into the designated cell. 

Make use of non-code cells for written commentary. These cells should be grammatical and clearly written. In some of these cells you will have questions to answer. The questions will be marked by a "Q:" and will have a corresponding "A:" spot for you. *Make sure to answer every question marked with a `Q:` for full credit.* 


In [39]:
import os
import regex as re   # stack overflow suggestion for REGEX error encountered
import emoji
import pandas as pd
import numpy as np
import random


from pprintpp import pprint
from collections import Counter, defaultdict
from string import punctuation

from nltk.corpus import stopwords

sw = stopwords.words("english")

In [2]:
# Add any additional import statements you need here

import textacy.preprocessing as tprep
from lexical_diversity import lex_div as ld
import spacy

from spacy.tokenizer import Tokenizer
from spacy.util import compile_prefix_regex, compile_infix_regex, compile_suffix_regex 

nlp = spacy.load('en_core_web_sm')

import string
from pprintpp import pprint

import seaborn as sns
import plotly.express as px


In [3]:
# change `data_location` to the location of the folder on your machine.

data_location = "/Users/dunya/Desktop/mod6"

twitter_folder = f"{data_location}/twitter/"
lyrics_folder = f"{data_location}/lyrics"

artist_files = {'cher':'cher_followers_data.txt',
                'robyn':'robynkonichiwa_followers_data.txt'}

positive_words_file = "positive-words.txt"
negative_words_file = "negative-words.txt"
tidy_text_file = "tidytext_sentiments.txt"

## Data Input

Now read in each of the corpora. For the lyrics data, it may be convenient to store the entire contents of the file to make it easier to inspect the titles individually, as you'll do in the last part of the assignment. In the solution, I stored the lyrics data in a dictionary with two dimensions of keys: artist and song. The value was the file contents. A Pandas data frame would work equally well. 

For the Twitter data, we only need the description field for this assignment. Feel free all the descriptions read it into a data structure. In the solution, I stored the descriptions as a dictionary of lists, with the key being the artist. 




In [4]:
# Read in the lyrics data

os.listdir(lyrics_folder) 
  
# Helper to read text file

def read_text_file(file_path):
    with open(file_path, 'r') as f:
        return f.read()
   
# Function to iterate through all files

def load_songs(artist):
    path = f"{lyrics_folder}/{artist}/"                          # designate path
    os.chdir(path)                                               # change to path directory
    song_list = []
    for file in os.listdir():
        if file.endswith(".txt"):                                # check for text format
            songs_dict = {}
            file_path = f"{path}{file}"
            path_components = file_path.split('/')
            songs_dict["title"] = path_components[-1]            # get title from file path
            songs_dict["artist"] = path_components[-2]           # get artist from file path
            songs_dict["lyrics"] = read_text_file(file_path)     # extract lyrics
            song_list.append(songs_dict)
    return song_list

In [5]:
# Read in the lyrics data using load_songs function above

artist_names = ["cher", "robyn"]

read_cher = pd.DataFrame(load_songs("cher"))
read_robyn = pd.DataFrame(load_songs("robyn"))

# Convert list of dictionaries to pandas

lyrics_data = pd.concat([read_cher, read_robyn])

In [6]:
# Read in the twitter data

# Cher

twitter_data = pd.read_csv(twitter_folder + artist_files['cher'],
                           sep="\t",
                           quoting=3)

twitter_data['artist'] = "cher"

# Robyn

twitter_data_2 = pd.read_csv(twitter_folder + artist_files['robyn'],
                             sep="\t",
                             quoting=3)
twitter_data_2['artist'] = "robyn"

# Merge data

twitter_data = pd.concat([
    twitter_data,twitter_data_2])
    
del(twitter_data_2)

In [7]:
# Read in the positive and negative words and the
# tidytext sentiment. Store these so that the positive
# words are associated with a score of +1 and negative words
# are associated with a score of -1. You can use a dataframe or a 
# dictionary for this.

# Function to load + / - / tidy

def load_tidy():
    tidy_dict = {}
    path = f"{data_location}/{tidy_text_file}"     # designate tidy path
    
    with open(path, 'r') as f:
        # columns = ['word','sentiment', 'lexicon']
        tidy_df = pd.read_table(path, usecols = ['word','sentiment', 'lexicon']) 
        # TO_DICT
    return tidy_df

# Function to assign + / -

def assign_score(sentiment):
    if sentiment == "negative":
        return -1
    else:
        return 1


In [8]:
# Read in the tidy data using the load_tidy function

tidy_df = load_tidy()

# Apply the assign_score function

tidy_df["score"] = tidy_df["sentiment"].apply(assign_score)

display(tidy_df)

Unnamed: 0,word,sentiment,lexicon,score
0,abandon,negative,nrc,-1
1,abandoned,negative,nrc,-1
2,abandonment,negative,nrc,-1
3,abba,positive,nrc,1
4,abduction,negative,nrc,-1
...,...,...,...,...
15128,win,positive,loughran,1
15129,winner,positive,loughran,1
15130,winners,positive,loughran,1
15131,winning,positive,loughran,1


In [9]:
# Pipeline functions

# Some punctuation variations
punctuation = set(punctuation) # speeds up comparison
tw_punct = punctuation - {"#"}

# Two useful regex
whitespace_pattern = re.compile(r"\s+")
hashtag_pattern = re.compile(r"^#[0-9a-zA-Z]+")

def remove_stop(tokens) :
    # modify this function to remove stopwords
    stop_words = spacy.lang.en.STOP_WORDS                        # load stop words

    potential_stop_words = [ '', 'im', 'like',                   
                            'dont', 'got', 'cause',              # added extra stop words
                            'wanna', 'youre']
    
    for wrd in potential_stop_words:
        stop_words.add(wrd)

    removed = [w for w in tokens if not w in stop_words]         # remove stop words
    return(removed)

 
def remove_punctuation(text, punct_set=punctuation) : 
    return("".join([ch for ch in text if ch not in punct_set]))

def tokenize(text) : 
    """ Splitting on whitespace rather than the book's tokenize function. That 
        function will drop tokens like '#hashtag' or '2A', which we need for Twitter. """
    
    # modify this function to return tokens
    collapse_whitespace = re.compile(r'\s+')
    return([item.lower() for item in collapse_whitespace.split(text)])  # using Module 2
    
def prepare(text, pipeline) : 
    tokens = str(text)
    
    for transform in pipeline : 
        tokens = transform(tokens)
        
    return(tokens)


In [10]:
# Create pipline using functions above

my_pipeline = [str.lower, remove_punctuation, tokenize, remove_stop]

# Apply piplines functions to generate tokens from text

lyrics_data["tokens"] = lyrics_data["lyrics"].apply(prepare,pipeline = my_pipeline)
twitter_data["tokens"] = twitter_data["description"].apply(prepare,pipeline = my_pipeline)



## Sentiment Analysis on Songs

In this section, score the sentiment for all the songs for both artists in your data set. Score the sentiment by manually calculating the sentiment using the combined lexicons provided in this repository. 

After you have calculated these sentiments, answer the questions at the end of this section.


In [11]:
# Score sentiment for all the songs of both artists

# for artist -> song -> tokenize -> calc_sentiment

test_arr = ['come','stay','ill','send','away','false','pride','ill',
            'forsake','life','yes','ill','true','true','youll','come',
            'stay','lovers','past','ill','leave','theyll','mind','ill',
            'youll','feel','free','youll','come','stay','promise',
            'faithfully','ill','decide','leave','ill','try','need',
            'youll','come','stay','yes','ill','true','true','youll',
            'come','stay','live','life','known','know','think','hardly',
            'grown','oh','thank','god','finally','gonna','stay','gonna','stay']

# Function to calculate sentiment from token array

def calc_sentiment(token_arr):
    total_sent = 0
    for token in token_arr:
        df = tidy_df.loc[tidy_df['word'] == token]  # find word
        if len(df) == 0:
            total_sent += 0                     # if not found, add 0
        else:
            val = df['score'].tolist()          # if found, get value
            total_sent += val[0]                # add first value
    return total_sent


In [12]:
# Apply sentiment calculation to lyrics data

lyrics_data["song_sentiment"] = lyrics_data["tokens"].apply(calc_sentiment)

lyrics_data.head()

Unnamed: 0,title,artist,lyrics,tokens,song_sentiment
0,cher_comeandstaywithme.txt,cher,"""Come And Stay With Me""\n\n\n\nI'll send away ...","[come, stay, ill, send, away, false, pride, il...",-1
1,cher_pirate.txt,cher,"""Pirate""\n\n\n\nHe'll sail on with the summer ...","[pirate, hell, sail, summer, wind, blows, day,...",7
2,cher_stars.txt,cher,"""Stars""\n\n\n\nI was never one for saying what...","[stars, saying, feel, tonight, bringing, know,...",4
3,cher_thesedays.txt,cher,"""These Days""\n\n\n\nWell I've been out walking...","[days, ive, walking, talking, days, days, days...",-8
4,cher_lovesohigh.txt,cher,"""Love So High""\n\n\n\nEvery morning I would wa...","[love, high, morning, wake, id, tie, sun, cup,...",12


### Questions

Q: Overall, which artist has the higher average sentiment per song? 

In [13]:
# Create artist dfs

cher_lyrics = lyrics_data[lyrics_data["artist"] == "cher"]
robyn_lyrics = lyrics_data[lyrics_data["artist"] == "robyn"]

# Get sentiment averages 

print("Cher Average Sentiment: ", cher_lyrics["song_sentiment"].mean())
print("Robyn Average Sentiment: ", robyn_lyrics["song_sentiment"].mean())


Cher Average Sentiment:  4.113924050632911
Robyn Average Sentiment:  6.009615384615385


A: Based on the calculations above, Robyn has about 50% more sentiment than Cher on average.

---

Q: For your first artist, what are the three songs that have the highest and lowest sentiments? Print the lyrics of those songs to the screen. What do you think is driving the sentiment score? 

In [14]:
# Organize Cher df by sentiment score

lowest_sents = cher_lyrics.sort_values(by="song_sentiment", ascending=True)[0:3]
highest_sents = cher_lyrics.sort_values(by="song_sentiment", ascending=False)[0:3]

# Display lyrics

i = 0
print("|=========== Cher's Highest Sentiment Songs: ==========|\n")
display(highest_sents)
while i < 3:
    highest_lyrics = highest_sents["lyrics"].tolist()
    print(highest_lyrics[i][0:250], "\n")   # capped at 250 characters
    i += 1

i = 0
print("|=========== Cher's Lowest Sentiment Songs: ===========|\n")
display(lowest_sents)
while i < 3:
    lowest_lyrics = lowest_sents["lyrics"].tolist()
    print(lowest_lyrics[i][0:250], "\n")    # capped at 250 characters
    i += 1





Unnamed: 0,title,artist,lyrics,tokens,song_sentiment
234,cher_ifoundyoulove.txt,cher,"""I Found You Love""\n\n\n\nWell I was looking f...","[found, love, looking, new, love, different, k...",46
103,cher_perfection.txt,cher,"""Perfection""\n\n\n\nHush little Baby, gotta be...","[perfection, hush, little, baby, gotta, strong...",45
15,cher_mylove.txt,cher,"""My Love""\n\n\n\nWhen I go away\nI know my hea...","[love, away, know, heart, stay, love, understo...",44


"I Found You Love"



Well I was looking for a new love, a different kind of true love
Who's gonna treat me right, all day and night
Hey baby I've been looking too
And I have found there's
No other love from me but you
Well I was looking for a new lo 

"Perfection"



Hush little Baby, gotta be strong
'Cause in this world we are born to fight
Be the best, prove them wrong
A winner's work is never done, reach the top, number one

Oh, perfection
You drive me crazy with perfection
I've worn my pride a 

"My Love"



When I go away
I know my heart can stay with my love
It's understood
Everywhere with my love
My love does it good, whoa
My love, oh only my love
My love does it good

And when the cupboard's bare
I'll still find something there with my l 




Unnamed: 0,title,artist,lyrics,tokens,song_sentiment
41,cher_bangbang.txt,cher,"""Bang-Bang""\n\n\n\nBang bang you shot me down\...","[bangbang, bang, bang, shot, bang, bang, hit, ...",-71
195,cher_bangbangmybabyshotmedown.txt,cher,"""Bang Bang (My Baby Shot Me Down)""\n\n\n\nI wa...","[bang, bang, baby, shot, rode, horses, sticks,...",-35
158,cher_outrageous.txt,cher,"""Outrageous""\n\n\n\nOutrageous, outrageous\n(T...","[outrageous, outrageous, outrageous, outrageou...",-33


"Bang-Bang"



Bang bang you shot me down
Bang bang I hit the ground
Bang bang that awful sound
Bang bang my baby shot me down

I was five and you were six
We rode on horses made of sticks
I wore black you wore white
You would always win the fight

B 

"Bang Bang (My Baby Shot Me Down)"



I was five and he was six
We rode on horses made of sticks
He wore black and I wore white
He would always win the fight

Bang bang, he shot me down
Bang bang, I hit the ground
Bang bang, that awful sound
Bang ban 

"Outrageous"



Outrageous, outrageous
(They say) I'm outrageous
It's the rage

I'm gonna wear what I will and spend some
And I will be dress to kill don'tcha know
And when the lights come up
I'm ready I'm ready
To put on a show with class
And if I c 




A: Cher's highest sentiment songs are: (1) "I Found Love", (2) "Perfection", and (3) "My Love". For these three songs I believe the frequent use of positive words like `love`, `perfection`, and `best` are driving the sentiment up. Cher's lowest sentiment songs are: (1) "Bang Bang", (2) "Bang Bang (My Baby Shot Me Down)", and (3) "Outrageous". It seems the first two songs are different version of the same thing. The constant use of the word `bang` is definitely responsible for the low sentiment in these two. The first version has two times lower sentiment than the second, it seems thats because the chorus is used more frequently there.

---


Q: For your second artist, what are the three songs that have the highest and lowest sentiments? Print the lyrics of those songs to the screen. What do you think is driving the sentiment score? 

In [15]:
# Organize Robyn df by sentiment score

lowest_sents = robyn_lyrics.sort_values(by="song_sentiment", ascending=True)[0:3]
highest_sents = robyn_lyrics.sort_values(by="song_sentiment", ascending=False)[0:3]

# Display lyrics

i = 0
print("|=========== Robyn's Highest Sentiment Songs: ==========|\n")
display(highest_sents)
while i < 3:
    highest_lyrics = highest_sents["lyrics"].tolist()
    print(highest_lyrics[i][0:250], "\n")   # capped at 250 characters
    i += 1

i = 0
print("|=========== Robyn's Lowest Sentiment Songs: ===========|\n")
display(lowest_sents)
while i < 3:
    lowest_lyrics = lowest_sents["lyrics"].tolist()
    print(lowest_lyrics[i][0:250], "\n")    # capped at 250 characters
    i += 1




Unnamed: 0,title,artist,lyrics,tokens,song_sentiment
21,robyn_loveisfree.txt,robyn,"""Love Is Free""\n\n\n\nFree\nLove is free, baby...","[love, free, free, love, free, baby, free, lov...",121
50,robyn_wedancetothebeat114528.txt,robyn,"""We Dance To The Beat""\n\n\n\nWe dance to the ...","[dance, beat, dance, beat, dance, beat, dance,...",67
98,robyn_wedancetothebeat.txt,robyn,"""We Dance To The Beat""\n\n\n\nWe dance to the ...","[dance, beat, dance, beat, dance, beat, dance,...",67


"Love Is Free"



Free
Love is free, baby
Free
Love is free, baby
Boom boom boom boom boom chica boom
Let me give it to you, baby
Chica boom chica boom chica boom
Chica boom chica boom chica boom
Free
Love is free, baby
Free
Love is free, baby
Boom b 

"We Dance To The Beat"



We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to th 

"We Dance To The Beat"



We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to the beat
We dance to th 




Unnamed: 0,title,artist,lyrics,tokens,song_sentiment
53,robyn_dontfuckingtellmewhattodo.txt,robyn,"""Don't Fucking Tell Me What To Do""\n\n\n\nMy d...","[fucking, tell, drinking, killing, drinking, k...",-91
75,robyn_dontfuckingtellmewhattodo114520.txt,robyn,"""Don't Fucking Tell Me What To Do""\n\n\n\nMy d...","[fucking, tell, drinking, killing, drinking, k...",-91
16,robyn_criminalintent.txt,robyn,"""Criminal Intent""\n\n\n\nSomebody alert the au...","[criminal, intent, somebody, alert, authoritie...",-53


"Don't Fucking Tell Me What To Do"



My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My d 

"Don't Fucking Tell Me What To Do"



My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My drinking is killing me
My d 

"Criminal Intent"



Somebody alert the authorities, I got criminal intent
Conspiracy to engage in lewd and indecent acts and events
I'mma wind it, grind it, oh my, I'mma say it again
Somebody alert the authorities, she's got criminal intent

Somebod 




A: Both the highest and lowest sentiment songs for Robyn seem to contain duplicates or slightly different versions of the same song. Robyn's highest sentiment songs are: (1) "Love is Free", (2) "We Dance to the Beat 114528", and (3) "We Dance to the Beat". The word `love` is the theme of one song so that may be a reason for high sentiment. The word `dance` is used throughout the chorus of two songs which definitely increases sentiment. Robyn's lowest sentiment songs are: (1) "Dont F!@$ing Tell Me What To Do", (2) "Dont F!@$ing Tell Me What To Do 114520", and (3) "Outrageous". The swear words are definitely culprits in low sentiment. Aside from that, words like `killing` and `criminal` drive sentiment down.

---

Q: Plot the distributions of the sentiment scores for both artists. You can use `seaborn` to plot densities or plot histograms in matplotlib.

In [16]:
# Create sentiment plots

# Seaborn plot

color = sns.color_palette()

fig = px.histogram(cher_lyrics, x="song_sentiment",  width=800, height=400)
fig.update_traces(marker_color="pink",marker_line_color='purple',
                  marker_line_width=1.5)
fig.update_layout(title_text='Cher Song Sentiment Distributions')
fig.show()

fig2 = px.histogram(robyn_lyrics, x="song_sentiment",  width=800, height=400)
fig2.update_traces(marker_color="turquoise",marker_line_color='green',
                  marker_line_width=1.5)
fig2.update_layout(title_text='Robyn Song Sentiment Distributions')
fig2.show()

The songs tend to sway a little towards the positive side when it comes to sentiment, but generally stay neutral on average.

## Sentiment Analysis on Twitter Descriptions

In this section, define two sets of emojis you designate as positive and negative. Make sure to have at least 10 emojis per set. You can learn about the most popular emojis on Twitter at [the emojitracker](https://emojitracker.com/). 

Associate your positive emojis with a score of +1, negative with -1. Score the average sentiment of your two artists based on the Twitter descriptions of their followers. The average sentiment can just be the total score divided by number of followers. You do not need to calculate sentiment on non-emoji content for this section.

In [17]:
# Define positive emoji attributes

pos_words = [
    '💙', '🌞', '♥', '🌈', '❤️', 
    '💜',  '✨', '😊', '🙌', '🤗'
    ]
pos_sentiments = ['positive'] * 10
pos_lexicons = ['EMJ'] * 10
pos_scores = [1] * 10

positive_emojis = pd.DataFrame(
    {'word': pos_words,
     'sentiment': pos_sentiments,
     'lexicon': pos_lexicons,
     'score': pos_scores
    })

# Define negative emoji attributes

neg_words = [
    '🥺', '😞', '😠', '💢', '😤', 
    '😭',  '😢', '🖕', '😒', '🌧️'
    ]
neg_sentiments = ['negative'] * 10
neg_lexicons = ['EMJ'] * 10
neg_scores = [-1] * 10

negative_emojis = pd.DataFrame(
    {'word': neg_words,
     'sentiment': neg_sentiments,
     'lexicon': neg_lexicons,
     'score': neg_scores
    })

# Merge positive and negative df

emoji_df = pd.concat([positive_emojis, negative_emojis], axis=0)
emoji_df = emoji_df.reset_index()
display(emoji_df)


Unnamed: 0,index,word,sentiment,lexicon,score
0,0,💙,positive,EMJ,1
1,1,🌞,positive,EMJ,1
2,2,♥,positive,EMJ,1
3,3,🌈,positive,EMJ,1
4,4,❤️,positive,EMJ,1
5,5,💜,positive,EMJ,1
6,6,✨,positive,EMJ,1
7,7,😊,positive,EMJ,1
8,8,🙌,positive,EMJ,1
9,9,🤗,positive,EMJ,1


In [18]:
# Extend tidy_df to include emojis

print("Tidy DF Length Before Emojis: ", len(tidy_df))
tidy_df = pd.concat([tidy_df, emoji_df], axis=0)
print("Tidy DF Length After Emojis: ", len(tidy_df))

Tidy DF Length Before Emojis:  15133
Tidy DF Length After Emojis:  15153


In [27]:
# Sample 100,000 tweets

tweet_data = twitter_data.sample(100000)

# Calculate sentiment scores

tweet_data["song_sentiment"] = tweet_data["tokens"].apply(calc_sentiment)
tweet_data.head()


100000


Q: What is the average sentiment of your two artists? 

In [28]:
# Split tweets by artist

cher_tweets = tweet_data[tweet_data["artist"] == "cher"]
robyn_tweets = tweet_data[tweet_data["artist"] == "robyn"]

# Get sentiment averages 

print("Cher Average Sentiment: ", cher_tweets["song_sentiment"].mean())
print("Robyn Average Sentiment: ", robyn_tweets["song_sentiment"].mean())

Cher Average Sentiment:  0.36203513973056634
Robyn Average Sentiment:  0.32586039747939893


A: Based on the calculations displayed above, the average tweet sentiment for Cher is 0.36 and the average tweet sentiment for Robyn is 0.32. Tweets of Cher's followers tend to be slightly more positive.

---

Q: Which positive emoji is the most popular for each artist? Which negative emoji? 

In [57]:
# Function to check for emojis

def contains_emoji(s):
    s = str(s)
    emojis = [ch for ch in s if emoji.is_emoji(ch)]
    return(len(emojis) > 0)

# Function to get 25 most common emojis by artist

def emoji_counter(token_arr):
    word_counts = Counter(token_arr[0]).most_common()      # obtain word counts
    emojis = []
    for tup in word_counts:
        if emoji.is_emoji(tup[0]):      # use emoji library to check tuples
            emojis.append(tup)          # store if emoji
    return emojis[0:25]

# Check for present emojis in twitter sample

cher_tweets['has_emoji'] = cher_tweets["description"].apply(contains_emoji)
cher_tweets_ch = cher_tweets[cher_tweets['has_emoji']== True]
cher_tokens = cher_tweets["tokens"].tolist()
robyn_tweets['has_emoji'] = robyn_tweets["description"].apply(contains_emoji)
robyn_tweets_ch = robyn_tweets[robyn_tweets['has_emoji']== True]
robyn_tokens = robyn_tweets["tokens"].tolist()

# Print data rows with emoji

display(cher_tweets_ch["description"])
print("\n")
display(robyn_tweets_ch["description"])




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



908960     I don't consider myself a particularly ethical...
4088       F-Bomb Mom with a Beautiful Autistic Daughter ...
1348759                                      #Directioner_👑❤
2333966    Gender Fluid. 🔪Top Surgery 8/18/20🔪 Drag King-...
1023650    SupaFriend and SupaFan. “My flow is legendary ...
                                 ...                        
726469     Wide range of interests, shallow depth of know...
986286     20 살 (años) Venezuelan koren boy, the mixture ...
661885     I love Taylor Swift since I was a junior high....
3549413    I Am a Proud Disturbed, Yellowstone, Walking D...
1721229                                                    🌱
Name: description, Length: 8663, dtype: object





351542    Mufc for life, proud to be irish , #BigBangThe...
239817    🇵🇭👨🏻‍⚕️Medicine &👨🏻‍🔬Microbiomes. Perpetually ...
190515                     💖🍃Federica va, anche su Twitter.
30826     “I was gonna punch you but I’m holding wine.” ...
283578    Beautiful jewellery created & set just like th...
                                ...                        
43026     Music lover. Welsh 🏳️‍🌈 ❤️ 📻 Studio Buttons & ...
30466     enjoys long, romantic walks down Target and TJ...
186596    Hi guys, this is my another account, my first ...
299611    Hair & Makeup by Caitlyn Meyer ✨ Professional ...
33927                   Nolite Te Bastardes Carborundorum 🌹
Name: description, Length: 590, dtype: object

In [58]:
# Print Emoji counter

cher_em = emoji_counter(cher_tokens)
rob_em = emoji_counter(robyn_tokens)

print("Cher Emojis: \n")
pprint(cher_em)
print("Robyn Emojis: \n")
pprint(rob_em)

Cher Emojis: 

[]
Robyn Emojis: 

[]



A: Based on my selected list of positive and negative emojis, none were present in either the Cher or Robyn twitter dataframes from the 100,000 tweets sampled.