## Gilette Ad Notebook


In [None]:
#This is how you install python packages into Jupyter Notebook
#The package we will be using to interface with Twitter is Python-Twitter
!pip install python-twitter

In [1]:
pip


The following command must be run outside of the IPython shell:

    $ pip 

The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.

See the Python documentation for more informations on how to install packages:

    https://docs.python.org/3/installing/


In [7]:
#Importing packages and libraries, these packages are already downloaded on my computer so I do not have to install them
#like in the previous cell. You can do these for all packages you have downloaded and want to use in your notebook

#It's a good practice to import numerous libraries, in case you might need it later in your first or second cell

import twitter #<- from the python-twitter package. A Python wrapper around the Twitter API. 
#This library provides a pure Python interface for the Twitter API. It works with Python versions from 2.7+ and Python 3.
#Twitter exposes a web services API and this library is intended to make it even easier for Python programmers to use.

#Further documentation: **https://github.com/bear/python-twitter**

import re #<--from the regular expression *re* package. A regular expression is a special sequence of characters that 
#helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. 

#Further documentation: **https://docs.python.org/3/library/re.html**

import string #<- from the string package that useds common string operations such as lowercasing or uppercasing string

#Further documentation: **https://docs.python.org/3/library/string.html** 

import nltk #<- #from the NLTK package. NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules,
#data sets, and tutorials supporting research and development in Natural Language Processing

#Further documentation: http://www.nltk.org/ &&  https://www.nltk.org/book/ch01.html

#Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols 
#and other elements called tokens.

from nltk.tokenize import word_tokenize #tokenizes and breaks down tokens into words
from nltk.corpus import stopwords #takes out unneccessary stopwords, but decided not to use it this time
from nltk.stem import WordNetLemmatizer #lemmas are the canonical form of a set of words. The form of a word that 
#appears at the beginning of a dictionary or glossary entry. For example run, ran, and running all have the semantic
#representation of run
from nltk.tokenize import sent_tokenize #tokenizes and breaks down tokens into sentences *what we will be using*

In [8]:
#When you use python-twitter or any other twitter library, you must verify your Twitter API credentials  to get access
#Everyone with a developer account on Twitter is given a set of 4 keys consumer key & secret, & token_key & token_secret
#Feel free to use your own credentials

#Find out your credentials: https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html

api = twitter.Api(consumer_key="   ", #plug in your keys
  consumer_secret="  ",
  access_token_key="  ",
  access_token_secret="   ")

In [9]:
#Verifying credentials 
#As you can see in my profile snippet, my last status was on Mon August 13th sharing my article - yikes!
print(api.VerifyCredentials())

{"created_at": "Wed Jan 24 01:06:51 +0000 2018", "description": "Student. Writer & Musician. https://t.co/YMQYL4Aery", "favourites_count": 3, "friends_count": 12, "id": 955970115576717312, "id_str": "955970115576717312", "lang": "en", "name": "Chantel Diaz", "profile_background_color": "000000", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png", "profile_image_url": "http://pbs.twimg.com/profile_images/1020805483962118145/v-41dsY4_normal.jpg", "profile_image_url_https": "https://pbs.twimg.com/profile_images/1020805483962118145/v-41dsY4_normal.jpg", "profile_link_color": "F58EA8", "profile_sidebar_border_color": "000000", "profile_sidebar_fill_color": "000000", "profile_text_color": "000000", "screen_name": "chmd_", "status": {"created_at": "Mon Aug 13 15:06:54 +0000 2018", "favorite_count": 1, "id": 1029021483434160128, "id_str": "1029021483434160128", "lang": "en

In [30]:
#As always it's important to read the documentation of each programming language and each package you use

#In this case, python-twitter allows you to retrieve statuses by handle using GetUserTimeline function

#You assign "statuses" which retrieves a list back of 100 Gillette's most recent tweets. 
#When printing a list, you must iterate through the list to get all the elements in it.
#This is represented in s.text FOR s in statuses. This is called a list comprehension

#Read more about it here: https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

#Let's see what the people at Gillette are saying first and compare it to the public opinion
statuses = api.GetUserTimeline(screen_name="Gillette", count=100)
print([s.text for s in statuses])

["@TheGreekSimba We're so happy to hear that, Crayton. We believe in the best in men and want to use our platform to… https://t.co/TXYmX6F0aM", '@docrn1 Thank you! We appreciate your support.', "@Htigy55t We're so glad you liked it, Aviva. Men everywhere are already working to re-write the rules on what it lo… https://t.co/QyHcEfoJ1s", "@JimmyMorabito We appreciate your kind words, Jimmy. We're happy to hear you liked our Pure Shave Gel, we'll contin… https://t.co/s1FRkd2o1b", '@herb_betts Thank you for the support!', '@RSullPhoto We appreciate the support! Fusion 5 makes for a great shave!', '@tracyre We appreciate your support, Tracy.  Thank you for joining us in this important conversation!', '@richiedelapenha We greatly appreciate your support and kind words, Richard. As a brand that encourages men to be t… https://t.co/gELf2fpefs', '@InsidetheNCAA Congrats @Shaquemgriffin! This award is well deserved, keep up the hard work and dedication.', "@taytayallday92 We truly appreciate the

In [11]:
#Seems like people in Gillette are embracing their new campaign quite positively
#Before getting tweets, lets use a emoji stirp function from stack overflow to see if we can take them out

#This is where understanding the re package and its libraries are important. 
#Emojis are based on unicode characters, so you have to know the syntax to strip them

# http://stackoverflow.com/a/13752628/6762004
RE_EMOJI = re.compile('[\U00010000-\U0010ffff]', flags=re.UNICODE) #this compiles a pretty good range of emojis

#Full emoji list: https://unicode.org/emoji/charts/full-emoji-list.html

def strip_emoji(text): #def defines a new function you create, you can call it anything as long as you pass an argument in it
    return RE_EMOJI.sub(r'', text) #that function will then return the desired output, which is to substitute them as blanks

#Seems like it'll work pretty well

print(strip_emoji('🙄🤔'))




In [13]:
#Here we are putting it ALL together

#Another function is Python-Twitter is to GetSearch, which grabs hashtags or keywords of your choice. This is what we will
#be using for our purposes. Here we are finding #gillette, getting english only "en" tweets, and about 100 of them for now

search = api.GetSearch("#gillette", lang="en",count=100) #search is a list of tweets, similar to last time

#However here we need to create a FOR loop in our list to iterate through the elements to do the other tasks we don't want
#to do ourselves :D, such as cleaning! Again, here is where NLTK + regular expressions + the string module come into play

#Read more about for loops: https://www.w3schools.com/python/python_for_loops.asp

for t in search: #t could be literally called anything like x, since its calling the list element of search
    tweets = t.text.lower() #defining tweets as a new list that lowers all the text to lowercase using string module
    tweets = re.sub(r"http\S+", "", tweets) #using regular expressions to replace more symbols for blanks
    tweets = tweets.replace("…","") #The replace() method returns a copy of the string where all occurrences 
    #of a substring is replaced with another substring. In this case ... with blanks
    tweets = strip_emoji(tweets) #calling the strip_emoji function on the tweets
    sentences = sent_tokenize(tweets.replace('\n',' ')) #tokenizes tweets into sentences, replaces the line break '\n' with blank
    clean_words = [word for word in sentences if word not in set(string.punctuation)] #another list comprehension in the 
    #sentences list that will clean out the words with string punctuation and return clean_words
    characters_to_remove = ["''",'``','...', "'rt",'"rt', ','] #creating another list with other characters we want to remove like quotes
    #and more. put things in double quotes when a quote is used in the characters removed
    clean_words = [word for word in clean_words if word not in set(characters_to_remove)] #applying that list to our clean_words
    #to take out the characters we want to remove, characters_to_remove can be further added as well
    characters_to_remove2 = [word for word in clean_words if any(letter in sentences for letter in '\\')] #taking out other slashes
    clean_words = [word for word in clean_words if word not in set(characters_to_remove2)] #again applying that list
    print(clean_words) #print to see the final output of your new clean_words list from the original search

['we’re told that there is a crisis of masculinity.', 'from #metoo to #gillette - something isn’t working.', 'garrett j wh']
["@basedpoland @crusaderkeif a country run by women who don't approve of masculinity.", '#gillette #procterandgamble']
['nervous to apply for a job like "food champion" at border foods?', "apply even if you're not a 100% match.", 'you might b']
['rt @cbckidsnews: last month, #gillette released a controversial ad campaign calling on men to "be better" for the men of tomorrow.', 'but what']
["don't be shy.", 'score a job like "cross utilized agent - gcc" at skywest airlines by asking for referrals.', "it's a gre"]
['wonder what #gillette will do about their sponsorship of the #pats stadium?', '#thebestmencanbe #robertkraft #podcast']
["dear massachusetts officials: any plans for a parade to honor the brutalized women who cater to robert kraft's sexu"]
['interested in a job in #gillette, wy?', 'this could be a great fit.', 'click the link in our bio to apply: floorh

In [15]:
#Similar concept here:
search = api.GetSearch("gillette", lang='en', count=100)
for t in search:
    tweets = t.text.lower()
    tweets = re.sub(r"http\S+", "", tweets)
    tweets = tweets.replace("…","")
    tweets = strip_emoji(tweets)
    sentences = sent_tokenize(tweets.replace('\n',' '))
    clean_words = [word for word in sentences if word not in set(string.punctuation)]
    characters_to_remove = ["''",'``','...']
    clean_words = [word for word in clean_words if word not in set(characters_to_remove)]
    characters_to_remove2 = [word for word in clean_words if any(letter in sentences for letter in '\\')]
    clean_words = [word for word in clean_words if word not in set(characters_to_remove2)]
    print(clean_words)

['when the krafts raised the gillette stadium sign two years ago to make room for a 5th super bowl banner, they had t']
['the  sheds his coat.', 'tb12 stops by @gillette headquarters in boston for a fresh, post-super bowl shave.']
["turns out almost everyone loved that 'controversial' gillette ad about toxic masculinity․"]
['@coachvass @suffragentleman @realamberheard @piersmorgan @gillette why is that?']
['don\'t show this "toxic masculinity" to the "gillette democratic girlies in white" club !!!', 'ahahahah !!!!!!']
['rt @emilyjashinsky: this message brought to you by gillette.']
['rt @gyakutennomeg: two little boys roughhousing?', 'labeled "toxic masculinity" by gillette.', 'little black girl punching a little white boy in']
['rt @emilyjashinsky: this message brought to you by gillette.']
['@gillette nothing says customer service like ignoring someone and then banishing them to on hold.']
['rt @mrewanmorrison: "metropolis" was a plan for a utopian future city (1894) by king camp gil

In [16]:
#and here, since some people have mispelled gilette
search = api.GetSearch("#gilette", lang='en', count=200)
for t in search:
    tweets = t.text.lower()
    tweets = re.sub(r"http\S+", "", tweets)
    tweets = tweets.replace("…","")
    tweets = strip_emoji(tweets)
    sentences = sent_tokenize(tweets.replace('\n',' '))
    clean_words = [word for word in sentences if word not in set(string.punctuation)]
    characters_to_remove = ["''",'``','...']
    clean_words = [word for word in clean_words if word not in set(characters_to_remove)]
    characters_to_remove2 = [word for word in clean_words if any(letter in sentences for letter in '\\')]
    clean_words = [word for word in clean_words if word not in set(characters_to_remove2)]
    print(clean_words)

['@titaniamcgrath where is a #gilette when you need one.']
['if  @gillette had said  this  ,  this way ... the backslash  were  no where  near the one']
['rt @chandana_hiran: @kapoors_s @karanjohar @katja_iversen @wef @sayftycom .', '@karanjohar i really appreciate that you addressed objectificati']
['rt @realcandaceo: i just hope #gilette makes an ad reminding women not to fake their sexual assaults.', 'really important we be reminded not']
['rt @realcandaceo: i just hope #gilette makes an ad reminding women not to fake their sexual assaults.', 'really important we be reminded not']
['@judeochretiens is this the best the women can get ?', '#gilette #boycottgilette #feminism']
['rt @realcandaceo: i just hope #gilette makes an ad reminding women not to fake their sexual assaults.', 'really important we be reminded not']
['rt @realcandaceo: i just hope #gilette makes an ad reminding women not to fake their sexual assaults.', 'really important we be reminded not']
['rt @realcandaceo: i ju

In [18]:
#another slogan attached to the campaign
search = api.GetSearch("thebestamancanget", lang='en', count=200)
for t in search:
    tweets = t.text.lower()
    tweets = re.sub(r"http\S+", "", tweets)
    tweets = tweets.replace("…","")
    tweets = strip_emoji(tweets)
    sentences = sent_tokenize(tweets.replace('\n',' '))
    clean_words = [word for word in sentences if word not in set(string.punctuation)]
    characters_to_remove = ["''",'``','...']
    clean_words = [word for word in clean_words if word not in set(characters_to_remove)]
    characters_to_remove2 = [word for word in clean_words if any(letter in sentences for letter in '\\')]
    clean_words = [word for word in clean_words if word not in set(characters_to_remove2)]
    print(clean_words)

['my lad preparing for a smooth transition into adulthood.', '@gillette #1stshave  #thebestamancanget']
['rt @sneville15: hey boys club - it’s not ok to find humour in terms derogatory towards women, it’s revolting!', 'how about standing up for wha']
['hey boys club - it’s not ok to find humour in terms derogatory towards women, it’s revolting!', 'how about standing up']
['new video is up!!', 'was the gillette ad nasty or necessary?', 'why are opinions so divided?', "here's what i think!"]
['rt @redeemervip: this man should be ashamed #gillette #thebestamancanget #metalgearsolidvthephantompain #xboxshare']
["@cheese_kun89 woah bro, rape's not cool.", '#thebestamancanget']
['#tbt“the best a man can get” original gillette commercial (1989)  #atari #videogamecollection #classicgaming #nes']
["rt @b2bcopychat: time for q2: more and more brands are getting political and taking a stand - we all remember @gillette's #thebestamancange"]
["time for q2: more and more brands are getting political

In [None]:
# Our cleaning function isn't perfect because we did not tokenize
#the tweets into words, seperating them, where we could've better deleted the rts and the @s. But generally this is one approach,
#and could be further modified, exported into csv, and cleaned up further in Excel