# Social Media and Data Science - Part 5

### Goal: Use social media posts to explore the appplication of text and natural language processing to see what might be learned from online interactions.

Specifically, we will retrieve, annotate, process, and interpret Twitter data on health-related issues such as smoking.

--- 
References:
* [Mining Twitter Data with Python (Part 1: Collecting data)](https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/)
* The [Tweepy Python API for Twitter](http://www.tweepy.org/)

Required Software
* [Python 3](https://www.python.org)
* [NumPy](http://www.numpy.org) - for preparing data for plotting
* [Matplotlib](https://matplotlib.org) - plots and garphs
* [jsonpickle](https://jsonpickle.github.io) for storing tweets. 
---

In [1]:
%matplotlib inline

import operator
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import jsonpickle
import json
import random
import tweepy
import spacy
import time
from datetime import datetime
from spacy.symbols import ORTH, LEMMA, POS

# 5.0 Introduction

This final part of our journey through social media data retrieval, annotation, natural langauge processing, and classififcation will challenge you to apply these techniques to a new problem. Specifically, you will create, annotate, and process a new data set.

# 5.0.1 Setup

As before, we start with the Tweets class and the configuration for our Twitter API connection.  We may not need this, but we'll load it in any case.

In [34]:
class Tweets:
    
    
    def __init__(self,term="",corpus_size=100):
        self.tweets={}
        if term !="":
            self.searchTwitter(term,corpus_size)
                
    def searchTwitter(self,term,corpus_size):
        searchTime=datetime.now()
        while (self.countTweets() < corpus_size):
            new_tweets = api.search(term,lang="en",tweet_mode='extended',count=corpus_size)
            for nt_json in new_tweets:
                nt = nt_json._json
                if self.getTweet(nt['id_str']) is None and self.countTweets() < corpus_size:
                    self.addTweet(nt,searchTime,term)
            time.sleep(120)
                
    def addTweet(self,tweet,searchTime,term="",count=0):
        id = tweet['id_str']
        if id not in self.tweets.keys():
            self.tweets[id]={}
            self.tweets[id]['tweet']=tweet
            self.tweets[id]['count']=0
            self.tweets[id]['searchTime']=searchTime
            self.tweets[id]['searchTerm']=term
        self.tweets[id]['count'] = self.tweets[id]['count'] +1
        
    def combineTweets(self,other):
        for otherid in other.getIds():
            tweet = other.getTweet(otherid)
            searchTerm = other.getSearchTerm(otherid)
            searchTime = other.getSearchTime(otherid)
            self.addTweet(tweet,searchTime,searchTerm)
        
    def getTweet(self,id):
        if id in self.tweets:
            return self.tweets[id]['tweet']
        else:
            return None
    
    def getTweetCount(self,id):
        return self.tweets[id]['count']
    
    def countTweets(self):
        return len(self.tweets)
    
    # return a sorted list of tupes of the form (id,count), with the occurrence counts sorted in decreasing order
    def mostFrequent(self):
        ps = []
        for t,entry in self.tweets.items():
            count = entry['count']
            ps.append((t,count))  
        ps.sort(key=lambda x: x[1],reverse=True)
        return ps
    
    # reeturns tweet IDs as a set
    def getIds(self):
        return set(self.tweets.keys())
    
    # save the tweets to a file
    def saveTweets(self,filename):
        json_data =jsonpickle.encode(self.tweets)
        with open(filename,'w') as f:
            json.dump(json_data,f)
    
    # read the tweets from a file 
    def readTweets(self,filename):
        with open(filename,'r') as f:
            json_data = json.load(f)
            incontents = jsonpickle.decode(json_data)   
            self.tweets=incontents
        
    def getSearchTerm(self,id):
        return self.tweets[id]['searchTerm']
    
    def getSearchTime(self,id):
        return self.tweets[id]['searchTime']
    
    def getText(self,id):
        tweet = self.getTweet(id)
        text=tweet['full_text']
        if 'retweeted_status'in tweet:
            original = tweet['retweeted_status']
            text=original['full_text']
        return text
                
    def addCode(self,id,code):
        tweet=self.getTweet(id)
        if 'codes' not in tweet:
            tweet['codes']=set()
        tweet['codes'].add(code)
        
   
    def addCodes(self,id,codes):
        for code in codes:
            self.addCode(id,code)
        
 
    def getCodes(self,id):
        tweet=self.getTweet(id)
        return tweet['codes']
    
    # NEW -ROUTINE TO GET PROFILE
    def getCodeProfile(self):
        summary={}
        for id in self.tweets.keys():
            tweet=self.getTweet(id)
            if 'codes' in tweet:
                for code in tweet['codes']:
                    if code not in summary:
                            summary[code] =0
                    summary[code]=summary[code]+1
        sortedsummary = sorted(summary.items(),key=operator.itemgetter(0),reverse=True)
        return sortedsummary

*REDACT FOLLOWING DETAILS*

In [3]:
consumer_key='D2L4YZ2YrO1PMix7uKUK63b8H'
consumer_secret='losRw9T8zb6VT3TEJ9JHmmhAmn1GXKVj30dkiMv9vjhXuiWek9'
access_token='15283934-iggs1hiZAPI2o5sfHWMfjumTF7SvytHPjpPRGf3I6'
access_secret='bOvqssxS97PGPwXHQZxk83KtAcDyLhRLgdQaokCdVvwFi'

In [4]:
from tweepy import OAuthHandler

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

We will also load some routines that we defined in [Part 3](SocialMedia - Part 3.ipynb):
    
1. Our routine for creating a customized NLP pipeline
2. Our routine for including tokens
3. The `filterTweetTokens` routine defined in an exercise (Without the inclusion of named entities. It will be easier to leave them out for now).

In [5]:
def getTwitterNLP():
    nlp = spacy.load('en')
    
    for word in nlp.Defaults.stop_words:
        lex = nlp.vocab[word]
        lex.is_stop = True
    
    special_case = [{ORTH: u'e-cigarette', LEMMA: u'e-cigarette', POS: u'NOUN'}]
    nlp.tokenizer.add_special_case(u'e-cigarette', special_case)
    nlp.tokenizer.add_special_case(u'E-cigarette', special_case)
    vape_case = [{ORTH: u'vape',LEMMA:u'vape',POS: u'NOUN'}]
    
    vape_spellings =[u'vap',u'vape',u'vaping',u'vapor',u'Vap',u'Vape',u'Vapor',u'Vapour']
    for v in vape_spellings:
        nlp.tokenizer.add_special_case(v, vape_case)
    def hashtag_pipe(doc):
        merged_hashtag = True
        while merged_hashtag == True:
            merged_hashtag = False
            for token_index,token in enumerate(doc):
                if token.text == '#':
                    try:
                        nbor = token.nbor()
                        start_index = token.idx
                        end_index = start_index + len(token.nbor().text) + 1
                        if doc.merge(start_index, end_index) is not None:
                            merged_hashtag = True
                            break
                    except:
                        pass
        return doc
    nlp.add_pipe(hashtag_pipe,first=True)
    return nlp

def includeToken(tok):
    val =False
    if tok.is_stop == False:
        if tok.is_alpha == True: 
            if tok.text =='RT':
                val = False
            elif tok.pos_=='NOUN' or tok.pos_=='PROPN' or tok.pos_=='VERB':
                val = True
        elif tok.text[0]=='#' or tok.text[0]=='@':
            val = True
    if val== True:
        stripped =tok.lemma_.lower().strip()
        if len(stripped) ==0:
            val = False
        else:
            val = stripped
    return val

def filterTweetTokens(tokens):
    filtered=[]
    for t in tokens:
        inc = includeToken(t)
        if inc != False:
            filtered.append(inc)
    return filtered

Finally, we will include some additional modules from Scikit-Learn:

In [6]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.base import TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.stop_words import ENGLISH_STOP_WORDS
from sklearn.metrics import accuracy_score
import string
import re

Now, we're ready to go along for an exercise

---
## EXERCISE 5.1: Annotating and classifying new data


Identifying the source of social media comments might be an important step in the process of interpreting a large corpus. Continuing with our example of smoking and vaping, it might be interesting to compare tweets from users - people who are talking about their own personal use  to those who might be either promoting vaping  (manufacturers, sponsors, etc.) or warning about dangers of vaping (physicians, researchers, public health agencies, etc.).

A team of researchers at RTI International tackled this problem in a 2018 paper [Classification of Twitter Users Who Tweet About E-Cigarettes](http://publichealth.jmir.org/2017/3/e63/) by Annice Kim and colleagues collected tweets and attributed them to individuals, enthusiasts, "informed agencies (news media or health community), marketers, or spammers. 

Your goal here is to collect a small data set and to attempt a smaller version of this challenge. Specifically, we will try to collect preliminary data for a classifier capable of identifing tweets from users of e-cigarettes vs. others.  Using any of the code found in Parts 1-4, complete these steps:

1. Run some searches for tweets like 'e-cig', 'e-cigarette', 'vape' and 'vaping'. Collect a corpus of 200-300  or more tweets. You might want to save each of these result sets in files.

2. Combine these tweets into one large collection using the 'Tweet' class listed above. Save the results in a file 

3. Annotate 50 of these tweets as pertaining to either 'individual' or 'non-individual'. Be sure that you do at least a few of the tweets from each of the original sets. One way to do this might be to randomize the tweets. Save the annotated results in a file. Look at the distrbution. Is it close to even? If not, do more.

4. Take your annotated tweets - split them into train (80%) and test (20%) sets.  Process the train data and build a model (based on a TfIdf Vectorizer and an SVM). Evaluate the model on the test data sets.

5. Test your model on the remaining tweets. What does your result look like?

6. Review the data to identify opportunities for improvement - how might you make these models bettter?



----
*ANSWER BELOW - CUT BELOW HERE*

### 1. Running some Searches

In [7]:
ecig = Tweets("vape",100)

In [8]:
ecig.saveTweets("vape1.json")

In [9]:
ecig2 = Tweets("ecig",100)

In [10]:
ecig.countTweets()

100

In [12]:
ecig2.saveTweets("ecig1.json")

In [13]:
ecig3 = Tweets("vaping",100)

In [15]:
ecig3.saveTweets("vaping1.json")

In [7]:
ecig4 = Tweets("vaping",100)

In [8]:
ecig4.saveTweets("vaping2.json")

In [9]:
ecig5 = Tweets("e-cigarette",100)

In [10]:
ecig5.saveTweets("ecig2.json")

In [11]:
vape2=Tweets("vaping",100)
vape2.saveTweets("vape2.json")

### 2. combine results of searches and save. 

In [35]:
fullTweets = Tweets()
fullTweets.readTweets("ecig1.json")

In [36]:
ecig2 = Tweets()
ecig2.readTweets("ecig2.json")

In [37]:
fullTweets.combineTweets(ecig2)

In [38]:
fullTweets.countTweets()

200

In [39]:
vape1=Tweets()
vape1.readTweets("vape1.json")
fullTweets.combineTweets(vape1)

In [40]:
fullTweets.countTweets()

300

In [41]:
vape2=Tweets()
vape2.readTweets("vape2.json")
fullTweets.combineTweets(vape2)

In [42]:
vaping1=Tweets()
vaping1.readTweets("vaping1.json")
fullTweets.combineTweets(vaping1)
vaping2=Tweets()
vaping2.readTweets("vaping2.json")
fullTweets.combineTweets(vaping2)

In [43]:
fullTweets.countTweets()

591

In [44]:
fullTweets.saveTweets("part5.json")

### 3. annotating 50 tweets.

#### randomly select...

In [50]:
ids=list(fullTweets.getIds())

In [51]:
len(ids)

591

In [52]:
import random
random.shuffle(ids)

In [57]:
id = ids[0]
fullTweets.getText(id)

'Pick up a spare SERIES-S17 900mAh battery for vaping on the go for as low as £9.99. https://t.co/cVUBURPHml'

In [58]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [59]:
id = ids[1]
fullTweets.getText(id)

"If you wanna vape or use your ecig or whatever that's totally fine with me, you do you. But don't blow that strawberry watermelon popcorn blueberry cheesecake cotton candy poptart flavored nasty shit on me is all I ask. \n\nThanks."

In [60]:
fullTweets.addCode(id,"INDIVIDUAL")

In [61]:
id = ids[2]
fullTweets.getText(id)

'UBLO HEMP CBD Vape Juice | E Liquid | eliquid 0% Nicotine 6 STRENGTH 6 Flavours Hemp Oil ON E BAY https://t.co/WXo8pMmey0 … … … … #vapecommunity #vapefamily #vapeshop #Vape #cloudchaser #vapelife #vapeporn #CBD #JBRT18VAPE #ublo #CBDlife #atsocialmedia #tweetmaster #hemp https://t.co/Rle6ntapu0'

In [62]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [63]:
id = ids[3]
fullTweets.getText(id)

'@Washington_vape THANK YOU!!! ❤\n@mymass_ @twik_star @RBlazick @boxerlad680 @Heavenlyink @Stephen_40s @vaping1967 @mattKirkham5 @Sonic_vaper1 @Dripping_Hippie @ScreamQueen131 @Heavencantwait @sarcasticvaper @LordCVapes @Vaping_Train\xa0 @Vixxen_85 @AnibalAsenjo @ZGyurko @VapingKaren ☇#ARIAS🤘🏾'

In [64]:
fullTweets.addCode(id,"INDIVIDUAL")

In [65]:
id = ids[4]
fullTweets.getText(id)

'virgo energy is not allowing a single thing to be out of place at ur job &amp; being the cleanest worker but having the messiest house &amp; interpersonal relationship skills, vaping/essential oils, dropping everything 4 spontaneous trips to new cities, only reading murder mystery novels'

In [66]:
fullTweets.addCode(id,"INDIVIDUAL")

In [67]:
id = ids[5]
fullTweets.getText(id)

'FDA requires additional e-cigarette makers to provide critical information so the agency can better examine youth use and product appeal, amid continued concerns around youth access to products https://t.co/bdSNWxQYly'

In [68]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [69]:
id = ids[6]
fullTweets.getText(id)

'eliquidfrance  -  #Repost patry_vapes2 • • •\n#eliquidfrance#ejuice #vapepics #vapeon #vapenation #vapelove #vapelife #vape #vapefam #vapecommunity #instavape #ecig #eliquid #vapepics… https://t.co/8Q3fIEzVhB'

In [70]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [71]:
id = ids[7]
fullTweets.getText(id)

'I ain’t think my vape could get a nigga high like this'

In [72]:
fullTweets.addCode(id,"INDIVIDUAL")

In [73]:
id = ids[8]
fullTweets.getText(id)

"The worst part of switching to Vaping is showing up to Bike Night smelling like Lots 'O Hugging Bear"

In [74]:
fullTweets.addCode(id,"INDIVIDUAL")

In [75]:
id = ids[9]
fullTweets.getText(id)

'@andgoseek Hope it was Fantastic Vape juice. I wouldn’t allow any inferior vaping liquid in my peepers. 💨👀 https://t.co/h9Y0E20WfU'

In [76]:
fullTweets.addCode(id,"INDIVIDUAL")

In [77]:
id = ids[10]
fullTweets.getText(id)

'I got new neos (for KENT glo e-cigarette)!\n#newglo #kentglo https://t.co/C5qMXQ3hdp'

In [78]:
fullTweets.addCode(id,"INDIVIDUAL")

In [79]:
id = ids[11]
fullTweets.getText(id)

'If anybody is into vaping check out @SimpleTDR second channel. It’s not really my thing but I still love to make sure we support everybody we possibly can even if it’s just sharing this so more eyes can see it!   #OwenAndLiamMovement https://t.co/BIGEgq23UU'

In [80]:
fullTweets.addCode(id,"INDIVIDUAL")

In [81]:
id = ids[12]
fullTweets.getText(id)

'VZone eMask Review – It’s the stained glass looking one… https://t.co/flkJxJfuKp #Vape #ecig'

In [82]:
fullTweets.addCode(id,"INDIVIDUAL")

In [83]:
id = ids[13]
fullTweets.getText(id)

"If you wanna vape or use your ecig or whatever that's totally fine with me, you do you. But don't blow that strawberry watermelon popcorn blueberry cheesecake cotton candy poptart flavored nasty shit on me is all I ask. \n\nThanks."

In [84]:
fullTweets.addCode(id,"INDIVIDUAL")

In [85]:
id = ids[14]
fullTweets.getText(id)

'just bought a pack of cigs, tryna quit vaping'

In [86]:
fullTweets.addCode(id,"INDIVIDUAL")

In [87]:
id = ids[15]
fullTweets.getText(id)

"If you wanna vape or use your ecig or whatever that's totally fine with me, you do you. But don't blow that strawberry watermelon popcorn blueberry cheesecake cotton candy poptart flavored nasty shit on me is all I ask. \n\nThanks."

In [88]:
fullTweets.addCode(id,"INDIVIDUAL")

In [89]:
id = ids[16]
fullTweets.getText(id)

'Flavor Spam | Fake Anti-#VAPING Comments Flood FDA | Published on - https://t.co/dtgkOeSzlP #ecig #ukvapers #vape #VapeUK #vaping https://t.co/wL8wYMoWxT'

In [90]:
fullTweets.addCode(id,"INDIVIDUAL")

In [91]:
id = ids[17]
fullTweets.getText(id)

"If you wanna vape or use your ecig or whatever that's totally fine with me, you do you. But don't blow that strawberry watermelon popcorn blueberry cheesecake cotton candy poptart flavored nasty shit on me is all I ask. \n\nThanks."

In [92]:
fullTweets.addCode(id,"INDIVIDUAL")

In [93]:
id = ids[18]
fullTweets.getText(id)

"Looking for the best all-in-one system? Here it is! Joyetech All in One AIO eGo Box Mod Innovative leakage resistance design and childproof structure. Hurry, they're going fast!  https://t.co/5TOBzFt8qy .@Cuecig #vape  #ejuice #ecig #vaping #vapelife #vapefam #vapefriends #vapeon https://t.co/3YLmAeHqYI"

In [94]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [95]:
id = ids[19]
fullTweets.getText(id)

"Missing the 'smoker's identity' is an important cause of #relapse after quitting. Quitting = losing familiar rituals, coping strategies for stress, social group etc\n#vaping provides a similar alternative identity to ease the change\n@caitlinnuea\nhttps://t.co/gU7hTccjIs"

In [96]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [97]:
id = ids[20]
fullTweets.getText(id)

"Toxic Levels Of Lead And Other Metals Found In E-Cigarette Vapor - \nDon't believe the propaganda hype!  https://t.co/xJpBZQHpV8 https://t.co/WtF1hPQ1Dr"

In [98]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [99]:
id = ids[21]
fullTweets.getText(id)

'Vaping Man Climbs Freeway Sign In L.A. Before Backflipping To The Ground, Shutting Down Highway During Rush Hour https://t.co/6uAh9dg8kw'

In [100]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [102]:
id = ids[22]
fullTweets.getText(id)

'E-cigarettes: How "safe" are they? #Vape #Tobacco #HeartDisease\n"e-cigarette vapors contain toxic substances, including the heavy metals lead, cadmium, and nickel." @DrMarthaGulati https://t.co/Z17K3XxDJG https://t.co/tyuBRAU2XB'

In [103]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [104]:
id = ids[23]
fullTweets.getText(id)

'Cults in MI\n-Suburban boys who say they’re from Detroit\n-Fishermen\n-Craft Beer Junkies \n-Yoopers\n-Middle Schoolers who vape\n-Dads with boats \n-Fans of UofM/MSU who didn’t attend either school\n-Red Wings fans \n-Anyone at Electric Forest/Faster Horses \n-Hockey Moms \n-Chicks'

In [105]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [106]:
id = ids[24]
fullTweets.getText(id)

'Cults at Rowan: \n-townies posted up outside 7/11\n-Greek life on the 4th floor of the lib \n-ppl who vape on the roof of robo parking garage \n-white guys that play bball everyday at the rec https://t.co/y3umAUNmNp'

In [107]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [108]:
id = ids[25]
fullTweets.getText(id)

'“…once e-cigarette use hits critical mass, the revolution will become unstoppable.”'

In [109]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [110]:
id = ids[26]
fullTweets.getText(id)

'@LarrySharpe @meetzman @VapinGreek @Vapingit @VapingmominRI @vaping4lifer @cigarbabe2 @Vaping_Train @IamBroony @castello2 @BrisyCoe @VapinSquirrel @cyncab @PositiveEnerG @mihotep @imaracingmom @skrymir42 @phil_w888 @Mykl0 @ojkershaw @RingersmaB @Jake2001 @ItsJohnConner @jsummers71 @SenRonJohnson @CynthiaNixon @HouseDemocrats @SenateDems Thanks for sharing... Very powerful messages in that discussion. Love the part about #innovation in particular. Problem is finding the right backing to bring the innovation to light or in my case to market.'

In [111]:
fullTweets.addCode(id,"INDIVIDUAL")

In [112]:
id = ids[27]
fullTweets.getText(id)

'Vision Spinner II | Kangertech GeniTank Pro Starter Kit - Step up the quality of your vape with our Vision II/ Kangertech Mini Genitank system.\xa0Lots of colors!  .@cuecig #vape #ejuice #vaping #vapelife #vapefam #vapeon https://t.co/2PXsczVaym https://t.co/emUx1UfowB'

In [113]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [114]:
id = ids[28]
fullTweets.getText(id)

"If you wanna vape or use your ecig or whatever that's totally fine with me, you do you. But don't blow that strawberry watermelon popcorn blueberry cheesecake cotton candy poptart flavored nasty shit on me is all I ask. \n\nThanks."

In [115]:
fullTweets.addCode(id,"INDIVIDUAL")

In [116]:
fullTweets.saveTweets("part5-annotated.json")

In [118]:
id = ids[29]
fullTweets.getText(id)

'Looking for a wonderful, small #ecig to carry with you? Eleaf Icare 2 AIO Starter Kit with 2ml tank rubber finish, a softer &amp; smoother feel in hand. 6 colors!  .@Cuecig #vape #ejuice #vaping #vapelife #vapefam #startvaping #vapor #vapenation #vapeon  https://t.co/jiIAsvnZ8m https://t.co/6TB5yJxGh3'

In [119]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [120]:
id = ids[30]
fullTweets.getText(id)

'Life, Liberty and the Pursuit of #Vaping  https://t.co/v7ut8kHauB https://t.co/eBBOkasbnv'

In [121]:
fullTweets.addCode(id,"INDIVIDUAL")

In [122]:
id = ids[31]
fullTweets.getText(id)

'@cloakzy @TTfue we get it u vape (ilu, good job guys)'

In [123]:
fullTweets.addCode(id,"INDIVIDUAL")

In [124]:
id = ids[32]
fullTweets.getText(id)

'I’ve been driving for almost a decade and I’ve had 32 cars and I’ve had ONE flat tire, ever because I ran over a random e-cigarette and it exploded but that doesn’t take away that @Mitchel_Lutton has had 3 flat tires this year alone. https://t.co/AP2F0hdvyL'

In [125]:
fullTweets.addCode(id,"INDIVIDUAL")

In [126]:
id = ids[33]
fullTweets.getText(id)

'A new report from The Smoking in Pregnancy Challenge Group calls for greater support for and better understanding of vaping\nhttps://t.co/kgDxVfzL7x @francinebates1 @LindaBauld @GillWaltonRCM #vaping #vapenews #pregnancy #health #quitsmoking #RCM https://t.co/I3kXzocYqA'

In [127]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [128]:
id = ids[34]
fullTweets.getText(id)

'Come in to work to open, managers all sitting around one of our tables drinking some wine... one of them starts vaping in the middle of the restaurant, and one is baked as fuck. It’s a good day'

In [129]:
fullTweets.addCode(id,"INDIVIDUAL")

In [130]:
id = ids[35]
fullTweets.getText(id)

'Dad of six nearly has his todger blasted off when his e-cigarette battery exploded in his pocket https://t.co/5mfG6cYy6E'

In [131]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [132]:
id = ids[36]
fullTweets.getText(id)

'Current #PROMO https://t.co/vgOhmNqVix | VAPEPENstore | #Delaware #Milford #Bear #Vaping #eJuice #VapePen #Business #Coinbase #Peercoin'

In [133]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [134]:
id = ids[37]
fullTweets.getText(id)

'virgo energy is not allowing a single thing to be out of place at ur job &amp; being the cleanest worker but having the messiest house &amp; interpersonal relationship skills, vaping/essential oils, dropping everything 4 spontaneous trips to new cities, only reading murder mystery novels'

In [135]:
fullTweets.addCode(id,"INDIVIDUAL")

In [136]:
id = ids[38]
fullTweets.getText(id)

'6:30 AM:\n✅ chocolate milk\n✅ vaping\n✅ blasting chief keef'

In [137]:
fullTweets.addCode(id,"INDIVIDUAL")

In [138]:
id = ids[39]
fullTweets.getText(id)

'virgo energy is not allowing a single thing to be out of place at ur job &amp; being the cleanest worker but having the messiest house &amp; interpersonal relationship skills, vaping/essential oils, dropping everything 4 spontaneous trips to new cities, only reading murder mystery novels'

In [139]:
fullTweets.addCode(id,"INDIVIDUAL")

In [140]:
ids[37]==ids[39]

False

In [141]:
id = ids[40]
fullTweets.getText(id)

"Missing the 'smoker's identity' is an important cause of #relapse after quitting. Quitting = losing familiar rituals, coping strategies for stress, social group etc\n#vaping provides a similar alternative identity to ease the change\n@caitlinnuea\nhttps://t.co/gU7hTccjIs"

In [142]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [143]:
id = ids[41]
fullTweets.getText(id)

'Time to do some mixing 👍👍💨💨🇮🇪🇮🇪\n#vape #vapeporn #vapingirland #vapegram #vapefam #vapecrew #vapejuice #ejuice #vapecomunity #vapepics #vapedaily #vapestagram #subohm #instvape #vapetricks… https://t.co/zHlcHu4nZU'

In [144]:
fullTweets.addCode(id,"INDIVIDUAL")

In [145]:
id = ids[42]
fullTweets.getText(id)

'vaping with cheddar ruffles in my mouth: the emma dexter story'

In [146]:
fullTweets.addCode(id,"INDIVIDUAL")

In [147]:
id = ids[43]
fullTweets.getText(id)

'Vaping Has Less Than 1% the Cancer Risk of Smoking\nBy Jim McDonald @whycherrywhy\nhttps://t.co/y36fuiQK8z'

In [148]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [149]:
id = ids[44]
fullTweets.getText(id)

'Personally I don’t trust Michael &amp; a lot of what he said. \nY’ALL WHEN THE INTERVIEW STARTED HE WAS LITERALLY VAPING &amp; THEN STARTS TALKING ABOUT HOW HE’S GONNA LOSE “HIS” HOUSE THAT ISN’T EVEN HIS uH-'

In [150]:
fullTweets.addCode(id,"INDIVIDUAL")

In [151]:
id = ids[45]
fullTweets.getText(id)

'@EazyBreezyWind @ArdentGG_ Can guarantee hes better at Vaping then CSGO. Lmao'

In [152]:
fullTweets.addCode(id,"INDIVIDUAL")

In [153]:
id = ids[46]
fullTweets.getText(id)

"'That's going to be an issue': Lawyers weigh in on apartment smoking and vaping ban | CBC News https://t.co/EQdaiN5irE"

In [154]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [155]:
id = ids[47]
fullTweets.getText(id)

'New arrival Smok Novo pod from The Green Vapor\n#smokingshop #smokeshop #ecigarette #ecig #ecigs https://t.co/KMqpOvb3ED'

In [156]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [157]:
id = ids[48]
fullTweets.getText(id)

'@theessexvaper @Washington_vape @Stephen_40s @che170 @boxerlad680 @JAMDOG_ @twik_star @bapestar001 @Vixxen_85 @Heavenlyink @Sonic_vaper1 @Kieronbeckett @d17j17h17 @Vaping_Train @helenking6 @big_dripper @fuzzieduck @m_jones27 @ScreamQueen131 @soozehugs @ianrgorringe @GreenVapinGiant Thanks buddy 👍'

In [158]:
fullTweets.addCode(id,"INDIVIDUAL")

In [160]:
id = ids[49]
fullTweets.getText(id)

'E-cigarette explodes in man’s pocket only ‘two inches away from his penis’ https://t.co/W39HNxZt5g'

In [161]:
fullTweets.addCode(id,"NON-INDIVIDUAL")

In [162]:
id = ids[50]
fullTweets.getText(id)

'Are my lungs sore due to germs or excessive vaping - a memoir'

In [163]:
fullTweets.addCode(id,"INDIVIDUAL")

In [164]:
fullTweets.saveTweets("part5-annotated.json")