# GetOldTweets3

### Contents:
- [Import Packages](#Import-Packages)
- [Scrapping Tweets](#Scrapping-Tweets)
- [Running with additional incidents/ locations](#Running-with-additional-incidents/-locations)

For this project we decided to focus on floods within the United States. To obtain flood data we scraped Tweets using the GetOldTweets3 API. These APIs allowed us to scrape tweets with certain keywords pertaining to floods, from a certain location, within a certain timeframe.

# Import Packages

In [1]:
import GetOldTweets3 as got
import pandas as pd

from warnings import catch_warnings
from warnings import filterwarnings
import warnings

warnings.simplefilter(action="ignore")

# Scrapping Tweets

Function for scrapping each state:
Steps:
1. import tweets 
2. created nested list of tweets and tweet attributes (username, text, date, retweets, favorites, mentions, hashtags, geo) 
3. created dataframe of the list created in 2 
4. created a target column = 0 or 1 depending on when we pulled from 
5. created a user_split column that split usernames on '' 
6. created 'user_text' column, which is the the 'user_split' column + 'Text' column. 

In [57]:
def get_tweets(state, startdate, enddate, maxtweet):
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch("Flood")\
                                            .setSince(startdate)\
                                            .setUntil(enddate)\
                                            .setNear(state)\
                                            .setWithin("500mi")\
                                            .setMaxTweets(maxtweet)
    tweet = got.manager.TweetManager.getTweets(tweetCriteria)
    text_tweets = [[#tw.id,
                #tw.permalink,
                tw.username,
                #tw.to,
                tw.text,
                tw.date,
                tw.retweets,
                tw.favorites,
                tw.mentions,
                tw.hashtags,
                tw.geo] for tw in tweet]
    df_state= pd.DataFrame(text_tweets, columns = ['User', 'Text', 'Date', 'Favorites', 'Retweets', 'Mentions',
                                                    'Hashtags'  , 'Geolocation'])
    return df_state

In [58]:
df_1 = get_tweets('Wisconsin', "2019-07-18", "2019-07-20", 1000)

In [60]:
df_1.shape

(347, 8)

In [61]:
df_1.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation
0,Gabriettaa,Nice to have a positive story about big cats f...,2019-07-19 22:40:31+00:00,0,1,@BBCWorld @BigCatRescue,#Respect,
1,chaaasiti04,my dad just told me to flood my car so I can g...,2019-07-19 22:26:34+00:00,0,0,,,
2,JDenius,"Happy B-day Reggie, from the office!!!",2019-07-19 22:19:05+00:00,1,2,,,
3,RiskItAllRichy,Lol mine too. I’ve been avoiding floods for th...,2019-07-19 20:57:18+00:00,0,0,,,
4,CajunCy,Riding the bus along I29 to @RAGBRAI_IOWA and ...,2019-07-19 19:53:44+00:00,0,2,@RAGBRAI_IOWA,,


In [62]:
df_1['user_split'] = [df_1['User'][row].replace('_', ' ') for row in df_1.index]

In [63]:
df_1.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split
0,Gabriettaa,Nice to have a positive story about big cats f...,2019-07-19 22:40:31+00:00,0,1,@BBCWorld @BigCatRescue,#Respect,,Gabriettaa
1,chaaasiti04,my dad just told me to flood my car so I can g...,2019-07-19 22:26:34+00:00,0,0,,,,chaaasiti04
2,JDenius,"Happy B-day Reggie, from the office!!!",2019-07-19 22:19:05+00:00,1,2,,,,JDenius
3,RiskItAllRichy,Lol mine too. I’ve been avoiding floods for th...,2019-07-19 20:57:18+00:00,0,0,,,,RiskItAllRichy
4,CajunCy,Riding the bus along I29 to @RAGBRAI_IOWA and ...,2019-07-19 19:53:44+00:00,0,2,@RAGBRAI_IOWA,,,CajunCy


In [64]:
df_1['User'].value_counts().head(25)

iembot_phi         24
iembot_okx          8
iembot_arx          8
NWS_MountHolly      7
TBWSirSpitta        7
USGS_TexasFlood     5
iembot_rlx          5
iembot_hun          4
manyhotzbeats       3
kpax512             3
TotalTrafficNYC     3
iembot_jkl          3
udor_visualarts     2
LI_Weather516       2
Bill_Flood          2
TxMichaelDugan      2
runhildroeder       2
NWSMemphis          2
FredButt6           2
NWSCharlestonWV     2
CrystalDcreates     2
kristy527           2
spotternetwork      2
nwsjacksonky        2
NWSTucson           2
Name: User, dtype: int64

In [65]:
# 1 for it was a legitimate flood
df_1['target']=1

In [66]:
df_1.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,target
0,Gabriettaa,Nice to have a positive story about big cats f...,2019-07-19 22:40:31+00:00,0,1,@BBCWorld @BigCatRescue,#Respect,,Gabriettaa,1
1,chaaasiti04,my dad just told me to flood my car so I can g...,2019-07-19 22:26:34+00:00,0,0,,,,chaaasiti04,1
2,JDenius,"Happy B-day Reggie, from the office!!!",2019-07-19 22:19:05+00:00,1,2,,,,JDenius,1
3,RiskItAllRichy,Lol mine too. I’ve been avoiding floods for th...,2019-07-19 20:57:18+00:00,0,0,,,,RiskItAllRichy,1
4,CajunCy,Riding the bus along I29 to @RAGBRAI_IOWA and ...,2019-07-19 19:53:44+00:00,0,2,@RAGBRAI_IOWA,,,CajunCy,1


In [67]:
for i in df_1[df_1['User']== 'TBWSirSpitta'].index:
    df_1['target'][i]=0
df_1[df_1['User']=='TBWSirSpitta']

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,target
30,TBWSirSpitta,Don’t be that one mf that try to fuck up the v...,2019-07-19 03:16:55+00:00,2,2,,#MYWAY #LINKINBIO #TooBusyWinning #FLOOD #FOE ...,,TBWSirSpitta,0
33,TBWSirSpitta,Go Pre Order “MY WAY” Now #LINKINBIO @GTDigita...,2019-07-19 02:45:39+00:00,3,2,@GTDigitalDIST @Goldtoes415,#LINKINBIO #TooBusyWinning #MYWAY #FLOOD #FOE ...,,TBWSirSpitta,0
34,TBWSirSpitta,PRE ORDER #MYWAY The link is in my bio @applem...,2019-07-19 02:40:52+00:00,2,1,@AppleMusic,#MYWAY #FLOOD #TooBusyWinning #FLOOD #FOE,,TBWSirSpitta,0
171,TBWSirSpitta,"So they ask me, “Yung boa, what you gone do th...",2019-07-18 17:26:46+00:00,0,0,,#MYWAY #TooBusyWinning #FLOOD #FOE #SOLID #gtd...,,TBWSirSpitta,0
193,TBWSirSpitta,Ahhhh shit we done made it to the LAST DAY BEF...,2019-07-18 13:26:47+00:00,0,0,,#MYWAY #TooBusyWinning #FLOOD #FOE #SOLID,,TBWSirSpitta,0
203,TBWSirSpitta,Niggas couldn’t survive on this side TOMORROW ...,2019-07-18 12:35:00+00:00,0,0,@CalvoMayne,#MYWAY #TooBusyWinning #FLOOD #FOE #SOLID #gtd...,,TBWSirSpitta,0
294,TBWSirSpitta,Wanna feel like this guy? Listen to #MYWAY And...,2019-07-18 01:01:48+00:00,1,0,,#MYWAY #TooBusyWinning #FLOOD #FOE #SOLID #gtd...,,TBWSirSpitta,0


In [68]:
# looking at a false positive 
df_1['Text'][0]

"Nice to have a positive story about big cats for a change! #Respect India floods: Tired tiger takes nap in resident's bed via @BBCWorld https://www.bbc.com/news/world-asia-india-49041722 @BigCatRescue"

In [69]:
# changing the false positive to a target of one 
df_1['target'][0]=0

In [70]:
len(df_1['user_split'][0])

10

In [71]:
df_1[df_1['user_split'].str.find('iembot')>-1]

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,target
10,iembot_sgf,"At 6:15 PM CDT, 2 WNW Rolla [Phelps Co, MO] PU...",2019-07-19 13:53:26+00:00,1,0,,,,iembot sgf,1
14,iembot_arx,"At 3:00 AM CDT, Stockton [Winona Co, MN] EMERG...",2019-07-19 12:07:40+00:00,0,0,,,,iembot arx,1
15,iembot_arx,"At 7:05 AM CDT, 2 NE Lewiston [Winona Co, MN] ...",2019-07-19 12:07:39+00:00,0,0,,,,iembot arx,1
17,iembot_arx,"At 5:22 AM CDT, 1 W Coon Valley [Vernon Co, WI...",2019-07-19 10:28:15+00:00,0,0,,,,iembot arx,1
18,iembot_arx,"At 4:13 AM CDT, 1 N Viroqua [Vernon Co, WI] LA...",2019-07-19 09:16:25+00:00,0,0,,,,iembot arx,1
...,...,...,...,...,...,...,...,...,...,...
286,iembot_phi,"At 9:15 PM EDT, 1 E Burlington TWP [Burlington...",2019-07-18 01:29:20+00:00,0,0,,,,iembot phi,1
288,iembot_okx,"At 8:30 PM EDT, Long Island City [Queens Co, N...",2019-07-18 01:17:38+00:00,0,0,,,,iembot okx,1
337,iembot_phi,"At 8:06 PM EDT, 1 SSW Exeter TWP [Berks Co, PA...",2019-07-18 00:07:49+00:00,0,0,,,,iembot phi,1
344,iembot_meg,"At 7:45 AM CDT, 8 ENE Tupelo [Lee Co, MS] BROA...",2019-07-18 00:01:43+00:00,0,0,,,,iembot meg,1


In [72]:
df_1['user_text'] = df_1['user_split']+' ' + df_1['Text']
df_1.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,target,user_text
0,Gabriettaa,Nice to have a positive story about big cats f...,2019-07-19 22:40:31+00:00,0,1,@BBCWorld @BigCatRescue,#Respect,,Gabriettaa,0,Gabriettaa Nice to have a positive story about...
1,chaaasiti04,my dad just told me to flood my car so I can g...,2019-07-19 22:26:34+00:00,0,0,,,,chaaasiti04,1,chaaasiti04 my dad just told me to flood my ca...
2,JDenius,"Happy B-day Reggie, from the office!!!",2019-07-19 22:19:05+00:00,1,2,,,,JDenius,1,"JDenius Happy B-day Reggie, from the office!!!"
3,RiskItAllRichy,Lol mine too. I’ve been avoiding floods for th...,2019-07-19 20:57:18+00:00,0,0,,,,RiskItAllRichy,1,RiskItAllRichy Lol mine too. I’ve been avoidin...
4,CajunCy,Riding the bus along I29 to @RAGBRAI_IOWA and ...,2019-07-19 19:53:44+00:00,0,2,@RAGBRAI_IOWA,,,CajunCy,1,CajunCy Riding the bus along I29 to @RAGBRAI_I...


Using maxtweet = 350 to make closer to balanced classes.

In [80]:
df_0 = get_tweets('Wisconsin', "2019-12-16", "2019-12-25", 350)

In [81]:
df_0['user_split'] = [df_0['User'][row].replace('_', ' ') for row in df_0.index]


In [83]:
df_0['user_text'] = df_0['user_split']+' ' + df_0['Text']
df_0.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text
0,jimistlboy,It’s an enormous injustice and it hurts my Car...,2019-12-24 23:29:02+00:00,1,2,,,,jimistlboy,jimistlboy It’s an enormous injustice and it h...
1,EdwardWellence,Im sorry to say some people wont find happines...,2019-12-24 23:12:59+00:00,2,7,,,,EdwardWellence,EdwardWellence Im sorry to say some people won...
2,maya0818,"Floods, fog create Christmas Eve travel havoc",2019-12-24 23:07:44+00:00,0,0,,,,maya0818,"maya0818 Floods, fog create Christmas Eve trav..."
3,morganabigail,Drinking a Bar Baby Blonde by @GreatFloodBeer ...,2019-12-24 23:03:37+00:00,0,2,@GreatFloodBeer,,,morganabigail,morganabigail Drinking a Bar Baby Blonde by @G...
4,LitVolBangor,"Merry Jólabókaflóð, or “Yule Book Flood""! In I...",2019-12-24 22:46:56+00:00,6,12,,#delightful #ChristmasEve,,LitVolBangor,"LitVolBangor Merry Jólabókaflóð, or “Yule Book..."


In [84]:
df_0['target']=0

In [86]:
df_WI = pd.concat([df_1, df_0], ignore_index = True)
df_WI.head()

Unnamed: 0,Date,Favorites,Geolocation,Hashtags,Mentions,Retweets,Text,User,target,user_split,user_text
0,2019-07-19 22:40:31+00:00,0,,#Respect,@BBCWorld @BigCatRescue,1,Nice to have a positive story about big cats f...,Gabriettaa,0,Gabriettaa,Gabriettaa Nice to have a positive story about...
1,2019-07-19 22:26:34+00:00,0,,,,0,my dad just told me to flood my car so I can g...,chaaasiti04,1,chaaasiti04,chaaasiti04 my dad just told me to flood my ca...
2,2019-07-19 22:19:05+00:00,1,,,,2,"Happy B-day Reggie, from the office!!!",JDenius,1,JDenius,"JDenius Happy B-day Reggie, from the office!!!"
3,2019-07-19 20:57:18+00:00,0,,,,0,Lol mine too. I’ve been avoiding floods for th...,RiskItAllRichy,1,RiskItAllRichy,RiskItAllRichy Lol mine too. I’ve been avoidin...
4,2019-07-19 19:53:44+00:00,0,,,@RAGBRAI_IOWA,2,Riding the bus along I29 to @RAGBRAI_IOWA and ...,CajunCy,1,CajunCy,CajunCy Riding the bus along I29 to @RAGBRAI_I...


In [87]:
df_WI['location'] = 'WI'
df_WI.head()

Unnamed: 0,Date,Favorites,Geolocation,Hashtags,Mentions,Retweets,Text,User,target,user_split,user_text,location
0,2019-07-19 22:40:31+00:00,0,,#Respect,@BBCWorld @BigCatRescue,1,Nice to have a positive story about big cats f...,Gabriettaa,0,Gabriettaa,Gabriettaa Nice to have a positive story about...,WI
1,2019-07-19 22:26:34+00:00,0,,,,0,my dad just told me to flood my car so I can g...,chaaasiti04,1,chaaasiti04,chaaasiti04 my dad just told me to flood my ca...,WI
2,2019-07-19 22:19:05+00:00,1,,,,2,"Happy B-day Reggie, from the office!!!",JDenius,1,JDenius,"JDenius Happy B-day Reggie, from the office!!!",WI
3,2019-07-19 20:57:18+00:00,0,,,,0,Lol mine too. I’ve been avoiding floods for th...,RiskItAllRichy,1,RiskItAllRichy,RiskItAllRichy Lol mine too. I’ve been avoidin...,WI
4,2019-07-19 19:53:44+00:00,0,,,@RAGBRAI_IOWA,2,Riding the bus along I29 to @RAGBRAI_IOWA and ...,CajunCy,1,CajunCy,CajunCy Riding the bus along I29 to @RAGBRAI_I...,WI


In [24]:
#df_WI.to_csv('../datasets/df_WI.csv')

In [89]:
pd.read_csv('../datasets/df_WI.csv').head()

Unnamed: 0.1,Unnamed: 0,Date,Favorites,Geolocation,HashTags,Mentions,Retweets,Text,User,target,user_split,user_text,location
0,0,2019-07-19 22:40:31+00:00,0,,#Respect,@BBCWorld @BigCatRescue,1,Nice to have a positive story about big cats f...,Gabriettaa,0,Gabriettaa,Gabriettaa Nice to have a positive story about...,WI
1,1,2019-07-19 22:26:34+00:00,0,,,,0,my dad just told me to flood my car so I can g...,chaaasiti04,1,chaaasiti04,chaaasiti04 my dad just told me to flood my ca...,WI
2,2,2019-07-19 22:19:05+00:00,1,,,,2,"Happy B-day Reggie, from the office!!!",JDenius,1,JDenius,"JDenius Happy B-day Reggie, from the office!!!",WI
3,3,2019-07-19 22:17:13+00:00,0,,,,0,Something about how crime will magically drop ...,beganovic2021,1,beganovic2021,beganovic2021 Something about how crime will m...,WI
4,4,2019-07-19 20:57:18+00:00,0,,,,0,Lol mine too. I’ve been avoiding floods for th...,RiskItAllRichy,1,RiskItAllRichy,RiskItAllRichy Lol mine too. I’ve been avoidin...,WI


### Running with additional incidents/ locations

We then scrapped Texas and Iowa for dates where a flood did occur and dates where a flood did not occur. 

In [90]:
df_1_TX = get_tweets('Texas', "2016-05-22", "2016-06-24", 1000)

In [94]:
df_1_TX.shape

(1000, 8)

In [96]:
df_1_TX['user_split'] = [df_1_TX['User'][row].replace('_', ' ') for row in df_1_TX.index]
df_1_TX['user_text'] = df_1_TX['user_split']+' ' + df_1_TX['Text']
df_1_TX.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text
0,iembot_rlx,"At 7:51 PM, 2 NNE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:53:45+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:51 PM, 2 NNE Rush [Boyd Co, KY..."
1,iembot_rlx,"At 7:48 PM, 4 SSE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:50:46+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:48 PM, 4 SSE Rush [Boyd Co, KY..."
2,iembot_rlx,"At 7:44 PM, Rush [Boyd Co, KY] EMERGENCY MNGR ...",2016-06-23 23:48:01+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:44 PM, Rush [Boyd Co, KY] EMER..."
3,iembot_rlx,"At 6:03 PM, 3 SW Amma [Roane Co, WV] DEPT OF H...",2016-06-23 22:27:16+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 6:03 PM, 3 SW Amma [Roane Co, WV..."
4,iembot_rnk,"At 6:00 PM, Renick [Greenbrier Co, WV] EMERGEN...",2016-06-23 22:23:57+00:00,0,0,,#RNK,,iembot rnk,"iembot rnk At 6:00 PM, Renick [Greenbrier Co, ..."


In [97]:
df_1_TX['target']=1

In [98]:
df_0_TX = get_tweets('Texas', "2016-08-22", "2016-09-24", 1000)

In [99]:
df_0_TX.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation
0,GODFATHERofICE,And it begins/has begun #ruts #snow #frozenHan...,2016-09-23 23:21:49+00:00,0,0,,#ruts #snow #frozenHands #floods #doubleZams,
1,NotMadOnline247,"""Oh, Ophelia, you've been on my mind girl sinc...",2016-09-23 22:58:22+00:00,0,1,,,
2,tim_kite,Charity beer for flood relief! Come get some! ...,2016-09-23 22:50:47+00:00,0,0,@TheAbitaBeer,,
3,kerascreations,#cedarrapids #ellisharbor #cr2016flood #prepar...,2016-09-23 22:28:33+00:00,0,1,,#cedarrapids #ellisharbor #cr2016flood #prepar...,
4,adv_w_nancylynn,Come to the McCallie tailgate tonight and supp...,2016-09-23 21:39:13+00:00,0,1,,,


In [102]:
df_0_TX['user_split'] = [df_0_TX['User'][row].replace('_', ' ') for row in df_0_TX.index]
df_0_TX['user_text'] = df_0_TX['user_split']+' ' + df_0_TX['Text']
df_0_TX.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text,target
0,GODFATHERofICE,And it begins/has begun #ruts #snow #frozenHan...,2016-09-23 23:21:49+00:00,0,0,,#ruts #snow #frozenHands #floods #doubleZams,,GODFATHERofICE,GODFATHERofICE And it begins/has begun #ruts #...,0
1,NotMadOnline247,"""Oh, Ophelia, you've been on my mind girl sinc...",2016-09-23 22:58:22+00:00,0,1,,,,NotMadOnline247,"NotMadOnline247 ""Oh, Ophelia, you've been on m...",0
2,tim_kite,Charity beer for flood relief! Come get some! ...,2016-09-23 22:50:47+00:00,0,0,@TheAbitaBeer,,,tim kite,tim kite Charity beer for flood relief! Come g...,0
3,kerascreations,#cedarrapids #ellisharbor #cr2016flood #prepar...,2016-09-23 22:28:33+00:00,0,1,,#cedarrapids #ellisharbor #cr2016flood #prepar...,,kerascreations,kerascreations #cedarrapids #ellisharbor #cr20...,0
4,adv_w_nancylynn,Come to the McCallie tailgate tonight and supp...,2016-09-23 21:39:13+00:00,0,1,,,,adv w nancylynn,adv w nancylynn Come to the McCallie tailgate ...,0


In [101]:
df_0_TX['target']=0

In [103]:
df_TX = pd.concat([df_1_TX, df_0_TX], axis = 0)

In [104]:
df_TX['location'] = 'TX'
df_TX.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text,target,location
0,iembot_rlx,"At 7:51 PM, 2 NNE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:53:45+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:51 PM, 2 NNE Rush [Boyd Co, KY...",1,TX
1,iembot_rlx,"At 7:48 PM, 4 SSE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:50:46+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:48 PM, 4 SSE Rush [Boyd Co, KY...",1,TX
2,iembot_rlx,"At 7:44 PM, Rush [Boyd Co, KY] EMERGENCY MNGR ...",2016-06-23 23:48:01+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:44 PM, Rush [Boyd Co, KY] EMER...",1,TX
3,iembot_rlx,"At 6:03 PM, 3 SW Amma [Roane Co, WV] DEPT OF H...",2016-06-23 22:27:16+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 6:03 PM, 3 SW Amma [Roane Co, WV...",1,TX
4,iembot_rnk,"At 6:00 PM, Renick [Greenbrier Co, WV] EMERGEN...",2016-06-23 22:23:57+00:00,0,0,,#RNK,,iembot rnk,"iembot rnk At 6:00 PM, Renick [Greenbrier Co, ...",1,TX


In [39]:
#df_TX.to_csv('../datasets/df_TX.csv')

In [106]:
pd.read_csv('../datasets/df_TX.csv')

Unnamed: 0.1,Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,HashTags,Geolocation,user_split,user_text,target,location
0,0,LisaAbeyta,Flash flood warning until 7 PM #nmwx #nm @Albu...,2016-06-23 23:59:36+00:00,0,0,,#nmwx #nm,,LisaAbeyta,LisaAbeyta Flash flood warning until 7 PM #nmw...,1,TX
1,1,iembot_rlx,"At 7:51 PM, 2 NNE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:53:45+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:51 PM, 2 NNE Rush [Boyd Co, KY...",1,TX
2,2,iembot_rlx,"At 7:48 PM, 4 SSE Rush [Boyd Co, KY] EMERGENCY...",2016-06-23 23:50:46+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:48 PM, 4 SSE Rush [Boyd Co, KY...",1,TX
3,3,iembot_rlx,"At 7:44 PM, Rush [Boyd Co, KY] EMERGENCY MNGR ...",2016-06-23 23:48:01+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 7:44 PM, Rush [Boyd Co, KY] EMER...",1,TX
4,4,iembot_rlx,"At 6:03 PM, 3 SW Amma [Roane Co, WV] DEPT OF H...",2016-06-23 22:27:16+00:00,0,0,,#RLX,,iembot rlx,"iembot rlx At 6:03 PM, 3 SW Amma [Roane Co, WV...",1,TX
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,995,CharlotteCP,Flash flood watch for Charlotte and eastward u...,2016-09-02 09:56:07+00:00,0,0,@SarahFortnerWx,,,CharlotteCP,CharlotteCP Flash flood watch for Charlotte an...,0,TX
1996,996,NWSTallahassee,Move to higher ground! Flash Flood Warning in...,2016-09-02 05:19:09+00:00,56,18,,,,NWSTallahassee,NWSTallahassee Move to higher ground! Flash F...,0,TX
1997,997,BLifeAldine,Help arrives for Greenspoint flood victi http:...,2016-09-02 04:47:49+00:00,0,0,,,,BLifeAldine,BLifeAldine Help arrives for Greenspoint flood...,0,TX
1998,998,Futureteller,W my fellow Capital Correspondents raising $fo...,2016-09-02 02:49:25+00:00,0,3,,,,Futureteller,Futureteller W my fellow Capital Correspondent...,0,TX


y=1 data, Iowa

In [107]:
df_1_IA = get_tweets('Iowa', "2016-09-21", "2016-10-03", 1000)

In [108]:
df_1_IA['user_split'] = [df_1_IA['User'][row].replace('_', ' ') for row in df_1_IA.index]
df_1_IA['user_text'] = df_1_IA['user_split']+' ' + df_1_IA['Text']
df_1_IA.shape

(665, 10)

In [109]:
df_1_IA['target']=1
df_1_IA.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text,target
0,TooCute504,Took the time out to volunteer in BR today. Pe...,2016-10-02 23:06:20+00:00,0,0,,,,TooCute504,TooCute504 Took the time out to volunteer in B...,1
1,juliakate,The great Paducah Flood Wall Murals... this At...,2016-10-02 22:43:33+00:00,0,0,,,,juliakate,juliakate The great Paducah Flood Wall Murals....,1
2,Slapfish_2009,The Barman's Fund and USBG Tailgate Benefit fo...,2016-10-02 20:32:22+00:00,0,0,,#DrinkWithPurpose,,Slapfish 2009,Slapfish 2009 The Barman's Fund and USBG Tailg...,1
3,AhleahZoe1,Flood waters. @Downtown Burlington IA https://...,2016-10-02 18:01:34+00:00,0,0,,,,AhleahZoe1,AhleahZoe1 Flood waters. @Downtown Burlington ...,1
4,katyxbeth,Last night sure was rainy as seen right before...,2016-10-02 17:54:06+00:00,0,0,,,,katyxbeth,katyxbeth Last night sure was rainy as seen ri...,1


Doing a little cleaning here, reclassifying some targets.

In [110]:
df_1_IA['target'][[0,1,2,4,5,22,23, 24]]=0

y=0, Iowa

In [112]:
# y= 0, Iowa tweets
# change function to max tweet of 660 from 1000 to balance it out
df_0_IA = get_tweets('Iowa', "2017-09-01", "2017-10-21", 660)

In [113]:
df_0_IA['user_split'] = [df_0_IA['User'][row].replace('_', ' ') for row in df_0_IA.index]
df_0_IA['user_text'] = df_0_IA['user_split']+' ' + df_0_IA['Text']
df_0_IA.head()

Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,Hashtags,Geolocation,user_split,user_text
0,timinhollywood,Excuse me while I flood my timeline with Howar...,2017-10-20 19:11:14+00:00,0,0,,,,timinhollywood,timinhollywood Excuse me while I flood my time...
1,iembot_hgx,"At 6:00 AM, 3 W League City [Galveston Co, TX]...",2017-10-20 14:52:39+00:00,0,0,,#HGX,,iembot hgx,"iembot hgx At 6:00 AM, 3 W League City [Galves..."
2,NWSHouston,"Flash Flood Warning including League City TX, ...",2017-10-20 09:52:18+00:00,7,3,,,,NWSHouston,NWSHouston Flash Flood Warning including Leagu...
3,USGS_TexasFlood,"#USGS08030500 - Sabine Rv nr Ruliff, TX is abo...",2017-10-20 07:01:38+00:00,0,1,,#USGS08030500,,USGS TexasFlood,USGS TexasFlood #USGS08030500 - Sabine Rv nr R...
4,StreetDiamond,"It's @monroekush - Prepping for ""Operation Flo...",2017-10-20 02:46:57+00:00,11,0,@MonroeKush,#Loading #GetYourLIGHTERS,,StreetDiamond,StreetDiamond It's @monroekush - Prepping for ...


In [117]:
df_0_IA['target']=0

In [115]:
df_IA = pd.concat([df_1_IA, df_0_IA], axis = 0)

In [116]:
df_IA['location'] = 'IA'

In [54]:
# df_IA.to_csv('../datasets/df_IA.csv')

In [119]:
pd.read_csv('../datasets/df_IA.csv').head()

Unnamed: 0.1,Unnamed: 0,User,Text,Date,Favorites,Retweets,Mentions,HashTags,Geolocation,user_split,user_text,target,location
0,0,TooCute504,Took the time out to volunteer in BR today. Pe...,2016-10-02 23:06:20+00:00,0,0,,,,TooCute504,TooCute504 Took the time out to volunteer in B...,0,IA
1,1,juliakate,The great Paducah Flood Wall Murals... this At...,2016-10-02 22:43:33+00:00,0,0,,,,juliakate,juliakate The great Paducah Flood Wall Murals....,0,IA
2,2,Slapfish_2009,The Barman's Fund and USBG Tailgate Benefit fo...,2016-10-02 20:32:22+00:00,0,0,,#DrinkWithPurpose,,Slapfish 2009,Slapfish 2009 The Barman's Fund and USBG Tailg...,0,IA
3,3,AhleahZoe1,Flood waters. @Downtown Burlington IA https://...,2016-10-02 18:01:34+00:00,0,0,,,,AhleahZoe1,AhleahZoe1 Flood waters. @Downtown Burlington ...,1,IA
4,4,katyxbeth,Last night sure was rainy as seen right before...,2016-10-02 17:54:06+00:00,0,0,,,,katyxbeth,katyxbeth Last night sure was rainy as seen ri...,0,IA


We used the combined data from this model to scrape the tweets. 