# HUrricane Harvey tweets
1. [Emergency Help Requests](#help) 
    - only 85 tweets but a great match for the task 
    - how do people actually talk when requesting for help?
        - mention address (OWL has GPS and a separate address field)
        - specifics like knee deep water, number of kids and elderly 
    - clean up some twitter specific quirks with tweet-preprocessor
2. [Hydrate tweets](#hydrate)
    - 7 million but were generated through keywords, so contains many news type of tweets and not as appropriate for task
    - twitter policy is can share lists of tweetids but not the content, so need to "hydrate" (pull actual data with api)

In [1]:
# Douglas 
# 6/11/2019

In [2]:
import pandas as pd 
pd.set_option('max_colwidth', -1)

In [3]:
!ls data/

disaster-relief-dfe-854578.csv		  paired_medical_terms.pkl
disaster-relief-dfe-backup.csv		  paired_terms_disaster_relef.csv
Emergency_Help_Requests_from_Twitter.csv  socialmedia-disaster-tweets-DFE.csv
harvey_port_arthur_tweets.pkl		  twarc.log
HurricaneHarvey_ids.txt.gz


<a id='help'></a>
# 1. Emergency Help Requests

In [4]:
help_reqs = pd.read_csv('./data/Emergency_Help_Requests_from_Twitter.csv')
# Data source: http://localhost:8888/?token=bb82409f3904772fa6e5e907c013a527b8b62aa7c803b879

In [5]:
help_reqs.shape

(85, 20)

In [6]:
help_reqs.columns

Index(['X', 'Y', 'OBJECTID', 'displayName', 'username', 'latitude',
       'longitude', 'message', 'imageUrl', 'type', 'subType', 'dcId',
       'locationType', 'locationDetails', 'algorithmType', 'algorithmMatch',
       'date', 'postUrl', 'profileUrl', 'replyUrl'],
      dtype='object')

In [7]:
help_reqs['message']

0     @mastasj 2231 Fredrick st port Arthur to 776404097208322 Linda chambers                                                                     
1     RESCUE ALERT: Six people in Port Arthur need rescuing. They're at 5235 Ebony Lane with water knee deep into their home.                     
2     2320 COX ST PORT ARTHUR TX 77640 !!!!!!!!!!                                                                                                 
3     .@HarveyRescue https://t.co/HqJcQPkLJH                                                                                                      
4     @austin_milan y'all got a team out in PA? https://t.co/6zptz4mnKG                                                                           
5     Need a boat rescue at 2611 east 12th street, Port Arthur, 77640.  Donald Sennette (74 y/o old).… https://t.co/yUjopFdo6L                    
6     If it's not one thing . It's another! My great grandma , Ophelia Kinlaw is in need of rescue! 3401 10 st Port Ar

In [8]:
# 6/12 

In [9]:
# https://pypi.org/project/tweet-preprocessor/
import preprocessor as tweet_p

In [10]:
tweet_p.set_options(tweet_p.OPT.MENTION, tweet_p.OPT.HASHTAG, tweet_p.OPT.URL, tweet_p.OPT.SMILEY, tweet_p.OPT.EMOJI, tweet_p.OPT.RESERVED)

In [11]:
tweet_p.clean(help_reqs['message'][0])

'2231 Fredrick st port Arthur to 776404097208322 Linda chambers'

In [12]:
help_reqs['processed'] = help_reqs['message'].apply(tweet_p.clean).str.replace(': ', '')

In [13]:
pd.set_option('max_colwidth', -1)

In [14]:
pd.concat([help_reqs['processed'].head(), help_reqs['processed'].tail()])

0     2231 Fredrick st port Arthur to 776404097208322 Linda chambers                                                             
1     RESCUE ALERTSix people in Port Arthur need rescuing. They're at 5235 Ebony Lane with water knee deep into their home.      
2     2320 COX ST PORT ARTHUR TX 77640 !!!!!!!!!!                                                                                
3     .                                                                                                                          
4     y'all got a team out in PA?                                                                                                
80    1119 11th street Port Arthur, 4 people, one of them is 83 year old grandmother, and 3 other adults. Thank you! Let me know!
81    Please 1535 15th st port Arthur TX                                                                                         
82    A lady is stuck in her attic with her 2 grandkids. 500 neches ave street Port Arthur

In [15]:
import wordninja

In [19]:
tokens = help_reqs['processed'].map(wordninja.split).map(' '.join)

In [20]:
tokens

0     2231 Fredrick st port Arthur to 776404097208322 Linda chambers                                                               
1     RESCUE ALERT Six people in Port Arthur need rescuing They're at 5235 Ebony Lane with water knee deep into their home         
2     2320 COX ST PORT ARTHUR TX 77640                                                                                             
3                                                                                                                                  
4     y'all got a team out in PA                                                                                                   
5     Need a boat rescue at 2611 east 12 th street Port Arthur 77640 Donald S enne tte 74 y o old                                  
6     If it's not one thing It's another My great grandma Ophelia Kin law is in need of rescue 3401 10 st Port Art hu              
7     4005 4 th st call my daddy 4095191044 my number is 8322353561         

In [21]:
import spacy

In [22]:
nlp = spacy.load('en_core_web_md')

In [23]:
# tokens.to_pickle('harvey_port_arthur_tweets.pkl')

In [24]:
%time spacy_docs = pd.Series(nlp.pipe(tokens))

CPU times: user 632 ms, sys: 71.6 ms, total: 704 ms
Wall time: 576 ms


In [25]:
pd.concat([spacy_docs.head(), spacy_docs.tail()])

0     (2231, Fredrick, st, port, Arthur, to, 776404097208322, Linda, chambers)                                                                        
1     (RESCUE, ALERT, Six, people, in, Port, Arthur, need, rescuing, They, 're, at, 5235, Ebony, Lane, with, water, knee, deep, into, their, home)    
2     (2320, COX, ST, PORT, ARTHUR, TX, 77640)                                                                                                        
3     ()                                                                                                                                              
4     (y', all, got, a, team, out, in, PA)                                                                                                            
80    (1119, 11, th, street, Port, Arthur, 4, people, one, of, them, is, 83, year, old, grandmother, and, 3, other, adults, Thank, you, Let, me, know)
81    (Please, 1535, 15, th, st, port, Arthur, TX)                                            

In [26]:
samp = spacy_docs[61]
samp

2408 19 th St Orange TX 77630 Key box code is 192494 Y O LADY NEEDS RESCUE LAYING ON THE FLOOR IN WATER CANT GET UP

In [27]:
for token in samp:
    print(token.text, token.dep_, token.ent_type_)

2408 ROOT 
19 nummod DATE
th compound 
St compound GPE
Orange compound GPE
TX appos GPE
77630 nummod CARDINAL
Key amod 
box compound 
code nsubj 
is ROOT 
192494 nummod CARDINAL
Y compound 
O compound 
LADY compound 
NEEDS compound 
RESCUE attr 
LAYING acl 
ON prep 
THE det 
FLOOR pobj 
IN prep 
WATER compound 
CANT pobj 
GET ROOT 
UP prt 


In [28]:
from spacy import displacy

In [29]:
displacy.render(samp, page=False, minify=True, jupyter=True)

In [30]:
displacy.render(samp, style='ent', page=False, minify=True, jupyter=True)

In [31]:
samp.ents

(19, St Orange, TX, 77630, 192494)

<a id='hydrate'></a>
# 2. Keyword generated Hurricane Harvey Tweets
Keywords used to generate dataset
    #Harvey
    #Harvey2017
    #HarveyStorm
    #HoustonFlood
    #HoustonFlooding
    #HoustonFloods
    #HurricaneHarvey
    Gulf Coast
    Hurricane Harvey
    Twitter

In [32]:
harvey = pd.read_csv('./data/HurricaneHarvey_ids.txt.gz', header=None, names=['id'])['id'].tolist()
# https://digital.library.unt.edu/ark:/67531/metadc993940/

In [34]:
# 7 million tweets 
len(harvey)

7041866

In [35]:
from twarc import Twarc

In [36]:
import gzip

In [37]:
# %%time
# with gzip.open('./data/HurricaneHarvey_ids.txt.gz', 'rt') as f: #rt=read text
#     tweet_ids = [line[:-1] for line in f] # remove \n char
# pandas does it faster and easier
# https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files

In [38]:
# tweet_ids[:5]

In [39]:
# https://github.com/DocNow/twarc
# Create instance - filling in crednetials from .twarc file
t = Twarc()

In [40]:
tweets = []
for tweet in t.hydrate(harvey[:5]):
    tweets.append(tweet)

In [41]:
tweets[0]

{'contributors': None,
 'coordinates': None,
 'created_at': 'Fri Aug 18 22:55:43 +0000 2017',
 'display_text_range': [0, 120],
 'entities': {'hashtags': [{'indices': [32, 39], 'text': 'Harvey'},
   {'indices': [64, 73], 'text': 'Coahuila'},
   {'indices': [74, 84], 'text': 'NuevoLeon'},
   {'indices': [85, 96], 'text': 'Tamaulipas'}],
  'media': [{'display_url': 'pic.twitter.com/DGW2NdvgpD',
    'expanded_url': 'https://twitter.com/mmsoriano/status/898639604240203776/photo/1',
    'id': 898639595323047936,
    'id_str': '898639595323047936',
    'indices': [97, 120],
    'media_url': 'http://pbs.twimg.com/media/DHib82RUwAAjE9i.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/DHib82RUwAAjE9i.jpg',
    'sizes': {'large': {'h': 736, 'resize': 'fit', 'w': 897},
     'medium': {'h': 736, 'resize': 'fit', 'w': 897},
     'small': {'h': 558, 'resize': 'fit', 'w': 680},
     'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
    'source_status_id': 898639604240203776,
    'source_statu

In [42]:
tweets[0]['full_text'] # extract text

'RT @mmsoriano: Aguas con el tal #Harvey, no lo pierdan de vista #Coahuila #NuevoLeon #Tamaulipas https://t.co/DGW2NdvgpD'