# Find example disaster tweets
In which we find tweets that mention a location and/or have a geolocation, to use in a grant proposal.

In [1]:
import gzip
import json

# Maria

## Before/during

In [69]:
from datetime import datetime
import pandas as pd
import os
import re
def convert_file_list_to_df(file_list):
    tweet_list = [[l.strip().split('\t') for l in gzip.open(f, 'r')] for f in file_list]
    tweet_list = [pd.DataFrame(t[1:], columns=t[0]) for t in tweet_list]
    tweet_df = pd.concat(tweet_list, axis=0)
    return tweet_df

start_date = 19
end_date = 23
month = 9
year = 2017
date_range = range(start_date, end_date+1)
data_dir = '../../data/mined_tweets/'
file_prefix = '#Maria,#HurricaneMaria,#PuertoRico'
before_files = [os.path.join(data_dir, '%s_%d-%02d-%d_%d-%02d-%d.gz'%(file_prefix, year, month, i, year, month, i+1)) for i in date_range]
# convert to dataframe
before_tweets = convert_file_list_to_df(before_files)
# clean text
quote_matcher = re.compile('^"|"$')
before_tweets.loc[:, "text"] = before_tweets.loc[:, "text"].apply(lambda x: quote_matcher.sub('', x))
# fix dates 
date_fmt = '%Y-%m-%d %H:%M'
before_tweets.loc[:, 'date'] = before_tweets.loc[:, 'date'].apply(lambda x: datetime.strptime(x, date_fmt))
before_tweets.sort_values('date', inplace=True, ascending=True)

In [70]:
print('\n'.join(before_tweets.loc[:, 'text'][:100]))

what's the effect called putting aside #Fujiwhara when #Maria consumes #Jose off shore next week? caliwhara or sandywhara or gobblewhara? √°
Hurricane #Maria is back up to a cat 5. Pray for these island nations as they take on a a second major hurricane.
#19Sep #HuracanMaria #Maria @ARB_SHN Informa que huracán Maria toma fuerza y nuevamente llega a categoría 5 durante su paso en el Mar Caribe pic.twitter.com/vUzXqzsxNy
#Maria Undergoing eye wall replacement cycle ... this means an intensification if it completes the cycle prior to #PuertoRico and #StCroix https:// twitter.com/cyclonebiskit/ status/910179815369699328 …
And #maria ? https:// twitter.com/ananavarro/sta tus/910185059566723072 …
#keeppraying for #HurricaneMaria https:// twitter.com/MichaelSkolnik /status/910105454457606144 …
#Maria , right?
WATCH LIVE: Hurricane #Maria cuts a devastating path through Virgin Islands en route to Puerto Rico. http:// on.nbcdfw.com/p2mX6Dd pic.twitter.com/nFJHBxcnsT
At this supply store in Denn

Let's restrict to the landfall day.

In [72]:
landfall_start = datetime.strptime('2017-09-19 00:00', date_fmt)
landfall_end = datetime.strptime('2017-09-20 00:00', date_fmt)
before_tweets_landfall = before_tweets[(before_tweets.loc[:, 'date'] >= landfall_start) &
                                       (before_tweets.loc[:, 'date'] <= landfall_end)]
print('%d landfall tweets'%(before_tweets_landfall.shape[0]))

12000 landfall tweets


In [73]:
print('\n'.join(before_tweets_landfall.loc[:, 'text'][8000:9000]))

#huracánmaria #hurricanemaria #huracan #hurricane #prayfortheanimals #prayforthecaribbean … https://www. instagram.com/p/BZPQzTfFp8G/
Hurrikan #Maria verwüstet die Karibikinsel Dominica. Puerto Rico bereitet sich auf das Schlimmste vor. http:// ebx.sh/2xviCjL
Pour info c est #maria et non irma!
https://www. nytimes.com/video/world/am ericas/100000005441310/maria-hits-dominica-and-guadeloupe.html … #HurricaneMaria http:// fb.me/1cdviKfPO
Latest satellite images show #HurricaneMaria beginning to impact Puerto Rico with high winds pic.twitter.com/FJgKYq6bDX
I hate hurricanes. Keeping you all in my heart and prayers. #PuertoRico #miisladelencanta #hurricanemaria #hurricaneirma #getreadytohelp
My thoughts are with everyone in #MexicoCity and the #Caribbean today. Stay strong. #NaturalDisaster #EarthquakeMexico #HurricaneMaria
These are extremely important and helpful maps of areas that are likely to be in most need in #PuertoRico @ricardorossello @LuisRiveraMarin https:// twitter.com/OrdenD

Location examples found:

- All hunkered down for #Maria . If I die, tell the world I love them all. Cuidanse mi gente boricua. #MariaPR #51stState pic.twitter.com/uSCD2\Ksp77
- #waiting for #hurricanemaria #sanjuan #puertorico #september2017 #condadolagoon #tropical … https://www.instagram.com/p/BZPMLNRhXxU/
- Wanted to tweet the pic of the in the metro station but 1 person died,2 are missing : I am taming my obnoxious side #Guadeloupe #maria
- Rain and wind are starting to pick up in STX.... please let the damage be minimal #HurricaneMaria
- Sky darkening.. Sucking up the moisture. Dry heat outside. #Maria #PR #hurricane pic.twitter.com/8TtJ6ThSf2

How about the day of impact?

In [74]:
impact_start = datetime.strptime('2017-09-20 00:00', date_fmt)
impact_end = datetime.strptime('2017-09-22 00:00', date_fmt)
impact_tweets = before_tweets[(before_tweets.loc[:, 'date'] >= impact_start) &
                              (before_tweets.loc[:, 'date'] <= impact_end)]
print('%d impact tweets'%(impact_tweets.shape[0]))

35980 impact tweets


In [76]:
print('\n'.join(impact_tweets.loc[:, 'text'][1000:2000]))

#Maria is expected to move off the N coast of PR, pass offshore the NE coast of Hispaniola, and move near the Turks… https:// twitter.com/i/web/status/9 10516383594811393 …
This is flooding in St Thomas from #HurricaneMaria https:// twitter.com/gdimeweather/s tatus/910506377688252416 …
Shared from #NOAANow Where do you go, #Maria and what will you do ? pic.twitter.com/hZy7wghbjd
#StCroix #HurricaneMaria https:// twitter.com/gdimeweather/s tatus/910510332635951104 …
Beyond Puerto Rico, #Maria may in fact take a path quite similar to what #Jose has done in recent days.
#MexicoCity #PuertoRico #Florida #Texas GOD IS IN CONTROL.
Juntos ##mexico #puertorico https://www. instagram.com/p/BZRNletj2reD 1r3FHUtWm4FoVBy3e_-yq9EX_E0/ …
Really pleased with the info the Riu Palace Bavaro has give on Maria. #bavaro #HurricaneMaria #riu #ThomsonHolidays
https:// youtu.be/HQYA9zN1KkM According to the o.p., this is from Vega Alta, Puerto Rico today. #vegaalta #HurricaneMaria #PuertoRico
#HurricaneIrma #

Location examples found:

- Washed out. Bridge Salto Arriba Utuado Puerto Rico Huracán #MARIA Videos on facebook at Meteorología Del Caribe
- John Randal McDonald home destroyed. #stx #stcroix #hurricanemaria pic.twitter.com/bac7XUm9ov
- Escenas d caos. #PuertoRico ante una catástrofe. Gobernador Rosselló–"Cuando pase esto, juntos nos vamos a levantar" https:// elpais.com/internacional/ 2017/09/20/america/1505915160_551927.html …
- Cat 4 Hurricane #Maria devastating Puerto Rico as its center approaches the northern coast. See you on #WJZ with the latest. #FirstWarningWX pic.twitter.com/6UefNzR0BW
- Center of #Maria currently in the middle of PR. Once it passes, continue to track to the NW north of Hispaniola & SE Bahamas. pic.twitter.com/o1Q405q4a0
- Does anyone know how Mayaguez is doing? #PuertoRico #HurricaneMaria
- Torre de las Cumbres, Cupey,agua llega hasta el piso 6 familias alojadas en el piso 8.el edificio se mueve #HurricaneMaria #sos @WapaRadio
- Huracán #María sale de #PuertoRico . Su ojo salió por la costa Norte entre los pueblos de Vega Baja y Arecibo pic.twitter.com/pgc5OmxSf4
- 80% of the homes in Juana Matos in San Juan are destroyed from flooding, roofs gone. #HurricaneMaria . https://twitter.com/LSanjurjo/status/910526101616807936 …
- Couldn’t have said it better. St. Croix, St. Thomas, and St. John is very much part of the United States. #hurricanemaria pic.twitter.com/GWHe9cOdVB
- Anyone with information on folks on Puerto Rico Ponce area? #PuertoRico
- More pictures of Condado I have found through friends on Facebook: Ashford Avenue and Calle Krug. #HurricaneMaria pic.twitter.com/I6VOC1HwSx
- Getting video from Country Club neighborhood in Carolina, #PuertoRico . Streets completely flooded, looks like a river #MariaPR
- Alguien tiene algun update del Condominio Atlantis en #SanJuan #PuertoRico #HuracanMaria
- Pictures out of Golden Rock St. Croix with the caption"beaten up but not beaten down." #VIstrong #stcroix #stx #hurricanemaria pic.twitter.com/h9lu3XKfHA
- UPDATE: water is thankfully receding from our driveway, still strong winds and rain here in San Juan #HurricaneMaria pic.twitter.com/YTty8INTzM
- La avenida Piñero es un río por el desborde del Río Piedras en #PuertoRico http:// uni.vi/Y5EN30fiL2E pic.twitter.com/5IM8g2rRyl


## After

In [8]:
# all files with prefix "crisis" were live-streamed
test_file = '../../data/mined_tweets/crisis_maria_tweets_Sep-29-17-00-00.gz'
test_tweets = [json.loads(l.strip().replace('\n', '')) for l in gzip.open(test_file, 'r')]
test_tweet_txt = [t.get('text') for t in test_tweets]
print('\n'.join(test_tweet_txt[:10]))

RT @nycgov: Puerto Rico, you are not alone. @FDNY locations accept specific donations for residents affected by #HurricaneMaria… 
RT @HillaryClinton: President Trump, Sec. Mattis, and DOD should send the Navy, including the USNS Comfort, to Puerto Rico now. These a… 
RT @DesiPerkins: Reminding everyone to keep #PuertoRico and #Mexico in your hearts and prayers and if you are able to donate to please do.…
@realDonaldTrump send help to #PuertoRico 🇵🇷they are #Americans too! https://t.co/PDe7WGXd2G
RT @Livewellxoxo34: It took 5 days for fema director@Brocklong @potus
To make statements about #PuertoRico #VirginIslands 
Thanks #MSM&amp;… 
RT @aishacs: This is unconscionable #PuertoRico https://t.co/ax7O9CdGNc
RT @fema: Emergency communications vehicles left for Puerto Rico today to support comms for search &amp; rescue, medical, &amp; other… 
RT @MarcAnthony: Mr. President shut the fuck up about NFL. Do something about our people in need in #PuertoRico. We are American citizens t…
RT @NiaA

In [9]:
print(len(test_tweets))

532471


Let's filter to remove RTs.

In [15]:
test_tweets_non_rt = filter(lambda x: not x.get('retweeted_status'), test_tweets)
test_tweets_non_rt_txt = map(lambda x: x.get('text'), test_tweets_non_rt)
print('%d original tweets'%(len(test_tweets_non_rt_txt)))
print('\n'.join(test_tweets_non_rt_txt[:10]))

104968 original tweets
@realDonaldTrump send help to #PuertoRico 🇵🇷they are #Americans too! https://t.co/PDe7WGXd2G
https://t.co/bxEwQS0D8v 75 MPH winds hitting North Carolina coast now. #Maria still a Cat. 1 - no direct USA hit.
"Which ocean are you talking about Donald there are 2 of them"-an actual tweet by and actual person #PuertoRico
Please help @joycegiraud raise money to help the victims of #HurricaneMaria! Donate below! #RHOBH 

https://t.co/eUaPGtfPFv
This says it all. 
#FEMA #DHS #PuertoRico #Maria
#TrumpsKatrina #Trump #DOD https://t.co/YOiVsYdvyn
So impressed with @LillyDiabetes efforts to help #PuertoRico and aid in recovery. Thank you!  https://t.co/4bNj36ssts
Thank you @united for stepping up and offering complimentary flights to fly ppl out of #PuertoRico. The #ethical &amp; humanitarian thing to do
#PuertoRico.  💕💕💕💕🆘🆘🆘🆘🆘🆘🆘🆘🆘🆘🆘 https://t.co/gwWm1MQE7e
Please JOIN US! #UnidosPorPuertoRico #Twitterstorm TONIGHT, Sept. 26 @ 8-9 PM ET! #CongressActNow #WeAreUSCitizens #Pu

In [18]:
print('\n'.join(test_tweets_non_rt_txt[300:1000]))

Tell Trump that PR not receiving adequate relief will cause a mass exodus of brown ppl into the mainland. He'll respond faster. #PuertoRico
@ThisWeekABC Everyday Americans have rallied to help #PuertoRico Shout out to #Chicago and #NYC, but mayor counters… https://t.co/kBczZuG3lr
Sometimes you really have to wonder... Is @realDonaldTrump just plain stupid? #PuertoRico #IsAnIsland #InTheMiddleOfAnOcean #True!
Wait a second... How did you even get to Puerto Rico...?  As Trump said, there's a big-ass OCEAN surrounding it!!!… https://t.co/LLavdYs2fx
🇵🇷With some friends in shelters I'm still glad that they're alive!! Care packages are put together and ready to be shipped :) #PuertoRico
Pray 🙏also for our fellow US citizens of the #USVirginIslands who are recovering from devastation of #hurricaneirma and #hurricanemaria.
Independence after Maria https://t.co/uQhPQPfN8Q
#PuertoRico needs targeted aid to rebuild. Urging Republican leaders to help us act today and rebuild Puerto Rico… https://t

TODO: What if we look at retweets? Some of those might have originally been sent from survivors.

# Harvey

In [81]:
# text only => FAIL
# harvey_rehydrated = '../../data/mined_tweets/HurricaneHarvey_ids_rehydrated_clean.txt.gz'
harvey_rehydrated_file = '../../data/mined_tweets/HurricaneHarvey_ids_rehydrated.json.gz'
relevant_fields = ['text', 'created_at']
harvey_tweets = []
for i, l in enumerate(gzip.open(harvey_rehydrated_file, 'r')):
    t = json.loads(l.strip())
    try:
        t_info = [t.get(f) for f in relevant_fields]
        harvey_tweets.append(t_info)
    except Exception, e:
        pass
    if(i % 100000 == 0):
        print('processed %d tweets'%(i))
harvey_tweets = pd.DataFrame(harvey_tweets, columns=relevant_fields)
# TOO BIG
# harvey_tweets = [json.loads(l.strip()) for l in gzip.open(harvey_rehydrated, 'r')]
print('got %d Harvey tweets'%(len(harvey_tweets)))

processed 0 tweets
processed 100000 tweets
processed 200000 tweets
processed 300000 tweets
processed 400000 tweets
processed 500000 tweets
processed 600000 tweets
processed 700000 tweets
processed 800000 tweets
processed 900000 tweets
processed 1000000 tweets
processed 1100000 tweets
processed 1200000 tweets
processed 1300000 tweets
processed 1400000 tweets
processed 1500000 tweets
processed 1600000 tweets
processed 1700000 tweets
processed 1800000 tweets
processed 1900000 tweets
processed 2000000 tweets
processed 2100000 tweets
processed 2200000 tweets
processed 2300000 tweets
processed 2400000 tweets
processed 2500000 tweets
processed 2600000 tweets
processed 2700000 tweets
processed 2800000 tweets
processed 2900000 tweets
processed 3000000 tweets
processed 3100000 tweets
processed 3200000 tweets
processed 3300000 tweets
processed 3400000 tweets
processed 3500000 tweets
processed 3600000 tweets
processed 3700000 tweets
processed 3800000 tweets
processed 3900000 tweets
processed 40000

In [83]:
# convert dates to datetime
date_fmt = '%a %b %d %H:%M:%S +0000 %Y'
harvey_tweets.loc[:, 'created_at'] = harvey_tweets.loc[:, 'created_at'].apply(lambda x: datetime.strptime(x, date_fmt))

Preprocess and round dates to nearest day so that we can get a sample of tweets per day.

In [87]:
harvey_tweets.loc[:, 'created_at_day'] = harvey_tweets.loc[:, 'created_at'].apply(lambda x: datetime(*x.timetuple()[:3]))

In [88]:
mini_date_fmt = '%Y-%m-%d'
harvey_landfall_start = datetime.strptime('2017-08-25', mini_date_fmt)
harvey_landfall_end = datetime.strptime('2017-09-03', mini_date_fmt)
harvey_tweets_landfall = harvey_tweets[(harvey_tweets.loc[:, 'created_at'] >= harvey_landfall_start) &
                                       (harvey_tweets.loc[:, 'created_at'] < harvey_landfall_end)]
print('got %d relevant Harvey tweets'%(harvey_tweets_landfall.shape[0]))

got 5643799 relevant Harvey tweets


In [91]:
sample_size = 200
for date, date_group in harvey_tweets_landfall.groupby('created_at_day'):
    print('tweet sample from %s'%(date))
    print('\n'.join(date_group.head(sample_size).loc[:, 'text'].apply(lambda x: x.replace('\n', ''))))

tweet sample from 2017-08-25 00:00:00
RT @chematierra: Foto de una de las bandas delanteras del #Huracán #Harvey en la parte sur de la Isla del PadreCréditos: Janelle AllenVía…
Top 5:1: #RuinAGoodTimeIn4Words2: #HurricaneHarvey3: #NoFlag +84: DeVante Parker +75: Jay Thomas -2
RT @tymetolove: When people would rather starve during a hurricane than buy chicken &amp; waffles flavored @LAYS #Harvey #HurricaneHarvey http…
RT @SophiaTesfaye: #HurricaneHarvey hasn't even made landfall yet and Trump is already screwing up https://t.co/Oqb3Y61BX2
Texas friends, please stay safe #Harvey
In Austin, TX. Waiting for Harvey to arrive. #NotTheRabbit #HurricaneHarvey
RT @NWSCharlestonSC: While watching #Harvey, remember there is a disturbance that could pass off the SE coast. For more, https://t.co/YYFLQ…
RT @ejimenez1287: #HurricaneHarvey This is my Home Town 😭 pray for us  #Harvey #Tropical #HurricaneHarvey #houwx #wfaaweather #txwx https:/…
RT @NOAA_HurrHunter: Timelapse of WP-3D Orion #NOAA42 flyi

A lot of RTs! Let's restrict to original content.

In [93]:
sample_size = 200
rt_matcher = re.compile('^RT.*')
for date, date_group in harvey_tweets_landfall.groupby('created_at_day'):
    date_group = date_group[date_group.loc[:, 'text'].apply(lambda x: not rt_matcher.match(x))]
    print('tweet sample from %s'%(date))
    print('\n'.join(date_group.head(sample_size).loc[:, 'text'].apply(lambda x: x.replace('\n', ''))))

tweet sample from 2017-08-25 00:00:00
Top 5:1: #RuinAGoodTimeIn4Words2: #HurricaneHarvey3: #NoFlag +84: DeVante Parker +75: Jay Thomas -2
Texas friends, please stay safe #Harvey
In Austin, TX. Waiting for Harvey to arrive. #NotTheRabbit #HurricaneHarvey
These are the same dumb asses who think Trump is being used by God.#harvey
World Trends: #وضعي_الحين_يحتاج | #RuinAGoodTimeIn4Words | #HurricaneHarvey| Free Unfollow Unfollowers | https://t.co/kVKbVOiMFC
Un generador portátil para cuando llegue el huracán #Harvey #Univision45 #HouNews https://t.co/OU4QsOcdPM
Hurricane #Harvey Advisory 18A: Harvey Moving Northwestward Toward the Texas Coast. https://t.co/VqHn0uj6EM
Updated rainfall...more rainfall is expected.  Devastating/Catastrophic flooding is expected.  Be Flood Aware!… https://t.co/NP80x9jtmy
Preparations for #Harvey underway..... False forecasts and irresponsible rumors on social media are interfering... https://t.co/UZa4PgtJqT
Be ready for Hurricane #Harvey. Don't be caught off g

Example tweets:

- RT @710KURV: Look closely to the east of the Valley, you can start to see #HurricaneHarvey feeder bands approaching. #RGV https://t.co/q2bS…
- RT @mikebettes: #Harvey getting angry in the Gulf tonight. Be safe everyone. Live broadcasts all night on @weatherchannel https://t.co/LEOt…
- RT @MarkBoone17: #HurricaneHarvey Nothing funny going on here. As a veteran of Katrina I can tell you there will be a "death toll" after. G…
- A&M’s main campus does not expect to close due to #HurricaneHarvey: https://t.co/N3J0LQUOXJ #fox44tx https://t.co/hhZGfekggz
- Prayers to all my friends and family in the Coastal Bend, including the @goicerays, be safe. #HurricaneHarvey
- Every fucking weather source is saying GTFOut of Galveston. And what do these stupid fucks do? Oh we're just gonna pray.#Harvey
- Apache helicopters in low formation over I-10 heading west.  The threat is sinking in.  #txwx #Harvey
- #HurricaneHarvey is is on its way to Rockport. Pray for my Tiki Mug collection.
- Calhoun County: SH 35 closed due to flood debris, https://t.co/gVDf6MjF38 for the latest #Harvey https://t.co/ne76H1xRj5
- .@NWSSanAntonio tells me Onion Creek is 10.5 ft and slowly rising, expected to crest at 21 feet tomorrow… https://t.co/RzI13cDoDV
- I'll be in Dickinson first thing with multiple boats. If you're stranded we are coming for you. Just hold on. #PrayersforTexas #HarveyStorm
- View from the Med Center. More storms rolling in. Coast Guard helicopter nearby performing rescues. #houston… https://t.co/I60xsehDvl
- Tree fell on house in my neighborhood in Woodland Trails West on Fairbanks and Battle Oak. #hurricaneharvy… https://t.co/YcDphnujqB
- Brays Bayou at the Lidstone bridge. To the right is OST, near Produce Row. #harvey2017 #houston https://t.co/Sy9Kon3XPe
- Desperate #Harvey victims turn to social media to get rescued https://t.co/4T9fou4wXr https://t.co/1oqqT5AER8
- #BREAKING: The I-10 westbound between Beaumont and Winnie is now closed due to flooding.#Harvey #TexasFloods
- Trini Mendenhall Center on 1414 Wirt Road in Spring a ranch is open as a shelter and has supplies. #Harvey #springbranch

TODO: Can we make it easier to find location tweets by pre-loading the lexicon of anchor strings from Wikipedia?

In [None]:
lexicon_file = '/hg190/corpora/crosswikis-data.tar.bz2/string_mention_counts.bz2'
wiki_lexicon = [l.strip().split('\t')[0] for l in gzip.open(lexicon_file)]
print('lexicon has %d entries'%(len(wiki_lexicon)))
print(','.join(wiki_lexicon[:10]))

## Irma

In [105]:
keyword = '#irma'
# irma_files = filter(lambda x: keyword in x and x.endswith('.gz'), [os.path.join(data_dir, f) for f in os.path.listdir(data_dir)])
irma_file = '../../data/mined_tweets/#Irma,#HurricaneIrma,#Harvey,#HurricaneHarvey_2017-08-17_2017-09-14.gz'
keyword_matcher = re.compile(keyword)
irma_tweets = []
for i, l in enumerate(gzip.open(irma_file, 'r')):
    l = l.strip()
    l_split = l.split('\t')
    irma_tweets.append(l_split)
irma_tweets = pd.DataFrame(irma_tweets[1:], columns=irma_tweets[0])
print('%d %s tweets'%(irma_tweets.shape[0], keyword))

14200 #irma tweets


Convert date to usable format first.

In [109]:
# convert date
date_fmt_mini = '%Y-%m-%d %H:%M'
irma_tweets = irma_tweets.assign(created_at=irma_tweets.loc[:, 'date'].apply(lambda x: datetime.strptime(x, date_fmt_mini)))
irma_tweets = irma_tweets.assign(created_at_day=irma_tweets.loc[:, 'created_at'].apply(lambda x: datetime(*x.timetuple()[:3])))
irma_tweets.sort_values('created_at_day', inplace=True)

In [110]:
for date, date_group in irma_tweets.groupby('created_at_day'):
    date_group = date_group[date_group.loc[:, 'text'].apply(lambda x: not rt_matcher.match(x))]
    print('tweet sample from %s'%(date))
    print('\n'.join(date_group.head(sample_size).loc[:, 'text'].apply(lambda x: x.replace('\n', ''))))

tweet sample from 2017-09-10 00:00:00
"This guys saving dogs!!! #hurricane #storm #miami #irma pic.twitter.com/FreNqNXgFG"
"This guys saving dogs!!! #hurricane #storm #miami #irma pic.twitter.com/FreNqNXgFG"
tweet sample from 2017-09-13 00:00:00
"#AfterIrma #IrmaRecovery #hurricaneirma #Irma #AfterIRMA #Hurricane #Jose #ThursdayThoughts ... http:// youtu.be/UkVN3XfxXEs?a"
"“Hearing these kids are going to survive is the best gift I ever could have got,” Coast Guard serviceman #Harvey http:// ow.ly/Lpyk50e02t9"
"RFA Mounts Bay’s ‘Herculean efforts’ to help British Virgin Islanders after Hurricane #Irma Read more: http:// ow.ly/VCc430f88yS pic.twitter.com/An0U7MWtIS"
"#Beyonce had her “faith in humanity” restored watching people come together to help #HurricaneHarvey aftermath. http://www. mytalk1071.com/beyonces-faith -humanity-restored-hurricane-relief-volunteers/ …"
"APTA Update on Hurricane #Irma @APTA_info http:// masstransitmag.com/12367562"
"Travel to the Florida Keys in this #360

This makes no sense. Why are there so few #Irma tweets?? It made landfall in Florida on 9/10.

Let's try a bigger dataset.

In [111]:
full_harvey_irma_file = '../../data/mined_tweets/#Irma,#Harvey,#HurricaneIrma,#HurricaneHarvey_combined_data.tsv'
harvey_irma_tweet_df = pd.read_csv(full_harvey_irma_file, sep='\t', index_col=0)
print(harvey_irma_tweet_df.head())

   index       username                                               text  \
0    1.0    westpalmbch  "NOW: I-95 in West Palm Beach. Hurricane warni...   
1    2.0  OfficialJoelF  "Meanwhile in Hialeah #Irma (Cred: @Armani_Bla...   
2    3.0   AfricanaCarr  "Yemaja and Osun, guide the people of #Cuba #F...   
3    4.0  WeatherNation  "NEW: This video was shot just minutes ago of ...   
4    5.0            ABC  "LATEST: Hurricane #Irma now 105 miles SE of K...   

   geo            id retweet_count  favorite_count place_name  \
0  NaN  9.064888e+17            42            54.0        NaN   
1  NaN  9.066287e+17          1662          1975.0        NaN   
2  NaN  9.066582e+17            12            34.0        NaN   
3  NaN  9.066637e+17           242           227.0        NaN   
4  NaN  9.066576e+17           368           341.0        NaN   

            created_at  
0  2017-09-09 08:05:00  
1  2017-09-09 17:21:00  
2  2017-09-09 19:18:00  
3  2017-09-09 19:40:00  
4  2017-09-09 1

  interactivity=interactivity, compiler=compiler, result=result)


In [119]:
keyword = 'irma'
keyword_matcher = re.compile(keyword)
has_keyword_tweets = harvey_irma_tweet_df.loc[:, 'text'].apply(lambda x: len(keyword_matcher.findall(str(x))) > 0)
full_irma_df = harvey_irma_tweet_df[has_keyword_tweets]
print('%d %s tweets in full dataframe'%(full_irma_df.shape[0], keyword))

34982 irma tweets in full dataframe


In [122]:
date_fmt_mini = '%Y-%m-%d %H:%M:%S'
full_irma_df = full_irma_df.assign(created_at=full_irma_df.loc[:, 'created_at'].apply(lambda x: datetime.strptime(x, date_fmt_mini)))
full_irma_df = full_irma_df.assign(created_at_day=full_irma_df.loc[:, 'created_at'].apply(lambda x: datetime(*x.timetuple()[:3])))
full_irma_df.sort_values('created_at_day', inplace=True)

In [124]:
sample_size = 500
for date, date_group in full_irma_df.groupby('created_at_day'):
    date_group = date_group[date_group.loc[:, 'text'].apply(lambda x: not rt_matcher.match(x))]
    print('tweet sample from %s'%(date))
    print('\n'.join(date_group.head(sample_size).loc[:, 'text'].apply(lambda x: x.replace('\n', ''))))

tweet sample from 2017-08-23 00:00:00

tweet sample from 2017-08-26 00:00:00

tweet sample from 2017-08-27 00:00:00

tweet sample from 2017-08-28 00:00:00

tweet sample from 2017-08-29 00:00:00
Cancillería confirma que una familia hondureña damnificada por huracán #Harvey https://t.co/CRQFZGw39M... https://t.co/t0FeqZlwKx
Chairman @USRepRodney statement on Hurricane #Harvey relief: https://t.co/5CNnvvBTIU
tweet sample from 2017-08-30 00:00:00
#Harvey2017 que pase el año que El presidente de EEUUAA no firmará el acuerdo de París tiene algo de irónico
AFPespanol: #ÚLTIMAHORA Confirman la muerte de seis miembros de una misma familia en Houston por #Harvey #AFP
tweet sample from 2017-08-31 00:00:00
News - #Irma intensifies into hurricane, track tropical activity - The Weather Network #tsirma  https://t.co/O8tZNmJWPg
Previsión de la evolución en #MSLP y #Viento del ciclón #Irma en #GFS para las próximas 100 horasSe confirma... https://t.co/25NdaPbRER
tweet sample from 2017-09-02 00:00:00
#I

Example tweets:

- Lost power down in Vero. God keep everyone riding out safe 🙏🏼 #Irma2017 #goawayirma #FoxNews #verobeachfl
- LIVE Webcam - Lauderdale by the Sea beach cam, Florida - #HurricaneIrma #hurricaneirma2017 https://t.co/0ekDmRY4xG
- #HurricaneIrma #Hurricane #Irma #NWSKeyWest #huricaneirma2017 #Florida #FloridaHurricane @MetroPulseUSA #KeyWest… https://t.co/NYbP88mV3u

Nothing doing!

What if we focus just on the day of landfall (9/10/17) and use data scraped from the archive?

In [146]:
from ast import literal_eval
irma_month = 'Sep'
irma_year = 17
irma_start_day = 10
irma_end_day = 13
irma_days = range(irma_start_day, irma_end_day)
irma_archive_file_base = '../../data/mined_tweets/archive_#Irma,#HurricaneIrma,#Harvey,#HurricaneHarvey_%s-%02d-%d.gz'
irma_archive_files = [irma_archive_file_base%(month, day, year) for day in irma_days]
# load data
irma_archive_tweets = []
for f in irma_archive_files:
    for l in gzip.open(irma_archive_file, 'r'):
        try:
    #         t = json.loads(l.strip())
            t = literal_eval(l.strip())
            t_info = [t.get(f) for f in relevant_fields]
            irma_archive_tweets.append(t_info)
        except Exception, e:
            pass
irma_archive_tweets = pd.DataFrame(irma_archive_tweets, columns=relevant_fields)
keyword = 'irma'
keyword_matcher = re.compile(keyword)
has_keyword = irma_archive_tweets.loc[:, 'text'].apply(lambda x: len(keyword_matcher.findall(x.encode('utf-8').lower())) > 0)
irma_archive_tweets = irma_archive_tweets[has_keyword]
print('%d archive tweets'%(irma_archive_tweets.shape[0]))
rt_matcher = re.compile('^RT.*')
irma_archive_tweets_original = irma_archive_tweets[irma_archive_tweets.loc[:, 'text'].apply(lambda x: not rt_matcher.match(x))]
print('%d original archive tweets'%(irma_archive_tweets_original.shape[0]))

35076 archive tweets
5517 original archive tweets


Get rounded date again and sort.

In [148]:
date_fmt = '%a %b %d %H:%M:%S +0000 %Y'
irma_archive_tweets_original = irma_archive_tweets_original.assign(created_at=irma_archive_tweets_original.loc[:, 'created_at'].apply(lambda x: datetime.strptime(x, date_fmt)))
irma_archive_tweets_original = irma_archive_tweets_original.assign(created_at_day=irma_archive_tweets_original.loc[:, 'created_at'].apply(lambda x: datetime(*x.timetuple()[:3])))
irma_archive_tweets_original.sort_values('created_at_day', inplace=True)

In [150]:
sample_size = 1000
for date, date_group in irma_archive_tweets_original.groupby('created_at_day'):
    print('tweet sample from %s'%(date))
    print('\n'.join(date_group.head(sample_size).loc[:, 'text']))

tweet sample from 2017-09-09 00:00:00
@flightradar24 exodus of light aircraft from Daytona Beach #IrmaHurricane2017 https://t.co/jnnMEEVdnj
¿Ya giras niña?, ya conoceras Miami el año que viene...
Portate bien ;)

#Irma.
#IrmaHurri
Check out the Hurricane Urma Melbourne Florida channel on Zello. #zello https://t.co/HXiwqefxtq
@Regrann from @kristaplin  -  A wave crashes 30 feet over the Bal Harbour Jetty #HurricaneIrma… https://t.co/h1buwfWeqi
#IrmaEnTN Seria bueno que el chiquitín diga q quiso decir  con "Cuba es un país difícil"
AT Hurricane #Irma Advisory: Update-2017-09-09
22:00 GMT, 110 kt/125 mph winds, 933 mb,  23.4N 80.7W, moving: WNW at 7 kt/8 mph #tropics
Even though the path has shifted, it doesn't mean going back is safe. #Miami #FLwx #hurricaneirma2017 #HurricaneIrma https://t.co/8SJrCNbrb5
#IrmaHurricane2017 this hurricane rains the earth within 6 months, causing earthquake demolitions in the aftershock impact
Watching news from here I feel so helpless. #Irma is getting re

Examples:

- Last updates show a westernmost path of #Irma. Bad news for SWFlorida &amp; #FloridaKeys. #HurricaineIrma will across o… https://t.co/lwOjepwBt9
- hollywood beach broadwalk .. suns coming up. @wsvn #HurricaneIrma https://t.co/eELl4evrJE
- Starting to get windy here in St Pete💨 #Irma
- #Irma Update: Naples, Sarasota and Tampa will see the brunt of #Irma. Hurricane-force winds can extend over 50 mile… https://t.co/FYJRoYGG5f
- Yes, they are #surfing at Chastain Beach, "the rocks," in Stuart on #HurricaneIrma blown waves. #TCWeather https://t.co/feiabAxsPy
- Stay strong Sanibel and Captiva. Al my thoughts and love are with you❤️ #Irma
- Publix on Cap Cir NE almost entirely out of bread. Luckily checkout lines aren't too long @abc27 #HurricaneIrma https://t.co/eFghR6FBEr
- I live in Parrish and this is the projected path of the eye, this should be fun #HurricaneIrma https://t.co/ERnn2U8gln
- In northern ATL because of #Irma ? FREE!! EOP is an awesome place! Eat at O4W Pizza 1st for a wonderful break from… https://t.co/iRi2CJWpTi
- COLLIER: FIRST BAPTIST IS NOW CLOSED. THERE ARE 425 SEATS LEFT AT GULF COAST HIGH SCHOOL. #HurricaneIrma
- For all my #em4hc colleagues curious about #Irma #hospital #evacuation. Also, my recent post on the topic… https://t.co/dK6V7qtc0c
- Just got word from my buddy in Lauderdale that his neighbor is saying #Irma will spin into gulf if not on top of #keywest in morning.

**Conclusion:**

We need to standardize the file formats because parsing through each one separately is a pain in the neck.

Also, in terms of ambiguity it appears Maria >>> Harvey > Irma.