Useful Pandas code snippets:
* https://gist.github.com/bsweger/e5817488d161f37dcbd2 
* https://jeffdelaney.me/blog/useful-snippets-in-pandas/

In [4]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style='white')
sns.set(style='whitegrid', color_codes=True)
pd.options.display.max_columns = 0 # Display all columns in the dataframe.
pd.set_option('display.float_format', lambda x: '%.2f' % x) # Change display format from scientific notation to floats with two decimal places.
pd.set_option('display.max_colwidth', -1) # Display full contents of dataframe.

### Load Dataframe from H5 File

In [2]:
hdf = pd.HDFStore("tweets.h5")
df = hdf["tweets"]
df

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
715194383742672897,2016-03-30 15:10:03,3.21,Mini Mantras,False,,,en,,https://www.twitter.com/ProWhites/status/71519...,DIVERSITY is never imposed on non-White areas....,Pro-Whites,ProWhites,Spreading the irrefutable truth about #WhiteGe...,False,1551,832,2013-01-15,White Country,Pacific Time (US & Canada),3056,490,"[whitegenocide, brexit]",2,[],0,[],0,[pic.twitter.com/pvS14tjlm1],1,[photo]
715194868612595714,2016-03-30 15:11:59,0.97,Twitter for Android,False,715185221294010371,3342929843,en,,https://www.twitter.com/WSBet23/status/7151948...,@StrongerIn @standardnews I am sure London has...,23.06.16 UKIndepDay,WSBet23,The establishment and other profiteers of EU w...,False,6,148,2015-04-11,,,212,910,[],0,"[StrongerIn, standardnews]",2,[],0,[],0,[]
715194899453362176,2016-03-30 15:12:06,0.38,Twitter for iPhone,False,715151830020399105,3342929843,en,,https://www.twitter.com/MockLabour/status/7151...,"@StrongerIn no thanks, I'm voting Leave becaus...",BHAFC,MockLabour,"Exposing Labour. the racism, misogyny, homopho...",False,1068,1038,2015-11-14,Fabulous Sussex,,18910,273,[brexit],1,[StrongerIn],1,[],0,[],0,[]
715195086158606336,2016-03-30 15:12:50,7.05,TweetDeck,False,,,en,,https://www.twitter.com/wdjstraw/status/715195...,".@vote_leave bosses support NHS privatisation,...",Will Straw,wdjstraw,Executive Director of Britain Stronger in Euro...,False,28635,1621,2009-03-14,United Kingdom,London,10634,2,"[nhs, strongerin]",2,[vote_leave],1,[],0,[pic.twitter.com/a4twQRDyOh],1,[photo]
715195159491817472,2016-03-30 15:13:08,7.25,Buffer,False,,,en,,https://www.twitter.com/jeremyghez/status/7151...,Is this because they think a Brexit is now lik...,jeremyghez,jeremyghez,Professor of Economics and International Affai...,False,325,888,2008-12-31,Paris,Athens,3072,131,"[bankofamerica, brexit]",2,[],0,[https://t.co/MLrgdRlIWL],1,[],0,[]
715195401549320192,2016-03-30 15:14:06,4.68,TweetDeck,False,,3342929843,en,,https://www.twitter.com/BurgisBullock/status/7...,@StrongerIn speaking at #EU #referendum debate...,Burgis Bullock,BurgisBullock,"Accountants with a personal, partner led servi...",False,423,81,2011-07-27,"Warwickshire, UK",London,3590,53,"[eu, referendum]",2,"[StrongerIn, leamingtonspa]",2,[https://t.co/aXDwMy3TPI],1,[],0,[]
715195410185330688,2016-03-30 15:14:08,2.39,Twitter Web Client,False,,,en,,https://www.twitter.com/TomCalver2/status/7151...,German satirical show ZDF Heute-Show pokes fun...,Tom Calver,TomCalver2,Turtle-neck activist & freelance investigative...,False,1362,1280,2013-11-07,"London, England",,733,198,[brexit],1,[],0,[https://t.co/p9jn8Qzyby],1,[],0,[]
715195487838715905,2016-03-30 15:14:26,4.61,TweetDeck,False,715193232460804096,4904820549,en,,https://www.twitter.com/Langworthy_47/status/7...,@Xians4EU And Turkey has almost as many placem...,Sam Wide §50,Langworthy_47,EU's subsidiarity is more honoured in the brea...,False,171,674,2011-08-21,British Isles and Axarquía,Madrid,5978,70,"[strongerin, brexit, euref]",3,[Xians4EU],1,[],0,[],0,[]
715195543404822530,2016-03-30 15:14:39,3.55,Twitter Web Client,False,,,en,,https://www.twitter.com/GovUCC/status/71519554...,Check out contributions by our own @MaryCMurph...,GovUCC,GovUCC,"News, events and updates from the Department o...",False,389,65,2012-09-11,UCC,Dublin,1671,7,[brexit],1,"[MaryCMurphy, IrishTimes]",2,[https://t.co/PhE0WG8akR],1,[],0,[]
715195614351507463,2016-03-30 15:14:56,2.14,Twitter Web Client,False,715192021766512643,715173859486466049,en,,https://www.twitter.com/foto2021/status/715195...,@Steel4GO @Grassroots_Out EU membership better...,foto2021,foto2021,We have the character of an island nation: ind...,False,139,226,2014-02-08,"Gloucestershire, England",Pacific Time (US & Canada),12812,3954,[],0,"[Steel4GO, Grassroots_Out]",2,[],0,[],0,[]


### Examine the Dataframe and Remove Faulty, Unrelated, and Spam Values

First five rows.

In [3]:
df.head()

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
715194383742672897,2016-03-30 15:10:03,3.21,Mini Mantras,False,,,en,,https://www.twitter.com/ProWhites/status/71519...,DIVERSITY is never imposed on non-White areas....,Pro-Whites,ProWhites,Spreading the irrefutable truth about #WhiteGe...,False,1551,832,2013-01-15,White Country,Pacific Time (US & Canada),3056,490,"[whitegenocide, brexit]",2,[],0,[],0,[pic.twitter.com/pvS14tjlm1],1,[photo]
715194868612595714,2016-03-30 15:11:59,0.97,Twitter for Android,False,7.151852212940104e+17,3342929843.0,en,,https://www.twitter.com/WSBet23/status/7151948...,@StrongerIn @standardnews I am sure London has...,23.06.16 UKIndepDay,WSBet23,The establishment and other profiteers of EU w...,False,6,148,2015-04-11,,,212,910,[],0,"[StrongerIn, standardnews]",2,[],0,[],0,[]
715194899453362176,2016-03-30 15:12:06,0.38,Twitter for iPhone,False,7.151518300203991e+17,3342929843.0,en,,https://www.twitter.com/MockLabour/status/7151...,"@StrongerIn no thanks, I'm voting Leave becaus...",BHAFC,MockLabour,"Exposing Labour. the racism, misogyny, homopho...",False,1068,1038,2015-11-14,Fabulous Sussex,,18910,273,[brexit],1,[StrongerIn],1,[],0,[],0,[]
715195086158606336,2016-03-30 15:12:50,7.05,TweetDeck,False,,,en,,https://www.twitter.com/wdjstraw/status/715195...,".@vote_leave bosses support NHS privatisation,...",Will Straw,wdjstraw,Executive Director of Britain Stronger in Euro...,False,28635,1621,2009-03-14,United Kingdom,London,10634,2,"[nhs, strongerin]",2,[vote_leave],1,[],0,[pic.twitter.com/a4twQRDyOh],1,[photo]
715195159491817472,2016-03-30 15:13:08,7.25,Buffer,False,,,en,,https://www.twitter.com/jeremyghez/status/7151...,Is this because they think a Brexit is now lik...,jeremyghez,jeremyghez,Professor of Economics and International Affai...,False,325,888,2008-12-31,Paris,Athens,3072,131,"[bankofamerica, brexit]",2,[],0,[https://t.co/MLrgdRlIWL],1,[],0,[]


Last five rows.

In [4]:
df.tail()

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
745829853870579715,2016-06-23 04:04:28,0.68,愛液で潮吹き！！,False,,,en,,https://www.twitter.com/mumanashahad/status/74...,☑MORE FREE 395 FOLLOWERS ▶https://t.co/Pd8F8TN...,Muman Ashahd (Koita),mumanashahad,I Proud To Be an Indian I Proud To Be an India...,False,98,527,2015-10-18,"Andheri West, Mumbai",,94,40,"[londonstorm, strongerin]",2,[],0,[https://t.co/Pd8F8TNfGe],1,[],0,[]
745829854994661376,2016-06-23 04:04:28,5.14,Twitter for iPad,False,7.457510784661299e+17,174646175.0,en,,https://www.twitter.com/ShurdyRover/status/745...,@COLRICHARDKEMP @Lou_i5e @nlygo @MelanieLatest...,The Goal Machine,ShurdyRover,Supporter of all things Glos & Hereford - Tige...,False,140,189,2011-05-06,Hereford,Hawaii,4789,112,[],0,"[COLRICHARDKEMP, Lou_i5e, nlygo, MelanieLatest...",5,[],0,[],0,[]
745829855737032704,2016-06-23 04:04:29,0.41,Arduino,False,,,en,,https://www.twitter.com/Tian_A1/status/7458298...,"#WallStreet dips, all eyes on #British #Refere...",Tian_A1 News,Tian_A1,| The best way to predict your future is to cr...,False,643,334,2016-01-23,"Arcachon, France",Paris,49492,70,"[wallstreet, british, referendum, reuters, fir...",7,[],0,[https://t.co/jCSJ4XO7TN],1,[],0,[]
745829877220319232,2016-06-23 04:04:34,3.15,Twitter Web Client,False,7.45751170304475e+17,1920286333.0,en,,https://www.twitter.com/RandolpheLibre/status/...,@JihadistJoe @weltec2 Vote #Leave and resist t...,Jane Smith,RandolpheLibre,,False,953,1891,2013-04-30,Dar al-Harb,Solomon Is.,6668,608,[leave],1,"[JihadistJoe, weltec2]",2,[],0,[],0,[]
745829881267822592,2016-06-23 04:04:35,0.03,Twitter Web Client,False,,,en,,https://www.twitter.com/Trumpindahouse/status/...,DONT SCREW UP! THIS IS THE MOST IMPORTANT VOTE...,#StrongDonald,Trumpindahouse,People who call Donald Trump weak cannot even ...,False,27,26,2016-06-12,"Tampa, FL",,76,6,"[brexit, voteleave]",2,[],0,[],0,[],0,[]


Random sample of rows.

In [5]:
df.sample(5)

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
745264718915440640,2016-06-21 14:38:49,3.72,Twitter Web Client,False,,,en,,https://www.twitter.com/FiveRights/status/7452...,"""Christianity will dominate world"" = hate spee...",Philip Schuyler,FiveRights,Libertarian for #Trump2016. #AmericaFirst #Br...,False,78361,69347,2012-10-01,,Eastern Time (US & Canada),48939,0,[leave],1,[],0,[],0,[pic.twitter.com/IviLd04DaN],1,[photo]
743904646234836992,2016-06-17 20:34:23,0.28,Twitter for Android,False,7.439011129992397e+17,2223592112.0,en,,https://www.twitter.com/niceguyinblue/status/7...,@UK__News #Brexit is a terrorist word like Is...,Uk bloke,niceguyinblue,Ex Royal Engineer Ran own business since IN ...,False,631,1888,2016-03-06,"Gainsborough, England",,3083,2356,[brexit],1,[UK__News],1,[],0,[],0,[]
732269526369230848,2016-05-16 18:00:34,3.1,TweetDeck,False,,,en,,https://www.twitter.com/brentgofftv/status/732...,In 1 hour. WorldNews from #Berlin. Join us her...,Brent Goff,brentgofftv,Chief News Anchor @DWNews TV http://muckrack.c...,True,2299,1102,2013-04-09,"Berlin, Germany",Bern,10337,1101,"[berlin, brexit, china, monday]",4,[dwnews],1,[https://t.co/5Xu4hPc9v6],1,[pic.twitter.com/8WXGGuVBA2],1,[photo]
743151201311723524,2016-06-15 18:40:28,1.28,Twitter Web Client,False,7.431505974241649e+17,3068507543.0,en,,https://www.twitter.com/WeAreThe59/status/7431...,besides support for Remain is NOT weakening in...,Jeremy Blackwell,WeAreThe59,Proud gay Socialist Scottish Patriot with a fu...,False,1275,1338,2015-03-03,Creag Longairt and Paris,London,26426,860,"[scotland, strongerin]",2,[],0,[],0,[],0,[]
744933011976785924,2016-06-20 16:40:44,0.0,Twitter for Android,False,7.449308279187866e+17,4216879030.0,en,,https://www.twitter.com/Mgt0wer/status/7449330...,"@GodfreyElfwick I stopped reading at ""bad peop...",James Carlyle Dejour,Mgt0wer,"PHD, Right-leaning, Facts, Logic, National Ide...",False,8,53,2016-06-19,,,122,207,[voteleave],1,[GodfreyElfwick],1,[],0,[],0,[]


Number of rows/columns in a tuple.

In [6]:
df.shape

(2598609, 30)

Measures of central tendency for integer columns.

In [7]:
df.describe()

Unnamed: 0,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count
count,2598609.0,2598609.0,2598609.0,2598609.0,2598609.0,2598609.0,2598609.0,2598609.0
mean,8577.87,1429.14,30140.7,4287.85,1.7,0.91,0.3,0.18
std,194856.22,5630.3,136400.23,13199.66,1.54,1.2,0.47,0.44
min,-1.0,-128.0,-1.0,-1.0,0.0,0.0,0.0,0.0
25%,113.0,161.0,1248.0,80.0,1.0,0.0,0.0,0.0
50%,456.0,516.0,5150.0,589.0,1.0,0.0,0.0,0.0
75%,1611.0,1425.0,18133.0,2949.0,2.0,1.0,1.0,0.0
max,62517720.0,1570110.0,6211105.0,1229284.0,21.0,12.0,5.0,5.0


Investigate tweets with negative user_followers_count, user_friends_count, user_statuses_count, and user_favourites_count values. Save them in dataframes. The Twitter Developers Forum indicates that this is because the data took too long to fetch (the page referenced for further information no longer exists): https://twittercommunity.com/t/negative-followers-count/14604

In [8]:
negative_friends = df.loc[df['user_friends_count'] < 0]
print(len(negative_friends))
negative_friends # 28 observations have a negative user_friends_count.

28


Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
724872595129126912,2016-04-26 08:07:48,8.08,Twitter Web Client,False,,,en,,https://www.twitter.com/pablothehat/status/724...,This is the REAL THREAT TO THE NHS.. TTIP deal...,pablothehat,pablothehat,,False,-1,-1,2008-03-29,Bishops Castle,London,-1,-1,"[lbc, voteleave]",2,[],0,[https://t.co/tUX3mDjkx5],1,[],0,[]
727865576056557568,2016-05-04 14:20:51,7.29,Twitter for iPhone,False,,,en,,https://www.twitter.com/ChristianCooper/status...,A quick scroll through the #Brexit headlines/t...,Christian H. Cooper,ChristianCooper,"Trader, Fellow @TrumanProject & Term Member at...",False,3587,-85,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3126,1971,[brexit],1,[],0,[],0,[],0,[]
729593161941659648,2016-05-09 08:45:39,7.3,Twitter for iPad,False,,,en,,https://www.twitter.com/ChristianCooper/status...,Cameron's speech on #brexit is like Romey's sp...,Christian H. Cooper,ChristianCooper,"Trader, Fellow @TrumanProject & Term Member at...",False,3588,-84,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3037,2055,[brexit],1,[],0,[],0,[],0,[]
729819178849312768,2016-05-09 23:43:46,7.3,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,"Since 2009, there has been wage growth only f...",Christian H. Cooper,ChristianCooper,"Trader, Fellow @TrumanProject & Term Member at...",False,3590,-97,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3049,2058,"[presidenttrump, brexit]",2,[],0,[],0,[pic.twitter.com/CSI4erPEkD],1,[photo]
732922625870024705,2016-05-18 13:15:45,7.33,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,#Bremain pulls ahead sharply in new poll. Phon...,Christian H. Cooper,ChristianCooper,"Trader, Fellow @TrumanProject & Term Member at...",False,3626,-113,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3185,2116,[bremain],1,[],0,[https://t.co/ZbGkj7hmAO],1,[pic.twitter.com/GG2hVBeLkZ],1,[photo]
734391700357947392,2016-05-22 14:33:20,7.34,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,Exclusive footage of Boris using the force to ...,Christian H. Cooper,ChristianCooper,"Derivatives trader, fellow @TrumanProject & Te...",False,3626,-93,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3332,2285,[brexit],1,[],0,[https://t.co/1pqLSh7X0T],1,[],0,[]
734813481443065856,2016-05-23 18:29:20,7.34,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,The markets are consistently ignoring all poll...,Christian H. Cooper,ChristianCooper,"Derivatives trader, fellow @TrumanProject & Te...",False,3634,-93,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3362,2298,[brexit],1,[],0,[],0,[pic.twitter.com/XI5GYsCov7],1,[photo]
734863502460887044,2016-05-23 21:48:06,7.34,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,#Brexit Anyone? EU suspends visa-free travel t...,Christian H. Cooper,ChristianCooper,"Derivatives trader, fellow @TrumanProject & Te...",False,3635,-99,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3370,2298,[brexit],1,[],0,[https://t.co/LbacN3ZVnh],1,[pic.twitter.com/uqyUrvxAlq],1,[photo]
736628143692226561,2016-05-28 18:40:10,7.35,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,The EU is not immune to the world-wide shift r...,Christian H. Cooper,ChristianCooper,"Derivatives trader, fellow @TrumanProject & Te...",False,3641,-115,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3378,2331,[brexit],1,[],0,[],0,[pic.twitter.com/A8g1RB5Kvw],1,[photo]
737395816592080896,2016-05-30 21:30:37,7.36,Twitshot.com,False,,,en,,https://www.twitter.com/ChristianCooper/status...,Been saying this for months: #Brexit happens t...,Christian H. Cooper,ChristianCooper,"Derivatives trader, fellow @TrumanProject & Te...",False,3635,-93,2009-01-20,NYC and Tehran,Eastern Time (US & Canada),3408,2336,"[brexit, roughsummer]",2,[],0,[https://t.co/FUBnMZtLJW],1,[pic.twitter.com/w2h2cYJP72],1,[photo]


In [9]:
negative_followers = df.loc[df['user_followers_count'] < 0] 
negative_followers # 2 observations have a negative user_followers_count.

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
724872595129126912,2016-04-26 08:07:48,8.08,Twitter Web Client,False,,,en,,https://www.twitter.com/pablothehat/status/724...,This is the REAL THREAT TO THE NHS.. TTIP deal...,pablothehat,pablothehat,,False,-1,-1,2008-03-29,Bishops Castle,London,-1,-1,"[lbc, voteleave]",2,[],0,[https://t.co/tUX3mDjkx5],1,[],0,[]
738129289653190659,2016-06-01 22:05:11,7.1,Twitter for Android,False,,,en,,https://www.twitter.com/jedifoxy/status/738129...,Nailed it Mr Sachs. #newsnight. #identity #Nat...,Foxy Stoat,jedifoxy,Spunking reason into the face of calamity,False,-1,-1,2009-04-28,City 17,London,-1,-1,"[newsnight, identity, nationalism, culture, br...",5,[],0,[],0,[],0,[]


In [10]:
negative_statuses = df.loc[df['user_statuses_count'] < 0] 
negative_statuses # The same 2 observations have a negative user_statuses_count.

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
724872595129126912,2016-04-26 08:07:48,8.08,Twitter Web Client,False,,,en,,https://www.twitter.com/pablothehat/status/724...,This is the REAL THREAT TO THE NHS.. TTIP deal...,pablothehat,pablothehat,,False,-1,-1,2008-03-29,Bishops Castle,London,-1,-1,"[lbc, voteleave]",2,[],0,[https://t.co/tUX3mDjkx5],1,[],0,[]
738129289653190659,2016-06-01 22:05:11,7.1,Twitter for Android,False,,,en,,https://www.twitter.com/jedifoxy/status/738129...,Nailed it Mr Sachs. #newsnight. #identity #Nat...,Foxy Stoat,jedifoxy,Spunking reason into the face of calamity,False,-1,-1,2009-04-28,City 17,London,-1,-1,"[newsnight, identity, nationalism, culture, br...",5,[],0,[],0,[],0,[]


In [11]:
negative_favourites = df.loc[df['user_favourites_count'] < 0]
negative_favourites # 6 observations have a negative user_favourites_count.

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
724872595129126912,2016-04-26 08:07:48,8.08,Twitter Web Client,False,,,en,,https://www.twitter.com/pablothehat/status/724...,This is the REAL THREAT TO THE NHS.. TTIP deal...,pablothehat,pablothehat,,False,-1,-1,2008-03-29,Bishops Castle,London,-1,-1,"[lbc, voteleave]",2,[],0,[https://t.co/tUX3mDjkx5],1,[],0,[]
734707134441660416,2016-05-23 11:26:45,3.79,TweetDeck,False,,,en,,https://www.twitter.com/lewisworrow/status/734...,"The UK would lose 500,000 jobs after vote to l...",Lewis Worrow,lewisworrow,"CEO of @ChaissonCassius, Investor, Trustee, Co...",False,580,357,2012-08-09,"England, United Kingdom",London,2725,-1,[brexit],1,[ReutersBiz],1,[https://t.co/ELcmpNKxLv],1,[pic.twitter.com/sxg2Ijkq44],1,[photo]
738129289653190659,2016-06-01 22:05:11,7.1,Twitter for Android,False,,,en,,https://www.twitter.com/jedifoxy/status/738129...,Nailed it Mr Sachs. #newsnight. #identity #Nat...,Foxy Stoat,jedifoxy,Spunking reason into the face of calamity,False,-1,-1,2009-04-28,City 17,London,-1,-1,"[newsnight, identity, nationalism, culture, br...",5,[],0,[],0,[],0,[]
742329620758990848,2016-06-13 12:15:48,3.84,Twitter Web Client,False,,,en,,https://www.twitter.com/lewisworrow/status/742...,"When it comes to the #EUref, should the United...",Lewis Worrow,lewisworrow,"CEO of @ChaissonCassius, author of a Treatise ...",False,585,381,2012-08-09,"England, United Kingdom",London,2762,-1,[euref],1,[],0,[],0,[],0,[]
742329812216360961,2016-06-13 12:16:33,3.84,Twitter Web Client,False,,,en,,https://www.twitter.com/lewisworrow/status/742...,"When it comes to the #EUref, should the United...",Lewis Worrow,lewisworrow,"CEO of @ChaissonCassius, author of a Treatise ...",False,585,381,2012-08-09,"England, United Kingdom",London,2762,-1,[euref],1,[],0,[],0,[],0,[]
744880704400535553,2016-06-20 13:12:53,3.86,Twitter Web Client,False,,,en,,https://www.twitter.com/lewisworrow/status/744...,Probability of Remain vote in Britain's #EURef...,Lewis Worrow,lewisworrow,"CEO of @ChaissonCassius, author of a Treatise ...",False,587,400,2012-08-09,"England, United Kingdom",London,2778,-1,[euref],1,[ReutersBiz],1,[https://t.co/t0YYnNYV1O],1,[pic.twitter.com/uazP9qUikS],1,[photo]


In [12]:
df.loc[744880704400535553]

created_at                                             2016-06-20 13:12:53
account_age                                                           3.86
source                                                  Twitter Web Client
truncated                                                            False
in_reply_to_status_id                                                 None
in_reply_to_user_id                                                   None
lang                                                                    en
coordinates                                                           None
tweet_url                https://www.twitter.com/lewisworrow/status/744...
text                     Probability of Remain vote in Britain's #EURef...
user_name                                                     Lewis Worrow
user_screen_name                                               lewisworrow
user_description         CEO of @ChaissonCassius, author of a Treatise ...
user_verified            

Append all IDs of tweets with a negative user count to a list (represented by the indexes in the four dataframes of negative user counts), and count the unique IDs in this list. There are 32 (6 tweets contain multiple negative user counts).

In [13]:
negative_user_counts = []

for tweet_id in negative_friends.index:
    negative_user_counts.append(tweet_id)
    
for tweet_id in negative_followers.index:
    negative_user_counts.append(tweet_id)

for tweet_id in negative_statuses.index:
    negative_user_counts.append(tweet_id)

for tweet_id in negative_favourites.index:
    negative_user_counts.append(tweet_id)    
    
print(len(negative_user_counts))
print(len(set(negative_user_counts)))

38
32


Remove the 32 unique tweets with negative user counts from the dataframe.

In [14]:
df = df[~df.index.isin(set(negative_user_counts))]
df

Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
715194383742672897,2016-03-30 15:10:03,3.21,Mini Mantras,False,,,en,,https://www.twitter.com/ProWhites/status/71519...,DIVERSITY is never imposed on non-White areas....,Pro-Whites,ProWhites,Spreading the irrefutable truth about #WhiteGe...,False,1551,832,2013-01-15,White Country,Pacific Time (US & Canada),3056,490,"[whitegenocide, brexit]",2,[],0,[],0,[pic.twitter.com/pvS14tjlm1],1,[photo]
715194868612595714,2016-03-30 15:11:59,0.97,Twitter for Android,False,715185221294010371,3342929843,en,,https://www.twitter.com/WSBet23/status/7151948...,@StrongerIn @standardnews I am sure London has...,23.06.16 UKIndepDay,WSBet23,The establishment and other profiteers of EU w...,False,6,148,2015-04-11,,,212,910,[],0,"[StrongerIn, standardnews]",2,[],0,[],0,[]
715194899453362176,2016-03-30 15:12:06,0.38,Twitter for iPhone,False,715151830020399105,3342929843,en,,https://www.twitter.com/MockLabour/status/7151...,"@StrongerIn no thanks, I'm voting Leave becaus...",BHAFC,MockLabour,"Exposing Labour. the racism, misogyny, homopho...",False,1068,1038,2015-11-14,Fabulous Sussex,,18910,273,[brexit],1,[StrongerIn],1,[],0,[],0,[]
715195086158606336,2016-03-30 15:12:50,7.05,TweetDeck,False,,,en,,https://www.twitter.com/wdjstraw/status/715195...,".@vote_leave bosses support NHS privatisation,...",Will Straw,wdjstraw,Executive Director of Britain Stronger in Euro...,False,28635,1621,2009-03-14,United Kingdom,London,10634,2,"[nhs, strongerin]",2,[vote_leave],1,[],0,[pic.twitter.com/a4twQRDyOh],1,[photo]
715195159491817472,2016-03-30 15:13:08,7.25,Buffer,False,,,en,,https://www.twitter.com/jeremyghez/status/7151...,Is this because they think a Brexit is now lik...,jeremyghez,jeremyghez,Professor of Economics and International Affai...,False,325,888,2008-12-31,Paris,Athens,3072,131,"[bankofamerica, brexit]",2,[],0,[https://t.co/MLrgdRlIWL],1,[],0,[]
715195401549320192,2016-03-30 15:14:06,4.68,TweetDeck,False,,3342929843,en,,https://www.twitter.com/BurgisBullock/status/7...,@StrongerIn speaking at #EU #referendum debate...,Burgis Bullock,BurgisBullock,"Accountants with a personal, partner led servi...",False,423,81,2011-07-27,"Warwickshire, UK",London,3590,53,"[eu, referendum]",2,"[StrongerIn, leamingtonspa]",2,[https://t.co/aXDwMy3TPI],1,[],0,[]
715195410185330688,2016-03-30 15:14:08,2.39,Twitter Web Client,False,,,en,,https://www.twitter.com/TomCalver2/status/7151...,German satirical show ZDF Heute-Show pokes fun...,Tom Calver,TomCalver2,Turtle-neck activist & freelance investigative...,False,1362,1280,2013-11-07,"London, England",,733,198,[brexit],1,[],0,[https://t.co/p9jn8Qzyby],1,[],0,[]
715195487838715905,2016-03-30 15:14:26,4.61,TweetDeck,False,715193232460804096,4904820549,en,,https://www.twitter.com/Langworthy_47/status/7...,@Xians4EU And Turkey has almost as many placem...,Sam Wide §50,Langworthy_47,EU's subsidiarity is more honoured in the brea...,False,171,674,2011-08-21,British Isles and Axarquía,Madrid,5978,70,"[strongerin, brexit, euref]",3,[Xians4EU],1,[],0,[],0,[]
715195543404822530,2016-03-30 15:14:39,3.55,Twitter Web Client,False,,,en,,https://www.twitter.com/GovUCC/status/71519554...,Check out contributions by our own @MaryCMurph...,GovUCC,GovUCC,"News, events and updates from the Department o...",False,389,65,2012-09-11,UCC,Dublin,1671,7,[brexit],1,"[MaryCMurphy, IrishTimes]",2,[https://t.co/PhE0WG8akR],1,[],0,[]
715195614351507463,2016-03-30 15:14:56,2.14,Twitter Web Client,False,715192021766512643,715173859486466049,en,,https://www.twitter.com/foto2021/status/715195...,@Steel4GO @Grassroots_Out EU membership better...,foto2021,foto2021,We have the character of an island nation: ind...,False,139,226,2014-02-08,"Gloucestershire, England",Pacific Time (US & Canada),12812,3954,[],0,"[Steel4GO, Grassroots_Out]",2,[],0,[],0,[]


Confirm that the updated dataframe does not contain any tweets with negative user counts.

In [15]:
df.describe()

Unnamed: 0,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count
count,2598577.0,2598577.0,2598577.0,2598577.0,2598577.0,2598577.0,2598577.0,2598577.0
mean,8577.85,1429.16,30140.97,4287.87,1.7,0.91,0.3,0.18
std,194857.4,5630.33,136401.03,13199.74,1.54,1.2,0.47,0.44
min,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,113.0,161.0,1248.0,80.0,1.0,0.0,0.0,0.0
50%,456.0,516.0,5150.0,589.0,1.0,0.0,0.0,0.0
75%,1611.0,1425.0,18133.0,2949.0,2.0,1.0,1.0,0.0
max,62517720.0,1570110.0,6211105.0,1229284.0,21.0,12.0,5.0,5.0


Remove spam and unrelated tweets: the 137 that had been manually identified, as well as other tweets from unrelated users identified through this process.

In [5]:
sample_tweets = os.path.expanduser('~/Dropbox/University of Oxford/Oxford Internet Institute/Dissertation/Data/4000_sample_v1.xlsx')
unrelated_df = pd.read_excel(sample_tweets, 'Unrelated or Spam')
unrelated_df

Unnamed: 0,tweet_number,tweet_id,date,time,created_at,tweet_url,text,notes,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_url,media_type_count,media_type,rt_count,like_count,rt_count_data,like_count_data
0,22,742145645209882624,2016-06-13,00:04:44,2016-06-13 00:04:44,https://www.twitter.com/RachidELAIDI/status/742145645209882624,The latest EUROPEAN POLITICS! https://t.co/UJ73ZiBtpY Thanks to @EURefNO @MariusHollenga @Dizzyeek #strongerin #eu,Tweet is a news digest (seems to be automatically generated).,ELAÏDI,RachidELAIDI,No comment,False,795,2072,"['strongerin', 'eu']",2,"['EURefNO', 'MariusHollenga', 'Dizzyeek']",3,['https://t.co/UJ73ZiBtpY'],1,[],0,[],0.00,0.00,0.00,0.00
1,27,742571324061650945,2016-06-14,04:16:14,2016-06-14 04:16:14,https://www.twitter.com/TheEdgeKville/status/742571324061650945,Crash Kings - Mountain Man https://t.co/6gIXnAueLv #TakeControl,Tweet is not relevant to the referendum (links to a song).,The Edge Kerrville,TheEdgeKville,Kerrville's Rock Alternative,False,4,1,['takecontrol'],1,[],0,['https://t.co/6gIXnAueLv'],1,[],0,[],0.00,0.00,0.00,0.00
2,47,745931256618160128,2016-06-23,10:47:24,2016-06-23 10:47:24,https://www.twitter.com/TheEdgeKville/status/745931256618160128,Pearl Jam - Jeremy https://t.co/6gIXnAueLv #TakeControl,Tweet is not relevant to the referendum (links to a song).,The Edge Kerrville,TheEdgeKville,Kerrville's Rock Alternative,False,7,1,['takecontrol'],1,[],0,['https://t.co/6gIXnAueLv'],1,[],0,[],0.00,0.00,0.00,0.00
3,60,735523834087018496,2016-05-25,17:32:02,2016-05-25 17:32:02,https://www.twitter.com/pablothehat/status/735523834087018496,John Pilger: Why Hillary Clinton Is More Dangerous Than Donald Trump https://t.co/oFAjUWmkp4 #lbc #brexitthemovie #voteleave #gogogo,Contains referendum-relevant hashtags but links to an article about US politics.,pablothehat,pablothehat,Biped Humanoid,False,1251,2074,"['lbc', 'brexitthemovie', 'voteleave', 'gogogo']",4,[],0,['https://t.co/oFAjUWmkp4'],1,[],0,[],0.00,0.00,0.00,0.00
4,82,737994672652484609,2016-06-01,13:10:15,2016-06-01 13:10:15,https://www.twitter.com/InforLatam/status/737994672652484609,"Moving to #Infor #Cloud saves Concordia Plan Services $400,000 in annual operating expenses. #ensw https://t.co/QeHFGChOzW",Does not relate to the referendum. Concerns a software company called #Infor.,InforLatinAmerica,InforLatam,Infor es un proveedor líder de software y servicios empresariales del mundo con más de 70.000 clientes http://www.facebook.com/inforlatinamerica,False,1584,2004,"['infor', 'cloud', 'ensw']",3,[],0,['https://t.co/QeHFGChOzW'],1,[],0,[],1.00,0.00,1.00,0.00
5,105,743179022318829568,2016-06-15,20:31:01,2016-06-15 20:31:01,https://www.twitter.com/IBMandInfor/status/743179022318829568,Seamless compatiibility with #Ibm and #Infor. One stop shopping. https://t.co/kZiG0Ir1uf #IBMi https://t.co/c2ziOMz1pB,Does not relate to the referendum. Concerns a software company called #Infor.,IBM Infor Alliance,IBMandInfor,Get the latest updates from the IBM and Infor Alliance. Tweets by Timothy Wilson and follow IBM Social Computing Guidelines.,False,6926,1052,"['ibm', 'infor', 'ibmi']",3,[],0,['https://t.co/kZiG0Ir1uf'],1,['pic.twitter.com/c2ziOMz1pB'],1,['photo'],0.00,1.00,0.00,0.00
6,125,741885525339086848,2016-06-12,06:51:07,2016-06-12 06:51:07,https://www.twitter.com/therealrafaqat/status/741885525339086848,Advice for #Muslim #British Youth: https://t.co/ChqX7P9m9k #IS #ISIL #Daesh #ISIS #UK #MI5 #MI6 #GHCQ #UKIP #EDL #Pegida #GB #BritainFirst,Tweet is not relevant to the referendum; links to video about how to practise Muslim religions in Britain.,Rafaqat Ali Khan,therealrafaqat,I'm pro Imam Mehdi Ra Gohar Shahi #GoharShahi help me Fight the cancer of hatred. (I'm Anti #ISIS #IS #ISIL #Daesh #IslamicState),False,6998,335,"['muslim', 'british', 'is', 'isil', 'daesh', 'isis', 'uk', 'mi5', 'mi6', 'ghcq', 'ukip', 'edl', 'pegida', 'gb', 'britainfirst']",15,[],0,['https://t.co/ChqX7P9m9k'],1,[],0,[],1.00,1.00,1.00,0.00
7,143,736573651080912896,2016-05-28,15:03:37,2016-05-28 15:03:37,https://www.twitter.com/ISIL_SATAN/status/736573651080912896,Advice for #Muslim #British Youth: https://t.co/cYgsGX6vFY #IS #ISIL #Daesh #ISIS #UK #MI5 #MI6 #GHCQ #UKIP #EDL #Pegida #GB #BritainFirst,Tweet is not relevant to the referendum; links to video about how to practise Muslim religions in Britain.,ISIL AGENTS OF DEVIL,ISIL_SATAN,#ISIL #ISIS #Daesh #IslamicState #IS are agents of the devil time the world woke up from its sleep before its too late.,False,2139,1822,"['muslim', 'british', 'is', 'isil', 'daesh', 'isis', 'uk', 'mi5', 'mi6', 'ghcq', 'ukip', 'edl', 'pegida', 'gb', 'britainfirst']",15,[],0,['https://t.co/cYgsGX6vFY'],1,[],0,[],0.00,0.00,0.00,0.00
8,147,733315861184974851,2016-05-19,15:18:20,2016-05-19 15:18:20,https://www.twitter.com/drmohram/status/733315861184974851,I'm so drained these days. Wish to have a webcam chat or meet? Find link in bio. #banen #business #voteleave #fcbbvb https://t.co/4KzUuLj0aU,Does not relate to the referendum. Includes a highly inappropriate photo.,An Lackey,drmohram,My videos and photos here http://goo.gl/KNCUyy?4IKM,False,23,53,"['banen', 'business', 'voteleave', 'fcbbvb']",4,[],0,[],0,['pic.twitter.com/4KzUuLj0aU'],1,['photo'],0.00,0.00,0.00,0.00
9,255,740242759886348288,2016-06-07,18:03:21,2016-06-07 18:03:21,https://www.twitter.com/TrafficDotMy/status/740242759886348288,RT: The latest The mikael mike Daily! https://t.co/WFRwB1Vyw0 Thanks to @am_Techaddict @ToyUtopia @AceOfHeartsDogs #remain,Does not seem to be related to the referendum; user is a RT bot.,TrafficDotMy,TrafficDotMy,RT bot. Will serve you the latest traffic news in every 30 minutes :),False,21547,2,['remain'],1,"['am_Techaddict', 'ToyUtopia', 'AceOfHeartsDogs']",3,['https://t.co/WFRwB1Vyw0'],1,[],0,[],0.00,0.00,0.00,0.00


In [11]:
unrelated_ids = unrelated_df['tweet_id'].tolist()
print(len(unrelated_ids))
df = df[~df.index.isin(unrelated_ids)]

137


Confirm that the manually labelled unrelated and spam tweets have been removed (note: seven of the spam tweets were already removed through the above procedures).

In [15]:
df.describe()

Unnamed: 0,days_before_ref,account_age,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count,rt_count,like_count
count,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,752366.0,280135.0
mean,17.6,3.88,4558.66,1526.34,40294.74,3845.54,2.55,0.3,0.35,0.25,4.16,8.95
std,17.11,2.6,74131.93,6353.19,179039.56,10271.62,1.72,0.67,0.49,0.51,47.68,86.34
min,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,1.28,108.0,157.0,1038.0,55.0,1.0,0.0,0.0,0.0,0.0,0.0
50%,13.0,4.23,466.0,528.0,4469.0,503.0,2.0,0.0,0.0,0.0,0.0,1.0
75%,26.0,6.13,1765.0,1585.0,16141.0,2822.0,3.0,0.0,1.0,0.0,1.0,5.0
max,69.0,9.95,22752666.0,1568791.0,6206247.0,390609.0,21.0,10.0,4.0,4.0,14258.0,18107.0


In [25]:
unrelated_users = ['AbsenceDirect', 'ADOTRADIO', 'AlphaPhi_MB', 'AzorTheFrog', 'BrainsFirst2', 'chocolateprofit', 
                   'CMEAA', 'comedy_awkward', 'decidesoftware', 'DMCSoftware', 'dmgmfmcm', 'DogThunder', 'DoolAbdullahi', 
                   'drmohram', 'Falloutshelter7', 'FoolsRecords', 'gokartpartysm', 'halalkitty1', 'Holeseat1', 'hopenothate', 
                   'IBMandInfor', 'ImperiumGG', 'infor_brasil', 'InforLatam', 'ISIL_SATAN', 'ISILEXPOSED', 'JaydaFransenBFF', 
                   'jen_mcmahon', 'kaetorres', 'mandybaggot', 'MaxBateman', 'newsreviews121', 'orang3belas', 'POYNTE', 
                   'QuantumAction13', 'RachidELAIDI', 'Reebok', 'RichardAndrews_', 'Saleem_Hafeezz', 'scifit_org', 
                   'ShafaqatAliK', 'shanaya183', 'ShawlandsPS', 'shawnfranssens', 'shellstinyworld', 'steve4good', 
                   'stuffingcock', 'Tahir_butt_LHR', 'TamanTarik', 'thecamdendaily', 'TheEdgeKville', 'TheMMAMasters',
                   'therealrafaqat', 'travis10brink', 'truckerbooman', 'weapon83', 'zadrojustin', 'Zebadee2']
print(len(unrelated_users))
unrelated_user_tweets = df[df.user_screen_name.isin(unrelated_users)]
unrelated_user_tweets.index

58


Int64Index([720874043721666561, 720885927539712000, 720928047130656768,
            720986283242295297, 720993055990550529, 721021159316320257,
            721032579244285953, 721045755809636352, 721084219783569408,
            721142527852691458,
            ...
            745743498855387136, 745745102493069312, 745746770219700225,
            745747432064098304, 745749465387143169, 745750945317916672,
            745751206916653057, 745751568046243840, 745752743214669824,
            741376064836997120],
           dtype='int64', length=7013)

In [28]:
df = df[~df.index.isin(unrelated_user_tweets.index)]
df.describe()

Unnamed: 0,days_before_ref,account_age,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count,rt_count,like_count
count,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,745353.0,279588.0
mean,17.64,3.91,4592.68,1536.69,40567.06,3878.38,2.56,0.3,0.35,0.25,4.2,8.97
std,17.14,2.59,74469.08,6380.75,179784.98,10313.14,1.72,0.67,0.49,0.51,47.9,86.42
min,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,1.34,111.0,161.0,1046.0,58.0,1.0,0.0,0.0,0.0,0.0,0.0
50%,13.0,4.26,475.0,536.0,4545.0,517.0,2.0,0.0,0.0,0.0,0.0,1.0
75%,27.0,6.15,1782.0,1597.0,16314.0,2864.0,3.0,0.0,1.0,0.0,1.0,5.0
max,69.0,9.95,22752666.0,1568791.0,6206247.0,390609.0,21.0,10.0,4.0,4.0,14258.0,18107.0


Remove all tweets from source Paper.li, as these are automated news digests. All manually identified spam/unrelated tweets that represent news digests are from this source.

In [30]:
news_digests = df[df.source == 'Paper.li']
news_digests.index

Int64Index([720968774497792000, 721330983711150080, 721693374533746688,
            722055748134563841, 722228104345821184, 722610115950604288,
            724752711095357440, 724924062359818241, 726349421932335104,
            727583760720535553,
            ...
            745740940363579392, 745742228056903680, 745742368402452480,
            745742987901149184, 745744666407092224, 745746693656829952,
            745750285641977856, 745752544790609921, 745752846742724608,
            736864275361333248],
           dtype='int64', length=3942)

In [32]:
df = df[~df.index.isin(news_digests.index)]
df.describe()

Unnamed: 0,days_before_ref,account_age,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count,rt_count,like_count
count,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,741411.0,278143.0
mean,17.68,3.9,4600.39,1535.25,40639.45,3890.77,2.56,0.29,0.35,0.25,4.22,9.01
std,17.17,2.59,74662.67,6392.37,180207.83,10332.97,1.72,0.66,0.48,0.51,48.03,86.64
min,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,1.33,110.0,160.0,1038.0,59.0,1.0,0.0,0.0,0.0,0.0,0.0
50%,13.0,4.25,471.0,533.0,4525.0,521.0,2.0,0.0,0.0,0.0,0.0,1.0
75%,27.0,6.13,1778.0,1593.0,16257.0,2882.0,3.0,0.0,1.0,0.0,1.0,5.0
max,69.0,9.95,22752666.0,1568791.0,6206247.0,390609.0,21.0,10.0,4.0,4.0,14258.0,18107.0


Memory footprint and datatypes.

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2598577 entries, 715194383742672897 to 745829881267822592
Data columns (total 30 columns):
created_at               datetime64[ns]
account_age              object
source                   object
truncated                object
in_reply_to_status_id    object
in_reply_to_user_id      object
lang                     object
coordinates              object
tweet_url                object
text                     object
user_name                object
user_screen_name         object
user_description         object
user_verified            bool
user_followers_count     int64
user_friends_count       int64
user_created_at          object
user_location            object
user_time_zone           object
user_statuses_count      int64
user_favourites_count    int64
hashtags                 object
hashtags_count           int64
mentions                 object
mentions_count           int64
urls                     object
urls_count               in

### Modify/Create Temporal Variables and Binary Variables

Add one hour to created_at, as time from the Twitter API is in UTC (+0000).

In [17]:
import datetime as dt
import calendar

df['created_at'] = df['created_at'] + dt.timedelta(hours=1)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,created_at,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
715194383742672897,2016-03-30 16:10:03,3.21,Mini Mantras,False,,,en,,https://www.twitter.com/ProWhites/status/71519...,DIVERSITY is never imposed on non-White areas....,Pro-Whites,ProWhites,Spreading the irrefutable truth about #WhiteGe...,False,1551,832,2013-01-15,White Country,Pacific Time (US & Canada),3056,490,"[whitegenocide, brexit]",2,[],0,[],0,[pic.twitter.com/pvS14tjlm1],1,[photo]
715194868612595714,2016-03-30 16:11:59,0.97,Twitter for Android,False,715185221294010371,3342929843,en,,https://www.twitter.com/WSBet23/status/7151948...,@StrongerIn @standardnews I am sure London has...,23.06.16 UKIndepDay,WSBet23,The establishment and other profiteers of EU w...,False,6,148,2015-04-11,,,212,910,[],0,"[StrongerIn, standardnews]",2,[],0,[],0,[]
715194899453362176,2016-03-30 16:12:06,0.38,Twitter for iPhone,False,715151830020399105,3342929843,en,,https://www.twitter.com/MockLabour/status/7151...,"@StrongerIn no thanks, I'm voting Leave becaus...",BHAFC,MockLabour,"Exposing Labour. the racism, misogyny, homopho...",False,1068,1038,2015-11-14,Fabulous Sussex,,18910,273,[brexit],1,[StrongerIn],1,[],0,[],0,[]
715195086158606336,2016-03-30 16:12:50,7.05,TweetDeck,False,,,en,,https://www.twitter.com/wdjstraw/status/715195...,".@vote_leave bosses support NHS privatisation,...",Will Straw,wdjstraw,Executive Director of Britain Stronger in Euro...,False,28635,1621,2009-03-14,United Kingdom,London,10634,2,"[nhs, strongerin]",2,[vote_leave],1,[],0,[pic.twitter.com/a4twQRDyOh],1,[photo]
715195159491817472,2016-03-30 16:13:08,7.25,Buffer,False,,,en,,https://www.twitter.com/jeremyghez/status/7151...,Is this because they think a Brexit is now lik...,jeremyghez,jeremyghez,Professor of Economics and International Affai...,False,325,888,2008-12-31,Paris,Athens,3072,131,"[bankofamerica, brexit]",2,[],0,[https://t.co/MLrgdRlIWL],1,[],0,[]
715195401549320192,2016-03-30 16:14:06,4.68,TweetDeck,False,,3342929843,en,,https://www.twitter.com/BurgisBullock/status/7...,@StrongerIn speaking at #EU #referendum debate...,Burgis Bullock,BurgisBullock,"Accountants with a personal, partner led servi...",False,423,81,2011-07-27,"Warwickshire, UK",London,3590,53,"[eu, referendum]",2,"[StrongerIn, leamingtonspa]",2,[https://t.co/aXDwMy3TPI],1,[],0,[]
715195410185330688,2016-03-30 16:14:08,2.39,Twitter Web Client,False,,,en,,https://www.twitter.com/TomCalver2/status/7151...,German satirical show ZDF Heute-Show pokes fun...,Tom Calver,TomCalver2,Turtle-neck activist & freelance investigative...,False,1362,1280,2013-11-07,"London, England",,733,198,[brexit],1,[],0,[https://t.co/p9jn8Qzyby],1,[],0,[]
715195487838715905,2016-03-30 16:14:26,4.61,TweetDeck,False,715193232460804096,4904820549,en,,https://www.twitter.com/Langworthy_47/status/7...,@Xians4EU And Turkey has almost as many placem...,Sam Wide §50,Langworthy_47,EU's subsidiarity is more honoured in the brea...,False,171,674,2011-08-21,British Isles and Axarquía,Madrid,5978,70,"[strongerin, brexit, euref]",3,[Xians4EU],1,[],0,[],0,[]
715195543404822530,2016-03-30 16:14:39,3.55,Twitter Web Client,False,,,en,,https://www.twitter.com/GovUCC/status/71519554...,Check out contributions by our own @MaryCMurph...,GovUCC,GovUCC,"News, events and updates from the Department o...",False,389,65,2012-09-11,UCC,Dublin,1671,7,[brexit],1,"[MaryCMurphy, IrishTimes]",2,[https://t.co/PhE0WG8akR],1,[],0,[]
715195614351507463,2016-03-30 16:14:56,2.14,Twitter Web Client,False,715192021766512643,715173859486466049,en,,https://www.twitter.com/foto2021/status/715195...,@Steel4GO @Grassroots_Out EU membership better...,foto2021,foto2021,We have the character of an island nation: ind...,False,139,226,2014-02-08,"Gloucestershire, England",Pacific Time (US & Canada),12812,3954,[],0,"[Steel4GO, Grassroots_Out]",2,[],0,[],0,[]


Add date and time as separate columns, based upon created_at. Insert them as the second and third columns of the dataframe.

In [18]:
df.insert(1, 'date', df['created_at'].dt.date)
df.insert(2, 'time', df['created_at'].dt.time)
df

Unnamed: 0,created_at,date,time,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
715194383742672897,2016-03-30 16:10:03,2016-03-30,16:10:03,3.21,Mini Mantras,False,,,en,,https://www.twitter.com/ProWhites/status/71519...,DIVERSITY is never imposed on non-White areas....,Pro-Whites,ProWhites,Spreading the irrefutable truth about #WhiteGe...,False,1551,832,2013-01-15,White Country,Pacific Time (US & Canada),3056,490,"[whitegenocide, brexit]",2,[],0,[],0,[pic.twitter.com/pvS14tjlm1],1,[photo]
715194868612595714,2016-03-30 16:11:59,2016-03-30,16:11:59,0.97,Twitter for Android,False,715185221294010371,3342929843,en,,https://www.twitter.com/WSBet23/status/7151948...,@StrongerIn @standardnews I am sure London has...,23.06.16 UKIndepDay,WSBet23,The establishment and other profiteers of EU w...,False,6,148,2015-04-11,,,212,910,[],0,"[StrongerIn, standardnews]",2,[],0,[],0,[]
715194899453362176,2016-03-30 16:12:06,2016-03-30,16:12:06,0.38,Twitter for iPhone,False,715151830020399105,3342929843,en,,https://www.twitter.com/MockLabour/status/7151...,"@StrongerIn no thanks, I'm voting Leave becaus...",BHAFC,MockLabour,"Exposing Labour. the racism, misogyny, homopho...",False,1068,1038,2015-11-14,Fabulous Sussex,,18910,273,[brexit],1,[StrongerIn],1,[],0,[],0,[]
715195086158606336,2016-03-30 16:12:50,2016-03-30,16:12:50,7.05,TweetDeck,False,,,en,,https://www.twitter.com/wdjstraw/status/715195...,".@vote_leave bosses support NHS privatisation,...",Will Straw,wdjstraw,Executive Director of Britain Stronger in Euro...,False,28635,1621,2009-03-14,United Kingdom,London,10634,2,"[nhs, strongerin]",2,[vote_leave],1,[],0,[pic.twitter.com/a4twQRDyOh],1,[photo]
715195159491817472,2016-03-30 16:13:08,2016-03-30,16:13:08,7.25,Buffer,False,,,en,,https://www.twitter.com/jeremyghez/status/7151...,Is this because they think a Brexit is now lik...,jeremyghez,jeremyghez,Professor of Economics and International Affai...,False,325,888,2008-12-31,Paris,Athens,3072,131,"[bankofamerica, brexit]",2,[],0,[https://t.co/MLrgdRlIWL],1,[],0,[]
715195401549320192,2016-03-30 16:14:06,2016-03-30,16:14:06,4.68,TweetDeck,False,,3342929843,en,,https://www.twitter.com/BurgisBullock/status/7...,@StrongerIn speaking at #EU #referendum debate...,Burgis Bullock,BurgisBullock,"Accountants with a personal, partner led servi...",False,423,81,2011-07-27,"Warwickshire, UK",London,3590,53,"[eu, referendum]",2,"[StrongerIn, leamingtonspa]",2,[https://t.co/aXDwMy3TPI],1,[],0,[]
715195410185330688,2016-03-30 16:14:08,2016-03-30,16:14:08,2.39,Twitter Web Client,False,,,en,,https://www.twitter.com/TomCalver2/status/7151...,German satirical show ZDF Heute-Show pokes fun...,Tom Calver,TomCalver2,Turtle-neck activist & freelance investigative...,False,1362,1280,2013-11-07,"London, England",,733,198,[brexit],1,[],0,[https://t.co/p9jn8Qzyby],1,[],0,[]
715195487838715905,2016-03-30 16:14:26,2016-03-30,16:14:26,4.61,TweetDeck,False,715193232460804096,4904820549,en,,https://www.twitter.com/Langworthy_47/status/7...,@Xians4EU And Turkey has almost as many placem...,Sam Wide §50,Langworthy_47,EU's subsidiarity is more honoured in the brea...,False,171,674,2011-08-21,British Isles and Axarquía,Madrid,5978,70,"[strongerin, brexit, euref]",3,[Xians4EU],1,[],0,[],0,[]
715195543404822530,2016-03-30 16:14:39,2016-03-30,16:14:39,3.55,Twitter Web Client,False,,,en,,https://www.twitter.com/GovUCC/status/71519554...,Check out contributions by our own @MaryCMurph...,GovUCC,GovUCC,"News, events and updates from the Department o...",False,389,65,2012-09-11,UCC,Dublin,1671,7,[brexit],1,"[MaryCMurphy, IrishTimes]",2,[https://t.co/PhE0WG8akR],1,[],0,[]
715195614351507463,2016-03-30 16:14:56,2016-03-30,16:14:56,2.14,Twitter Web Client,False,715192021766512643,715173859486466049,en,,https://www.twitter.com/foto2021/status/715195...,@Steel4GO @Grassroots_Out EU membership better...,foto2021,foto2021,We have the character of an island nation: ind...,False,139,226,2014-02-08,"Gloucestershire, England",Pacific Time (US & Canada),12812,3954,[],0,"[Steel4GO, Grassroots_Out]",2,[],0,[],0,[]


Filter out tweets from 30 March and 23 June 2016, after converting created_at to a date string.

In [19]:
# df.date.value_counts() # Reveals that 1,076 tweets are from 30 March and 21,326 are from 23 June.

df = df[(df['created_at'].dt.strftime('%Y-%m-%d') != '2016-03-30') & (df['created_at'].dt.strftime('%Y-%m-%d') != '2016-06-23')]
df

Unnamed: 0,created_at,date,time,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,05:02:43,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,05:03:40,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,05:04:26,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,05:04:34,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,05:05:11,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,05:06:48,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,05:06:50,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,05:07:06,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,05:07:12,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,05:07:12,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


Add day of week (text and numeric) as new columns. date.weekday() returns the day of the week as an integer, where Monday is 0 and Sunday is 6.

In [20]:
def day_numeric(datetime):
    return datetime.weekday()

def day(datetime):
    return calendar.day_name[datetime.weekday()]

df.insert(2, 'day_numeric', df.created_at.apply(day_numeric))
df.insert(3, 'day', df.created_at.apply(day))
df

Unnamed: 0,created_at,date,day_numeric,day,time,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


Create hour column, which rounds to the nearest hour. Change format to just display two digits.

In [21]:
df.insert(5, 'hour', df.created_at.dt.round('60min'))
df.hour = df.hour.dt.strftime('%H')
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


Unnamed: 0,created_at,date,day_numeric,day,time,hour,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


Create hour_binned column, which bins hours into four categories.

In [22]:
def bin_hours(time):
    if (time >= dt.time(hour=1)) & (time < dt.time(hour=6)):
        return "Small hours"
    elif (time >= dt.time(hour=6)) & (time < dt.time(hour=12)):
        return "Morning"
    elif (time >= dt.time(hour=12)) & (time < dt.time(hour=18)):
        return "Afternoon"
    else:
        return "Night"

df.insert(6, 'hour_binned', df['time'].apply(bin_hours))
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


Create days_before_ref column by subtracting the tweet date from 23 June 2016. Need to add one because Python rounds down.

In [23]:
def days_before_referendum(date):
    return (dt.datetime(2016, 6, 23) - date).days + 1

df.insert(7, 'days_before_ref', df['created_at'].apply(days_before_referendum))
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


Add binary versions of URL and media type count.

In [24]:
df.insert(34, 'urls_count_binary', np.where(df['urls_count'] > 0, 1, 0))
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,[photo]


In [25]:
df.insert(37, 'media_type_count_binary', np.where(df['media_type_count'] > 0, 1, 0))
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo]
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif]
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[]
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[]
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif]
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo]
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[]
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo]
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[]
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo]


### Extract Media Type into Three Columns (Photo, Video, GIF)

Double check unique media types.

In [26]:
media_items = []

for i in df['media_type'].values:
    for media_item in i:
        media_items.append(media_item)
    
set(media_items)

{'animated_gif', 'photo', 'video'}

In [27]:
df['photo'] = np.where(['photo' in i for i in df['media_type'].values], 1, 0)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo],1
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif],0
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[],0
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[],0
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif],0
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo],1
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[],0
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo],1
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[],0
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo],1


In [28]:
df['video'] = np.where(['video' in i for i in df['media_type'].values], 1, 0)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo],1,0
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif],0,0
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[],0,0
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[],0,0
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif],0,0
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo],1,0
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[],0,0
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo],1,0
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[],0,0
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo],1,0


In [29]:
df['gif'] = np.where(['animated_gif' in i for i in df['media_type'].values], 1, 0)
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,truncated,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video,gif
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,False,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo],1,0,0
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif],0,0,1
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,False,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[],0,0,0
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,False,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[],0,0,0
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,False,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif],0,0,1
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo],1,0,0
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,False,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[],0,0,0
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo],1,0,0
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,False,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[],0,0,0
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,False,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo],1,0,0


### Inspect and Remove Uninteresting Columns

None of the tweets are truncated, so this column can be removed.

In [30]:
df.truncated.unique()

array([False], dtype=object)

In [31]:
del df['truncated']
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,in_reply_to_status_id,in_reply_to_user_id,lang,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video,gif
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,,,en,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo],1,0,0
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,,,en,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif],0,0,1
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,,,en,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[],0,0,0
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,707714714042822657,3241779670,en,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[],0,0,0
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,,,en,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif],0,0,1
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo],1,0,0
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,,,en,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[],0,0,0
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,,,en,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo],1,0,0
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,,,en,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[],0,0,0
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,,,en,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo],1,0,0


All of the tweets are in English, so the lang column can be removed.

In [32]:
df.lang.unique()

array(['en'], dtype=object)

In [33]:
del df['lang']
df

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,in_reply_to_status_id,in_reply_to_user_id,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video,gif
720824650373029889,2016-04-15 05:02:43,2016-04-15,4,Friday,05:02:43,05,Small hours,69,0.22,Twitter Web Client,,,,https://www.twitter.com/SJDelahunty72/status/7...,Referendum Party: Election Video https://t.co...,Steven J Delahunty,SJDelahunty72,@TrentUni Economics & Ex @RoyalAirForce (joine...,False,88,15,2016-01-23,"Nottingham, England",,9093,2312,"[bbcbreakfast, gmb, euref, bbcqt]",4,[YouTube],1,[https://t.co/0yCnmwO7w7],1,1,[pic.twitter.com/CDAM1jMzeZ],1,1,[photo],1,0,0
720824890891218946,2016-04-15 05:03:40,2016-04-15,4,Friday,05:03:40,05,Small hours,69,1.24,Twitter for Android,,,,https://www.twitter.com/BritinQaf/status/72082...,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4543,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/s4SOtpA8ya],1,1,[animated_gif],0,0,1
720825083023925248,2016-04-15 05:04:26,2016-04-15,4,Friday,05:04:26,05,Small hours,69,1.21,Twitter for iPhone,,,,https://www.twitter.com/gabididit/status/72082...,#DemDebate great and all but still waiting for...,Gabrielle Belli,gabididit,M.A. student researching and writing about acr...,False,61,179,2015-01-30,"New York, NY",Eastern Time (US & Canada),172,159,"[demdebate, drones, abortion, globalsouth, ref...",6,[],0,[],0,0,[],0,0,[],0,0,0
720825114116231168,2016-04-15 05:04:34,2016-04-15,4,Friday,05:04:34,05,Small hours,69,6.26,Twitter Web Client,707714714042822657,3241779670,,https://www.twitter.com/DrAlfOldman/status/720...,@BrexitWatch This is scary but not surprising....,Alf Oldman,DrAlfOldman,"Blogs about politics, people & travel. Expert ...",False,4493,136,2010-01-11,"Latchi, Cyprus",Athens,30298,319,[],0,[BrexitWatch],1,[],0,0,[],0,0,[],0,0,0
720825272447021057,2016-04-15 05:05:11,2016-04-15,4,Friday,05:05:11,05,Small hours,69,1.24,Twitter for Android,,,,https://www.twitter.com/BritinQaf/status/72082...,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤,BritinQaf,"Argentina fans who love Brian and Justin , thi...",False,1057,206,2015-01-19,,,4545,2580,"[britin, qaf]",2,[],0,[],0,0,[pic.twitter.com/8Tf8FDVUnh],1,1,[animated_gif],0,0,1
720825677277061120,2016-04-15 05:06:48,2016-04-15,4,Friday,05:06:48,05,Small hours,69,8.47,Tweet Jukebox,,,,https://www.twitter.com/DAILYSQUIB/status/7208...,#VoteLeave and Austerity Will End https://t.co...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3442,5,"[voteleave, eureferendum, euref, remain]",4,[],0,[https://t.co/OiGzHH50f6],1,1,[pic.twitter.com/xRPciOzi6Z],1,1,[photo],1,0,0
720825685481119744,2016-04-15 05:06:50,2016-04-15,4,Friday,05:06:50,05,Small hours,69,8.47,Tweet Jukebox,,,,https://www.twitter.com/DAILYSQUIB/status/7208...,#Brexit #Remain #eureferendum BREXIT: Volcanoe...,Daily Squib News,DAILYSQUIB,CAUTION! The Daily Squib can be hazardous to y...,False,104369,2821,2007-10-26,All major cities worldwide,London,3443,5,"[brexit, remain, eureferendum]",3,[],0,[https://t.co/0pSeyDUbNm],1,1,[],0,0,[],0,0,0
720825751310716930,2016-04-15 05:07:06,2016-04-15,4,Friday,05:07:06,05,Small hours,69,0.06,Tweet Jukebox,,,,https://www.twitter.com/RemainShame/status/720...,Do you want live in EU dictatorship? https://t...,BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,715,20,"[eureferendum, brexit, remain, strongerin, inc...",5,[],0,[https://t.co/A2E0AHAZpj],1,1,[pic.twitter.com/N4fL7ko0ye],1,1,[photo],1,0,0
720825778351378433,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.06,Tweet Jukebox,,,,https://www.twitter.com/RemainShame/status/720...,"All the great things are simple, and many can ...",BrexitNeverSurrender,RemainShame,The Spirit of a Great Man Lives On,False,74,60,2016-03-22,"Kent, England",,716,20,"[brexit, euref]",2,[],0,[],0,0,[],0,0,[],0,0,0
720825780058521600,2016-04-15 05:07:12,2016-04-15,4,Friday,05:07:12,05,Small hours,69,0.03,Tweet Jukebox,,,,https://www.twitter.com/EU_Failed/status/72082...,Selling Britain Off On the Cheap is Good Says ...,EUfailed,EU_Failed,,False,26,20,2016-04-02,,,78,1,"[eu, eureferendum, euref, brexit, voteleave]",5,[],0,[https://t.co/iXoWvXz3ok],1,1,[pic.twitter.com/zgVlzxXdSa],1,1,[photo],1,0,0


Only 6019 tweets contain coordinates (0.23% of entire corpus).

In [34]:
df[df.coordinates.notnull()]

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,in_reply_to_status_id,in_reply_to_user_id,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video,gif
720835033003466754,2016-04-15 05:43:58,2016-04-15,4,Friday,05:43:58,06,Small hours,69,2.84,Nottingham Trends,,,"{'type': 'Point', 'coordinates': [-1.1439, 52....",https://www.twitter.com/trendinaliaNQT/status/...,"On Thursday 14, #Brexit was Trending Topic in ...",Nottingham Trends,trendinaliaNQT,,False,203,2,2013-06-12,"Nottingham, United Kingdom",London,32505,0,"[brexit, trndnl]",2,[],0,[https://t.co/X4n3yXQSxU],1,1,[],0,0,[],0,0,0
720858313030656000,2016-04-15 07:16:29,2016-04-15,4,Friday,07:16:29,07,Morning,69,6.71,Instagram,,,"{'type': 'Point', 'coordinates': [73.71218929,...",https://www.twitter.com/OxIIID/status/72085831...,#Leave all #unnecessary and #go #further @ Man...,Жгучий,OxIIID,Red Hot Russian PepeRRRR!,False,517,1664,2009-07-30,"Russia, Perm",Ekaterinburg,15754,3090,"[leave, unnecessary, go, further]",4,[],0,[https://t.co/PMQXdDzeTM],1,1,[],0,0,[],0,0,0
720860289831317505,2016-04-15 07:24:20,2016-04-15,4,Friday,07:24:20,07,Morning,69,6.80,Path,,,"{'type': 'Point', 'coordinates': [106.82532, -...",https://www.twitter.com/LiescaLaurent/status/7...,Prep #triptoams #gogogo 🇨🇳🇨🇳🇫🇷🇫🇷🇫🇷🇳🇱🇳🇱🇳🇱 (at C...,Liesca Laurent,LiescaLaurent,Dream Big Dream!,False,399,280,2009-06-28,here,Pacific Time (US & Canada),31533,2,"[triptoams, gogogo]",2,[],0,[https://t.co/Ja8A34XQS5],1,1,[],0,0,[],0,0,0
720876879532548099,2016-04-15 08:30:15,2016-04-15,4,Friday,08:30:15,09,Morning,69,1.28,Client Twitter Web,,,"{'type': 'Point', 'coordinates': [2.2303171, 4...",https://www.twitter.com/JVincentimes/status/72...,"Still a while to go of course, but the IG's Br...",JV,JVincentimes,"Jean-Vincent, CEO. Content #Strategy and Conte...",False,867,2704,2015-01-04,"Meudon, Ile-de-France",Paris,2297,3234,[euref],1,[],0,[],0,0,[pic.twitter.com/d4tPhYaeiS],1,1,[photo],1,0,0
720923382674493440,2016-04-15 11:35:03,2016-04-15,4,Friday,11:35:03,12,Morning,69,3.17,Twitter for Android,,,"{'type': 'Point', 'coordinates': [-3.5309504, ...",https://www.twitter.com/DrPaulMiddleton/status...,"""Problematic if Scots voted more enthusiastica...",Paul Middleton,DrPaulMiddleton,Senior Lecturer in Biblical Studies at Chester...,False,1579,1964,2013-02-12,"Chester, England",,11201,502,[eureferendum],1,[],0,[],0,0,[],0,0,[],0,0,0
720925939656499200,2016-04-15 11:45:12,2016-04-15,4,Friday,11:45:12,12,Morning,69,0.76,Twitter for Windows Phone,,3362016513,"{'type': 'Point', 'coordinates': [-0.1406241, ...",https://www.twitter.com/Louisferns7/status/720...,@LeaveEUOfficial Austerity for Greeks no 🏠 for...,Louis Ferns,Louisferns7,Education is the most powerful weapon which yo...,False,14,157,2015-07-12,,,920,17,[],0,[LeaveEUOfficial],1,[],0,0,[pic.twitter.com/lsZav0ke3J],1,1,[photo],1,0,0
720940398588387328,2016-04-15 12:42:40,2016-04-15,4,Friday,12:42:40,13,Afternoon,69,2.88,Twitter for Android,,,"{'type': 'Point', 'coordinates': [-1.6479529, ...",https://www.twitter.com/DavidWoodhead26/status...,"#Brexit The plethora of downright lies,dodgy s...",David Peter Woodhead,DavidWoodhead26,"Scuba Diver, original tweet profile, Prostate-...",False,266,286,2013-05-31,Leeds,,8447,142,"[brexit, strongerin]",2,[],0,[],0,0,[],0,0,[],0,0,0
720941871808266243,2016-04-15 12:48:31,2016-04-15,4,Friday,12:48:31,13,Afternoon,69,3.17,Twitter for Android,,,"{'type': 'Point', 'coordinates': [-4.4157859, ...",https://www.twitter.com/DrPaulMiddleton/status...,"""Problematic if Scots vote more enthusiastical...",Paul Middleton,DrPaulMiddleton,Senior Lecturer in Biblical Studies at Chester...,False,1580,1964,2013-02-12,"Chester, England",,11203,502,[eureferendum],1,[],0,[],0,0,[],0,0,[],0,0,0
720944517831081984,2016-04-15 12:59:02,2016-04-15,4,Friday,12:59:02,13,Afternoon,69,4.89,Twitter for Android,720944220769554433,306432809,"{'type': 'Point', 'coordinates': [0.5445856, 5...",https://www.twitter.com/JoBeavis/status/720944...,@vickyford Looking forward to seeing you in Ch...,JOANNE BEAVIS,JoBeavis,Conservative Cllr Braintree District Council. ...,False,895,576,2011-05-27,"Gosfield, Essex.",,6539,677,"[essex, remain]",2,[vickyford],1,[],0,0,[],0,0,[],0,0,0
720946661246283777,2016-04-15 13:07:33,2016-04-15,4,Friday,13:07:33,13,Afternoon,69,2.88,Twitter for Android,,,"{'type': 'Point', 'coordinates': [-1.6479427, ...",https://www.twitter.com/DavidWoodhead26/status...,#Brexit What an inglorious trio Call me Dave.P...,David Peter Woodhead,DavidWoodhead26,"Scuba Diver, original tweet profile, Prostate-...",False,267,286,2013-05-31,Leeds,,8448,142,[brexit],1,[],0,[],0,0,[],0,0,[],0,0,0


### Modify Data Types

In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2576175 entries, 720824650373029889 to 745753228780896256
Data columns (total 40 columns):
created_at                 datetime64[ns]
date                       object
day_numeric                int64
day                        object
time                       object
hour                       object
hour_binned                object
days_before_ref            int64
account_age                object
source                     object
in_reply_to_status_id      object
in_reply_to_user_id        object
coordinates                object
tweet_url                  object
text                       object
user_name                  object
user_screen_name           object
user_description           object
user_verified              bool
user_followers_count       int64
user_friends_count         int64
user_created_at            object
user_location              object
user_time_zone             object
user_statuses_count        int64
user_fav

Change account_age to a numeric variable.

In [36]:
df.account_age = pd.to_numeric(df.account_age)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


Change day_numeric and newly created binary variables into objects (otherwise they are treated as numeric).

In [48]:
df.day_numeric = df.day_numeric.astype('object')
df.urls_count_binary = df.urls_count_binary.astype('object')
df.media_type_count_binary = df.media_type_count_binary.astype('object')
df.photo = df.photo.astype('object')
df.video = df.video.astype('object')
df.gif = df.gif.astype('object')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


### Examine the Updated Dataframe

In [49]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2576175 entries, 720824650373029889 to 745753228780896256
Data columns (total 40 columns):
created_at                 datetime64[ns]
date                       object
day_numeric                object
day                        object
time                       object
hour                       object
hour_binned                object
days_before_ref            int64
account_age                float64
source                     object
in_reply_to_status_id      object
in_reply_to_user_id        object
coordinates                object
tweet_url                  object
text                       object
user_name                  object
user_screen_name           object
user_description           object
user_verified              bool
user_followers_count       int64
user_friends_count         int64
user_created_at            object
user_location              object
user_time_zone             object
user_statuses_count        int64
user_f

In [50]:
df.describe()

Unnamed: 0,days_before_ref,account_age,user_followers_count,user_friends_count,user_statuses_count,user_favourites_count,hashtags_count,mentions_count,urls_count,media_type_count
count,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0,2576175.0
mean,20.21,3.97,8577.35,1428.66,30185.08,4285.49,1.7,0.91,0.3,0.18
std,18.21,2.51,195168.89,5634.09,136703.75,13198.77,1.54,1.2,0.47,0.44
min,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,5.0,1.64,113.0,161.0,1249.0,80.0,1.0,0.0,0.0,0.0
50%,15.0,4.26,456.0,516.0,5147.0,588.0,1.0,0.0,0.0,0.0
75%,31.0,6.19,1612.0,1427.0,18127.5,2945.0,2.0,1.0,1.0,0.0
max,69.0,10.07,62517720.0,1570110.0,6210243.0,1229284.0,21.0,12.0,5.0,5.0


In [40]:
df.sample(10)

Unnamed: 0,created_at,date,day_numeric,day,time,hour,hour_binned,days_before_ref,account_age,source,in_reply_to_status_id,in_reply_to_user_id,coordinates,tweet_url,text,user_name,user_screen_name,user_description,user_verified,user_followers_count,user_friends_count,user_created_at,user_location,user_time_zone,user_statuses_count,user_favourites_count,hashtags,hashtags_count,mentions,mentions_count,urls,urls_count,urls_count_binary,media_urls,media_type_count,media_type_count_binary,media_type,photo,video,gif
745586892956176384,2016-06-22 12:59:02,2016-06-22,2,Wednesday,12:59:02,13,Afternoon,1,6.6,Twitter for iPhone,,,,https://www.twitter.com/OllieMcKendrick/status...,"oh good, another lengthy paragraph about #Brex...",Ollie MN,OllieMcKendrick,vaguely human/parasite type thing snapchat: om...,True,37344,1948,2009-11-16,"Edinburgh, Scotland",London,6444,7928,[brexit],1,[],0,[],0,0,[],0,0,[],0,0,0
731047942157705216,2016-05-13 10:06:26,2016-05-13,4,Friday,10:06:26,10,Morning,41,6.52,SkyNews Alerts - Breaking,,,,https://www.twitter.com/SkyNewsBreak/status/73...,International Monetary Fund says UK vote to le...,Sky News Newsdesk,SkyNewsBreak,"The latest breaking news, direct from the Sky ...",True,2107269,3,2009-11-04,"London, UK",London,34386,0,[euref],1,[],0,[],0,0,[],0,0,[],0,0,0
740447783841320960,2016-06-08 08:38:03,2016-06-08,2,Wednesday,08:38:03,9,Morning,15,0.99,Twitter Web Client,,,,https://www.twitter.com/Fooky_Slam/status/7404...,Islam seeping in through every crack. #Brexit ...,Fooky Slam,Fooky_Slam,--- A Stitch In Time; Saves Nine --- Gnostic a...,False,640,1166,2015-06-11,Eboracum,Pacific Time (US & Canada),1506,1911,[brexit],1,[],0,[https://t.co/6VrQJBjy0s],1,1,[],0,0,[],0,0,0
742455803169406976,2016-06-13 21:37:12,2016-06-13,0,Monday,21:37:12,22,Night,10,7.65,Twitter Web Client,,,,https://www.twitter.com/SusannaFlood/status/74...,"Come on UK, wake up, this isn't a game: #EUref...",SusannaFlood,SusannaFlood,Global media head at Amnesty International cur...,False,1837,704,2008-10-20,London,London,1004,68,"[eureferendum, strongerin]",2,[],0,[https://t.co/xxO9Hik850],1,1,[],0,0,[],0,0,0
741225223043883008,2016-06-10 12:07:19,2016-06-10,4,Friday,12:07:19,12,Afternoon,13,4.79,Twitter for iPad,7.411968276267662e+17,79173926.0,,https://www.twitter.com/ilsonsteve/status/7412...,@labourpress nothing to do with #brexit we alr...,stephen draper,ilsonsteve,"married with one daughter, follow dcfc and ilk...",False,147,284,2011-08-27,ilkeston,Hawaii,2255,474,[brexit],1,[labourpress],1,[],0,0,[],0,0,[],0,0,0
722468006891270144,2016-04-19 17:52:50,2016-04-19,1,Tuesday,17:52:50,18,Afternoon,65,2.03,Twitter for Android,7.224674298221117e+17,2436022038.0,,https://www.twitter.com/BIG_JFT/status/7224680...,@Nigel_Farage @LeaveEUOfficial @Grassroots_Out...,THE ANESTHETIST,BIG_JFT,Just living life to the MAX everyday. Never gi...,False,100,246,2014-04-09,,,2257,1746,[],0,"[Nigel_Farage, LeaveEUOfficial, Grassroots_Out]",3,[],0,0,[],0,0,[],0,0,0
744123599771021312,2016-06-18 12:04:25,2016-06-18,5,Saturday,12:04:25,12,Afternoon,5,2.17,Twitter Web Client,7.441211034639114e+17,1541810790.0,,https://www.twitter.com/BionicRaspberry/status...,@piecrust33 @AFrankWords metropolitan middle c...,BionicRaspberry,BionicRaspberry,"Author, Academic, Designer, Cultural Libertari...",False,266,538,2014-04-16,Jersey,London,2994,1128,[brexit],1,"[piecrust33, AFrankWords]",2,[],0,0,[],0,0,[],0,0,0
742038534513692672,2016-06-12 17:59:07,2016-06-12,6,Sunday,17:59:07,18,Afternoon,11,3.43,Twitter for Android,7.420381581084262e+17,731038483.0,,https://www.twitter.com/wride_nicholas/status/...,@MrRae1000 @Bakehouse2016 @fartelengelbert mor...,Nicholas wride,wride_nicholas,the EU is a 1950s solution to a 1930s problem....,False,191,549,2013-01-06,"Wandsworth, London",,7507,2061,[brexit],1,"[MrRae1000, Bakehouse2016, fartelengelbert]",3,[],0,0,[],0,0,[],0,0,0
740276170822815745,2016-06-07 21:16:07,2016-06-07,1,Tuesday,21:16:07,21,Night,16,7.37,Twitter for Android,,,,https://www.twitter.com/atticvs/status/7402761...,So that's it then...Cologne NEVER happened...i...,atticvs,atticvs,Fiercely English & proud..Paleoconservative......,False,2774,2797,2009-01-25,"Norwich, Norfolk.",,45158,101,"[itv, leaveeu]",2,[],0,[],0,0,[],0,0,[],0,0,0
735081350693326853,2016-05-24 13:13:45,2016-05-24,1,Tuesday,13:13:45,13,Afternoon,30,0.23,Twitter Web Client,,,,https://www.twitter.com/mrfishyfingers/status/...,#VoteLeave campaign is garbage. High profile f...,Simon Steele,mrfishyfingers,Playboy. Trillionaire philanthropist. Porn Sta...,False,243,589,2016-02-28,Not of this place.,,2945,1345,[voteleave],1,[],0,[],0,0,[],0,0,[],0,0,0


Look up information about a specific tweet.

In [41]:
tweet_id = 745587122506186752

print(df.loc[tweet_id].hour)
print(df.loc[tweet_id].hour_binned)
print(df.loc[tweet_id].day)
print(df.loc[tweet_id].days_before_ref)
print(df.loc[tweet_id].urls_count)
print(df.loc[tweet_id].urls_count_binary)
print(df.loc[tweet_id].media_type_count)
print(df.loc[tweet_id].media_type_count_binary)
df.loc[tweet_id].created_at

13
Afternoon
Wednesday
1
0
0
0
0


Timestamp('2016-06-22 12:59:57')

Filter the dataframe by specific columns and display the first five rows.

In [42]:
text_username_df = df.filter(items=['text','user_name'])
#text_username_df = df[['text','user_name']] # Same result as above.
text_username_df.head()

Unnamed: 0,text,user_name
720824650373029889,Referendum Party: Election Video https://t.co...,Steven J Delahunty
720824890891218946,The best couple 4Ever 💟 💟 #Britin #QAF https:/...,Brian And Justin ❤
720825083023925248,#DemDebate great and all but still waiting for...,Gabrielle Belli
720825114116231168,@BrexitWatch This is scary but not surprising....,Alf Oldman
720825272447021057,AND WATCHING YOU WALK AWAY 💟 💟 💟 ❤ 💕 #britin #...,Brian And Justin ❤


Calculate frequency distributions for specific columns.

In [None]:
df['day'].value_counts(ascending=True, dropna=True) # Ascending order; include missing values.
df['day'].value_counts()
df['days_before_ref'].value_counts()
df['hour_binned'].value_counts()
df['hashtags_count'].value_counts()

Count specific hashtags, mentions, or media types (given that these column values are lists). Can add this as a column.

In [None]:
[i.count('brexit') for i in df['hashtags'].values]
[i.count('photo') for i in df['media_type'].values]

#df['brexit_count'] = [i.count('brexit') for i in df['hashtags'].values]

### Save Results in Another H5 File

In [51]:
def save_h5(obj, filename):
    store = pd.HDFStore(filename)
    store["tweets"] = obj
    store.close()

save_h5(df, "tweets_processed.h5")

your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->block4_values] [items->['date', 'day_numeric', 'day', 'time', 'hour', 'hour_binned', 'source', 'in_reply_to_status_id', 'in_reply_to_user_id', 'coordinates', 'tweet_url', 'text', 'user_name', 'user_screen_name', 'user_description', 'user_created_at', 'user_location', 'user_time_zone', 'hashtags', 'mentions', 'urls', 'urls_count_binary', 'media_urls', 'media_type_count_binary', 'media_type', 'photo', 'video', 'gif']]

