# METADATA
Metadata for this project may be downloaded in CSV format. There are two different downloads available; the first, “Download details of ad airings on TV,” provides details about airings of ads on TV, giving information about when and where they aired. The second, “Download list of unique ads archive,” provides information on every ad archived by the project, whether or not that ad has been captured as airing on television. (For example, the ad may appear on a social media channel, such as youtube or snapchat.)
 
Ad airings on TV metadata
wp_identifier = A unique numeric id for each ad identified, assigned by the Political TV Ad Archive project. Type: number.

network = TV channel on which the ad aired. Type: text.

location = Name of market area covered by broadcast.  Type: text.

program = Name of TV program in which ad aired. Type: text.

program type = “News” or “not news,” representing type of TV programming. Type: text.

start_time = Date/time ad aired, start. Note: these are  UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.  Type: date/time.

end_time = Date/time ad aired, end. Note: these are  UTC times, or “coordinated universal time.”  Converting to local times requires consulting local time zones with special attention to seasonal time changes. Type: date/time.

archive_id = A unique alphanumeric id for each ad identified, corresponding with id used on PoliticalAdArchive.org. To see ad on website, add prefix: “http://politicaladarchive.org/ad/” to archive_id and a forward slash at the end and paste resulting url into browser. For example,  polad_berniesanders_f0chv becomes http://politicaladarchive.org/ad/polad_berniesanders_f0chv/ Type: text.

embed_url = Url for embedding ad. For embed code, use this id within this sample embed code: <iframe src=”https://archive.org/embed/PolAd_MarcoRubio_0py1v” width=”640″ height=”480″ frameborder=”0″ webkitallowfullscreen=”true” mozallowfullscreen=”true” allowfullscreen></iframe>  Type: text.

sponsor  = Ad sponsor, as it appears in the ad. Type: text.

sponsor type = Candidate committee, Super PAC, 501(c), 527 etc., source: the Center for Responsive Politics.) Type: text.

sponsor_affiliation = If applicable, the candidate associated with a particular sponsor. For example, Conservative Solutions PAC is a super PAC associated with Marco Rubio. Source: the Center for Responsive Politics. Type: text.

sponsor_affiliation_type = If sponsor is associated with a particular candidate, whether it supports or opposes that candidate. For example, Conservative Solutions PAC is a super PAC that supports Marco Rubio. Source: the Center for Responsive Politics.  Type: text.

race = Pres, Senate, or House. The federal race the ad is targeted toward. For Senate and House, the state is also indicated, along with the district.  Source: the Center for Responsive Politics. Type=text.

cycle = Election cycle, i.e. 2016 = the 2015-2016 elections. Source: the Center for Responsive Politics. Type=text.
subject = Subjects covered in ad; subject index from PolitiFact, input by Internet Archive researchers. Type: text.
candidate = Candidate(s) named in ad; input by Internet Archive researchers. Type: text.

type = Campaign ad, issue ad, unknown, input by Internet Archive researchers. Most ads in this archive are “campaign ads”–ads that are targeted toward particular candidates. However, some ads are “issue ads,” ads that cover “a national legislative issue of public importance.” Federal Communications Commission (FCC) rules require that TV stations disclose ad buy contracts for both types of ads; therefore the Political TV Ad Archive includes such ads in this collection. Example: this ad on Puerto Rico debt. Type: text.

message = Pro, con, mixed; input by Internet Archive researchers. Pro = ad mentions one or more candidates in positive way, no negative message about any candidate (Important: this applies only to candidates running in current election and race). Example: this ad sponsored by Donald Trump’s candidate committee mentions only him and does so in a positive way. Con = ad mentions one or more candidates in negative way. Example: this ad sponsored by the Right to Rise super PAC, which supports Jeb Bush, mentions only Marco Rubio and in a negative fashion. It includes references to “liberal Democrats” but none are candidates in the 2016 presidential race. Mixed: Any ad that mentions more than one candidate in particular race, with significant positive content about one or more candidates and negative content about one or more candidates. Example: this ad, sponsored by the Right to Rise Super PAC, criticizes Rubio but praises Jeb Bush.  Type: text.

air_count = Total number of times this particular ad has aired, as captured by the Internet Archive in key primary states. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also, while this is a national total, it pertains only to the states the Internet Archive is tracking. A list of these states can be seen above. Type: number.

market_count = Total number of markets where this ad has aired, as captured by the Internet Archive. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also: this count refers only to markets tracked by Internet Archive. Type: number.

date_created = Date the Political TV Ad Archive counted first airing of this particular ad. Filter here to see what new ads the project has documented since viewing site previously. In other words, if  data are filtered for “February 10, 2016,” the viewer will see a list of ads first counted as airing on TV on that date; in this way, researchers can see which ads are new to the archive. Note: no dates exist before February 9, 2016, not because ads weren’t counted before then, but simply because that is the date this feature was added to the data. Type: date.
 
Unique ads archived metadata
In addition to detailing specific airings of ads on television, the Political TV Ad Archive also archives ads that are not showing up on television stations we are monitoring. There could be a number of reasons for this: for example, perhaps the ad is targeted for an online audience only; or perhaps the ad is airing in states or cities that the project is not monitoring. Another possibility is that the ad is airing on stations not captured by the project, such as local cable programs. Finally, perhaps an ad has not yet aired in key primary states that the project is tracking but will air in the future.
The metadata in this download are similar to those in the ad airings metadata download. However, there are a few additional elements:
reference_count = number of fact or source checks from our partner organizations for this particular ad. For example, the claim that Donald Trump once supported impeaching former President George W. Bush, contained in this ad sponsored by Our Principles PAC, a super PAC opposing Trump, was fact checked by PolitiFact, which rated it as “True.” The PolitiFact story is embedded on the Political TV Ad Archive page displaying the ad. Type: number.
date_ingested = date this ad was added to the Political TV Ad Archive. Type: text.
transcript = Where available, the transcript for the ad. Type: text.

In [28]:
import pandas as pd
import numpy as np
df = pd.read_csv('project_2.csv', parse_dates=['start_time','end_time','date_created'])
df


Unnamed: 0,wp_identifier,network,location,program,program_type,start_time,end_time,archive_id,embed_url,sponsor,...,sponsor_affiliation,sponsor_affiliation_type,race,cycle,subject,candidate,type,message,market_count,date_created
0,232,FBC,"San Francisco-Oakland-San Jose, CA",Making Money With Charles Payne,news,2015-12-19 23:29:08,2015-12-19 23:29:38,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
1,232,MSNBCW,"San Francisco-Oakland-San Jose, CA",MSNBC Live With Kate Snow,news,2015-12-14 20:09:20,2015-12-14 20:09:50,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
2,232,WLTX,"Columbia, SC",Hawaii Five-0,not news,2016-01-02 03:37:10,2016-01-02 03:37:40,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
3,232,FOXNEWSW,"San Francisco-Oakland-San Jose, CA",The Five,news,2016-01-06 22:43:32,2016-01-06 22:44:02,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
4,232,FOXNEWSW,"San Francisco-Oakland-San Jose, CA",Fox and Friends First,news,2016-01-04 10:39:25,2016-01-04 10:39:55,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
5,232,FOXNEWSW,"San Francisco-Oakland-San Jose, CA",Shepard Smith Reporting,news,2016-01-06 20:17:55,2016-01-06 20:18:25,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
6,232,KPTH,"Sioux City, Iowa",The OT,not news,2016-01-04 00:24:53,2016-01-04 00:25:23,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
7,232,WLTX,"Columbia, SC",News 19 7pm Year Ender,news,2016-01-02 00:23:01,2016-01-02 00:23:31,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
8,232,FOXNEWSW,"San Francisco-Oakland-San Jose, CA",The Man Who Killed Usama Bin Laden,not news,2016-01-02 03:16:42,2016-01-02 03:17:12,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57
9,232,FBC,"San Francisco-Oakland-San Jose, CA",Mornings With Maria Bartiromo,news,2016-01-05 13:27:30,2016-01-05 13:27:59,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,2016-07-06 00:41:57


In [None]:
df.head(20)

In [None]:
df.describe()
df.values
df.index

In [3]:
sponsortype = df['sponsor_type'].unique()
network = df['network'].unique()
location = df['location'].unique()
program = df['program'].unique
programtype = df['program_type'].unique()
sponsor = df['sponsor'].unique()
sponsoraff = df['sponsor_affiliation'].unique()
sponsorafftype = df['sponsor_affiliation_type'].unique()
race = df['race'].unique()
cycle = df['cycle'].unique()
subject = df['subject'].unique()
candidate = df['candidate'].unique()
message = df['message'].unique()
start_time = df['start_time'].unique()
end_time = df['end_time'].unique()
market_count = df['market_count'].unique()
date_created = df['date_created'].unique()
embed_url = df['embed_url'].unique()
ttype = df['type'].unique()
market_count = df['market_count'].unique()
print(sponsortype)
np.in1d(['Texas'], location)
#use to check if item in dataset


['Candidate Committee' 'Super PAC' 'Unknown' 'Non Profit (501c)' 'PAC'
 'Party Committee' 'Multiple Types']


array([False], dtype=bool)

In [4]:
dedupe = df.drop_duplicates()
dedupe.describe()
df.describe()
#no dupes

Unnamed: 0,wp_identifier,cycle,market_count
count,181682.0,181682.0,181682.0
mean,1353.397645,2016.0,7.474808
std,997.988354,0.0,6.271119
min,232.0,2016.0,1.0
25%,430.0,2016.0,3.0
50%,1033.0,2016.0,5.0
75%,1961.0,2016.0,11.0
max,4554.0,2016.0,23.0


In [5]:
#data_dict_pol_ad = pd.read_clipboard()
data_dict_pol_ad = pd.DataFrame(index = np.arange(22), columns = ['Column_Title', 'Definition', 'Type'])
data_dict_pol_ad['Column_Title']= ['Ad airings on TV metadata wp_identifier','network','location','program','program type','start_time','end_time', 'archive_id','embed_url','sponsor','sponsor type','sponsor_affiliation','sponsor_affiliation_type' ,'race','cycle','type','subject', 'candidate','message' ,'air_count','market_count', 'date_created']

In [6]:
data_dict_pol_ad['Type'] = ['number','text','text', 'text', 'text', 'date/time', 'date/time', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'NA', 'NA', 'text', 'number', 'number', 'date']

In [7]:
#use regex if too largeType: date/time. end_time = Type: date/time. archive_id = Type: text. embed_url = Type=text. cycle = Type=text. subject = Type: text. type =Type: text. message =Type: text. air_count = Type: number. market_count =Type: number. date_created =, Type: date."'
#data_dict_pol_ad['Type'] = ['number', 'network','text','location','text'. program =, Type: text. program type =, Type: text. start_time =Type: text. sponsor = Type: text. sponsor type =Type: text. sponsor_affiliation =Type: text. sponsor_affiliation_type =Type: text. race =Type: text. candidate = 
data_dict_pol_ad['Definition'] = ['A unique numeric id for each ad identified, assigned by the Political TV Ad Archive project.',
                                  'TV channel on which the ad aired.', 
                                  'Name of market area covered by broadcast.',
                                  'Name of TV program in which ad aired.',
                                  '“News” or “not news,” representing type of TV programming.', 
                                  'Date/time ad aired, start. Note: these are UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.',
                                  'Date/time ad aired, end. Note: these are UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.',
                                  'A unique alphanumeric id for each ad identified, corresponding with id used on PoliticalAdArchive.org. To see ad on website, add prefix: “http://politicaladarchive.org/ad/” to archive_id and a forward slash at the end and paste resulting url into browser. For example, polad_berniesanders_f0chv becomes "http://politicaladarchive.org/ad/polad_berniesanders_f0chv/" ',
                                  'Url for embedding ad. For embed code, use this id within this sample embed code:', 
                                  'Ad sponsor, as it appears in the ad.', 
                                  'Candidate committee, Super PAC, 501(c), 527 etc., source: the Center for Responsive Politics.)', 
                                  'If applicable, the candidate associated with a particular sponsor. For example, Conservative Solutions PAC is a super PAC associated with Marco Rubio. Source: the Center for Responsive Politics.',
                                  'If sponsor is associated with a particular candidate, whether it supports or opposes that candidate. For example, Conservative Solutions PAC is a super PAC that supports Marco Rubio. Source: the Center for Responsive Politics.', 
                                  'Pres, Senate, or House. The federal race the ad is targeted toward. For Senate and House, the state is also indicated, along with the district. Source: the Center for Responsive Politics.',
                                  'Election cycle, i.e. 2016 = the 2015-2016 elections. Source: the Center for Responsive Politics.Subjects covered in ad; subject index from PolitiFact, input by Internet Archive researchers. Candidate(s) named in ad; input by Internet Archive researchers.', 
                                  'Campaign ad, issue ad, unknown, input by Internet Archive researchers. Most ads in this archive are “campaign ads”–ads that are targeted toward particular candidates. However, some ads are “issue ads,” ads that cover “a national legislative issue of public importance.” Federal Communications Commission (FCC) rules require that TV stations disclose ad buy contracts for both types of ads; therefore the Political TV Ad Archive includes such ads in this collection. Example: this ad on Puerto Rico debt.', 
                                  'NA',
                                  'NA',
                                  'Pro, con, mixed; input by Internet Archive researchers. Pro = ad mentions one or more candidates in positive way, no negative message about any candidate (Important: this applies only to candidates running in current election and race). Example: this ad sponsored by Donald Trump’s candidate committee mentions only him and does so in a positive way. Con = ad mentions one or more candidates in negative way. Example: this ad sponsored by the Right to Rise super PAC, which supports Jeb Bush, mentions only Marco Rubio and in a negative fashion. It includes references to “liberal Democrats” but none are candidates in the 2016 presidential race. Mixed: Any ad that mentions more than one candidate in particular race, with significant positive content about one or more candidates and negative content about one or more candidates. Example: this ad, sponsored by the Right to Rise Super PAC, criticizes Rubio but praises Jeb Bush.', 
                                  'Total number of times this particular ad has aired, as captured by the Internet Archive in key primary states. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also, while this is a national total, it pertains only to the states the Internet Archive is tracking. A list of these states can be seen above.',
                                  'Total number of markets where this ad has aired, as captured by the Internet Archive. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also: this count refers only to markets tracked by Internet Archive.',
                                  'Date the Political TV Ad Archive counted first airing of this particular ad. Filter here to see what new ads the project has documented since viewing site previously. In other words, if data are filtered for “February 10, 2016,” the viewer will see a list of ads first counted as airing on TV on that date; in this way, researchers can see which ads are new to the archive. Note: no dates exist before February 9, 2016, not because ads weren’t counted before then, but simply because that is the date this feature was added to the data.']


In [8]:
data_dict_pol_ad['Unique_Values'] = ['ID', network, location, program, programtype, start_time, end_time, 'ID', embed_url, sponsor, sponsortype, sponsoraff, sponsorafftype, race, cycle, ttype, subject, candidate, message, '?', market_count, date_created]
#add additional column
#how to groupby and aggregate data properly 


In [None]:
data_dict_pol_ad

In [None]:
df.ix[4]

In [None]:
# review unknown count for message
#delete out all columns with senate and house data to create a dataset that falls within parameters del df[]
#create dictionary DataFrame(dict) for other data dictionary
#reindex with additional added columns reindex([list], fill_value = )
df[df['market_count']>10]
#add 5 column dataset by df.add(df2, fill = NaN)
#pandas is very intuitive and will merge similar to tableau

In [None]:
import pandas.io.data #to import images and videos of trump and hilary

In [None]:
#rollup by state, rollup by republican party pres candidate and demo party pres candidate
pctcmarket_count = df['market_count'].pct_change()
#import datetime and determine total number of days from start date and then find corr

In [None]:
pctcmarket_count.corr
df.corr

In [None]:
%matplotlib inline
df.plot()
df.fillna(0, inplace = True)
df.sortlevel(1)

In [None]:
import pandas as pd
df2 = pd.read_csv('county_facts.csv')

In [None]:
df2

In [None]:
df2.describe()

In [None]:
data_dict_county_df2 = pd.read_csv('county_facts_dictionary.csv')

In [None]:
data_dict_county_df2

In [None]:
#will need to aggregate these two datasets and see if need to aggregate primary results
df3 = pd.read_csv('primary_results.csv')
df3

Each row contains the votes and fraction of votes that a candidate received in a given county's primary. sample

state: state where the primary or caucus was held
state_abbreviation: two letter state abbreviation
county: county where the results come from
fips: FIPS county code
party: Democrat or Republican
candidate: name of the candidate
votes: number of votes the candidate received in the corresponding state and county (may be missing)
fraction_votes: fraction of votes the president received in the corresponding state, county, and primary

In [None]:
data_dict_primary_d3 = pd.read_csv('primary_election_datadict.csv')

In [None]:
data_dict_primary_d3

In [None]:
df

In [29]:
#remove non primary and non support ads
df=df[df.sponsor_affiliation_type =="supports"]
df=df[df.start_time<'2016-06-07']
df

Unnamed: 0,wp_identifier,network,location,program,program_type,start_time,end_time,archive_id,embed_url,sponsor,...,sponsor_affiliation,sponsor_affiliation_type,race,cycle,subject,candidate,type,message,market_count,date_created
2084,238,WMUR,"Boston, MA/Manchester, NH",Good Morning America,news,2015-12-23 12:48:24,2015-12-23 12:48:54,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2085,238,KPTH,"Sioux City, Iowa",Siouxland News at 9 on FOX 44,news,2015-12-17 03:16:08,2015-12-17 03:16:38,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2086,238,WHO,"Des Moines-Ames, Iowa",Channel 13 News at 10,news,2015-12-25 04:29:18,2015-12-25 04:29:48,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2087,238,KMEG,"Sioux City, Iowa",Wheel of Fortune,not news,2015-12-20 00:53:09,2015-12-20 00:53:39,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2088,238,KTIV,"Sioux City, Iowa",News 4 at Noon,news,2015-12-22 18:17:11,2015-12-22 18:17:41,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2089,238,WBZ,"Boston, MA/Manchester, NH",The Late Show With Stephen Colbert,news,2015-12-18 05:03:35,2015-12-18 05:04:05,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2090,238,KCRG,"Ceder Rapids-Waterloo-Iowa City-Dublin, Iowa",Jimmy Kimmel Live,news,2015-12-18 04:59:05,2015-12-18 04:59:35,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2091,238,WBZ,"Boston, MA/Manchester, NH",The Late Late Show With James Corden,not news,2015-12-16 06:16:27,2015-12-16 06:16:57,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2092,238,WMUR,"Boston, MA/Manchester, NH",Closeup,news,2015-12-20 15:23:11,2015-12-20 15:23:41,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56
2093,238,KCAU,"Sioux City, Iowa",ABC9 News Midday,news,2015-12-29 17:51:55,2015-12-29 17:52:25,PolAd_MarcoRubio_bpcj7,https://archive.org/embed/PolAd_MarcoRubio_bpcj7,Conservative Solutions PAC,...,Marco Rubio,supports,PRES,2016,Foreign Policy,Marco Rubio,campaign,pro,5,2016-07-05 23:41:56


In [30]:
#Split out state from city
def splitcomma(item):
    return item.split(", ")[1]

df['state']=df['location'].apply(splitcomma)
df['state'].unique()

array(['MA/Manchester', 'Iowa', 'CA', 'CO', 'FL', 'NV', 'SC',
       'SC/Asheville-Anderson', 'NY', 'PA', 'NC', 'Ohio', 'OH',
       'DC/Hagerstown', ' NC', 'VA'], dtype=object)

In [32]:
#Fix State Abbreviations Pt 1
def fixabbrev(item):
    if item=="Iowa":
        return "IA"
    elif item=="Ohio":
        return "OH"
    else:
        return item

df['state']=df['state'].apply(fixabbrev)
df['state'].unique()



array(['MA/Manchester', 'IA', 'CA', 'CO', 'FL', 'NV', 'SC',
       'SC/Asheville-Anderson', 'NY', 'PA', 'NC', 'OH', 'DC/Hagerstown',
       ' NC', 'VA'], dtype=object)

In [34]:
#Fix State Abbreviations Pt 2
import datetime

new_hampshire=datetime.date(2016,2,9)
south_carolina=datetime.date(2016,2,27)


df['state'][(df['state']=='MA/Manchester') & (df['start_time']<=new_hampshire)]='NH'
df['state'][(df['state']=='MA/Manchester') & (df['start_time']>new_hampshire)]='MA'

df['state'][(df['state']=='SC/Asheville-Anderson') & (df['start_time']<=south_carolina)]='SC'
df['state'][(df['state']=='SC/Asheville-Anderson') & (df['start_time']>south_carolina)]='NC'

df['state'][(df['state']=='DC/Hagerstown')]='VA'

df['state'].unique()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


array(['NH', 'IA', 'CA', 'CO', 'FL', 'NV', 'SC', 'NY', 'PA', 'MA', 'NC',
       'OH', 'VA', ' NC'], dtype=object)

2016-02-09
