# METADATA
Metadata for this project may be downloaded in CSV format. There are two different downloads available; the first, “Download details of ad airings on TV,” provides details about airings of ads on TV, giving information about when and where they aired. The second, “Download list of unique ads archive,” provides information on every ad archived by the project, whether or not that ad has been captured as airing on television. (For example, the ad may appear on a social media channel, such as youtube or snapchat.)
 
Ad airings on TV metadata
wp_identifier = A unique numeric id for each ad identified, assigned by the Political TV Ad Archive project. Type: number.

network = TV channel on which the ad aired. Type: text.

location = Name of market area covered by broadcast.  Type: text.

program = Name of TV program in which ad aired. Type: text.

program type = “News” or “not news,” representing type of TV programming. Type: text.

start_time = Date/time ad aired, start. Note: these are  UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.  Type: date/time.

end_time = Date/time ad aired, end. Note: these are  UTC times, or “coordinated universal time.”  Converting to local times requires consulting local time zones with special attention to seasonal time changes. Type: date/time.

archive_id = A unique alphanumeric id for each ad identified, corresponding with id used on PoliticalAdArchive.org. To see ad on website, add prefix: “http://politicaladarchive.org/ad/” to archive_id and a forward slash at the end and paste resulting url into browser. For example,  polad_berniesanders_f0chv becomes http://politicaladarchive.org/ad/polad_berniesanders_f0chv/ Type: text.

embed_url = Url for embedding ad. For embed code, use this id within this sample embed code: <iframe src=”https://archive.org/embed/PolAd_MarcoRubio_0py1v” width=”640″ height=”480″ frameborder=”0″ webkitallowfullscreen=”true” mozallowfullscreen=”true” allowfullscreen></iframe>  Type: text.

sponsor  = Ad sponsor, as it appears in the ad. Type: text.

sponsor type = Candidate committee, Super PAC, 501(c), 527 etc., source: the Center for Responsive Politics.) Type: text.

sponsor_affiliation = If applicable, the candidate associated with a particular sponsor. For example, Conservative Solutions PAC is a super PAC associated with Marco Rubio. Source: the Center for Responsive Politics. Type: text.

sponsor_affiliation_type = If sponsor is associated with a particular candidate, whether it supports or opposes that candidate. For example, Conservative Solutions PAC is a super PAC that supports Marco Rubio. Source: the Center for Responsive Politics.  Type: text.

race = Pres, Senate, or House. The federal race the ad is targeted toward. For Senate and House, the state is also indicated, along with the district.  Source: the Center for Responsive Politics. Type=text.

cycle = Election cycle, i.e. 2016 = the 2015-2016 elections. Source: the Center for Responsive Politics. Type=text.
subject = Subjects covered in ad; subject index from PolitiFact, input by Internet Archive researchers. Type: text.
candidate = Candidate(s) named in ad; input by Internet Archive researchers. Type: text.

type = Campaign ad, issue ad, unknown, input by Internet Archive researchers. Most ads in this archive are “campaign ads”–ads that are targeted toward particular candidates. However, some ads are “issue ads,” ads that cover “a national legislative issue of public importance.” Federal Communications Commission (FCC) rules require that TV stations disclose ad buy contracts for both types of ads; therefore the Political TV Ad Archive includes such ads in this collection. Example: this ad on Puerto Rico debt. Type: text.

message = Pro, con, mixed; input by Internet Archive researchers. Pro = ad mentions one or more candidates in positive way, no negative message about any candidate (Important: this applies only to candidates running in current election and race). Example: this ad sponsored by Donald Trump’s candidate committee mentions only him and does so in a positive way. Con = ad mentions one or more candidates in negative way. Example: this ad sponsored by the Right to Rise super PAC, which supports Jeb Bush, mentions only Marco Rubio and in a negative fashion. It includes references to “liberal Democrats” but none are candidates in the 2016 presidential race. Mixed: Any ad that mentions more than one candidate in particular race, with significant positive content about one or more candidates and negative content about one or more candidates. Example: this ad, sponsored by the Right to Rise Super PAC, criticizes Rubio but praises Jeb Bush.  Type: text.

air_count = Total number of times this particular ad has aired, as captured by the Internet Archive in key primary states. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also, while this is a national total, it pertains only to the states the Internet Archive is tracking. A list of these states can be seen above. Type: number.

market_count = Total number of markets where this ad has aired, as captured by the Internet Archive. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also: this count refers only to markets tracked by Internet Archive. Type: number.

date_created = Date the Political TV Ad Archive counted first airing of this particular ad. Filter here to see what new ads the project has documented since viewing site previously. In other words, if  data are filtered for “February 10, 2016,” the viewer will see a list of ads first counted as airing on TV on that date; in this way, researchers can see which ads are new to the archive. Note: no dates exist before February 9, 2016, not because ads weren’t counted before then, but simply because that is the date this feature was added to the data. Type: date.
 
Unique ads archived metadata
In addition to detailing specific airings of ads on television, the Political TV Ad Archive also archives ads that are not showing up on television stations we are monitoring. There could be a number of reasons for this: for example, perhaps the ad is targeted for an online audience only; or perhaps the ad is airing in states or cities that the project is not monitoring. Another possibility is that the ad is airing on stations not captured by the project, such as local cable programs. Finally, perhaps an ad has not yet aired in key primary states that the project is tracking but will air in the future.
The metadata in this download are similar to those in the ad airings metadata download. However, there are a few additional elements:
reference_count = number of fact or source checks from our partner organizations for this particular ad. For example, the claim that Donald Trump once supported impeaching former President George W. Bush, contained in this ad sponsored by Our Principles PAC, a super PAC opposing Trump, was fact checked by PolitiFact, which rated it as “True.” The PolitiFact story is embedded on the Political TV Ad Archive page displaying the ad. Type: number.
date_ingested = date this ad was added to the Political TV Ad Archive. Type: text.
transcript = Where available, the transcript for the ad. Type: text.

In [18]:
import pandas as pd
import numpy as np
df = pd.read_csv('project_2.csv')
df


Unnamed: 0,wp_identifier,network,location,program,program_type,start_time,end_time,archive_id,embed_url,sponsor,...,sponsor_affiliation,sponsor_affiliation_type,race,cycle,subject,candidate,type,message,market_count,date_created
0,232,FBC,"San Francisco-Oakland-San Jose, CA",Making Money With Charles Payne,news,2015-12-19 23:29:08 UTC,2015-12-19 23:29:38 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
1,232,MSNBCW,"San Francisco-Oakland-San Jose, CA",MSNBC Live With Kate Snow,news,2015-12-14 20:09:20 UTC,2015-12-14 20:09:50 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
2,232,WLTX,"Columbia, SC",Hawaii Five-0,not news,2016-01-02 03:37:10 UTC,2016-01-02 03:37:40 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
3,1056,WSVN,"Miami-Fort Lauderdale, FL",Today in Florida,not news,2016-03-07 10:47:33 UTC,2016-03-07 10:48:03 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
4,1056,WTVT,"Tampa-St. Petersburg, FL",Good Day Tampa Bay 7AM,news,2016-03-07 12:29:24 UTC,2016-03-07 12:29:54 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
5,1056,WOFL,"Orlando-Daytona Beach-Melbourne, FL",Good Day 7am,news,2016-03-07 12:59:43 UTC,2016-03-07 13:00:13 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
6,1056,WESH,"Orlando-Daytona Beach-Melbourne, FL",Access Hollywood,not news,2016-03-07 04:59:49 UTC,2016-03-07 05:00:19 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
7,1056,WJW,"Cleveland, Ohio",The Big Bang Theory,not news,2016-03-07 05:59:37 UTC,2016-03-07 06:00:07 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
8,1056,WNCN,"Raleigh-Durham-Fayetteville, NC",My Carolina Talk,not news,2016-03-07 14:11:53 UTC,2016-03-07 14:12:23 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
9,1056,WTSP,"Tampa-St. Petersburg, FL",CSI Cyber,not news,2016-03-07 03:52:44 UTC,2016-03-07 03:53:14 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41


In [33]:
df.head(20)

Unnamed: 0,wp_identifier,network,location,program,program_type,start_time,end_time,archive_id,embed_url,sponsor,...,sponsor_affiliation,sponsor_affiliation_type,race,cycle,subject,candidate,type,message,market_count,date_created
0,232,FBC,"San Francisco-Oakland-San Jose, CA",Making Money With Charles Payne,news,2015-12-19 23:29:08 UTC,2015-12-19 23:29:38 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
1,232,MSNBCW,"San Francisco-Oakland-San Jose, CA",MSNBC Live With Kate Snow,news,2015-12-14 20:09:20 UTC,2015-12-14 20:09:50 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
2,232,WLTX,"Columbia, SC",Hawaii Five-0,not news,2016-01-02 03:37:10 UTC,2016-01-02 03:37:40 UTC,PolAd_MarcoRubio_s8ty9,https://archive.org/embed/PolAd_MarcoRubio_s8ty9,Marco Rubio for President,...,,none,PRES,2016,"Economy, Foreign Policy, Religion",Marco Rubio,campaign,pro,7,7/6/16 0:41
3,1056,WSVN,"Miami-Fort Lauderdale, FL",Today in Florida,not news,2016-03-07 10:47:33 UTC,2016-03-07 10:48:03 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
4,1056,WTVT,"Tampa-St. Petersburg, FL",Good Day Tampa Bay 7AM,news,2016-03-07 12:29:24 UTC,2016-03-07 12:29:54 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
5,1056,WOFL,"Orlando-Daytona Beach-Melbourne, FL",Good Day 7am,news,2016-03-07 12:59:43 UTC,2016-03-07 13:00:13 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
6,1056,WESH,"Orlando-Daytona Beach-Melbourne, FL",Access Hollywood,not news,2016-03-07 04:59:49 UTC,2016-03-07 05:00:19 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
7,1056,WJW,"Cleveland, Ohio",The Big Bang Theory,not news,2016-03-07 05:59:37 UTC,2016-03-07 06:00:07 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
8,1056,WNCN,"Raleigh-Durham-Fayetteville, NC",My Carolina Talk,not news,2016-03-07 14:11:53 UTC,2016-03-07 14:12:23 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41
9,1056,WTSP,"Tampa-St. Petersburg, FL",CSI Cyber,not news,2016-03-07 03:52:44 UTC,2016-03-07 03:53:14 UTC,PolAd_SocialSecurity_ibhz5,https://archive.org/embed/PolAd_SocialSecurity...,AARP,...,,none,PRES,2016,Social Security,Marco Rubio,issue,unknown,16,7/5/16 19:41


In [29]:
df.describe()
df.values
df.index

RangeIndex(start=0, stop=87608, step=1)

In [80]:
sponsortype = df['sponsor_type'].unique()
network = df['network'].unique()
location = df['location'].unique()
program = df['program'].unique
programtype = df['program_type'].unique()
sponsor = df['sponsor'].unique()
sponsoraff = df['sponsor_affiliation'].unique()
sponsorafftype = df['sponsor_affiliation_type'].unique()
race = df['race'].unique()
cycle = df['cycle'].unique()
subject = df['subject'].unique()
candidate = df['candidate'].unique()
message = df['message'].unique()
start_time = df['start_time'].unique()
end_time = df['end_time'].unique()
market_count = df['market_count'].unique()
date_created = df['date_created'].unique()
embed_url = df['embed_url'].unique()
ttype = df['type'].unique()
market_count = df['market_count'].unique()
print(sponsortype)
np.in1d(['Texas'], location)
#use to check if item in dataset


KeyError: 'air_count'

In [71]:
dedupe = df.drop_duplicates()
dedupe.describe()
df.describe()
#no dupes

Unnamed: 0,wp_identifier,cycle,market_count
count,87608.0,87608.0,87608.0
mean,2167.274005,2016.0,5.788147
std,795.574741,0.0,5.749755
min,232.0,2016.0,1.0
25%,1544.0,2016.0,1.0
50%,1997.0,2016.0,3.0
75%,2773.0,2016.0,10.0
max,4554.0,2016.0,23.0


In [83]:
#data_dict_pol_ad = pd.read_clipboard()
data_dict_pol_ad = pd.DataFrame(index = np.arange(22), columns = ['Column_Title', 'Definition', 'Type'])
data_dict_pol_ad['Column_Title']= ['Ad airings on TV metadata wp_identifier','network','location','program','program type','start_time','end_time', 'archive_id','embed_url','sponsor','sponsor type','sponsor_affiliation','sponsor_affiliation_type' ,'race','cycle','type','subject', 'candidate','message' ,'air_count','market_count', 'date_created']

In [84]:
data_dict_pol_ad['Type'] = ['number','text','text', 'text', 'text', 'date/time', 'date/time', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'text', 'NA', 'NA', 'text', 'number', 'number', 'date']

In [85]:
#use regex if too largeType: date/time. end_time = Type: date/time. archive_id = Type: text. embed_url = Type=text. cycle = Type=text. subject = Type: text. type =Type: text. message =Type: text. air_count = Type: number. market_count =Type: number. date_created =, Type: date."'
#data_dict_pol_ad['Type'] = ['number', 'network','text','location','text'. program =, Type: text. program type =, Type: text. start_time =Type: text. sponsor = Type: text. sponsor type =Type: text. sponsor_affiliation =Type: text. sponsor_affiliation_type =Type: text. race =Type: text. candidate = 
data_dict_pol_ad['Definition'] = ['A unique numeric id for each ad identified, assigned by the Political TV Ad Archive project.',
                                  'TV channel on which the ad aired.', 
                                  'Name of market area covered by broadcast.',
                                  'Name of TV program in which ad aired.',
                                  '“News” or “not news,” representing type of TV programming.', 
                                  'Date/time ad aired, start. Note: these are UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.',
                                  'Date/time ad aired, end. Note: these are UTC times, or “coordinated universal time.” Converting to local times requires consulting local time zones with special attention to seasonal time changes.',
                                  'A unique alphanumeric id for each ad identified, corresponding with id used on PoliticalAdArchive.org. To see ad on website, add prefix: “http://politicaladarchive.org/ad/” to archive_id and a forward slash at the end and paste resulting url into browser. For example, polad_berniesanders_f0chv becomes "http://politicaladarchive.org/ad/polad_berniesanders_f0chv/" ',
                                  'Url for embedding ad. For embed code, use this id within this sample embed code:', 
                                  'Ad sponsor, as it appears in the ad.', 
                                  'Candidate committee, Super PAC, 501(c), 527 etc., source: the Center for Responsive Politics.)', 
                                  'If applicable, the candidate associated with a particular sponsor. For example, Conservative Solutions PAC is a super PAC associated with Marco Rubio. Source: the Center for Responsive Politics.',
                                  'If sponsor is associated with a particular candidate, whether it supports or opposes that candidate. For example, Conservative Solutions PAC is a super PAC that supports Marco Rubio. Source: the Center for Responsive Politics.', 
                                  'Pres, Senate, or House. The federal race the ad is targeted toward. For Senate and House, the state is also indicated, along with the district. Source: the Center for Responsive Politics.',
                                  'Election cycle, i.e. 2016 = the 2015-2016 elections. Source: the Center for Responsive Politics.Subjects covered in ad; subject index from PolitiFact, input by Internet Archive researchers. Candidate(s) named in ad; input by Internet Archive researchers.', 
                                  'Campaign ad, issue ad, unknown, input by Internet Archive researchers. Most ads in this archive are “campaign ads”–ads that are targeted toward particular candidates. However, some ads are “issue ads,” ads that cover “a national legislative issue of public importance.” Federal Communications Commission (FCC) rules require that TV stations disclose ad buy contracts for both types of ads; therefore the Political TV Ad Archive includes such ads in this collection. Example: this ad on Puerto Rico debt.', 
                                  'NA',
                                  'NA',
                                  'Pro, con, mixed; input by Internet Archive researchers. Pro = ad mentions one or more candidates in positive way, no negative message about any candidate (Important: this applies only to candidates running in current election and race). Example: this ad sponsored by Donald Trump’s candidate committee mentions only him and does so in a positive way. Con = ad mentions one or more candidates in negative way. Example: this ad sponsored by the Right to Rise super PAC, which supports Jeb Bush, mentions only Marco Rubio and in a negative fashion. It includes references to “liberal Democrats” but none are candidates in the 2016 presidential race. Mixed: Any ad that mentions more than one candidate in particular race, with significant positive content about one or more candidates and negative content about one or more candidates. Example: this ad, sponsored by the Right to Rise Super PAC, criticizes Rubio but praises Jeb Bush.', 
                                  'Total number of times this particular ad has aired, as captured by the Internet Archive in key primary states. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also, while this is a national total, it pertains only to the states the Internet Archive is tracking. A list of these states can be seen above.',
                                  'Total number of markets where this ad has aired, as captured by the Internet Archive. Important: we capture all airings of ad, not just paid airings; if clip is replayed as part of a TV news broadcast, that will be represented in the count. Also: this count refers only to markets tracked by Internet Archive.',
                                  'Date the Political TV Ad Archive counted first airing of this particular ad. Filter here to see what new ads the project has documented since viewing site previously. In other words, if data are filtered for “February 10, 2016,” the viewer will see a list of ads first counted as airing on TV on that date; in this way, researchers can see which ads are new to the archive. Note: no dates exist before February 9, 2016, not because ads weren’t counted before then, but simply because that is the date this feature was added to the data.']


In [86]:
data_dict_pol_ad['Unique_Values'] = ['ID', network, location, program, programtype, start_time, end_time, 'ID', embed_url, sponsor, sponsortype, sponsoraff, sponsorafftype, race, cycle, ttype, subject, candidate, message, '?', market_count, date_created]
#add additional column
#how to groupby and aggregate data properly 


In [87]:
data_dict_pol_ad

Unnamed: 0,Column_Title,Definition,Type,Unique_Values
0,Ad airings on TV metadata wp_identifier,"A unique numeric id for each ad identified, as...",number,ID
1,network,TV channel on which the ad aired.,text,"[FBC, MSNBCW, WLTX, WSVN, WTVT, WOFL, WESH, WJ..."
2,location,Name of market area covered by broadcast.,text,"[San Francisco-Oakland-San Jose, CA, Columbia,..."
3,program,Name of TV program in which ad aired.,text,<bound method IndexOpsMixin.unique of 0 ...
4,program type,"“News” or “not news,” representing type of TV ...",text,"[news, not news]"
5,start_time,"Date/time ad aired, start. Note: these are UTC...",date/time,"[2015-12-19 23:29:08 UTC, 2015-12-14 20:09:20 ..."
6,end_time,"Date/time ad aired, end. Note: these are UTC t...",date/time,"[2015-12-19 23:29:38 UTC, 2015-12-14 20:09:50 ..."
7,archive_id,A unique alphanumeric id for each ad identifie...,text,ID
8,embed_url,"Url for embedding ad. For embed code, use this...",text,[https://archive.org/embed/PolAd_MarcoRubio_s8...
9,sponsor,"Ad sponsor, as it appears in the ad.",text,"[Marco Rubio for President, AARP, Keep the Pro..."
