# DESMOG Global Warming Disinformation Database

### This script web-scrapes the database for climate deniers (actors and organisations) involved in the global warming denial industry.

# 1: Individuals

In [1]:
import requests       # send requests to web server
from lxml import html # parse HTML
import json           # store data as json file
import re
# Change directory
import os
os.chdir('../../../Data/DeSmog')

In [2]:
url = 'https://www.desmogblog.com/global-warming-denier-database'
page = requests.get(url)
tree = html.fromstring(page.content)
tree.make_links_absolute(url) #extracts absolute instead of relative links

In [3]:
# Print the name and the URL of the first contrarian

print(tree.xpath('//*[@id="datatabs-1"]/h2[1]/a/text()')) 
print(tree.xpath('//*[@id="datatabs-1"]/h2[1]/a/@href')) 

['Tony Abbott']
['https://www.desmogblog.com/tony-abbott']


### Gather the data in tables

In [4]:
# Table of the names 
names = []
nr_of_pages = 480
for i in range(nr_of_pages):
    path = '//*[@id="datatabs-1"]/h2[{}]/a/text()'.format(i+1)
    names.append(tree.xpath(path))

print(len(names))
print(names[-1])

480
['Benjamin Zycher']


In [5]:
print(names[-40:-1])

[['Hendrik Tennekes'], ['Holger J. Thuss'], ['Rex Tillerson'], ['Utz Tillmann'], ['Richard Tol'], ['Donald Trump'], ['David Tuerck'], ['Fritz Vahrenholt'], ['Brian G. Valentine'], ['Jan Veizer'], ['Lord Nigel Vinson'], ['Arthur Viterito'], ['Greg E. Walcher'], ['James A. Wanliss'], ['Anthony Watts'], ['Gerd-Rainer Weber'], ['Werner Weber'], ['Edward Wegman'], ['William Wehrum'], ['Andrew Wheeler'], ['David Whitehouse'], ['Harry Wilkinson'], ['Dan H. Wilks'], ['Farris Wilks'], ['George Will'], ['Timothy Williams'], ['Sammy Wilson'], ['Wayne Winegarden'], ['Boris Winterhalter'], ['Bruno Wiskel'], ['David Wojick'], ['Joel Wood'], ['Todd Wynn'], ['Thomas Wysmuller'], ['Miklós Zágoni'], ['Karl Zeller'], ['Heather Zichal'], ['Antonino Zichichi'], ['Ryan Zinke']]


In [6]:
# Table of the URLs 
URLs = []
for i in range(nr_of_pages):
    path = '//*[@id="datatabs-1"]/h2[{}]/a/@href'.format(i+1)
    URLs.append(tree.xpath(path)[0])
    
print(len(URLs))
print(URLs[-1])

480
https://www.desmogblog.com/benjamin-zycher


In [7]:
# Combine name with URL

contrarians = []

for i in range(len(names)):
    contrarians.append({'name': names[i][0],
                     'url': URLs[i]})
print(contrarians[-1])

{'name': 'Benjamin Zycher', 'url': 'https://www.desmogblog.com/benjamin-zycher'}


### Extract the page content

In [8]:
page = requests.get(contrarians[0]['url'])
print("We extracted the following page:\n%s" % page.url)
print(page) # code 200 means all is good \o/ http://docs.python-requests.org/en/master/api/#requests.Response
tree = html.fromstring(page.content)

We extracted the following page:
https://www.desmogblog.com/tony-abbott
<Response [200]>


In [9]:
# Extract all overview content located in paragraph tags
overview = '\n\n'.join([p.text_content().strip() for p in 
                       tree.xpath('//*[@id="content"]')]).strip()

print(overview[310:1000])

ackground

Tony Abbott was Prime Minister of Australia between 2013 and 2015, having led the centre-right Liberal Party in opposition since 2009. He was ousted as Liberal Party leader by Malcolm Turnbull in September 2015 following poor approval ratings. [1], [3]

Other political roles he has held include Minister for Women, Leader of the House and Minister for Employment and Workplace Relations. Abbott lost his parliamentary seat in the 2019 elections to an independent candidate who “campaigned strongly on a platform of climate change”, according to the Sydney Morning Herald. [1], [4]

Abbott has historically opposed measures to combat climate change, advocating for the (ultimatel


In [10]:
# Extract page content
for contrarian in contrarians:
    # Visit landing page
    page = requests.get(contrarian['url'])
    
    # Parse the xml tree. We will ignore any unicode errors for the moment.
    tree = html.fromstring(page.content.decode('utf-8', errors='ignore'))
    
    # Extract all content located in paragraph tags
    content = '\n\n'.join([p.text_content().strip() for p in 
                           tree.xpath('//*[@id="content"]')]).strip()
    
    # Add text content to dictionary
    contrarian['content'] = content


In [11]:
(contrarians[200]['content'])

'Gertrud Höhler\n                                                      \n\n\n\n\n        \n    \n\tGertrud\xa0Höhler\n\n\tCredentials\n\n\t\n\t\t\xa0Dr. phil., University of Mannheim (1967). [1], [2]\n\n\n\tBackground\nGertrud Höhler is a German literary critic, journalist and corporate consultant as well as an outspoken Christian Democrat and past advisor to Chancellor Helmut Kohl. She is also on the boards of a number of major Swiss and German companies. She has also done work for radio, television, newspapers and magazines, and has received a number of awards. [3]\nShe studied literature and art history in Bonn, Berlin, Zurich and Mannheim from 1960 to 1966. She attained a Doctorate (Dr. phil.) at the University of Mannheim. From 1972 to 1993 she was a professor of German studies and General Literature at the University of Paderborn. She was later elected to the University Council of the University of Paderborn, but was forced to resign from this position after the allegation that s

In [12]:
# Parse relevant content 
regex = r'\b\n{2,}(.+?)\n\s+PRINT\n\s+SUBSCRIBE'

info  = re.search(regex, contrarians[200]['content'], re.S)[1]
print(info)

	Credentials

	
		 Dr. phil., University of Mannheim (1967). [1], [2]


	Background
Gertrud Höhler is a German literary critic, journalist and corporate consultant as well as an outspoken Christian Democrat and past advisor to Chancellor Helmut Kohl. She is also on the boards of a number of major Swiss and German companies. She has also done work for radio, television, newspapers and magazines, and has received a number of awards. [3]
She studied literature and art history in Bonn, Berlin, Zurich and Mannheim from 1960 to 1966. She attained a Doctorate (Dr. phil.) at the University of Mannheim. From 1972 to 1993 she was a professor of German studies and General Literature at the University of Paderborn. She was later elected to the University Council of the University of Paderborn, but was forced to resign from this position after the allegation that she had deliberately rented offices to a state representative of the NPD (National Democratic Party of Germany). [4], [5], [6], [7]
Höhle

In [13]:
for i, contrarian in enumerate(contrarians):
    try:
        info  = re.search(regex, contrarian['content'], re.S)[1]
        # Add text content to dictionary
        contrarian['info'] = info
    except TypeError:
        contrarian['info'] = None
        print('There was no relevant page content for {} (Index {}).'.format(contrarian['name'][0],i))

In [14]:
print('We extracted the names and profiles of {} climate change contrarians. One profile looks like this:'.format(len(contrarians)))
print('\n'+ contrarians[0]['name'])
print('\n'+ contrarians[0]['url'])
print('\n'+ contrarians[0]['info'])

We extracted the names and profiles of 480 climate change sceptics. One profile looks like this:

Tony Abbott

https://www.desmogblog.com/tony-abbott

Credentials


	Bachelor of Economics (BEc), University of Sydney. [1]
	Bachelor of Laws (LLB), University of Sydney. [1]
	Bachelor of Arts in Politics and Philosophy, Queen's College, University of Oxford. [2]


Background

Tony Abbott was Prime Minister of Australia between 2013 and 2015, having led the centre-right Liberal Party in opposition since 2009. He was ousted as Liberal Party leader by Malcolm Turnbull in September 2015 following poor approval ratings. [1], [3]

Other political roles he has held include Minister for Women, Leader of the House and Minister for Employment and Workplace Relations. Abbott lost his parliamentary seat in the 2019 elections to an independent candidate who “campaigned strongly on a platform of climate change”, according to the Sydney Morning Herald. [1], [4]

Abbott has historically opposed measures t

In [15]:
with open('contrarian_actors.json', 'w') as outfile:
   json.dump(contrarians, outfile)

# 2: Organisations

In [16]:
url = 'https://www.desmogblog.com/global-warming-denier-database'
page = requests.get(url)
tree = html.fromstring(page.content)
tree.make_links_absolute(url) #extracts absolute instead of relative links

In [17]:
# Print the name and the URL of the first contrarian organisation

print(tree.xpath('//*[@id="datatabs-2"]/div/div[2]/div[1]/h2/span/a/text()'))
print(tree.xpath('//*[@id="datatabs-2"]/div/div[2]/div[1]/h2/span/a/@href')) 

['55 Tufton Street']
['https://www.desmogblog.com/55-tufton-street']


### Gather the data in tables

In [18]:
# Table of the names 
names = []
nr_of_pages = 234
for i in range(nr_of_pages):
    path = '//*[@id="datatabs-2"]/div/div[2]/div[{}]/h2/span/a/text()'.format(i+1)
    names.append(tree.xpath(path))

print(len(names))
print(names[-1])

234
['Your Energy America']


In [19]:
print(names[-40:-1])

[['Power the Future'], ['PragerU'], ['Principia Scientific International'], ['Property and Environment Research Center'], ['Reaching America'], ['Reason Foundation'], ['Renewable Energy Foundation'], ['Scaife Family Foundations'], ['Science and Environmental Policy Project'], ['Science and Public Policy Institute'], ['Seminar Network'], ['Stand Together'], ['State Policy Network'], ['Statistical Assessment Service'], ['Talent Market'], ["TaxPayers' Alliance"], ['Tech Central Station'], ['Texas Public Policy Foundation'], ['The Advancement of Sound Science Coalition'], ['The Daily Caller'], ['The Empowerment Alliance'], ['The Galileo Movement'], ['Third Energy'], ['Thomas Jefferson Institute for Public Policy'], ['Transportation Fairness Alliance'], ['Trees of Liberty'], ['TSAugust'], ['Turning Point USA'], ['U.S. Grains Council'], ['University of Buckingham'], ['US Chamber of Commerce'], ['VA-SEEE'], ['Virginia Institute for Public Policy'], ['Washington Coal Club'], ['Washington Legal

In [20]:
# Table of the URLs 
URLs = []
for i in range(nr_of_pages):
    path = '//*[@id="datatabs-2"]/div/div[2]/div[{}]/h2/span/a/@href'.format(i+1)
    URLs.append(tree.xpath(path)[0])
    
print(len(URLs))
print(URLs[-1])

234
https://www.desmogblog.com/your-energy-america


In [21]:
# Combine name with URL

contrarian_organisations = []

for i in range(len(names)):
    contrarian_organisations.append({'name': names[i][0],
                                  'url': URLs[i]})
    
print(contrarian_organisations[0])

{'name': '55 Tufton Street', 'url': 'https://www.desmogblog.com/55-tufton-street'}


### Extract the page content

In [22]:
page = requests.get(contrarian_organisations[0]['url'])
print("We extracted the following page:\n%s" % page.url)
print(page) # code 200 means all is good \o/ http://docs.python-requests.org/en/master/api/#requests.Response
tree = html.fromstring(page.content)

We extracted the following page:
https://www.desmogblog.com/55-tufton-street
<Response [200]>


In [23]:
# Extract all overview content located in paragraph tags
overview = '\n\n'.join([p.text_content().strip() for p in 
                       tree.xpath('//*[@id="content"]')]).strip()

print(overview[310:1000])

ience denial group, the Global Warming Policy Foundation. Information on the building's residents can be found below. [1]

The building itself is owned by Richard Smith, a businessman who runs an aerospace company called HR Smith Group and a former trustee of the pro-Brexit Politics and Economics Research Trust founded by former Vote Leave and Taxpayers' Alliance CEO Matthew Elliott. It was purchased in 2009 by Specmat, one of Smith’s technology manufacturing companies. While he keeps a low profile, Smith is perhaps best known for flying former Prime Minister David Cameron to his home in Shobdon, Herefordshire, in 2007. Smith is associated with several of the organisations at 55 Tu


In [24]:
# Extract page content
for contrarian in contrarian_organisations:
    # Visit landing page
    page = requests.get(contrarian['url'])
    
    # Parse the xml tree. We will ignore any unicode errors for the moment.
    tree = html.fromstring(page.content.decode('utf-8', errors='ignore'))
    
    # Extract all content located in paragraph tags
    content = '\n\n'.join([p.text_content().strip() for p in 
                           tree.xpath('//*[@id="content"]')]).strip()
    
    # Add text content to dictionary
    contrarian['content'] = content


In [31]:
contrarian_organisations[0]['content'][0:2000]

"55 Tufton Street\n                                                      \n\n\n\n\n        \n    55 Tufton\xa0Street\n\nBackground\n\nThe Westminster building located at 55 Tufton Street is home to a small but influential network of libertarian, pro-Brexit thinktanks and lobby groups,\xa0including the UK's principal climate science denial group, the Global Warming Policy Foundation. Information on the building's residents can be found below. [1]\n\nThe building itself is owned by Richard Smith, a businessman who runs an aerospace company called HR Smith Group and a former trustee of the pro-Brexit Politics and Economics Research Trust founded by former Vote Leave and Taxpayers' Alliance CEO Matthew Elliott. It was purchased in 2009 by Specmat, one of Smith’s technology manufacturing companies. While he keeps a low profile, Smith is perhaps best known for flying former Prime Minister David Cameron to his home in Shobdon, Herefordshire, in 2007. Smith is associated with several of the or

In [26]:
# Parse relevant content 
regex = r'\b\n{2,}(.+?)\n\s+PRINT\n\s+SUBSCRIBE'

info  = re.search(regex, contrarian_organisations[200]['content'], re.S)[1]
print(info)

Background

The Renewable Energy Foundation is a UK charity that aims to “promote sustainable development for the benefit of the public by means of energy conservation and the use of renewable energy”, according to its website. It calls for a “structured energy policy that is both ecologically sensitive and practical”. [1]

The group regularly publishes data on renewable energy generation in the UK and its reports have been covered in national newspapers, including The Times and The Telegraph. [2] 

Although the REF claims not to be opposed to the development of wind energy, it has faced accusations of being an anti-wind campaign group by green energy companies. [3]

The charity was founded in 2004 to fight against what it described as the “grotesque political push” for wind energy in the UK, with TV presenter Noel Edmonds originally serving as its chairman. Edmonds said at the time that he joined the group because of the threat of wind farm developments near his home in Devon. [4]

In

In [27]:
# Parse relevant content 

for i, contrarian in enumerate(contrarian_organisations):
    try:
        info  = re.search(regex, contrarian['content'], re.S)[1]
        # Add text content to dictionary
        contrarian['info'] = info
    except TypeError:
        contrarian['info'] = None
        print('There was no relevant page content for {} (Index {}).'.format(contrarian['name'][0],i))

In [28]:
print('We extracted the names and profiles of {} climate change contrarianal organisations. One profile looks like this:'.format(len(contrarian_organisations)))
print('\n'+ contrarian_organisations[0]['name'])
print('\n'+ contrarian_organisations[0]['url'])
print('\n'+ contrarian_organisations[0]['info'])

We extracted the names and profiles of 234 climate change sceptical organisations. One profile looks like this:

55 Tufton Street

https://www.desmogblog.com/55-tufton-street

Background

The Westminster building located at 55 Tufton Street is home to a small but influential network of libertarian, pro-Brexit thinktanks and lobby groups, including the UK's principal climate science denial group, the Global Warming Policy Foundation. Information on the building's residents can be found below. [1]

The building itself is owned by Richard Smith, a businessman who runs an aerospace company called HR Smith Group and a former trustee of the pro-Brexit Politics and Economics Research Trust founded by former Vote Leave and Taxpayers' Alliance CEO Matthew Elliott. It was purchased in 2009 by Specmat, one of Smith’s technology manufacturing companies. While he keeps a low profile, Smith is perhaps best known for flying former Prime Minister David Cameron to his home in Shobdon, Herefordshire, in

In [29]:
with open('contrarian_organisations.json', 'w') as outfile:
    json.dump(contrarian_organisations, outfile)

# 3: Merge the datasets

In [32]:
# Read JSON formatted data
with open('contrarian_actors.json', 'r') as jfile:
    contrarians = json.load(jfile)

with open('contrarian_organisations.json', 'r') as jfile:
    contrarian_organisations = json.load(jfile)

In [33]:
for contrarian in contrarians:
    contrarian['type'] = 'actor'
    
for contrarian in contrarian_organisations:
    contrarian['type'] = 'organisation'

In [34]:
# Merge the data
contrarians.extend(contrarian_organisations)

In [41]:
# Remove content
for entry in contrarians:
    entry.pop('content', None)

In [43]:
with open('desmog_contrarians.json', 'w') as outfile:
  json.dump(contrarians, outfile)