# Pulling Data from Glassdoor API
* * * * *
This file will setup and test the API GET request to pull company ratings from Glassdoor's API. It will then load all of the companies I am interested in collecting ratings on, run the request, and then save the ratings data to a new csv.

In [2]:
# Import required libraries
import requests
import urllib
import json
from __future__ import division
import math
import os
import re
from operator import itemgetter
from itertools import groupby

## Constructing API GET Request

Setting up the basic variables to call.

In [2]:
# set base url
base_url="http://api.glassdoor.com/api/api.htm?"

# set response format
response_format=".json"

# set search parameters
search_params = {"v":"1",
                "format":"json",
                "t.k": "Glassdoor Partner ID Goes Here",
                "t.p": "Glassdoor Partner Key Goes Here",
                "userip":"My IP Address",
                "useragent":"Chrome/53.0.2785.116",
                "action":"employers",
                 "q":"3M"}    

In [3]:
#testing the url works
r = requests.get(base_url+response_format, params=search_params, headers={"User-Agent":"Chrome/53.0.2785.116"})

print(r.url)

http://api.glassdoor.com/api/api.htm?.json&useragent=Chrome%2F53.0.2785.116&format=json&action=employers&v=1&q=3M&t.k=eoptK2zAcqi&userip=136.152.142.46&t.p=98187


In [4]:
response_text= r.text

In [5]:
# Convert JSON response to a dictionary
data=json.loads(response_text)
data.keys()

dict_keys(['status', 'jsessionid', 'success', 'response'])

In [6]:
#looking at what it calls
data['response']['employers'][0]

{'careerOpportunitiesRating': '3.4',
 'ceo': {'image': {'height': 200,
   'src': 'https://media.glassdoor.com/people/sqll/446/3m-inge-g-thulin.png',
   'width': 200},
  'name': 'Inge G. Thulin',
  'numberOfRatings': 482,
  'pctApprove': 86,
  'pctDisapprove': 14,
  'title': 'Chairman President & CEO'},
 'compensationAndBenefitsRating': '3.7',
 'cultureAndValuesRating': '3.7',
 'exactMatch': True,
 'featuredReview': {'attributionURL': 'http://www.glassdoor.com/Reviews/Employee-Review-3M-RVW12548978.htm',
  'cons': 'bad management, favoritism, bending of company policies,',
  'currentJob': True,
  'headline': 'operator',
  'id': 12548978,
  'jobTitle': 'Employee',
  'location': '',
  'overall': 4,
  'overallNumeric': 4,
  'pros': 'Good pay, overtime, benefits, good work environment',
  'reviewDateTime': '2016-11-03 13:19:49.283'},
 'id': 446,
 'industry': 'Chemical Manufacturing',
 'industryId': 200068,
 'industryName': 'Chemical Manufacturing',
 'isEEP': False,
 'name': '3M',
 'numberOf

## Loading dataset with companies to search

Loading previously downloaded dataset with all the companies that need to be searched in the Glassdoor ratings database.

In [3]:
print(os.path.abspath(os.curdir)) #check current wd

/Users/Rosie/Box Sync/Berkeley/Fall 2016/PolSci239T/PS239T-final-project/Code


In [4]:
import csv
companylist=[]
os.chdir("..") #move 1 wd up
with open('./Data/01_glassdoor_ceo_pay.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        companylist.append(row['company'])

In [6]:
#making sure this worked as expected and produced a list
type(companylist)

list

In [9]:
#Should list all the companies, and it does.
print(companylist)

['3M', 'Abbott Labs', 'AbbVie', 'Accenture', 'ACE Group', 'ADM', 'Adobe', 'ADP', 'ADT Security Services', 'Advance Auto Parts', 'AEP', 'AES Corporation', 'Aetna', 'Aflac', 'Agilent Technologies', 'AIG', 'Aimco Apartment Homes', 'Air Products', 'Airgas', 'Akamai', 'Alcoa', 'Allergan', 'Alliance Data', 'Allstate', 'Altera', 'Altria', 'Amazon.com', 'Ameren', 'American Airlines', 'American Express', 'American Tower', 'Ameriprise', 'AmerisourceBergen', 'AMETEK', 'Amgen', 'Anadarko Petroleum', 'Analog Devices', 'Anthem', 'Aon', 'Apache', 'Apple', 'Applied Materials', 'Assurant', 'AT&T', 'Autodesk', 'AutoNation', 'AutoZone', 'Avago Technologies', 'AvalonBay', 'Avery Dennison', 'Baker Hughes', 'Bank of America', 'Bard', 'Baxter', 'BB&T', 'BD', 'Bed Bath & Beyond', 'Best Buy', 'Biogen Idec', 'BlackRock', 'BNY Mellon', 'Boeing', 'BorgWarner', 'Boston Scientific', 'Bristol-Myers Squibb', 'Broadcom', 'Brown-Forman', 'C.H. Robinson Worldwide', 'CA Technologies', 'Cablevision Systems', 'Cameron', 'C

## Testing & accounting for errors in the GET request

After running the code a few times, I discovered there were some errors in how the company names were stored, which generated errors when running the API code (all should now be fixed). This was some code I created to test for errors and keep track of them while continuing to run through the for loop.

In [10]:
search_params = []
search_params = {"v":"1",
                "format":"json",
                "t.k": "eoptK2zAcqi",
                "t.p": "98187",
                "userip":"136.152.142.46",
                "useragent":"Chrome/53.0.2785.116",
                "action":"employers",
                 "q":"Accenture"}
r = requests.get(base_url+response_format, params=search_params, headers={"User-Agent":"Chrome/53.0.2785.116"})
response_text= r.text
data=json.loads(response_text)
try:
    data1 = data['response']['employers'][0]
    print(data1)
except IndexError:
    print("broken")

{'squareLogo': 'https://media.glassdoor.com/sqll/4138/accenture-squarelogo-1446040583582.png', 'isEEP': True, 'featuredReview': {'jobTitle': 'Infrastructure Manager', 'currentJob': True, 'overallNumeric': 5, 'cons': 'This is not an opportunity for those that do not want to work. At Accenture everyone has a load of work to do and often times under very tight customer time lines. It is a challenge that I enjoy but many might find it stressful.', 'pros': 'There are a lot of pros working for Accenutre. They have great career opportunities, a never ending supply of interesting work, competitive compensation, wonderful benefits, great people, wonderful training programs, a tremendous number of brilliant professionals in their fields ready to help, and great core values.', 'jobTitleFromDb': 'Infrastructure Manager', 'location': 'West Lafayette, IN', 'headline': 'Alot of Jobs One Home', 'reviewDateTime': '2015-09-21 06:13:54.03', 'id': 8020149, 'overall': 5, 'attributionURL': 'http://www.glass

## Put it all together

The loop below adds a new search parameter from the company list to be pulled from the API, makes note of errors, and stores the data in a new dataset.

In [11]:
import time
final_dic = []

for companyname in companylist:
    #Insert company name into search parameters
    search_params = []
    search_params = {"v":"1",
                "format":"json",
                "t.k": "eoptK2zAcqi",
                "t.p": "98187",
                "userip":"136.152.142.46",
                "useragent":"Chrome/53.0.2785.116",
                "action":"employers",
                 "q":str(companyname)}
    
    #Run through API
    r = requests.get(base_url+response_format, params=search_params, headers={"User-Agent":"Chrome/53.0.2785.116"})
    response_text= r.text
    data=json.loads(response_text)
    company = {}
    
    try:
        data1 = data['response']['employers'][0]

        #Collect relevant data from first company in the list
        company["id"] = data1["id"]
        company["company_ratings"] = data1["name"]
        company["industry"] = data1["industry"]
        company["numberOfRatings"] = data1["numberOfRatings"]
        company["ceo2016"] = data1["ceo"]["name"]
        company["ceoratings"] = data1["ceo"]["numberOfRatings"]
        company["ceopctApprove"] = data1["ceo"]["pctApprove"]
        company["overall"] = data1["overallRating"]
        company["careerops"] = data1["careerOpportunitiesRating"]
        company["comp"] = data1["compensationAndBenefitsRating"]
        company["culturevalues"] = data1["cultureAndValuesRating"]
        company["srleadership"] = data1["seniorLeadershipRating"]
        company["wlb"] = data1["workLifeBalanceRating"]
        company["rectofriend"] = data1["recommendToFriendRating"]
        company["error"] = "None"
        final_dic.append(company)
    
        #check its iterating correctly
        print(companyname + " = " + str(company["company_ratings"]) + " complete")
        
    except IndexError:
        company["company_ratings"] = str(companyname)
        company["error"] = "Index Error"
        final_dic.append(company)
        
        #show it's iterating errors
        print(companyname + " Index Error!!!!!")
        
    except KeyError:
        company["company_ratings"] = str(companyname)
        company["error"] = "Key Error"
        final_dic.append(company)
        
        #show it's iterating errors
        print(companyname + " Key Error!!!!!")
    
    time.sleep(1) #have it rest 1 second between searches

3M = 3M complete
Abbott Labs = Abbott Labs complete
AbbVie = AbbVie complete
Accenture = Accenture complete
ACE Group = ACE Group complete
ADM = ADM complete
Adobe = Adobe complete
ADP = ADP complete
ADT Security Services = ADT Security Services complete
Advance Auto Parts = Advance Auto Parts complete
AEP = AEP complete
AES Corporation = AES Corporation complete
Aetna = Aetna complete
Aflac = Aflac complete
Agilent Technologies = Agilent Technologies complete
AIG = AIG complete
Aimco Apartment Homes = Aimco Apartment Homes complete
Air Products = Air Products complete
Airgas = Airgas complete
Akamai = Akamai complete
Alcoa = Alcoa complete
Allergan = Allergan complete
Alliance Data = Alliance Data complete
Allstate = Allstate complete
Altera = Altera complete
Altria = Altria complete
Amazon.com = Amazon.com complete
Ameren = Ameren complete
American Airlines = American Airlines complete
American Express = American Express complete
American Tower = American Tower complete
Ameriprise = 

In [12]:
print(final_dic) #check that things look correct

[{'ceo2016': 'Inge G. Thulin', 'srleadership': '3.2', 'company': '3M', 'error': 'None', 'comp': '3.7', 'ceoratings': 482, 'numberOfRatings': 1357, 'culturevalues': '3.7', 'ceopctApprove': 86, 'id': 446, 'overall': '3.7', 'industry': 'Chemical Manufacturing', 'careerops': '3.4', 'rectofriend': 75, 'wlb': '3.5'}, {'ceo2016': 'Miles D. White', 'srleadership': '3.3', 'company': 'Abbott Labs', 'error': 'None', 'comp': '3.8', 'ceoratings': 526, 'numberOfRatings': 1019, 'culturevalues': '3.6', 'ceopctApprove': 80, 'id': 12, 'overall': '3.6', 'industry': 'Biotech & Pharmaceuticals', 'careerops': '3.3', 'rectofriend': 75, 'wlb': '3.5'}, {'ceo2016': 'Richard A. Gonzalez', 'srleadership': '3.2', 'company': 'AbbVie', 'error': 'None', 'comp': '3.9', 'ceoratings': 208, 'numberOfRatings': 557, 'culturevalues': '3.5', 'ceopctApprove': 85, 'id': 649837, 'overall': '3.6', 'industry': 'Biotech & Pharmaceuticals', 'careerops': '3.3', 'rectofriend': 71, 'wlb': '3.4'}, {'ceo2016': 'Pierre Nanterme', 'srlead

In [13]:
final_dic[0].keys()

dict_keys(['ceo2016', 'srleadership', 'company', 'error', 'comp', 'ceoratings', 'numberOfRatings', 'culturevalues', 'ceopctApprove', 'id', 'overall', 'industry', 'careerops', 'rectofriend', 'wlb'])

In [14]:
import csv
keys = final_dic[0].keys()

#save dataset to a csv for analysis
with open('./Data/02_glassdoor_ratings.csv', 'w') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(final_dic)