## Requirements
* [Chrome Driver](https://chromedriver.chromium.org/downloads)
* Selenium

### Considerations
* Consider using webdriver-manager to streamline installation of chrome driver

In [105]:
import requests
import time

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

In [106]:
BASE_COMPANY_URL = "https://sloanreview.mit.edu/culture500/company/c"
COMPANY_VALUES = {
    "Agility": "agility",
    "Collaboration": "collaboration",
    "Customer": "customer",
    "Diversity": "inclusivity",
    "Execution": "execution",
    "Innovation": "innovation",
    "Integrity": "integrity",
    "Performance": "performance",
    "Respect": "respect",
}
PATH_TO_CHROME_DRIVER = "/Users/jamosa/bin/chromedriver"
CHROME_OPTIONS = Options()
# CHROME_OPTIONS.add_argument("--headless")

## Company URL
The URL associated with each company is relatively the same. The offset for a companies index is 100, so the first company's information can be found at `https://sloanreview.mit.edu/culture500/company/c101` which is the company `3M`. Therefore, information associated with all companies listed on the Culture 500 page can be found by modifying the last 3 characters of the URL with values in the range of 101 and 631, inclusive.

In [107]:
company_indices = range(101, 632)

## Singleton Inspection
So, questions! Now we need to examine a single instance and see if we can successfully scrape information form the site.
* Can we successfully scrape information about a single instance?
* Do all instances follow the same structure?

In [117]:
chrome_driver = webdriver.Chrome(executable_path=PATH_TO_CHROME_DRIVER, options=CHROME_OPTIONS)
company_number = 101 # Substitute with iterable once implementation works
culture500_company = f"{BASE_COMPANY_URL}{company_number}"
cur_company_category_html = ""

# Start off with one company and one category
chrome_driver.get(culture500_company) # open webpage

# Get Company's name
company_name_div = scraper.find("div", {"class": "sc-bdVaJa sc-bwzfXH jruoDg"})
company_name = company_name_div.get_text()

# All buttons associated with company values have the same class name
# Grab all links and simulate a click action
cv_buttons = chrome_driver.find_elements_by_class_name("sc-bdVaJa.sc-EHOje.bubble-sidebar-button.sc-gqjmRU.eiaGtK")

# Start of by getting information about one instance `agility`
# Simulate the click action
time.sleep(1) # Hacky approach, but should wait for page to be fully loaded before clicking
cv_buttons[0].click()


# Grab the updated pages content
cur_company_cur_value_html = chrome_driver.page_source

# Initialize web scraper with current companies html page
scraper = BeautifulSoup(cur_company_cur_value_html, "html.parser")

# Grab information about the frequency score and sentiment score
# Both have the same class name across all company values
freq_sent_score = scraper.find("div", {"class": "sc-bdVaJa sc-htpNat hALkol"})
freq_sent_score = freq_sent_score.get_text()
freq_sent_score = freq_sent_score.replace("Frequency Score: ", "")
freq_sent_score = freq_sent_score.replace("Sentiment Score: ", " ")
frequency_score, sentiment_score = freq_sent_score.split(" ")

print(f"Company Name: {company_name}")
print(f"Agility: Frequency Score - {frequency_score} | Sentiment Score: {sentiment_score}")
chrome_driver.quit()

Company Name: 3M
Agility: Frequency Score - +2.0 | Sentiment Score: -1.6


In [113]:
# res = scraper.find("div", {"class": "sc-bdVaJa sc-htpNat hALkol"})
# res = res.get_text()
# company_name_div.get_text()
cv_buttons[0].click()
freq_sent_score = scraper.find("div", {"class": "sc-bdVaJa sc-htpNat hALkol"})
freq_sent_score


Help on method click in module selenium.webdriver.remote.webelement:

click() method of selenium.webdriver.remote.webelement.WebElement instance
    Clicks the element.



In [97]:
res.replace("Frequency Score: ", "").replace("Sentiment Score: ", " ").split(" ")

['+2.0', '-1.6']

In [120]:
chrome_driver.quit()

In [14]:
with open("tmp.html", "w") as file:
    file.write(cur_company_category_html)

In [18]:
company_3m = f"{BASE_COMPANY_URL}101"
with requests.Session() as session:
    response = requests.get(company_3m)


In [19]:
print(response.content.decode("utf-8"))

<!doctype html>
  <html lang="en">
  <head>
    <meta charset="utf-8"/>
    <base href="/culture500/" />
    <link rel="shortcut icon" href="https://sloanreview.mit.edu/content/plugins/culture-500/favicon.png"/>
    <meta name="viewport" content="width=device-width,initial-scale=1,shrink-to-fit=no"/>
    <meta name="theme-color" content="#000000"/>
    <link rel="manifest" href="https://sloanreview.mit.edu/content/plugins/culture-500/manifest.json?20190722"/>
    <link href="https://fonts.googleapis.com/css?family=Roboto|Work+Sans:400,600" rel="stylesheet"/>
    <link href="https://sloanreview.mit.edu/content/plugins/culture-500/static/css/mit-c500-main.css?20190722" rel="stylesheet">
    <link rel="apple-touch-icon-precomposed" sizes="144x144" href="/static/apple-touch-icon-144x144-precomposed.png" />
    <link rel="apple-touch-icon-precomposed" sizes="114x114" href="/static/apple-touch-icon-114x114-precomposed.png" />
    <link rel="apple-touch-icon-precomposed" sizes="72x72" href="/

In [64]:
chrome_driver = webdriver.Chrome(CHROME_APP)
chrome_driver.get(company_3m + "?cv=agility")

In [65]:
soup = BeautifulSoup(chrome_driver.page_source, "html.parser")

In [66]:
soup.select("div.bubble-tooltip")

[]

In [71]:
company_values = chrome_driver.find_elements_by_class_name("sc-bdVaJa.sc-EHOje.bubble-sidebar-button.sc-gqjmRU.eiaGtK")
agility = company_values[0]

In [75]:
agility.click()

['__bool__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

In [77]:
agility.text

'Agility\nEmployees can respond quickly and effectively to changes in the marketplace and seize new opportunities.'

In [79]:
"Frequency Score" in chrome_driver.page_source

True

In [148]:
import math

def partition(value, num_slices):
    """Creates an approximately even partition of some whole number."""
    hop = value // num_slices
    indices = [math.floor(ndx * hop) for ndx in range(num_slices - 1)].append(value)
    
    return indices

In [151]:
res = partition(531, 12)