# Tutorial

In [1]:
from webscraper import GlassdoorWebScraper

I wrote a few docstrings detailing:

GlassdoorWebScraper
    - Installing chromedriver URL.
    - Glassdoor jobpage URL.
    - Design of GlassdoorWebScraper.
    - Table of Contents of all Functions in GlassdoorWebscraper.
    - All functions and their parameters with explanation.
    - Inspiration.
    - Method Resolution Order (MRO).
    - __init__
        - All attributes.

In [2]:
help(GlassdoorWebScraper)

Help on class GlassdoorWebScraper in module webscraper:

class GlassdoorWebScraper(elements.ConfigElements, elements.WebScrapingElements)
 |  GlassdoorWebScraper(keyword, PATH='C:\\Program Files (x86)\\chromedriver.exe')
 |  
 |  A GlassdoorWebScraper obj will be able to configure filters and webscrape.
 |  
 |  Ensure that your chromedriver corresponds correctly to your current
 |  Google chrome version here: 
 |  
 |  https://sites.google.com/a/chromium.org/chromedriver/downloads
 |  
 |  
 |  
 |  Here is the same URL with keyword="data scientist":
 |  
 |  https://www.glassdoor.com/Job/jobs.htm?sc.keyword="data scientist"
 |  &locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=all&
 |  fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=
 |  -1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=
 |  -1&employerSizes=0&applicationType=0&remoteWorkType=0
 |  
 |  
 |  
 |  Design:
 |      
 |      - locators.py and elements.py are split into 2 

## Functions

In [3]:
# Initialize a GlassdoorWebScraper obj.

PATH = "C:\Program Files (x86)\chromedriver.exe"  # The path to your chromedriver.
gd_scraper = GlassdoorWebScraper(keyword="data scientist", PATH=PATH)  # Replace "data scientist" with your keyword.

In [4]:
# Initialize a gd_scraper.driver, 
# sets an implicit wait of 5 seconds (configurable),
# maximizes window,
# gets gd_scraper.URL.

gd_scraper.get(implicitly_wait_time=5, set_implicitly_wait=True)

```python
# In the case you set set_implicitly_wait=False,
# you can personally set it like this (only called after get()):
gd_scraper.set_implicitly_wait(implicitly_wait_time=5)
```

 ## Attributes

In [6]:
gd_scraper.PATH  # The path to your chromedriver.

'C:\\Program Files (x86)\\chromedriver.exe'

In [9]:
gd_scraper.URL_part_1  # First part of the URL.

'https://www.glassdoor.com/Job/jobs.htm?sc.keyword='

In [7]:
gd_scraper.keyword  # Your keyword.

'data scientist'

In [10]:
gd_scraper.URL_part_2  # Second part of the URL.

'&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=                              all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=                              -1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=                              -1&employerSizes=0&applicationType=0&remoteWorkType=0'

In [11]:
gd_scraper.URL  # The full URL with keyword.

'https://www.glassdoor.com/Job/jobs.htm?sc.keyword=data scientist&locT=C&locId=1147401&locKeyword=San%20Francisco,%20CA&jobType=                              all&fromAge=-1&minSalary=0&includeNoSalaryJobs=true&radius=100&cityId=                              -1&minRating=0.0&industryId=-1&sgocId=-1&seniorityType=all&companyId=                              -1&employerSizes=0&applicationType=0&remoteWorkType=0'

In [12]:
gd_scraper.driver  # The driver, only instantiated after you call get() and reinstantiated every subsequent time.

<selenium.webdriver.chrome.webdriver.WebDriver (session="3ec82fe0b4778cd7a001a593c42ebd5e")>

In [13]:
gd_scraper.filters  # A dictionary of all possible filters and their filter options.
# Created after you call init_filters(_filter=input).

{'jobtypes': {'all_job_types': 4867,
  'full_time': 4665,
  'part_time': 97,
  'contract': 19,
  'internship': 41,
  'temporary': 11,
  'entry_level': 34},
 'postdates': {'posted_any_time': 10043,
  'last_day': 304,
  'last_3_days': 565,
  'last_week': 1620,
  'last_2_weeks': 2747,
  'last_month': 4807},
 'salaries': {},
 'radii': ['exact_location',
  '5_miles',
  '10_miles',
  '15_miles',
  '25_miles',
  '50_miles',
  '100_miles'],
 'cityids': {'all_cities': 3834,
  'fremont_ca': 129,
  'menlo_park_ca': 318,
  'mountain_view_ca': 237,
  'palo_alto_ca': 239,
  'redwood_city_ca': 262,
  'san_francisco_ca': 1404,
  'san_jose_ca': 321,
  'santa_clara_ca': 184,
  'south_san_francisco_ca': 518,
  'sunnyvale_ca': 222},
 'industries': {'all_industries': 3459,
  'biotech_&_pharmaceuticals': 1153,
  'marketing_&_advertising': 52,
  'consulting': 100,
  'recruiting_&_staffing': 178,
  'education_&_schools': 81,
  'banking_&_financial_services': 98,
  'government': 36,
  'health_care_&_hospitals'

In [14]:
# This attribute is useless to the user, however
# it may showcase all the stateful filters:
# gd_scraper.get_join_filters.keys().

gd_scraper.get_join_filters  # A dictionary of configurations/functions for all stateful filters.

{'jobtypes': {'get': <bound method ConfigElements.get_filters_jobtypes of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>,
  'join': <bound method ConfigElements.join_filters_jobtypes of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>,
  'is_salary': False,
  'is_more': False,
  'change': <bound method GlassdoorWebScraper.change_jobtype_to of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>},
 'postdates': {'get': <bound method ConfigElements.get_filters_postdates of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>,
  'join': <bound method ConfigElements.join_filters_postdates of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>,
  'is_salary': False,
  'is_more': False,
  'change': <bound method GlassdoorWebScraper.change_postdate_to of <webscraper.GlassdoorWebScraper object at 0x000001C5D233FC40>>},
 'salaries': {'get': <bound method ConfigElements.get_filters_minsalaries of <webscraper.GlassdoorWebScraper object 

## Functions Continued

```python
# This will update the keyword and URL simultaneously, 
# but it will only apply the next time you call get().
gd_scraper.update_keyword_and_URL("business analyst")
```

In [5]:
# Initialize all possible filters.
# Note: should be called right after get().
# Note: you can specify a _filter however,
# make sure this _filter is a stateful filter.

# For more info on stateful filters refer to:
# diagrams/filters.eddx or 
# diagrams/filters.png.
gd_scraper.init_filters(_filter=None)

Cannot initialize filter salaries.


+ **Note**: Always initialize a filter before changing it.

In [23]:
# Check out all the change functions in GlassdoorWebScraper in 
# diagrams/structure.eddx or diagrams/structure.png.
# Note: all possible change functions is in 
# gd_scraper.filters.keys().
# Note: all change functions have possible parameters in the 
# gd_scraper.filters dict.

# Here is an example:
print(gd_scraper.filters.keys(), end="\n\n")

print(gd_scraper.filters["jobtypes"], end="\n\n")

print(gd_scraper.filters["jobtypes"].keys())

gd_scraper.change_jobtype_to("full_time")

dict_keys(['jobtypes', 'postdates', 'salaries', 'radii', 'cityids', 'industries', 'job_functions', 'seniority_labels', 'companies', 'company_sizes', 'sortbys'])

{'all_job_types': 4867, 'full_time': 4665, 'part_time': 97, 'contract': 19, 'internship': 41, 'temporary': 11, 'entry_level': 34}

dict_keys(['all_job_types', 'full_time', 'part_time', 'contract', 'internship', 'temporary', 'entry_level'])


+ **Note**: If you intend to change multiple filters at a time, then you must initialize the filter you wish to change before actually changing it. This issue is tackled with the init_change_filter(filter_type) function.

```python
# For example, if you want to change postdate after you changed
# jobtype, then do this:
print(gd_scraper.filters.keys())
gd_scraper.init_filters("postdates")
print(gd_scraper.filters["postdates"].keys())
gd_scraper.change_postdate_to(your_input)
```

```python
# change_salary_to() takes 2 inputs: begin_salary and end_salary.
# begin_salary corresponds to "left_slider" and 
# end_salary corresponds to "right_slider".
print(gd_scraper.filters.keys())
gd_scraper.init_filters("salaries")
print(gd_scraper.filters["salaries"])
gd_scraper.change_salary_to(begin_salary, end_salary)
```

```python
# If the salary filter dropdown has a checkbox for
# including or excluding data with no salary info, then
# this function is applicable and it works the same way as 
# easy_apply_work_home() below.
gd_scraper.include_no_salary_data(include=bool)
```

```python
# If you want to reset the salary sliders, 
# you may call this function.
# is_both decides if both sliders will be reset (if this 
# parameter is True, then is_left's value doesn't matter).
# If is_both is False, then is_left decides whether you 
# reset the left or right slider.
gd_scraper.reset_salary_slider(is_both=True, is_left=True)
```

In [26]:
# First stateless filter.
# This one configures the Easy Apply Only
# and the Work From Home Only.

# is_eao decides whether you edit the Easy Apply Only or
# the Work From Home Only.
# will_apply decides if you want it on or off.
# e.g. if it is on, and you call this function to set it on, 
# it will stay on.
gd_scraper.easy_apply_work_home(is_eao=True, will_apply=True)

In [27]:
# Second stateless filter.
gd_scraper.change_rating_to(3)  # ratings go from 1-4.

In [28]:
gd_scraper.scrape_jobs(1)  # Specify the number of joblistings you want to scrape.

Unnamed: 0,company,job title,headquarters,salary estimate,job type,size,founded,type,industry,sector,revenue,job description
0,Metromile\n4.2,Staff Data Scientist,"San Francisco, CA",$118K - $193K (Glassdoor est.),Job Type : Full-time,201 to 500 employees,2011,company - public,insurance carriers,insurance,unknown / non-applicable,About Us\n\nOn the off chance you've thought a...


In [29]:
gd_scraper.clear_filters()  # Clears all filters.

In [None]:
# Closes the webpage,
# called after you are finished with everything.
gd_scraper.close()