Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Web Scraping for In-Demand Data Science Skills


I came across this nice blog post by Jesse Steinweg-Woods about how to scrape for key skills that employers are looking for in data scientists. Jesse showed the graphic below by Swami Chandrasekaran to demonstrate the point that it would take a lifetime (or more) to master all the tools required to qualify for every job listing.

data scientist roadmap

Rather than learn everything, we should learn the tools that have the greatest probability of ending up on the "requirements" list of a job posting. We can go to a website like Indeed and collect the keywords commonly used in data science postings. Then we can plot the keyword versus its frequency, as a function of the city in which one would like to work.

Jesse developed some nice code in Python to:

  • Construct a URL to the search results for job postings matching a given city and state (or nationwide, if none are specified)
  • Extract and tally keywords from data science job postings listed in the search results

Below were the results for NYC back in 2015, when he wrote this code (credit to Jesse for the graphic).

data scientist top keywords nyc

I decided to apply a similar analysis to, another job listings board. I was curious as to how similar the results were to Indeed. Of course, it's been four years since Jesse's original analysis, so there more variables in play than the change in platform.


To install the requirements for the scraper, use

pip install -r requirements.txt

I recommend creating a virtual environment so that the proper versions of each package are installed without any conflicts.

Note: The MonsterTextParser class depends on a language processing toolkit called nltk. After nltk is installed, you must take the following steps in a Python console:

import nltk'stopwords')

Then everything should work.

Also, keep in the same directory as your scripts so you can import its modules.


Take a look at the Jupyter notebook, which walks through the logic and shows some examples of how to use the classes in to construct queries, store/load search results, and count keywords in the returned results. You can also look at the examples in the root directory ( anything that ends with ).


Below are the results I obtained from a search for "Data Scientist" jobs in "New York, NY". There are 179 listings in total.

top keywords for data scientist listings on Monster, NYC 2019

It seems that just like in 2015, Python, R, and SQL reign supreme, but Python has surpassed R in popularity.

Directory Structure

Below is a breakdown of the files in this repository.

| |____data_sci_nyc_results.png         # bar plot of most popular Data Scientist jobs in NYC from Monster
| |____proj6_nb_22_1.png                # bar plot by Jesse showing most popular tools from scraping in 2015
| |____RoadToDataScientist1.png         # Graphic by Swami Chandrasekaran showing tools need to master data science
| |____wework_application.png           # Screenshot of a job posting by WeWork from
| |____screenshot.png                   # Screenshot of list of search results from
| |____data_scientist_nyc_search.json   # results from searches for Data Scientist jobs in NYC
| |____wework_description.txt           # job description scraped from a posting by WeWork on
|          # Loads data_scientist_nyc_search.json, counts keywords, and plots frequencies
|____requirements.txt                   # Required libraries for running the scripts in main directory
|                    # Tests to ensure the MonsterSearch class works
|          # Tests to ensure the MonsterListing and MonsterLocation classes work
|                  # Finds the keyword frequencies for a single data scientist job posting
|                          # this
|                         # The main module we've created to organize searches and listings.
|                         # Contains a few constants we use, including list of data science keywords
|____wework_details.txt                 # Details about posting whose description is in wework_description.txt
|____MonsterScraping.ipynb              # Jupyter notebook explaining how I went about scraping


Web scraping with Beautiful Soup to find the data science skills that employers are looking for.






No releases published


No packages published