# Job Sieve
---

Development documentation for the screen scraping of multiple job boards for useful statistics, filtered high-value prospects, and correlative analytics.

---

## <a name="toc"></a> Table of Contents
1. [Process Job Board](#process_job_board)
  1. [Job Listings](#job_listings)
  2. [Job Posts](#job_posts)
2. [Analytics](#analytics)
  1. [Keyword Frequencies](#keyword_frequencies)
  2. [Resume Correlations](#resume_correlations)
  3. [Filtering Prospects](#filtering_prospects)



In [1]:
# -------------------- LOAD DEPENDENCIES -------------------- #

# Environment hard reset
%reset -f

# Standard math and data libraries
import numpy as np
import pandas as pd

# Plotting libraries
import matplotlib.pyplot as plt
%matplotlib inline

# Libraries for scraping
import requests
import urllib.request
import time
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import lxml.html as lh
import ssl

# Date time for date operations
import datetime

# Levenshtein fuzzy comparisons
from fuzzywuzzy import fuzz 
from fuzzywuzzy import process

# Import string cleaning functions
import re

# Flask support
from flask import request, jsonify

# Configure paths
from pathlib import Path
# data_path = Path('Datasets')




## <a name="process_job_board"></a> [Process Job Board](#toc)

Given a job board, input the desired job characteristics, form them into a query, and aggregate all job postings returned to us from that query.

1. [Job Listings](#job_listings)
2. [Job Posts](#job_posts)



### <a name="job_listings"></a> [Job Listings](#toc)

When using an online job board, any query entered into the search bar returns several pages of job listings. Each listing on those pages is referred to as a card. The card is an HTML object which stores all of the relevant details that identify that card ti the user. These details include the job title, a link to the job posting itself, the company, the location, and a brief summary of the job's key features. The role of this program is to construct the query for the user, to execute the query against the job board, and to process (parse) those job cards into a Python object that can be expanded upon.

**Future development:** <br>
- Iterate through each page of job postings returned from the query and add them to our cards.
- The original results tell us the most relevant job listings (ordered typically by decending relevance) as well as the number of pages returned by the query. Using asynchronous techniques, these pages can be called simultaneously and processed by the order in which the results are returned. This technique greatly reduced the overhead expense of calling each query and waiting for the server to reply. The program can process the cards on its own time as the cards return.



In [2]:
# -------------------- PARSE JOB CARDS -------------------- #

keywords = ["CCNA"]
location =  "Albuquerque"

from custom.indeed.form_query import form_query
base_query = form_query(keywords, location)

# ---------- #

from custom.indeed.call_query import call_query
from custom.indeed.extract_cards import extract_cards
from custom.indeed.parse_cards import parse_cards

page = call_query(base_query)
cards = extract_cards(page)
parsed_cards = parse_cards(cards)

parsed_cards[1]


{'title': 'IT Support Associate II',
 'link': 'https://www.indeed.com/rc/clk?jk=e0c269f2fc3ca546&fccid=fe2d21eef233e94a&vjs=3',
 'company': 'Amazon.com Services LLC',
 'location': 'Albuquerque, NM',
 'summary': ['1+ year experience in maintaining laser printers.',
  '1+ year experience with Cisco/networking and either Linux and/or Microsoft.',
  'High School or equivalent diploma.']}

### <a name="job_posts"></a> [Job Posts](#toc)

With the cards processed, the next step is to iterate over the cards and parse each resulting job post. This information will be compiled for future analysis to determine if the job posting is worth the time it takes to formally apply for the job. This step boils down to just calling the link to each posting, extracting the description, extracting the site to apply to the job, and appending both attributes to the card to create a complete accounting of each opportunity. Some job postings have an option to apply for the job directly on the job board website. In such an instance, the application link is simply set to the job posting link itself.



In [3]:
# -------------------- PARSE JOB POSTINGS -------------------- #

from custom.indeed.parse_job_postings import parse_job_postings
cards = parse_job_postings(parsed_cards)
cards[0]


{'title': 'Installation Technician - Level III',
 'link': 'https://www.indeed.com/pagead/clk?mo=r&ad=-6NYlbfkN0B12G8xpRoeRLyzSGrx5gUDYJ2cuiP1A6qmzAe_HI-Ae470JDKxFdWE9_acl1_WCddFHurk7CBA2nm82RcgMz-UHbilJ1mh1Z16KAxMdJSZtafJOP7keUhtDlwBrk9tC2cBch4fnmRM761JuLEXRfogA3xAtuPWvnWVzkIQmxNVZ9uQSa5iSz-G6XZvYRj2daVbT41HEiEoYpAcEZwT1Y2WyV_jZl1ZtIY_bJ3YuGoseL3Zt2Qz25xLLAjfCvZKKXvFyR_5RH-dRxeEXTXs-GpSRY1IiQV59nhFh_ZQzxxXLEGC0WLjqzuJSnNyzS7fJuSIaJ4uTH11HSXnSUAbxirK4EKPcQlY5EKHcwt-v5p-qKnAmsRMVuViwcrU8GCUJdPqxVkrZnRu7isrUnFb7Lng98hyQk9j7l92SNK1tLea9xpejcsMsKZYqLeY_69aviHuXqgHDKCqORT_pFsDiroa&p=0&fvj=1&vjs=3',
 'company': 'ZeroDay Technology Solutions',
 'location': 'Albuquerque, NM',
 'summary': ['Safety | Accountability | Customer Centric | Return on Investment | Integrity | Family | Innovative | Community Involvement | Empowerment.'],
 'description': "Job detailsJob TypeFull-timeNumber of hires for this role5Full Job DescriptionTo be a team member at ADB Companies you must support the company’s missi

## <a name="analytics"></a> [Analytics](#toc)

This program is capable of aggregating large amounts of job posting data. By processing that data, patterns in aggregate employer demand can be discovered, the jobs with the most correlation to your resume can be identified, and the greatest prospects for an application can be filtered out and returned to the user. This analytics section aims to do just that.

1. [Keyword Frequencies](#keyword_frequencies)
2. [Resume Correlations](#resume_correlations)
3. [Filtering Prospects](#filtering_prospects)

