# Indeed Job Manager

- A text interface that helps users:
    1. Find desired jobs from indeed
    2. Quickly apply to interested jobs
    3. To keep track of application history

### User options

- Users provide:
    - Indeed Starting Urls: list
        - starting url starts from indeed and can contain filters like
            - job title
            - full time/ parttime
            - experience level
            - ex: "https://www.indeed.com/jobs?q=data+engineer&jt=fulltime&explvl=entry_level"
    - Title keywords: list
        - jobs must contain one of these key words
    - Must have keywords: list of lists
        - jobs must conain one group of eay keyword in the job description
    - Nice to have keywords: list
        - db registers all the nice to have keywords found in the job description
    - Resume ID: string
        - keep track of what resume was sent to a company
    - Job Tab Amount: int
        - toggle how many pages you want to apply to open and apply to at a time

### Project components

##### Jobs ETL
- scrapes indeed using starting url
- filters jobs by keywords
- labels jobs and stores them into a json database

##### Semi-Automated Job applier
- User gets N desired jobs that opened in tabs
- User informs system which jobs were applied to
- Job is successfuly registered in the database

##### Job Tracker (Not Implemented yet)
- Tracks where and how many jobs have been applied to
- Tracks how many false positives there have been


In [1]:
import webbrowser
import datetime
from process_jobs import process_jobs, load_database_jobs, write_file
from job_applier.scraper import Scraper
from IPython.display import clear_output


def menu():
    '''main menu'''
    selection = input(MENU_PROMPT)
    while selection != 'quit':
        if selection == 'scrape':
            scrape_new_jobs()
        elif selection == 'apply':
            jobs_db = load_database_jobs()
            new_jobs = get_new_jobs(jobs_db)
            print(f'there are currently {len(new_jobs)} jobs')
            for index in range(0, len(new_jobs), JOB_TAB_AMOUNT):      
                current_jobs = get_current_jobs(new_jobs, index)
                open_job_urls(current_jobs)
                print_job_info(current_jobs)
                selection_2 = input(APPLY_PROMPT)
                while not is_valid_amount(selection_2):
                    selection_2 = input(APPLY_PROMPT)
                if selection_2 == 'quit':
                    break
                if selection_2 == 'skip':
                    continue
                selected_index = parse_input(selection_2)
                commit_jobs(selected_index, current_jobs, jobs_db)
        elif selection == 'quit':
            break
        selection = input(MENU_PROMPT)

def scrape_new_jobs():
    '''web scrapes jobs, process them by keywords, then stores them in db'''
    print('scraping jobs... This may take some time...')
    clear_output(wait=True)
    #!python scrape_jobs.py
    clear_output(wait=False)
    scraper = Scraper()
    scraper.run_spiders(INDEED_STARTING_URLS)
    process_jobs(TITLE_KEYWORDS, MUST_HAVE_KEYWORD_GROUPS, NICE_TO_HAVE_KEYWORDS)
    clear_output(wait=False)

def get_new_jobs(jobs_db):
    '''gets jobs user is intersted in that is sorted by most recent, and amount of keywords'''
    filtered_jobs = [job for job in jobs_db if job['interested'] == True and job['applied_to'] == False]
    sorted_jobs = sorted(filtered_jobs, key = lambda x: (x['scraped_on'], len(x['nice_keywords'])), reverse = True)
    return sorted_jobs

def get_current_jobs(new_jobs, index):
    '''gets jobs from a list at JOB_TAB_AMOUNT at a time'''
    if index + JOB_TAB_AMOUNT <= len(new_jobs):
        return new_jobs[index:index+JOB_TAB_AMOUNT]
    else:
        return new_jobs[index:]

def open_job_urls(jobs):
    '''opens jobs in new tabs'''
    for job in jobs:
        url = job['url']
        print(url)
        webbrowser.open_new_tab(url)

def print_job_info(jobs):
    for idx, job in enumerate(jobs):
        print(f'{idx}. {job["title"]}: {job["nice_keywords"]}')

def is_valid_amount(input):
    input = input.split()
    if input[0] == 'quit' or input[0] == 'skip':
        return True
    if input[0] != 'commit':
        return False
    if len(input) == 1:
        return True
    else:
        if not all([value.isnumeric() for value in input[1:]]):
            return False
        if any([int(value) > JOB_TAB_AMOUNT for value in input[1:]]):
            return False
        return input[1:]

def parse_input(input):
    '''filters which job forms were of interest to the user'''
    input = input.split()
    # if commit is send, assumes all jobs were applied to
    if len(input) == 1:
        return list(range(JOB_TAB_AMOUNT))
    else:
        return [int(value) for value in input[1:]]

def commit_jobs(applied_to_list, applied_to_jobs, jobs_db):
    '''filters out which jobs the user was interested in updates db'''
    for idx, job in enumerate(applied_to_jobs):
        if idx in applied_to_list:
            register_job(job, jobs_db)
        else:
            register_not_interested(job, jobs_db)
    database_jobs_path = 'database_jobs.json'
    write_file(database_jobs_path, jobs_db)

def register_job(current_job, jobs_db):
    '''user successfully applied to job'''
    jobs_id = current_job['info']
    for job in reversed(jobs_db):
        if jobs_id in job['info']:
            job['applied_to'] = True
            job['applied_on'] = str(datetime.date.today())
            job['resume_sent'] = RESUME

def register_not_interested(current_jobs, jobs_db):
    '''keeps track of jobs that should have been applied to but were not'''
    jobs_id = current_jobs['info']
    for job in reversed(jobs_db):
        if jobs_id in job['info']:
            job['interested'] = False
            job['false positive'] = True


In [2]:

# jobs must contain one of these titles
TITLE_KEYWORDS = ['engineer', 'software-engineer', 'dataengineer', 'data-engineer', 'data']

# jobs must contain one group keyword in job description for every group
MUST_HAVE_KEYWORD_GROUPS = [['python', 'python3']]

# job will update all nice to have keywords founds
NICE_TO_HAVE_KEYWORDS = ['pandas', 'webscraping', 'dash', 'scrapy', 'etl', 'pipeline']

# Starting urls from indeed siete, can add job titles, experience level, etc
INDEED_STARTING_URLS = [
        "https://www.indeed.com/jobs?q=data+engineer&jt=fulltime&explvl=entry_level",
        "https://www.indeed.com/jobs?q=software+engineer&jt=fulltime&explvl=entry_level",
    ]

# amount of jobs opening at a tmie
JOB_TAB_AMOUNT = 5

# resume you are sending out
RESUME = 'V1.00'

MENU_PROMPT = 'Type "scrape" to scrape new jobs, "apply" to apply to new jobs, or "quit" to quit: '
APPLY_PROMPT = 'Type "commit" to commit all, "commit # #" to commit only those #, "skip", or "quit": '


if __name__ == '__main__':
    menu()

Type "scrape" to scrape new jobs, "apply" to apply to new jobs, or "quit" to quit:  apply


there are currently 140 jobs
https://www.indeed.com/viewjob?jk=9b1526f9a1b09b9c&from=serp&vjs=3
https://www.indeed.com/viewjob?cmp=Pediatric-Associates-Florida&t=Data+Engineer&jk=1cf657303c920589&vjs=3
https://www.indeed.com/viewjob?jk=507637d46f933d47&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=e2284cae54cc43a6&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=78c9041ef6929817&from=serp&vjs=3
0. Data engineer: ['etl', 'pipeline']
1. Data Engineer (Business Analytics Department): ['etl', 'pipeline']
2. Data Engineer: ['pipeline']
3. Data Engineer: ['etl']
4. Junior DevOps Engineer: ['pipeline']


Type "commit" to commit all, "commit # #" to commit only those #, "skip", or "quit":  skip


https://www.indeed.com/viewjob?jk=9625cab3cbf005df&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=c42c47663d07e845&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=6853ce0179984c44&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=7179a7e983d02cf6&from=serp&vjs=3
https://www.indeed.com/viewjob?jk=695db9e2231f8c07&from=serp&vjs=3
0. Data Engineer II: ['etl']
1. PYTHON DATA ENGINEER: ['pipeline']
2. Data QA Engineer: ['etl']
3. Data Engineer: ['etl']
4. ML/AI Engineer: ['pandas']


Type "commit" to commit all, "commit # #" to commit only those #, "skip", or "quit":  quit
Type "scrape" to scrape new jobs, "apply" to apply to new jobs, or "quit" to quit:  quit
