# MonsterScrap doc
These functions allow you to perform web scrapping on the Monster platform, to collect job detail.
____
Requirements:

In [1]:
from urllib.request import urlopen, HTTPError
from bs4 import BeautifulSoup

The *scrapBody()* function allows to take the body part of an html document from the URL.

This is to avoid redundant code in the main function.

In [2]:
def scrapBody(url):
    with urlopen(url) as response:
        body = BeautifulSoup(response.read(), 'html.parser').body
    return body

The **scrapMonsterID()** function allows to collect the *Job ID* of each job that the platform makes available.

A search on Monster allows to have a complete column of jobs for the first 10 pages of results and then returns one column per page.

The configuration at Monster has two modes at the end of the results:
- Display the latest jobs posted, all categories combined, with a "Désolé..." message
- Return error 404 otherwise

The detection of one of these two situations marks the end of *ID* scrapping

In [3]:
def scrapMonsterID(searchList):
    for search in searchList:
        url = "https://www.monster.fr/emploi/recherche/?q={}&stpage=1&page=10".format(search)
        body = scrapBody(url)
        
        sections = body.find_all('section', class_="card-content")
        sections = [section.attrs['data-jobid'] for section in sections[1:] if 'data-jobid' in section.attrs]
        
        count = int(body.find_next("h2", class_="figure").text.strip().split()[0][1:]) # Results number
        
        if len(sections) < count:
            page = 11
            while True:
                url = "https://www.monster.fr/emploi/recherche/?q={}&stpage=1&page={}".format(search, page)
                try:
                    body = scrapBody(url)
                except HTTPError:
                    break
                else:
                    if "Désolé" in body.find_next("h1", class_="pivot").text.strip():
                        break
                    else:
                        sectionPlus = body.find_all('section', class_="card-content")
                        sectionPlus = [section.attrs['data-jobid'] for section in sectionPlus[1:] if 'data-jobid' in section.attrs]
                        sections.extend(sectionPlus)
                page += 1
            sections = list(set(sections))
        else:
            sections = list(set(sections))
    return sections