# Webscraping from *welcometothejungle.com*

### URLs to be scraped

In [33]:
urls = [
    "https://www.welcometothejungle.com/fr/companies/quantcube-technology/jobs/stagiaire-intern-advanced-macro-data-scientist_paris?q=f3cdd48ce0a998f317ecc1d16b8088d1&o=2266581" 
    "https://www.welcometothejungle.com/fr/companies/wivoo/jobs/consultant-senior-data-analyst_paris?q=f3cdd48ce0a998f317ecc1d16b8088d1&o=2069426",
    "https://www.welcometothejungle.com/fr/companies/groupe-micropole/jobs/stage-consultant-customer-data-management-f-h_levallois-perret_MICRO_RAz4krm?q=f3cdd48ce0a998f317ecc1d16b8088d1&o=1405897",
    "https://www.welcometothejungle.com/fr/companies/meilleurtaux-com/jobs/data-engineer-senior-h-f_paris?q=f3cdd48ce0a998f317ecc1d16b8088d1&o=2291701",
    "https://www.welcometothejungle.com/fr/companies/meilleurtaux-com/jobs/head-of-data-engineering-h-f_paris?q=f3cdd48ce0a998f317ecc1d16b8088d1&o=2291695",
    "https://www.welcometothejungle.com/fr/companies/meilleurtaux-com/jobs/product-manager-plateforme-data-h-f_paris_MEILL_DZqxyql?q=0852cdbe92c9248226e4a3fa7c95bc85&o=2289438",
    "https://www.welcometothejungle.com/fr/companies/quantcube-technology/jobs/intern-data-scientist-nlp_paris?q=0852cdbe92c9248226e4a3fa7c95bc85&o=1565637",
    "https://www.welcometothejungle.com/fr/companies/skiils/jobs/data-scientist-pricing-senior-h-f?q=bf40d746ee3afd6b88506af9e2585a6c&o=2283993",
    "https://www.welcometothejungle.com/fr/companies/carbo/jobs/carbon-data-scientist-stage_paris?q=bf40d746ee3afd6b88506af9e2585a6c&o=1936014",
    "https://www.welcometothejungle.com/fr/companies/datascientest/jobs/data-scientist_levallois-perret?q=bf40d746ee3afd6b88506af9e2585a6c&o=367193"
]

## Scraping using BeautifulSoup

In [34]:
import requests
from bs4 import BeautifulSoup

for url in urls:
    response = requests.get(url)
    data = response.text
    soup = BeautifulSoup(data, 'html.parser')
    title = soup.find('h2').get_text()
    company = soup.find('span', class_="wui-text").get_text()
    location = soup.find('span', class_='sc-1eoldvz-0 bZJPQK').get_text()
    profil_recherche_section = soup.findAll('div', class_='sc-18ygef-1 ezamTS')[1] 
    lists = profil_recherche_section.find_all('ul')
    competences = []
    for ul in lists:
        items = ul.find_all('li')
        for item in items:
            competences.append(item.get_text(strip=True))
    print(f"Job offer: {title}")
    print(f"Company: {company}")
    print(f"Location: {location}")
    print(f"Demanded skills:")
    for c in competences:
        print(f" - {c}")
    print("----------------------------------------------------------")

Job offer: Stagiaire / Intern Advanced Macro Data Scientist
Company: QuantCube Technology
Location: Paris
Demanded skills:
 - Advanced Programming skills in Python, including OOP
 - Mastery in Machine Learning models for classification and regression (Robust Linear Models, Clustering, Random Forest and Boosting Models, SVM …)
 - Mastery in Time Series models (like ARIMA, VAR, ARCH,…), knowledge on Bayesian Learning and State Space Modeling (DFM) appreciated
 - Strong interest for Economics and Finance
 - Fluent in English, Fluency in any other language would be strongly appreciated
 - Advanced level in Mathematics, Probability and Statistics and strong ability to learn new fields such as  Signal Extraction and Spectral Theory or Graph and Causality Theory
 - Compétences avancées en programmation en Python, y compris en OOP
 - Maîtrise des modèles d’apprentissage automatique pour la classification et la régression (modèles linéaires robustes, clustering, modèles Random Forest et Boostin

## Scraping using Selenium

In [35]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

driver_path = '/path/to/chromedriver'

for url in urls:
    driver = webdriver.Chrome()
    driver.get(url)

    title = driver.find_element(By.TAG_NAME, 'h2').text
    company = driver.find_element(By.CLASS_NAME, 'wui-text').text
    location = driver.find_element(By.CLASS_NAME, 'sc-1eoldvz-0.bZJPQK').text

    profil_recherche_section = driver.find_elements(By.CLASS_NAME, 'sc-18ygef-1.ezamTS')[1]
    lists = profil_recherche_section.find_elements(By.TAG_NAME, 'ul')
    competences = []
    for ul in lists:
        items = ul.find_elements(By.TAG_NAME, 'li')
        for item in items:
            if item.text:
                competences.append(item.text)

    print(f"Job offer: {title}")
    print(f"Company: {company}")
    print(f"Location: {location}")
    print("Demanded skills:")
    for c in competences:
        print(f" - {c}")
    print("----------------------------------------------------------")
    driver.quit()

Job offer: Stagiaire / Intern Advanced Macro Data Scientist
Company: QUANTCUBE TECHNOLOGY
Location: Paris
Demanded skills:
 - Advanced Programming skills in Python, including OOP 
 - Mastery in Machine Learning models for classification and regression (Robust Linear Models, Clustering, Random Forest and Boosting Models, SVM …) 
 - Mastery in Time Series models (like ARIMA, VAR, ARCH,…), knowledge on Bayesian Learning and State Space Modeling (DFM) appreciated 
 - Strong interest for Economics and Finance 
 - Fluent in English, Fluency in any other language would be strongly appreciated 
 - Advanced level in Mathematics, Probability and Statistics and strong ability to learn new fields such as  Signal Extraction and Spectral Theory or Graph and Causality Theory  
----------------------------------------------------------
Job offer: Stage Consultant Data Management (F/H)
Company: MICROPOLE
Location: Levallois-Perret
Demanded skills:
 - Bonne expression orale et qualité rédactionnelle,
 -