### Task 2

Scrape job listings from the website https://realpython.github.io/fake-jobs and store the data into an SQLite database.

1. **Scraping Requirements**:
   - Extract the following details for each job listing:
     - **Job Title**
     - **Company Name**
     - **Location**
     - **Job Description**
     - **Application Link**

2. **Data Storage**:
   - Store the scraped data into an SQLite database in a table named `jobs`.

3. **Incremental Load**:
   - Ensure that your script performs **incremental loading**:
     - Scrape the webpage and add only **new job listings** to the database.
     - Avoid duplicating entries. Use `Job Title`, `Company Name`, and `Location` as unique identifiers for comparison.

4. **Update Tracking**:
   - Add functionality to detect if an existing job listing has been updated (e.g., description or application link changes) and update the database record accordingly.

5. **Filtering and Exporting**:
   - Allow filtering job listings by **location** or **company name**.
   - Write a function to export filtered results into a CSV file.


In [2]:
import sqlite3
import requests
from bs4 import BeautifulSoup as Bs

In [44]:
def extract_data():
    url = 'https://realpython.github.io/fake-jobs'
    response = requests.get(url)
    soup = Bs(response.content)

    job_titles = soup.find_all(class_="title is-5")
    job_titles_list = []
    for job_title in job_titles:
        job_titles_list.append(job_title.get_text())

    company_names = soup.find_all(class_ = "subtitle is-6 company")
    comany_names_list = []
    for company_name in company_names:
        comany_names_list.append(company_name.get_text())

    locations = soup.find_all(class_ = "location")
    locations_list = []
    for location in locations:
        locations_list.append(location.get_text().strip())

    links = soup.find_all("a")
    links_list = []
    for link in links:
        link_to_add = link.get('href')
        if link_to_add != 'https://www.realpython.com':
            links_list.append(link.get('href'))
    return {"jobs": job_titles_list, "companies": comany_names_list, "locations": locations_list, "links": links_list}


In [47]:
with sqlite3.connect('Vacancys.db') as vacancy_base:
    cursor = vacancy_base.cursor()
    query = 'CREATE TABLE IF NOT EXISTS Jobs ("Job Title" text, "Company Name" text, "Location" text, "Application Link" text)'
    cursor.execute(query)