# Task 1 – Import required libraries
For this task you will gather the required libraries to include in your application to write the code.
Here, we are importing the following required libraries: csv for writing data to a CSV file, datetime for
getting the current date, requests for sending HTTP requests to the website, BeautifulSoup for parsing
the HTML source code of the webpage, and time for introducing a delay in our program.
This is usually done at the top of the file, for example:
from datetime import datetime

In [31]:
import csv
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import time

# Task 2 – Generating a URL with a function.
Now that you have the required imports, you need to define a function that takes in two parameters:
position and location. It is useful to define a function to allow for the generation of required data, in this
case to form a URL for the web scrape request. If the parameters change, you can change the one
function rather than changing the code in multiple places.
The parameters in this function are needed to generate the URL of the webpage we want to scrape.
Using a function allows you to use a template URL and replace the placeholders with the actual
parameter values of position and location. For example, the URL may also include some additional
parameters such as locT=C and locId=1139970 that specify the location of the job posting. You can
customize these parameters based on your needs.

In [32]:
def generate_url(position, location):

    # Base URL template
    base_url = "https://realpython.github.io/fake-jobs/"

    # Replace placeholders in the base URL with actual parameter values
    formatted_url = base_url.format(position, location)

    # Add additional parameters if needed
    #formatted_url += "&locT=C&locId=1139970"

    return formatted_url


# Task 3 – Extract the Job Data from a single job posting card.
The next step is to define a function that will take a single job posting record as input and extract the
relevant data from it. The job posting is a Beautiful soup object. This function will be called from within
the main() function, which you will define in the next step.
To do this, we'll use the BeautifulSoup object to parse the HTML of the individual job posting and extract
the desired data using a series of try/except blocks to protect the program and the provide the data with
known values in case some data is missing from the posting. 
For example:
 try:
 job_title = atags[0].text.strip()
 except IndexError:
 job_title = ""
It is important to show that you use functions to break problems down into constituent parts to make
your program more efficient and maintainable.

In [39]:
def extract_job_data(job_posting):
    try:
        job_title = job_posting.find("h2", class_="title is-5").text.strip()
    except AttributeError:
        job_title = ""

    try:
        company = job_posting.find("h3", class_="subtitle is-6 company").text.strip()
    except AttributeError:
        company = ""

    try:
        location = job_posting.find("p", class_="location").text.strip()
    except AttributeError:
        location = ""

    job_data = {
        "Job Title": job_title,
        "Company": company,
        "Location": location,
    }

    return job_data


# Task 4 – Define the main function.
Every program needs a starting point, and a Python application is no different. Define the main function
that takes two parameters: job position and location. This function performs the following steps:
1. Set the headers for the HTTP request. A website may block requests from bots, so it's a good
idea to set a user agent string.
2. Construct the URL for the job search based on the job position and location using the function
you created earlier.
3. Send an HTTP request to the URL and retrieve the HTML code of the search results page.
4. Parse the HTML code using BeautifulSoup and select the HTML elements that contain the job
postings (hint: use the Beautiful Soup’s findall method).
5. For each posting, extract the job posting information using the helper function from task 3 and
store it in a list.
6. Write the job posting information to a CSV file.
7. Print a success message.
Run the program by calling the main function with the required parameters. For example in Jupyter
Notebook: main('developer', 'texas').

In [45]:
def main(position, location):
    # Step 1: Set headers for HTTP request
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
    }
    
    # Step 2: Construct URL
    url = generate_url(position, location)
    
    # Step 3: Send HTTP request and retrieve HTML
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        html_content = response.text
    else:
        print("Failed to retrieve HTML content. Status code:", response.status_code)
        return
    
    # Step 4: Parse HTML with BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')
    
    # Step 5: Extract job posting information
    job_postings = soup.find_all("div", class_="card-content")
    #print(job_postings)
    job_data_list = []
    for job_posting in job_postings:
        job_data = extract_job_data(job_posting)
        job_data_list.append(job_data)
    #print(job_data_list)
    
    
    # Step 6: Write job posting information to CSV file
    filename = f"jobs.csv"
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        #fieldnames = ["job_title", "company", "location"]  # Add more fields as needed
        writer = csv.DictWriter(csvfile, fieldnames=["Job Title", "Company", "Location"])
        
        writer.writeheader()
        for job_data in job_data_list:
            writer.writerow(job_data)
    
    
    # Step 7: Print success message
    print(f"Job postings have been successfully scraped and saved to '{filename}'.")


# Example usage:
main('developer', 'AA')


Job postings have been successfully scraped and saved to 'jobs.csv'.
