Beautiful Soup is a Python library used for parsing HTML and XML documents. Here is the general syntax for using Beautiful Soup:

1. Importing the library:
```
from bs4 import BeautifulSoup
```
2. Creating a BeautifulSoup object:
```
soup = BeautifulSoup(html_string, 'html.parser')
```
* `html_string` is the string containing the HTML code.
* `'html.parser'` is the parser used to parse the HTML code. You can also use `'lxml'` or `'xml'` parsers.

3. Finding elements:
```
soup.find('tag_name')  # finds the first occurrence of the tag
soup.find_all('tag_name')  # finds all occurrences of the tag
soup.find('tag_name', {'attribute_name': 'attribute_value'})  # finds the first occurrence of the tag with the specified attribute
soup.find_all('tag_name', {'attribute_name': 'attribute_value'})  # finds all occurrences of the tag with the specified attribute
```
* `tag_name` is the name of the HTML tag you want to find.
* `attribute_name` and `attribute_value` are the name and value of the attribute you want to filter by.

4. Navigating the tree:
```
soup.parent  # returns the parent element
soup.children  # returns a list of child elements
soup.next_sibling  # returns the next sibling element
soup.previous_sibling  # returns the previous sibling element
```
5. Modifying the tree:
```
soup.tag_name.string  # returns the text content of the tag
soup.tag_name.text  # returns the text content of the tag, including child elements
soup.tag_name.append(new_tag)  # adds a new tag to the end of the tag
soup.tag_name.insert(0, new_tag)  # inserts a new tag at the beginning of the tag
soup.tag_name.replace_with(new_tag)  # replaces the tag with a new tag
```
* `new_tag` is the new tag you want to add or replace.

6. Extracting data:
```
soup.get_text()  # returns the text content of the entire document
soup.find('tag_name').get_text()  # returns the text content of the specified tag
soup.find('tag_name').attrs  # returns a dictionary of the tag's attributes
```
These are the basic syntax and methods for using Beautiful Soup. You can find more information and examples in the official Beautiful Soup documentation.

In [1]:
from bs4 import BeautifulSoup

with open('home.html', 'r') as html_file:
    content =html_file.read()
    
    soup = BeautifulSoup(content, 'lxml')
    course_cards =soup.find_all('div', class_ = 'card-body')
    for course in course_cards:
        course_price = course.a.text.split()[-1]
        print(f'The course {course.h5.text} cost ${course_price[:-1]}')

The course Python for beginners cost $20
The course Python Web Development cost $50
The course Python Machine Learning cost $100


For practicing, I will scrape data from a job listing website. 
The details of the extract will 

- Company name
- Job title
- Posting date 
- skills 

This is just a simple extraction process.

In [None]:
# print output

from bs4 import BeautifulSoup
import requests
import re
import time
import pandas as pd

response = requests.get(
    'https://www.timesjobs.com/candidate/job-search.html?searchType=Home_Search&from=submit&asKey=OFF&txtKeywords=&cboPresFuncArea=&cboWorkExp1=0&clusterName=CLUSTER_EXP'
    )

print('Input Job keyword')
job_keyword = input('>> ')
print(f'Searching for keyword: {job_keyword}')

# empty list to append output data
job_data = []

# a function to extract and remove wildspaces for skills embedded within the html span tags
def extract_skills(skills_container):
    if skills_container:
        skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
        skills_list = [
            skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp', '')
            for skill in skills if skill.strip()
        ]
        return skills_list
    return []


def find_jobs():
    if response.status_code == 200:
        html_text = response.text
        soup = BeautifulSoup(html_text, 'lxml')
        job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

        for job_box in job_boxes:
            job_title = job_box.find('a').text.strip()
            if job_keyword.casefold() in job_title.casefold():
                posting_date = job_box.find('span', class_='sim-posted').text.strip()
                company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
                more_details = job_box.a['href']
                
                # Locate the skills container and extract skills
                skills_container = job_box.find('div', class_='more-skills-sections')
                skills = extract_skills(skills_container)
                
                   
                print(f"Company: {company_name}")
                print(f"Job Title: {job_title}")
                print(f"Posting Date: {posting_date}")            
                print(f"Skills: {'|'.join(skills)}")
                print(f"More Details: {more_details}\n")
    else:
        print("Failed to retrieve the webpage.")
find_jobs()

# a condition for this code to repeat at a given time interval
if __name__ == '__main__':
    while True:
        find_jobs()
        time_wait = 10
        print(f'Waiting {time_wait} seconds...')
        time.sleep(time_wait)

    

In [None]:
# export output

from bs4 import BeautifulSoup
import requests
import re
import time
import pandas as pd

response = requests.get(
    'https://www.timesjobs.com/candidate/job-search.html?searchType=Home_Search&from=submit&asKey=OFF&txtKeywords=&cboPresFuncArea=&cboWorkExp1=0&clusterName=CLUSTER_EXP'
    )

print('Input Job keyword')
job_keyword = input('>> ')
print(f'Searching for keyword: {job_keyword}')

# empty list to append output data
job_data = []

# a function to extract and remove wildspaces for skills embedded within the html span tags
def extract_skills(skills_container):
    if skills_container:
        skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
        skills_list = [
            skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp', '')
            for skill in skills if skill.strip()
        ]
        return skills_list
    return []


def find_jobs():
    if response.status_code == 200:
        html_text = response.text
        soup = BeautifulSoup(html_text, 'lxml')
        job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

        for job_box in job_boxes:
            job_title = job_box.find('a').text.strip()
            if job_keyword.casefold() in job_title.casefold():
                posting_date = job_box.find('span', class_='sim-posted').text.strip()
                company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
                more_details = job_box.a['href']
                
                # Locate the skills container and extract skills
                skills_container = job_box.find('div', class_='more-skills-sections')
                skills = extract_skills(skills_container)
                
                # append to empty list job_data
                job_data.append(
                        {
                            'company_name' : {company_name},
                            'job_title' : {job_title},
                            'posting_duration': {posting_date},
                            'skills' : {'|'.join(skills)}
                        }
                                )   
        # export as csv            
        df = pd.DataFrame(job_data)
        df.to_csv('job_data.csv', index=False)
    else:
        print("Failed to retrieve the webpage.")
find_jobs()

# a condition for this code to repeat at a given time interval
if __name__ == '__main__':
    while True:
        find_jobs()
        time_wait = 10
        print(f'Waiting {time_wait} seconds...')
        time.sleep(time_wait)

    

In [15]:
# exception handling

from bs4 import BeautifulSoup
import requests
import re
import time
import pandas as pd

while True:
    try:
        response = requests.get(
            'https://www.timesjobs.com/candidate/job-search.html?searchType=Home_Search&from=submit&asKey=OFF&txtKeywords=&cboPresFuncArea=&cboWorkExp1=0&clusterName=CLUSTER_EXP'
            )
        # raise HTTPError for bad response
        response.raise_for_status()

    # keep retrying until a connection is secured    
    except requests.exceptions.ConnectionError:
            print("Connection error. Retrying in 10 seconds...")
            time.sleep(10)  # Wait before retrying
    except requests.exceptions.HTTPError as e:
            print(f"HTTP error: {e}. Retrying in 10 seconds...")
            time.sleep(10)
    except Exception as e:
            print(f"An error occurred: {e}")
    break  # Optionally exit on unexpected errors

# initialize key word input
print('Input Job keyword')
job_keyword = input('>> ')
print(f'Searching for keyword: {job_keyword}')

# empty list to append output data
job_data = []

# a function to extract and remove wildspaces for skills embedded within the html span tags
def extract_skills(skills_container):
    if skills_container:
        skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
        skills_list = [
            skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp', '')
            for skill in skills if skill.strip()
        ]
        return skills_list
    return []


def find_jobs():
    if response.status_code == 200:
        html_text = response.text
        soup = BeautifulSoup(html_text, 'lxml')
        job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

        for job_box in job_boxes:
            job_title = job_box.find('a').text.strip()
            if job_keyword.casefold() in job_title.casefold():
                posting_date = job_box.find('span', class_='sim-posted').text.strip()
                company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
                more_details = job_box.a['href']
                
                # Locate the skills container and extract skills
                skills_container = job_box.find('div', class_='more-skills-sections')
                skills = extract_skills(skills_container)
                
                   
                print(f"Company: {company_name}")
                print(f"Job Title: {job_title}")
                print(f"Posting Date: {posting_date}")            
                print(f"Skills: {'|'.join(skills)}")
                print(f"More Details: {more_details}\n")
    else:
        print("Failed to retrieve the webpage.")
find_jobs()

# a condition for this code to repeat at a given time interval
if __name__ == '__main__':
    while True:
        find_jobs()
        time_wait = 10
        print(f'Waiting {time_wait} seconds...')
        time.sleep(time_wait)

    

Input Job keyword
Searching for keyword: intern
Company: Sparks To Ideas
Job Title: B.tech Project Internship
Posting Date: Posted today
Skills: php|python|react js|Javascript|Node js|Mern Stack|Cyber Security|Machine Learning|Artificial Intelligence|c ++
More Details: https://www.timesjobs.com/job-detail/b-tech-project-internship-sparks-to-ideas-ahmedabad-anand-bhavnagar-gandhinagar-junagarh-0-to-1-yrs-jobid-1bTPV5HLcglzpSvf__PLUS__uAgZw==&source=srp

Company: Hucon Solutions India Pvt ltd
Job Title: job openings for international voice
Posting Date: Posted today
Skills: good communication skills
More Details: https://www.timesjobs.com/job-detail/job-openings-for-international-voice-top-mnc-s-hyderabad-secunderabad-0-to-3-yrs-jobid-KUdXwhtUerNzpSvf__PLUS__uAgZw==&source=srp

Company: Sparks To Ideas
Job Title: App Development Live Project  Internship
Posting Date: Posted today
Skills: Dart|Android APK|Fire base|APK entygration|App design|IOS APK|Live application
More Details: https://

KeyboardInterrupt: 

In [None]:
from bs4 import BeautifulSoup
import requests
import re
import time

# Prompt for job keyword input
print('Input Job keyword')
job_keyword = input('>> ')
print(f'Searching for keyword: {job_keyword}')

# Function to extract skills from HTML span tags
def extract_skills(skills_container):
    if skills_container:
        skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
        skills_list = [
            skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp', '')
            for skill in skills if skill.strip()
        ]
        return skills_list
    return []

# Main function to fetch and display job data
def find_jobs():
    url = 'https://www.timesjobs.com/candidate/job-search.html?searchType=Home_Search&from=submit&asKey=OFF&txtKeywords=&cboPresFuncArea=&cboWorkExp1=0&clusterName=CLUSTER_EXP'
    
    while True:
        try:
            response = requests.get(url)
            response.raise_for_status()  # Raise HTTPError for bad responses

            # Process HTML if connection is successful
            html_text = response.text
            soup = BeautifulSoup(html_text, 'lxml')
            job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

            for job_box in job_boxes:
                job_title = job_box.find('a').text.strip()
                if job_keyword.casefold() in job_title.casefold():
                    posting_date = job_box.find('span', class_='sim-posted').text.strip()
                    company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
                    more_details = job_box.a['href']
                    
                    # Locate the skills container and extract skills
                    skills_container = job_box.find('div', class_='more-skills-sections')
                    skills = extract_skills(skills_container)

                    print(f"Company: {company_name}")
                    print(f"Job Title: {job_title}")
                    print(f"Posting Date: {posting_date}")            
                    print(f"Skills: {'|'.join(skills)}")
                    print(f"More Details: {more_details}\n")
            
            break  # Exit the loop if no exceptions were raised

        except requests.exceptions.ConnectionError:
            print("Connection error. Retrying in 10 seconds...")
            time.sleep(10)  # Wait before retrying
        except requests.exceptions.HTTPError as e:
            print(f"HTTP error: {e}. Retrying in 10 seconds...")
            time.sleep(10)
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
        break  # Exit on other unexpected errors

# Condition to repeat the job search at a time interval
if __name__ == '__main__':
    while True:
        find_jobs()
        time_wait = 10  # Wait interval in minutes
        print(f'Waiting {time_wait} minutes before next search...')
        time.sleep(time_wait * 60)


Input Job keyword
Searching for keyword: intern
Company: Sparks To Ideas
Job Title: B.tech Project Internship
Posting Date: Posted today
Skills: php|python|react js|Javascript|Node js|Mern Stack|Cyber Security|Machine Learning|Artificial Intelligence|c ++
More Details: https://www.timesjobs.com/job-detail/b-tech-project-internship-sparks-to-ideas-ahmedabad-anand-bhavnagar-gandhinagar-junagarh-0-to-1-yrs-jobid-1bTPV5HLcglzpSvf__PLUS__uAgZw==&source=srp

Company: Hucon Solutions India Pvt ltd
Job Title: job openings for international voice
Posting Date: Posted today
Skills: good communication skills
More Details: https://www.timesjobs.com/job-detail/job-openings-for-international-voice-top-mnc-s-hyderabad-secunderabad-0-to-3-yrs-jobid-KUdXwhtUerNzpSvf__PLUS__uAgZw==&source=srp

Company: Sparks To Ideas
Job Title: App Development Live Project  Internship
Posting Date: Posted today
Skills: Dart|Android APK|Fire base|APK entygration|App design|IOS APK|Live application
More Details: https://