Beautiful Soup is a Python library used for parsing HTML and XML documents. Here is the general syntax for using Beautiful Soup:

1. Importing the library:
```
from bs4 import BeautifulSoup
```
2. Creating a BeautifulSoup object:
```
soup = BeautifulSoup(html_string, 'html.parser')
```
* `html_string` is the string containing the HTML code.
* `'html.parser'` is the parser used to parse the HTML code. You can also use `'lxml'` or `'xml'` parsers.

3. Finding elements:
```
soup.find('tag_name')  # finds the first occurrence of the tag
soup.find_all('tag_name')  # finds all occurrences of the tag
soup.find('tag_name', {'attribute_name': 'attribute_value'})  # finds the first occurrence of the tag with the specified attribute
soup.find_all('tag_name', {'attribute_name': 'attribute_value'})  # finds all occurrences of the tag with the specified attribute
```
* `tag_name` is the name of the HTML tag you want to find.
* `attribute_name` and `attribute_value` are the name and value of the attribute you want to filter by.

4. Navigating the tree:
```
soup.parent  # returns the parent element
soup.children  # returns a list of child elements
soup.next_sibling  # returns the next sibling element
soup.previous_sibling  # returns the previous sibling element
```
5. Modifying the tree:
```
soup.tag_name.string  # returns the text content of the tag
soup.tag_name.text  # returns the text content of the tag, including child elements
soup.tag_name.append(new_tag)  # adds a new tag to the end of the tag
soup.tag_name.insert(0, new_tag)  # inserts a new tag at the beginning of the tag
soup.tag_name.replace_with(new_tag)  # replaces the tag with a new tag
```
* `new_tag` is the new tag you want to add or replace.

6. Extracting data:
```
soup.get_text()  # returns the text content of the entire document
soup.find('tag_name').get_text()  # returns the text content of the specified tag
soup.find('tag_name').attrs  # returns a dictionary of the tag's attributes
```
These are the basic syntax and methods for using Beautiful Soup. You can find more information and examples in the official Beautiful Soup documentation.

In [1]:
from bs4 import BeautifulSoup

with open('home.html', 'r') as html_file:
    content =html_file.read()
    
    soup = BeautifulSoup(content, 'lxml')
    course_cards =soup.find_all('div', class_ = 'card-body')
    for course in course_cards:
        course_price = course.a.text.split()[-1]
        print(f'The course {course.h5.text} cost ${course_price[:-1]}')

The course Python for beginners cost $20
The course Python Web Development cost $50
The course Python Machine Learning cost $100


For practicing, I will scrape data from a job listing website. 
The details of the extract will 

- Company name
- Job title
- Posting date 
- skills 

This is just a simple extraction process.

In [None]:
import requests
import re
import pandas as pd


response = requests.get('https://www.timesjobs.com/candidate/job-search.html?from=submit&luceneResultSize=25&txtKeywords=data&postWeek=60&searchType=personalizedSearch&actualTxtKeywords=data&searchBy=0&rdoOperator=OR&pDate=I&sequence=10&startPage=1')

if response.status_code == 200:
    html_text =response.text
    soup = BeautifulSoup(html_text, 'lxml')
    job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

    job_data = []

    for job_box in job_boxes:
        company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
        job_title = job_box.find('a').text.strip()
        posting_date = job_box.find('span', class_='sim-posted').text.strip()

        # Locate the skills container and pass it to the function
        skills_container = job_box.find_all('div', class_='more-skills-sections')
        

        # function to extract skills from with the span tags using regex
        def extract_skills(skills_container):
            if skills_container:
                skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
                skills_list = [skill.strip() for skill in skills if skill.strip()]
                return skills_list
            return []

        skills = extract_skills(skills_container)

        job_data.append(
            {
                'company_name' : {company_name},
                'job_title' : {job_title},
                'posting_duration': {posting_date},
                'skills' : {'|'.join(skills)}
            }
                     )       
        df = pd.DataFrame(job_data)
        df.to_csv('job_data.csv', index=False)
else:
    print("Failed to retrieve the webpage.")
    

In [None]:
import requests
import re
import pandas as pd


response = requests.get('https://www.timesjobs.com/candidate/job-search.html?from=submit&luceneResultSize=25&txtKeywords=data&postWeek=60&searchType=personalizedSearch&actualTxtKeywords=data&searchBy=0&rdoOperator=OR&pDate=I&sequence=1&startPage=1')

if response.status_code == 200:
    html_text =response.text
    soup = BeautifulSoup(html_text, 'lxml')
    job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

    for job_box in job_boxes:
        posting_date = job_box.find('span', class_='sim-posted').text.strip()
        company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
        job_title = job_box.find('a').text.strip()
        more_details = job_box.find()
        
            # Locate the skills container and pass it to the function
        skills_container = job_box.find_all('div', class_='more-skills-sections')
       
        

        # function to extract skills from with the span tags using regex
        def extract_skills(skills_container):
                if skills_container:
                    skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
                    skills_list = [skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp','') for skill in skills if skill.strip()]
                    return skills_list
                return []
            
        skills = extract_skills(skills_container)

        print(f"Company: {company_name}")
        print(f"Job Title: {job_title}")
        print(f"Posting Date: {posting_date}")            
        print(f"Skills: {'|'.join(skills)}\n")
else:
    print("Failed to retrieve the webpage.")
    

In [None]:
import requests
import re
import pandas as pd
import time


response = requests.get('https://www.timesjobs.com/candidate/job-search.html?from=submit&luceneResultSize=25&txtKeywords=data&postWeek=60&searchType=personalizedSearch&actualTxtKeywords=data&searchBy=0&rdoOperator=OR&pDate=I&sequence=1&startPage=1')
def find_jobs():
    if response.status_code == 200:
        html_text =response.text
        soup = BeautifulSoup(html_text, 'lxml')
        job_boxes = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')

        for job_box in job_boxes:
            posting_date = job_box.find('span', class_='sim-posted').text.strip()
            company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
            job_title = job_box.find('a').text.strip()
            more_details = job_box.a['href']
            
                # Locate the skills container and pass it to the function
            skills_container = job_box.find_all('div', class_='more-skills-sections')
        
            

            # function to extract skills from with the span tags using regex
            def extract_skills(skills_container):
                    if skills_container:
                        skills = re.findall(r'<span[^>]*>(.*?)</span>', str(skills_container), re.DOTALL)
                        skills_list = [skill.strip().replace(' / ', '/').replace('**   ', '').replace('  **', '').replace('amp','') for skill in skills if skill.strip()]
                        return skills_list
                    return []
                
            skills = extract_skills(skills_container)

            print(f"Company: {company_name}")
            print(f"Job Title: {job_title}")
            print(f"Posting Date: {posting_date}")            
            print(f"Skills: {'|'.join(skills)}")
            print(f"more_details: {more_details}\n")
    else:
        print("Failed to retrieve the webpage.")
        
if __name__ == '__main__':
     while True:
          find_jobs()
          time_wait = 10
          print(f'waiting {time_wait} seconds...')
          time.sleep(time_wait * 60)

Company: IBM India Pvt Ltd
Job Title: Data Engineer: Data Integration
Posting Date: Posted 2 days ago
Skills: data integration expertise|etl / elt tools|sql  &;  big data|unix shell scripting|python programming|data warehousing|storage|database|data structures|software engineering|information management|machine learning|splunk
more_details: https://www.timesjobs.com/job-detail/data-engineer-data-integration-ibm-india-pvt-ltd-pune-5-to-7-yrs-jobid-FL6Ob2Ed2i5zpSvf__PLUS__uAgZw==&source=srp

Company: IBM India Pvt Ltd
Job Title: Data Engineer: Data Modeling
Posting Date: few days ago
Skills: data modeling techniques|azure data factory|data migration solutions|database schema design|data pipeline management|sql|postgresql|software engineering
more_details: https://www.timesjobs.com/job-detail/data-engineer-data-modeling-ibm-india-pvt-ltd-kolkata-5-to-8-yrs-jobid-sQsHHL9ImehzpSvf__PLUS__uAgZw==&source=srp

Company: IBM India Pvt Ltd
Job Title: Data Engineer: Data Modeling
Posting Date: few

In [None]:
print()

In [33]:
import requests
from bs4 import BeautifulSoup

def extract_skills(skills_container):
    # Extract visible and hidden skill elements
    skill_elements = skills_container.find_all('span')
    # Clean and collect text of each skill
    skills = [skill.text.strip() for skill in skill_elements if skill.text.strip()]
    return skills

response = requests.get('https://www.timesjobs.com/candidate/job-search.html?from=submit&luceneResultSize=25&txtKeywords=data&postWeek=60&searchType=personalizedSearch&actualTxtKeywords=data&searchBy=0&rdoOperator=OR&pDate=I&sequence=10&startPage=1')

if response.status_code == 200:
    html_text = response.text
    soup = BeautifulSoup(html_text, 'lxml')
    job_boxes = soup.find_all('ul', class_='more-skills-sections')
    
    for job_box in job_boxes:
        company_name = job_box.find('h3', class_='joblist-comp-name').text.strip()
        job_title = job_box.find('a').text.strip()
        posting_date = job_box.find('span', class_='sim-posted').text.strip()

        # Locate the skills container and pass it to the function
        skills_container = job_box.find('div', class_='more-skills-sections')
        skills = extract_skills(skills_container)

        print(f"Company: {company_name}")
        print(f"Job Title: {job_title}")
        print(f"Posting Date: {posting_date}")
        print(f"Skills: {', '.join(skills)}\n")
else:
    print("Failed to retrieve the webpage.")
