#  Collecting Data via Web Scraping using BeautifulSoup and Requests


In this notebook, we will:

- Extract job-related information from a website using web scraping  
- Save the collected data into a CSV file for future analysis


## Extract information from a web site

We will extract job data from the following website:


In [1]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

The data we’ll scrape includes the programming language names and their average annual salaries.  
Before writing the scraping logic, it’s a good practice to open the URL in a browser and inspect the page structure to understand where the required data is located.


We’ll start by importing the necessary Python libraries for web scraping and data handling.


In [5]:
# Import Requried Libraries
from bs4 import BeautifulSoup
import pandas as pd
import requests

We’ll download the HTML content of the target webpage using the `requests` library to begin extracting the required data.


In [4]:
# Get the HTML content from the webpage
data = requests.get(url).text

We’ll create a BeautifulSoup object to parse the HTML content and make it easier to navigate and extract the required data elements.


In [7]:
soup = BeautifulSoup(data, "html.parser")   #Parse the HTML content

We’ll scrape the `Language name` and `Annual Average Salary` from the HTML content.


In [10]:
# Extract language name and average salary from the table
table = soup.find('table')
popular_languages = []

for row in table.find_all('tr'):

    cols = row.find_all('td')
    language = cols[1].getText()
    avg_salary = cols[3].getText()
    popular_languages.append(str(language) + ' , ' + str(avg_salary).replace('$','').replace(',',''))

popular_languages

['Language , Average Annual Salary',
 'Python , 114383',
 'Java , 101013',
 'R , 92037',
 'Javascript , 110981',
 'Swift , 130801',
 'C++ , 113865',
 'C# , 88726',
 'PHP , 84727',
 'SQL , 84793',
 'Go , 94082']

We’ll save the scraped data into a CSV file named `2.b-popular-languages(Collected from WebScraping).csv` for future use and analysis.


In [11]:
df = pd.DataFrame(popular_languages)
df.to_csv("2.b-popular-languages(Collected from WebScraping", header=False, index=False)

---