# **Hands-on Lab : Web Scraping**


Estimated time needed: **30 to 45** minutes


## Objectives


In this lab you will perform the following:


* Extract information from a given web site
* Write the scraped data into a csv file.


## Extract information from the given web site
You will extract the data from the below web site: <br>


In [1]:
#this url contains the data you need to scrape
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

The data you need to scrape is the **name of the programming language** and **average annual salary**.<br> It is a good idea to open the url in your web broswer and study the contents of the web page before you start to scrape.


Import the required libraries


In [10]:
# Your code here
import requests
from bs4 import BeautifulSoup

Download the webpage at the url


In [11]:
# Download the webpage at the url
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the webpage content
    soup = BeautifulSoup(response.content, 'html.parser')

    # Find all table rows
    rows = soup.find_all('tr')

    # Initialize lists to store programming languages and average annual salaries
    languages = []
    salaries = []

    # Loop through each row and extract data
    for row in rows[1:]:  # Skipping the header row
        cols = row.find_all('td')
        language = cols[1].text.strip()
        salary = cols[3].text.strip()

        languages.append(language)
        salaries.append(salary)

    # Print the extracted data
    for language, salary in zip(languages, salaries):
        print(f"Programming Language: {language}, Average Annual Salary: {salary}")
else:
    print("Failed to retrieve the webpage")


Programming Language: Python, Average Annual Salary: $114,383
Programming Language: Java, Average Annual Salary: $101,013
Programming Language: R, Average Annual Salary: $92,037
Programming Language: Javascript, Average Annual Salary: $110,981
Programming Language: Swift, Average Annual Salary: $130,801
Programming Language: C++, Average Annual Salary: $113,865
Programming Language: C#, Average Annual Salary: $88,726
Programming Language: PHP, Average Annual Salary: $84,727
Programming Language: SQL, Average Annual Salary: $84,793
Programming Language: Go, Average Annual Salary: $94,082


Create a soup object


In [12]:
from bs4 import BeautifulSoup

# Assuming 'response' contains the downloaded webpage content
soup = BeautifulSoup(response.content, 'html.parser')

# Now 'soup' is the BeautifulSoup object representing the webpage content

Scrape the `Language name` and `annual average salary`.


In [13]:
# Find all table rows
rows = soup.find_all('tr')

# Initialize lists to store programming languages and average annual salaries
languages = []
salaries = []

# Loop through each row and extract data
for row in rows[1:]:  # Skipping the header row
    cols = row.find_all('td')
    language = cols[1].text.strip()
    salary = cols[3].text.strip()

    languages.append(language)
    salaries.append(salary)

# Print the extracted data
for language, salary in zip(languages, salaries):
    print(f"Language: {language}, Average Annual Salary: {salary}")


Language: Python, Average Annual Salary: $114,383
Language: Java, Average Annual Salary: $101,013
Language: R, Average Annual Salary: $92,037
Language: Javascript, Average Annual Salary: $110,981
Language: Swift, Average Annual Salary: $130,801
Language: C++, Average Annual Salary: $113,865
Language: C#, Average Annual Salary: $88,726
Language: PHP, Average Annual Salary: $84,727
Language: SQL, Average Annual Salary: $84,793
Language: Go, Average Annual Salary: $94,082


Save the scrapped data into a file named *popular-languages.csv*


In [15]:
import csv

# Open the CSV file in write mode
with open('popular-languages.csv', 'w', newline='') as csvfile:
    # Create a CSV writer object
    writer = csv.writer(csvfile)

    # Write the header row
    writer.writerow(['Language', 'Average Annual Salary'])

    # Write the data rows
    for language, salary in zip(languages, salaries):
        writer.writerow([language, salary])

print("Data saved successfully to popular-languages.csv")

Data saved successfully to popular-languages.csv


In [17]:
# List the files in the current directory
import os
os.listdir('.')

['.config', 'popular-languages.csv', 'sample_data']