# **Hands-on Lab : Web Scraping**


Estimated time needed: **30 to 45** minutes


## Objectives


In this lab you will perform the following:


* Extract information from a given web site 
* Write the scraped data into a csv file.


## Extract information from the given web site
You will extract the data from the below web site: <br> 


In [1]:
#this url contains the data you need to scrape
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

The data you need to scrape is the **name of the programming language** and **average annual salary**.<br> It is a good idea to open the url in your web broswer and study the contents of the web page before you start to scrape.


Import the required libraries


In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a webpage

Download the webpage at the url


In [3]:
#this url contains the data you need to scrape
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

# get the contents of the webpage in text format and store in a variable called data
data  = requests.get(url).text 

Create a soup object


In [4]:
soup = BeautifulSoup(data,"html.parser")  # create a soup object using the variable 'data'

In [5]:
#find a html table in the web page
table = soup.find('table') # in html table is represented by the tag <table>

Scrape the `Language name` and `annual average salary`.


In [6]:
#your code goes here
for row in table.find_all('tr'): # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    language = cols[1].getText() # store the value in column 2 as color_name
    avg_salary = cols[3].getText() # store the value in column 4 as color_code
    print("{}--->{}".format(language, avg_salary))

Language--->Average Annual Salary
Python--->$114,383
Java--->$101,013
R--->$92,037
Javascript--->$110,981
Swift--->$130,801
C++--->$113,865
C#--->$88,726
PHP--->$84,727
SQL--->$84,793
Go--->$94,082


Save the scrapped data into a file named *popular-languages.csv*


In [None]:
import csv
from bs4 import BeautifulSoup  # Assuming you're using BeautifulSoup

# Create empty lists to store the results
languages_list = []
salaries_list = []

for row in table.find_all('tr'): 
    cols = row.find_all('td')  
    
    # Ensure there are enough columns before accessing
    if len(cols) > 2:  # Assuming salary is in a specific column index
        language = cols[1].getText().strip()  # Extract language name
        avg_salary = cols[3].getText().strip()  # Extract salary value
        
        languages_list.append(language)
        salaries_list.append(avg_salary)

# Save the data to CSV
with open('popular-languages.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    
    # Write header row
    writer.writerow(['Language', 'Average Salary'])
    
    # Write each language and corresponding average salary
    for language, avg_salary in zip(languages_list, salaries_list):
        writer.writerow([language, avg_salary])

print("Data has been saved to 'popular-languages.csv'.")


Data has been saved to 'popular-languages.csv'.
