# 1. Import Libraries

Importing `BeautifulSoup` to parse HTML, `requests` to download the webpage, and `pandas` to save the final data.

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests
import html5lib

# 2. Download the Webpage

I'll specify the URL of the webpage to be scraped and use `requests.get()` to download its HTML content.

In [2]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"
data = requests.get(url).text

# 3. Parse the HTML and Scrape Data

I'll create a `BeautifulSoup` object to parse the HTML. Then, I'll find the `<table>` element and loop through each row (`<tr>`).

For each row, I'll extract the text from the 2nd column (Language Name) and 4th column (Average Salary), clean the salary string, and append the result to a list.

In [3]:
soup = BeautifulSoup(data, 'html5lib')
table = soup.find('table')
popular_languages = []

# Loop through each row in the table (skipping the header row)
for row in table.find_all('tr')[1:]:
    cols = row.find_all('td')

    language = cols[1].getText()
    avg_salary = cols[3].getText()

    # Clean the salary data (remove $ and ,)
    cleaned_salary = avg_salary.replace('$', '').replace(',', '')

    popular_languages.append([language, cleaned_salary])

# 4. Inspect the Scraped Data

Displaying the first 5 rows of the scraped data to verify it was collected correctly.

In [4]:
print(popular_languages[:5])

[['Python', '114383'], ['Java', '101013'], ['R', '92037'], ['Javascript', '110981'], ['Swift', '130801']]


# 5. Save Data to CSV

Finally, I'll convert the list into a pandas DataFrame and save it to a new CSV file.

In [5]:
# Define column names for the DataFrame
columns = ['Language', 'Average Annual Salary']

df = pd.DataFrame(popular_languages, columns=columns)

# Save to CSV
df.to_csv('2.b-popular-languages(Collected from WebScraping).csv', index=False)

print("File saved successfully.")
df.head()

File saved successfully.


Unnamed: 0,Language,Average Annual Salary
0,Python,114383
1,Java,101013
2,R,92037
3,Javascript,110981
4,Swift,130801
