# **Web Scraping**


## Objectives


In this project you will perform the following:


* Extract information from a given web site
* Write the scraped data into a csv file.


## Extract information from the given web site
You will extract the data from the below web site: <br>


In [1]:
# This url contains the data you need to scrape
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"

The data you need to scrape is the **name of the programming language** and **average annual salary**.<br> It is a good idea to open the url in your web broswer and study the contents of the web page before you start to scrape.


In [2]:
# Import the required libraries

from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a webpage
import pandas as pd

In [3]:
# Download the webpage
data  = requests.get(url).text

In [4]:
# Create object - soup
soup = BeautifulSoup(data,"html.parser")

In [5]:
# Find a html table in the web page
table = soup.find('table')

# Verify the data
print(table)

<table>
<tbody>
<tr>
<td>No.</td>
<td>Language</td>
<td>Created By</td>
<td>Average Annual Salary</td>
<td>Learning Difficulty</td>
</tr>
<tr>
<td>1</td>
<td>Python</td>
<td>Guido van Rossum</td>
<td>$114,383</td>
<td>Easy</td>
</tr>
<tr>
<td>2</td>
<td>Java</td>
<td>James Gosling</td>
<td>$101,013</td>
<td>Easy</td>
</tr>
<tr>
<td>3</td>
<td>R</td>
<td>Robert Gentleman, Ross Ihaka</td>
<td>$92,037</td>
<td>Hard</td>
</tr>
<tr>
<td>4</td>
<td>Javascript</td>
<td>Netscape</td>
<td>$110,981</td>
<td>Easy</td>
</tr>
<tr>
<td>5</td>
<td>Swift</td>
<td>Apple</td>
<td>$130,801</td>
<td>Easy</td>
</tr>
<tr>
<td>6</td>
<td>C++</td>
<td>Bjarne Stroustrup</td>
<td>$113,865</td>
<td>Hard</td>
</tr>
<tr>
<td>7</td>
<td>C#</td>
<td>Microsoft</td>
<td>$88,726</td>
<td>Hard</td>
</tr>
<tr>
<td>8</td>
<td>PHP</td>
<td>Rasmus Lerdorf</td>
<td>$84,727</td>
<td>Easy</td>
</tr>
<tr>
<td>9</td>
<td>SQL</td>
<td>Donald D. Chamberlin, Raymond F. Boyce.</td>
<td>$84,793</td>
<td>Easy</td>
</tr>
<tr>
<td>10</t

In [6]:
# Scrape the `Language name` and `annual average salary`.

for row in table.find_all('tr'): # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    Language = cols[1].getText() # store the value in column 3 as color_name
    AvgAS = cols[3].getText().replace(",","").replace("$","") # store the value in column 4 as color_code
    print("{},{}".format(Language,AvgAS))

Language,Average Annual Salary
Python,114383
Java,101013
R,92037
Javascript,110981
Swift,130801
C++,113865
C#,88726
PHP,84727
SQL,84793
Go,94082


In [7]:
# Save the scrapped data (Language, Average Annual Salary) into a file named popular-languages.csv
file = open("popular-languages.csv", "w")
for row in table.find_all('tr'): # in html table row is represented by the tag <tr>
    # Get all columns in each row.
    cols = row.find_all('td') # in html a column is represented by the tag <td>
    Language = cols[1].getText() # store the value in column 3 as color_name
    AvgAS = cols[3].getText().replace(",","").replace("$","") # store the value in column 4 as color_code
    file.write("{}, {}\n".format(Language,AvgAS))
file.close()

print(f"CSV file saved successfully:", file.name)

CSV file saved successfully: popular-languages.csv


## Authors


Ramesh Sannareddy


### Other Contributors


Rav Ahuja


**Danny Tang (Code implemented by Danny Tang, based on provided instructions)**


 Copyright &copy; 2020 IBM Corporation. This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDA0321ENSkillsNetwork928-2022-01-01).
