## Beautiful Soup Tutorial: Install BeautifulSoup, Requests & LXML
https://python.gotrained.com/install-beautifulsoup/

To start Web Scraping tutorials, the first thing to do is to install the 3 libraries: BeautifulSoup, Requests, and LXML. We will use PIP. Note that sudo might be required if you are on Linux or Mac.

* pip install beautifulsoup4
* pip install requests
* pip install lxml

In [1]:
from bs4 import BeautifulSoup
import requests

## Extracting URLs

In [2]:
url = "http://www.htmlandcssbook.com/code-samples/chapter-04/example.html"
 
# Getting the webpage, creating a Response object.
response = requests.get(url)
 
# Extracting the source code of the page.
data = response.text
 
# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
 
# Extracting all the <a> tags into a list.
tags = soup.find_all('a')
 
# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

mailto:filmfolk@example.org
http://www.sundance.org
http://www.tropfest.com
http://sxsw.com
http://www.londonindependent.org
http://www.festival-cannes.com
http://www.sff.org.au
http://www.miff.com.au
http://www.nzff.co.nz
http://www.labiennale.org/en/cinema
http://www.bfi.org.uk/lff/
http://www.idfa.nl/industry.aspx
http://whistlerfilmfestival.com
about.html
#top


In [3]:
# Using the tutorial to WebScrape data from mlb.com on Sammy Sosa
url = "https://www.mlb.com/player/sammy-sosa-122544"
 
# Getting the webpage, creating a Response object.
response = requests.get(url)
 
# Extracting the source code of the page.
data = response.text
 
# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
 
# Extracting all the <a> tags into a list.
tags = soup.find_all('a')
 
# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
    print(tag.get('href'))

/
/login?redirectUri=https://www.mlb.com/player/sammy-sosa-122544
None
https://www.mlb.com/scores
https://www.mlb.com/news
https://www.mlb.com/video
https://www.mlb.com/standings
http://mlb.mlb.com/stats/sortable.jsp
http://mlb.mlb.com/mlb/schedule/index.jsp
https://www.mlb.com/player-search?tcid=nav_mlb_players
https://www.mlb.com/live-stream-games/subscribe?&affiliateId=MEGAMENU
http://mlb.mlb.com/mlb/baseballtickets/
https://www.mlb.com/apps
https://www.mlbshop.com/?_s=bm-mlbcom-hp
https://auctions.mlb.com
https://www.mlb.com/fantasy
https://www.playball.org/
https://www.mlb.com/team
http://mlb.mlb.com/home
https://www.mlb.com/scores
https://www.mlb.com/news
https://www.mlb.com/probable-pitchers
http://m.mlb.com/prospects/2019
http://mlb.mlb.com/mlb/fantasy/injuries/
http://mlb.mlb.com/mlb/transactions/?tcid=mm_mlb_news
https://www.mlb.com/starting-lineups
http://m.mlb.com/hof
http://m.mlb.com/postseason/history/world-series
https://www.mlb.com/awards
/draft/2019
/all-star
https://w

## Web Scraping Craigslist 

In [5]:
titles = soup.findAll('a', {'class': 'result-title'})
 
for title in titles:
    print(title.text)
#Now, it will find all <a> tags whose class name is ‘result-title’.

In [7]:
addresses = soup.findAll('span', {'class': 'result-hood'})
 
for address in addresses:
   print(address.text)

In [8]:
from bs4 import BeautifulSoup
import requests
 
url = "https://boston.craigslist.org/search/sof"
 
# Getting the webpage, creating a Response object.
response = requests.get(url)
 
# Extracting the source code of the page.
data = response.text
 
# Passing the source code to Beautiful Soup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
 
# Extracting all the <a> tags whose class name is 'result-title' into a list.
titles = soup.findAll('a', {'class': 'result-title'})
 
# Extracting text from the the <a> tags, i.e. class titles.
for title in titles:
    print(title.text)

Software Engineering Fellowship
Data Engineer
Sr. Database Admin. 10+ Month Contract (US Citizen or Greencard only)
Android Technical Lead
Autonomous Vehicle Operator | Toyota Research Institute
Facilities Assistant
Looking for Software Developers and Engineers
Senior Application Developer
Junior Engineer
Autonomous Vehicle Operator | Toyota Research Institute
Software Quality Assurance Engineer
Product Owner
Growth Stage Edtech Startup Hiring Full Stack Engineers
Autonomous Vehicle Operator | Toyota Research Institute
Software Engineering Fellowship
Implementer for Software Company
Python/Django Programmer
Facilities Assistant
Autonomous Vehicle Operator | Toyota Research Institute
HR Coordinator- People Strategies
Engineer, Process Validation
Autonomous Vehicle Operator | Toyota Research Institute
COBOL Instructors
Senior GIS Analyst
SWQA Engineer II
Software Engineering Fellowship
Autonomous Vehicle Operator | Toyota Research Institute
Growth Stage Edtech Startup Hiring Full Stack E

In [9]:
#Using the same code to WebScrape a different URL
from bs4 import BeautifulSoup
import requests
 
url = "https://newyork.craigslist.org/d/software-qa-dba-etc/search/sof"
 
# Getting the webpage, creating a Response object.
response = requests.get(url)
 
# Extracting the source code of the page.
data = response.text
 
# Passing the source code to Beautiful Soup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
 
# Extracting all the <a> tags whose class name is 'result-title' into a list.
titles = soup.findAll('a', {'class': 'result-title'})
 
# Extracting text from the the <a> tags, i.e. class titles.
for title in titles:
    print(title.text)

Web Developer
Web Developer
Sr Business Analyst w/Oracle EBS
Senior Software Engineer
Marketing Data Analyst | $22 - $25 per hour
Serverless Application Engineer
Product Management Fellowship
Software Engineering Fellowship
Senior Software Engineer
Project Manager - for Dynamic SaaS Company
Data Scientist with a rebellious spirit
Ex-bike messenger(s) wanted to work in a software tech environment!
Microsoft CRM Dynamics 365 Migration - full & part-time work from home
Senior Backend Engineer
Level 1 Operations Process Support (Connecticut)
Senior QA Analyst – Long-term contract
QA SDET Engineers  Needed - 3 Positions - All Levels
Developers who want to learn about Hedge Funds, Private Equity, VC
Data Analyst with  SQL   Financial
Growth Stage Edtech Startup Hiring Full Stack Engineers
Marketing Data Analyst | $22 - $25 per hour
Product Management Fellowship
Software Engineering Fellowship
Quality assurance tester (Entry Level)
Growing Company hiring Junior Devs and Senior Devs
Implemente