# Web Scraping with Beautiful Soup
[Learn Web Scraping with Beautiful Soup](https://www.codecademy.com/learn/learn-web-scraping)

# CONTENT
1. [Rules of Scraping](#Rules)
2. [Requests](#Requests)
3. [The Beautiful Soup Object](#BSObject)
4. [Object Types](#ObjectTypes)
5. [Navigating by Tags](#Tags)
6. [Website Structure](#Structure)
7. [Find All](#FindAll)
8. [Select for CSS Selectors](#CSSS)
9. [Reading Text](#Text)

<a name="Rules"></a>
## Rules of Scraping

1. Check the **legal use** of the site's data.
2. Do not spam site with requests (1 request / second).

<a name="Requests"></a>
## Requests

`request` library

In [2]:
import requests

webpage = requests.get("http://ufcstats.com/statistics/events/completed")
print(webpage)

<Response [200]>


In [3]:
# store the content response
webpage_content = webpage.content
print(webpage_content)

b'<!DOCTYPE html>\n<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->\n<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->\n<!--[if IE 8]>         <html class="no-js ie8 lt-ie9"> <![endif]-->\n<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->\n<head>\n  <meta charset="utf-8">\n  <meta http-equiv="X-UA-Compatible" content="IE=edge">\n  <title>\n    Stats | UFC\n  </title>\n  <meta name="description" content="">\n  <meta name="viewport" content="">\n  <link rel="stylesheet" href="/blocks/main.css?ver=289825">\n  <script src="/js/vendor/modernizr-2.6.2.min.js"></script>\n  <script>\n    (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n    })(window,document,\'script\',\'//www.google-analytics.com/analytics.js\',\'ga\');\n\n    ga(\'crea

<a name="BSObject"></a>
## The Beautiful Soup Object

Pull out the HTML parts of the page that we need.

`soup = BeautifulSoup("name.html", "html.parser")`

In [6]:
from bs4 import BeautifulSoup

webpage = requests.get("http://ufcstats.com/statistics/events/completed",
                      "html.parser")

# convert HTML to BS object
soup = BeautifulSoup(webpage.content)
# print
print(soup)

<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]--><!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]--><!--[if IE 8]>         <html class="no-js ie8 lt-ie9"> <![endif]--><!--[if gt IE 8]><!--><html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8"/>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<title>
    Stats | UFC
  </title>
<meta content="" name="description"/>
<meta content="" name="viewport"/>
<link href="/blocks/main.css?ver=79675" rel="stylesheet"/>
<script src="/js/vendor/modernizr-2.6.2.min.js"></script>
<script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-2855164-1', 'auto');
    ga('send', 'pa

<a name="ObjectTypes"></a>
## Object Types

BS breaks down the HTML page into several types of objects.

**Tag** &rarr; HTML tag