### The components of a web page

When we visit a web page, our web browser makes a request to a web server. This request is called a `GET` request, since we're getting files from the server. The server then sends back files that tell our browser how to render the page for us. The files fall into a few main types:
* HTML — contain the main content of the page.
* CSS — add styling to make the page look nicer.
* JS — Javascript files add interactivity to web pages.
* Images — image formats, such as JPG and PNG allow web pages to show pictures.

### HTML
#### 1. Introduction
HTML a markup language that tells a browser how to layout content. HTML allows you to do similar things to what you do in a word processor like Microsoft Word — make text bold, create paragraphs, and so on. 

HTML consists of elements called tags. The most basic tag is the `<html>` tag. This tag tells the web browser that everything inside of it is HTML. 

```html
<html>
    <head>
    </head>
    <body>
        <p>
            Here's a paragraph of text!
        </p>
        <p>
            Here's a second paragraph of text!
        </p>
    </body>
</html>
```
#### Here's how this will look:
<kbd>
<html>
    <head>
    </head>
    <body>
        <p>
            Here's a paragraph of text!
        </p>
        <p>
            Here's a second paragraph of text!
        </p>
    </body>
</html>
</kbd>


Tags have commonly used names that depend on their position in relation to other tags:

* `child` — a child is a tag inside another tag. So the two `p` tags above are both children of the `body` tag.
* `parent` — a parent is the tag that another tag is inside. e.g. the `html` tag is the parent of the `body` tag.
* `sibiling` — a sibiling is a tag that inside the same parent as another tag. e.g., `head` and `body` are siblings, since they're both inside `html`. 


#### 2.HTML hyperlink
```html
<html>
    <head>
    </head>
    <body>
        <p>
            Here's a paragraph of text!
            <a href="https://www.dataquest.io">Learn Data Science Online</a>
        </p>
        <p>
            Here's a second paragraph of text!
            <a href="https://www.python.org">Python</a>
        </p>
    </body>
</html>
```
Output:
<kbd>
<html>
    <head>
    </head>
    <body>
        <p>
            Here's a paragraph of text!
            <a href="https://www.dataquest.io">Learn Data Science Online</a>
        </p>
        <p>
            Here's a second paragraph of text!
            <a href="https://www.python.org">Python</a>
        </p>
    </body>
</html>
</kbd>
<br \>
In which, <br \>
`a` tag are links, tell the browser to render a link to another web page. <br \>
The `href` property of the tag determines where the link goes.<br \><br \>
There are some more tags:
* `div` — indicates a division, or area, of the page.
* `b` — bolds any text inside.
* `i` — italicizes any text inside.
* `table` — creates a table.
* `form` — creates an input form.

#### 3.HTML classes

One element can have multiple classes, and a class can be shared between elements. Each element can only have one id, and an id can only be used once on a page. Classes and ids are optional, and not all elements will have them.

```html
<html>
    <head>
    </head>
    <body>
        <p class="bold-paragraph">
            Here's a paragraph of text!
            <a href="https://www.dataquest.io" id="learn-link">Learn Data Science Online</a>
        </p>
        <p class="bold-paragraph extra-large">
            Here's a second paragraph of text!
            <a href="https://www.python.org" class="extra-large">Python</a>
        </p>
    </body>
</html>
```
Output:
<kbd>
<html>
    <head>
    </head>
    <body>
        <p class="bold-paragraph">
            Here's a paragraph of text!
            <a href="https://www.dataquest.io" id="learn-link">Learn Data Science Online</a>
        </p>
        <p class="bold-paragraph extra-large">
            Here's a second paragraph of text!
            <a href="https://www.python.org" class="extra-large">Python</a>
        </p>
    </body>
</html>
</kbd>

### Request
The first thing we'll need to do to scrape a web page is to download the page. We can download pages using the Python requests library. The requests library will make a `GET` request to a web server, which will download the `HTML` contents of a given web page for us. 

In [4]:
import requests
page = requests.get('http://dataquestio.github.io/web-scraping-pages/simple.html')
page

#Status code starting with a 2 generally indicates success, 
# and a code starting with a 4 or a 5 indicates an error.

<Response [200]>

In [5]:
page.content  #content of the downloaded html

b'<!DOCTYPE html>\n<html>\n    <head>\n        <title>A simple example page</title>\n    </head>\n    <body>\n        <p>Here is some simple content for this page.</p>\n    </body>\n</html>'

### BeautifulSoup
html parsing tool.

In [6]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content,'html.parser')
soup

<!DOCTYPE html>

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<p>Here is some simple content for this page.</p>
</body>
</html>

In [7]:
# format - nested
print(soup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <title>
   A simple example page
  </title>
 </head>
 <body>
  <p>
   Here is some simple content for this page.
  </p>
 </body>
</html>


In [8]:
# list all the elements at the top level of the page
list(soup.children)
# 

['html', '\n', <html>
 <head>
 <title>A simple example page</title>
 </head>
 <body>
 <p>Here is some simple content for this page.</p>
 </body>
 </html>]