# Grabing a Title

The first example on web scraping is grabing the title of a webpage. The title of a webpage is enclosed in the HTML tag `title`:

        <head>
            <title>Title on Browser Tab</title>
        </head>
        <body>...

For this notebook, we will use www.example.com which is a website specifically made to serve as an example domain and can be used without prior coordination or asking for permission.

# Step 1: Grab the page

The first thing to do is grab the information source from the website. In order to do this, we will be using the `requests` library. 

Note that this action could have to run multiple times before it works and that it might be blocked by a firewall.

In [1]:
import requests

In [2]:
res = requests.get("http://www.example.com")

`res` is defined as a *requests.models* type object that contains all the information from the website.

In [3]:
type(res)

requests.models.Response

Using `.text` will display the webpage information source.

In [4]:
res.text

'<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset="utf-8" />\n    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <style type="text/css">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <

# Step 2: Analyzing the extracted page

After extracing the raw information from the page, we should start the analyzing process. We could run our own custom script in the string of *res.text*, but it's much easier to use the `bs4` library and the multiple built-in tools and methods it has. Using BeautifulSoup we can create a *soup* object that contains all the *ingredients* of the webpage.

BeautifulSoup is also helped by the `lxml` library in the background. 

In [5]:
import bs4
import lxml

In [6]:
#Creating a soup object

soup = bs4.BeautifulSoup(res.text,"lxml")

In [7]:
#Displaying the source code in a readable form

soup

<!DOCTYPE html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-type"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>
<body>
<div>
<h1>Example Domain</h1>
<p>This domain is for use in illustrative examples

# Step 3: Grabing the Title

The last step is actually grabbing the title. We do this be using the `select()` method to grab the `title tag`.

In [8]:
# We will pass on "title" as an argument for the select() method

title_tag = soup.select("title")
title_tag

[<title>Example Domain</title>]

Notice what is returned here, its actually a list containing all the title elements (along with their tags). You can use indexing or even looping to grab the elements from the list.

In [9]:
title_tag[0]

<title>Example Domain</title>

In order to get the actual text string. we can use `getText()` method.

In [10]:
title_tag[0].getText()

'Example Domain'

# BONUS: Select first Paragraph

As a step further, lets select and display the first paragraph. Instead of `title`, we should pass on the argument `soup.select("p")`.

In [11]:
p = soup.select("p")

#We will use print to eliminate the \n character
print(p[0].getText())

This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.


In case that there are more paragraphs, we can select another one by simply chaning the index number.

In [12]:
#Selecting the second paragraph

print(p[1].getText())

More information...
