In [None]:
# 10.2.1 Use HTML Elements

# Robin has all tools installed and tested Mongo to make sure it's ready for data
# before she can start pulling data off of the web, she needs to be able to identify where the data is stored within 
# the HTML code.

# Every webpage is built using HYPERTEXT MARKUP LANGUAGE, or HTML. 

# Some sites are more sophisticated than others, but they all have the same basic structure. 
# Each element of a page, such as a title or a paragraph, is wrapped in a TAG. 
# Each TAG is specific to the element it's holding, and there are many different types of TAGS.


In [None]:
# Think of a webpage as a window into the internet. 
# HTML is the glass, boards, and blinds on that window. 

# Just like there are many sizes and shapes to windows, each webpage has been customized to present users with a view 
# into a different topic. 

# Consider a weather report delivered through a weather site. Think of a news source or social media platform. 
# Each of these examples are all built using custom HTML. 

# Our first step will be to explore that design so that we can write a script that knows what it's looking at when it 
# interacts with a webpage.



In [None]:
# # Open VS Code and create a file named index.html. 
# This file can be saved to your desktop because it's just for practice.

# In this blank HTML file put an exclamation point on the first line and press Enter. 
# This should autofill the editor to contain everything we need for a basic HTML page.

In [None]:
# After executing the "!" on VS Code, the autofill should look as follows: 

# <!DOCTYPE html>
# <html lang="en">
# <head>
#  <meta charset="UTF-8">
#  <meta name="viewport" content="width=device-width, initial-scale=1.0">
#  <meta http-equiv="X-UA-Compatible" content="ie=edge">
#  <title>Document</title>
# </head>
# <body>
# </body>
# </html>

In [None]:
# In this code, each line of code is wrapped in a tag, such as <title>. 


# HTML tags always begin with a left angle bracket (<), followed by the name of the tag (in our case, "title"). 
# Once the name has been entered, the tag is then closed with a right angle bracket (>). 
# This is only the first half of the completed tag, or the opening tag. We'll also need to add the closing tag.

# A closing HTML tag is very similar to the opening tag, but the only difference is a single character, the forward 
# slash inside the left angle bracket: </title>. 

# Now that there are both opening and closing HTML title tags, you can add the title of the document.

# For example, if you wanted your webpage title to be "Math Is Fun!" then the entire line of HTML code would look 
# like this: <title>Math Is Fun!</title>.



In [None]:
# HTML is a coding language used for creating webpages. It’s built using specific tags and arranging them in a nested 
# order, a bit like building blocks. 

#For example, if we wanted a header and a paragraph in the same section of a webpage, we would nest <h1 /> and <p /> 
# tags inside a <div /> tag, with the <div /> tag acting as a box to hold the other pieces.

# <div>
#    <h1>Hello, world!</h1>
#    <p>This is a great beginning.</p>
# </div>

# Most elements have opening and closing tags, which are identical except for the forward slash that begins the 
# closing tag. The closing tags represent the end of that HTML element.

In [None]:
# <!DOCTYPE html>
# <html lang="en">
# <head>
#  <meta charset="UTF-8">
#  <meta name="viewport" content="width=device-width, initial-scale=1.0">
#  <meta http-equiv="X-UA-Compatible" content="ie=edge">
#  <title>Document</title>
# </head>
# <body>
# </body>
# </html>

# These tags are what define each element of this webpage. 
# We can open this page right now, but it will be blank because we haven't added anything to it yet. 
# Let's take a closer look at how these different elements work together.

In [None]:
# 1. <!DOCTYPE html> is a declaration, not a tag. It tells web browsers in which HTML version the document is written.
# This should always be the first line in an HTML document.


In [None]:
# 2. <head> is the opening tag that serves as a container for the setup elements. 
# Jupyter Notebook imports occur in the top cell whereas Python imports occur at the top of the code. 
# HTML imports (e.g., a stylesheet or a library) will be within the <head>.


In [None]:
# 3. <meta> is short for "metadata" and tells the web browser basic information, such as page width.


In [None]:
# 4. <title> and </title> are the opening and closing tags that serve as a container for the page title displayed on 
# the tab at the top of your web browser. 

# In the example above, the title is "Document" and would appear like so in the browser:


In [None]:
# 5. </head> is the closing tag for the <head> tag, much like the end of a code block in Python.


In [None]:
# 6. <body> and </body> are opening and closing tags. They also serve as a container, but for data we can see 
# (navigation menus, lists, and paragraphs).


In [None]:
# 7. <html lang=”en”> and </html> are opening and closing tags that serve as a container for all elements within an '
# HTML page.


In [None]:
# Nesting is when HTML elements are contained within other elements. 
# Picture a set of nesting dolls with each nested in proper order, by design, into the largest doll. 
# It is the same for HTML tags—they must be in the correct order to not break the design of the webpage.

# Keeping code clean and easy to read is an important part of being a developer. 
# How would you keep your HTML in good visual shape?

# Use indentation to keep the tags in order—this helps show how and where elements are nested.
# Coding guidelines for HTML suggest using indentation of two to four spaces for each nested element. 
# This helps keep our code clean and easy to read.

In [None]:
# Let's review the new tags we put in our VS Code for "index.html":

# <h1 /> is a first-level header. The text in this tag will be displayed bigger and bolder than the rest of the page's
# text. There are many different headers available to use, from h1 to h6, with h1 returning the largest text.

# <p /> is a paragraph tag, currently holding lorem ipsum sentences. 
# (lorem ipsum is dummy text used to stage websites). 
# More can be read about it on the Lorem Ipsum reference website 

# <ul /> is an unordered list.

# <li /> is a list item.


In [1]:
# What does it mean when the <li /> tags are inside the <ul /> tags?

# It means the tags are nested. The <ul /> tags are a container for the <li /> tags.

# Without this exact order, the list items would not appear correctly on the webpage.

In [None]:
# Understanding the basic layout and how nesting and containers work is an important part of successful web scraping.

# Before we can program our script to pull that data, we have to tell it where to look. 

# Basically, our script would say, "look in this <div /> tag, then look inside that for a <p /> tag."

