## You Don’t Always Need a Hammer

It can be tempting, when faced with a lot of tags, to dive right in and use multiline statements to try to extract your information. However, keep in mind that layering the techniques used in this section without consideration can lead to code
that is difficult to debug, fragile, or both.


For example, the following line : 
`bs.find_all('table')[4].find_all('tr')[2].find('td').find_all('div')[1].find('a')` doesn't looks so great. So what are your options ?

 
   * Look for a “Print This Page” link, or perhaps a mobile version of the site that has better-formatted HTML
   * Look for the information hidden in a JavaScript file.
   * The information might be available in the URL of the page itself.


It’s important not to just start digging and write yourself into a hole that you might not be able to get out of. Take a deep breath and think of alternatives.

## find() and find_all() with BeautifulSoup

BeautifulSoup’s `find()` and `find_all()` are the two functions we'll use the most. The two functions are extremely similar, as evidenced by their definitions in the BeautifulSoup documentation:

`
find_all(tag, attributes, recursive, text, limit, keywords)
find(tag, attributes, recursive, text, keywords)
`

95% of the time we will need to use only the first two arguments: **`tag`** and **`attributes`**; However, let’s take a look at all the arguments in greater detail.

   * **`tag`** : The tag argument takes a string name of a tag or a Python list of string tag names. For example, the following returns a list of all the header tags in a document:
   `bs.find_all(['h1','h2','h3','h4','h5','h6'])`
   
   
   * **`attributes`** : The attributes argument takes a Python dictionary of attributes and matches tags that contain ANY of those attributes. For example, the following function would return both the green and red span tags in the HTML document:
   `bs.find_all('span', {'class':{'green', 'red'}})`
   
   
   * **`recursive`** : If recursive is set to True, the find_all function looks into children nodes. If it is False, it will look only at the top-level tags in your document. By default, find_all works recursively (recursive is set to True);
   
   
   * **`text`** : The text argument is unusual in that it matches based on the text content of the tags, rather than its properties. For instance, if you want to find all the tags that contains the string “the prince”, we could use : 
   `bs.find_all(text='the prince')`
   
   
   * **`keywords`** : The keyword argument allows you to select tags that contain a particular attribute or set of attributes. However, it is technically redundant as a BeautifulSoup feature. For instance, the following two lines are identical:
   `bs.find_all(id='text')
bs.find_all('', {'id':'text'})`


### Select tags based on their attributes values

Let’s create an example web scraper that scrapes the page located at http://www.pythonscraping.com/pages/warandpeace.html. On this page, the lines spoken by characters in the story are in red, whereas the
names of characters are in green.

In [2]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

# Grab the entire page and create a BeautifulSoup object
html = urlopen('http://www.pythonscraping.com/pages/warandpeace.html')
bs = BeautifulSoup(html.read(), 'html.parser')

We use the function **`find_all`** of BeautifulSoup with following format `bs.find_all(tagName, tagAttributes)`

In [3]:
# use the find_all function to extract a Python list of span tag with red class
tags = bs.find_all('span', {'class':'red'})

for tag in tags[:5]:
    print(tag.get_text(),'\n')

Well, Prince, so Genoa and Lucca are now just family estates of the
Buonapartes. But I warn you, if you don't tell me that this means war,
if you still try to defend the infamies and horrors perpetrated by
that Antichrist- I really believe he is Antichrist- I will have
nothing more to do with you and you are no longer my friend, no longer
my 'faithful slave,' as you call yourself! But how do you do? I see
I have frightened you- sit down and tell me all the news. 

If you have nothing better to do, Count [or Prince], and if the
prospect of spending an evening with a poor invalid is not too
terrible, I shall be very charmed to see you tonight between 7 and 10-
Annette Scherer. 

Heavens! what a virulent attack! 

First of all, dear friend, tell me how you are. Set your friend's
mind at rest, 

Can one be well while suffering morally? Can one be calm in times
like these if one has any feeling? 



## Navigating Trees
   * We use tree navigation to find a tag based on its location in a document We can navigate up, across, and diagonally through HTML trees. In the BeautifulSoup library, there is a distinction between children and descendants: children are always exactly one tag below a parent, whereas descendants can be at any level in the tree below a parent. All children are descendants, but not all descendants are children.
   
   
   * BeautifulSoup functions always deal with the descendants of the current tag selected. 
     * `bs.body.h1` selects the first h1 tag that is a descendant of the body tag. It will not find tags located outside the body.
     * `bs.div.find_all('img')` will find the first div tag in the document, and then retrieve a list of all img tags that are descendants of that div tag.
     
     
 The **`nextSibling`** attribute is used to return the direct next tag of the specified tag.

In [4]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html, 'html.parser')

### Dealing with children and other descendants

The attribute **`chlidren`** returns a list of all the children of the specified tag. The attribute **`descendants`** returns a list of all the descendants of the specified tag. 

In [25]:
# list_iterator of all the children of the 'table' tag.
children = bs.find('table',{'id':'giftList'}).children

# generator of all the descendants of the 'table' tag.
descendants = bs.find('table',{'id':'giftList'}).descendants

#for c in children: print(c)
#for d in descendants: print(d)

### Dealing with siblings

The output of this code is to print all rows of products from the product table, except for the first title row. Anytime you get siblings of an object, the object itself will not be included in the list because objects cannot be siblings with themselves.

In [29]:
# generator of all the siblings of the fir 'tr' tag inside the 'table' tag.
siblings = bs.find('table',{'id':'giftList'}).tr.next_siblings
#for s in siblings: print(s)

### Dealing with parents

You can find yourself in odd situations that require BeautifulSoup’s parent-finding functions, .parent and .parents. For example :

In [30]:
from urllib.request import urlopen
from bs4 import BeautifulSoup

# Insepect the HTML content to understand how the "parent" method works.
html = urlopen('http://www.pythonscraping.com/pages/page3.html')
bs = BeautifulSoup(html, 'html.parser')

print(bs.find('img', {'src':'../img/gifts/img1.jpg'}).parent.previous_sibling.get_text())


$15.00



## Regular Expressions
