## Homework 3 Part Two: Scraping
This homework asks you to scrape from three different sources: The Guardian, Supreme Court Decisions and a more complicated version of Shakespeare (bonus). 

Again, please follow the instructions and do the best you can. Look at the tutorial for examples, my answers for Homework 3 Part One, as well as the Beautiful Soup documentation, and any other Python resource (such as Stack overflow). As you get further into this assignment a lot of the trick will be using loops properly and appending information into lists. One of the great ways to carefully use Beautiful Soup is to first use find() to find the first instance of something and search through it. And then use find_all() to get a list of results that you must then loop through and search within.

In [4]:
import requests
from bs4 import BeautifulSoup

## Supreme Court Decisions 2020 
Now it's time to scrape from reality. The Supreme Court posts its decisions in a format that is not immediately data friendly. They have a simple HTML table with some information about the decision, including a link to a PDF that contains the written decision. We won't mess with those PDFs this week, but we do want to transform their tables into something useful to us. 

We will be scraping this page: 
https://www.supremecourt.gov/opinions/slipopinion/20

*Note:* While you won't see all of the tables for all the months when you go to the page, they are all there in the HTML that you will download and in the HTML source you view (which is the same thing). Definitely do a view source, and study the structure of the HTML tables before you start coding.

You eventually want to end up with a list of lists (rows and then columns) for every decision from the 2020. Follow the process, and see how far you get.


Write your lines that use requests to get the page, and a second variable that passes the raw HTML into Beautiful Soup for parsing. Include a third line that prints the HTML in the prettify() way.

In [3]:
raw_html = requests.get('https://www.supremecourt.gov/opinions/slipopinion/20').content
soup_doc = BeautifulSoup(raw_html, "html.parser")
print(soup_doc.prettify())

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
 <head id="ctl00_ctl00_Head1">
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="txt/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <script src="/js/jquery-3.1.0.min.js" type="text/javascript">
  </script>
  <script src="/js/bootstrap.js" type="text/javascript">
  </script>
  <link href="/css/font-awesome.min.css" rel="stylesheet" type="text/css"/>
  <link href="/css/bootstrap.min.css" rel="Stylesheet" type="text/css"/>
  <link href="/css/bootstrap-theme.min.css" rel="Stylesheet" type="text/css"/>
  <link href="/styles/newBootStrap2.css" rel="stylesheet" type="text/css"/>
  <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries -->
  <!--[if lt IE 9]>
          <script src="/js/html5shiv.js"></script>
          <script src="/js/respond.min.js"></script>
        <![endif]-->
  <!--[if lt IE 8]>
       

Isolate the HTML row with the first row of information for the case Alabama Assn. of Realtors v. Department of Health and Human Servs. (as of 11/10/21 that is the most recent case. These things can update though!)

In [4]:
soup_doc.find(class_="table table-bordered").find_all('tr')[1]

<tr>
<td style="text-align: center;">68</td>
<td style="text-align: center;">8/26/21</td>
<td style="text-align: center; white-space: nowrap;">21A23</td>
<td><a href="/opinions/20pdf/21a23_ap6c.pdf" target="_blank" title="The District Court’s judgment—which vacated as unlawful the Centers for Disease Control and Prevention’s imposition of a nationwide moratorium on evictions of any tenants who live in a county that is experiencing substantial or high levels of COVID–19 transmission and who make certain declarations of financial need, 86 Fed. Reg. 43244—is enforceable and the stay of that judgment is vacated.">Alabama Assn. of Realtors v. Department of Health and Human Servs.</a></td>
<td style="text-align: center;"> </td>
<td style="text-align: center;">PC</td>
<td style="text-align: center;">594/2</td>
</tr>

Print out each cell of information from that first row. Your output should look like this:


```
68
8/26/21
21A23
Alabama Assn. of Realtors v. Department of Health and Human Servs.
 
PC
594/2
```

In [5]:
case1 = soup_doc.find(class_="table table-bordered").find_all('tr')[1]
cell_case1 = case1.find_all('td')
for cell in cell_case1:
    print(cell.string)

68
8/26/21
21A23
Alabama Assn. of Realtors v. Department of Health and Human Servs.
 
PC
594/2


But wait, there is more information hidden inside the tags! Really important information. Find it and print it out like this (still just for this first row):
```
/opinions/20pdf/21a23_ap6c.pdf 
 The District Court’s judgment—which vacated as unlawful the Centers for Disease Control and Prevention’s imposition of a nationwide moratorium on evictions of any tenants who live in a county that is experiencing substantial or high levels of COVID–19 transmission and who make certain declarations of financial need, 86 Fed. Reg. 43244—is enforceable and the stay of that judgment is vacated.
 ```

In [6]:
print(case1.find('a')['href'])
print(case1.find('a')['title'])

/opinions/20pdf/21a23_ap6c.pdf
The District Court’s judgment—which vacated as unlawful the Centers for Disease Control and Prevention’s imposition of a nationwide moratorium on evictions of any tenants who live in a county that is experiencing substantial or high levels of COVID–19 transmission and who make certain declarations of financial need, 86 Fed. Reg. 43244—is enforceable and the stay of that judgment is vacated.


Okay, time to make this useful. Take the information you printed in the last two cells, and combine them all into a list. Output the list, it should look like this:
```
['68',
 '8/26/21',
 '21A23',
 'Alabama Assn. of Realtors v. Department of Health and Human Servs.',
 '\xa0',
 'PC',
 '594/2',
 '/opinions/20pdf/21a23_ap6c.pdf',
 'The District Court’s judgment—which vacated as unlawful the Centers for Disease Control and Prevention’s imposition of a nationwide moratorium on evictions of any tenants who live in a county that is experiencing substantial or high levels of COVID–19 transmission and who make certain declarations of financial need, 86 Fed. Reg. 43244—is enforceable and the stay of that judgment is vacated.']
 ```
 

In [7]:
first_case = []
for cell in cell_case1:
    first_case.append(cell.string)
first_case.append(case1.a['href'])
first_case.append(case1.a['title'])
first_case

['68',
 '8/26/21',
 '21A23',
 'Alabama Assn. of Realtors v. Department of Health and Human Servs.',
 '\xa0',
 'PC',
 '594/2',
 '/opinions/20pdf/21a23_ap6c.pdf',
 'The District Court’s judgment—which vacated as unlawful the Centers for Disease Control and Prevention’s imposition of a nationwide moratorium on evictions of any tenants who live in a county that is experiencing substantial or high levels of COVID–19 transmission and who make certain declarations of financial need, 86 Fed. Reg. 43244—is enforceable and the stay of that judgment is vacated.']

Now, run the exact same code, but for the first row in the third table, June 2020. The output should look like this:
```
['64',
 '6/29/21',
 '20-440',
 'Minerva Surgical, Inc. v. Hologic, Inc.',
 '\xa0',
 'EK',
 '594/2',
 '/opinions/20pdf/20-440_9ol1.pdf',
 'The well-grounded patent law doctrine of assignor estoppel applies only when the assignor’s claim of invalidity contradicts explicit or implicit representations the assignor made in assigning the patent.']
```


In [44]:
third_table = soup_doc.find_all(class_="table table-bordered")[2]
case2 = third_table.find_all('tr')[1]
cell_case2 = case2.find_all('td')
second_case = []
for cell in cell_case2:
    second_case.append(cell.string)
second_case.append(case2.a['href'])
second_case.append(case2.a['title'])
second_case

['64',
 '6/29/21',
 '20-440',
 'Minerva Surgical, Inc. v. Hologic, Inc.',
 '\xa0',
 'EK',
 '594/2',
 '/opinions/20pdf/20-440_9ol1.pdf',
 'The well-grounded patent law doctrine of assignor estoppel applies only when the assignor’s claim of invalidity contradicts explicit or implicit representations the assignor made in assigning the patent.']

Great! Now you want to go through all of the rows in that thrid table, June (but not the header), and get a list of lists with the information for every case in that row. 

Note, that the code here should be similar to the code above, but you will need to loop through all of the rows in June, and collect the info for each row with a new list that will then be appended to a larger list each to time the loop finishes (before looping back to the next row).

Your output should look like this:

```
[['64',
  '6/29/21',
  '20-440',
  'Minerva Surgical, Inc. v. Hologic, Inc.',
  '\xa0',
  'EK',
  '594/2',
  '/opinions/20pdf/20-440_9ol1.pdf',
  'The well-grounded patent law doctrine of assignor estoppel applies only when the assignor’s claim of invalidity contradicts explicit or implicit representations the assignor made in assigning the patent.'],
 ['63',
  '6/29/21',
  '19-897',
  'Johnson v. Guzman Chavez',
  '\xa0',
  'A',
  '594/2',
  '/opinions/20pdf/19-897_c07d.pdf',
  'The detention of an alien ordered removed from the United States who reenters without authorization is governed by 8 U. S. C. §1231.'],
 ['62',
  '6/29/21',
  '19-1039',
  'PennEast Pipeline Co. v. New Jersey',
  '\xa0',
  'R',
  '594/2',
  '/opinions/20pdf/19-1039_8n5a.pdf',
  'A certificate of public convenience and necessity issued by the Federal Energy Regulatory Commission pursuant to §717f(h) of the Natural Gas Act authorizes a private company to condemn all necessary rights-of-way, whether owned by private parties or States.'],
 ['61',
  '6/28/21',
  '20-1212',
  'Pakdel v. City and County of San Francisco',
  '\xa0',
  'PC',
  '594/2',
  '/opinions/20pdf/20-1212_3204.pdf',
  'Administrative exhaustion of state remedies is not a prerequisite for a 42 U. S. C. §1983 takings claim when the government has reached a conclusive position; the Ninth Circuit’s decision in this case directly contravenes Knick v. Township of Scott, 588 U. S. ___. '],
 ['60',
  '6/28/21',
  '20-391',
  'Lombardo v. St. Louis',
  '\xa0',
  'PC',
  '594/2',
  '/opinions/20pdf/20-391_2c83.pdf',
  'Because it is unclear in this excessive force case whether the Eighth Circuit incorrectly thought the use of a prone restraint is per se constitutional so long as an individual appears to resist officers’ efforts to subdue him, the Eighth Circuit’s judgment is vacated, and the case is remanded to give the court the opportunity in the first instance to employ the careful, context-specific analysis required by this Court’s excessive force precedent.'],
 ['59',
  '6/25/21',
  '20-297',
  'TransUnion LLC v. Ramirez',
  '\xa0',
  'BK',
  '594/2',
  '/opinions/20pdf/20-297_4g25.pdf',
  'Only a plaintiff concretely harmed by a defendant’s violation of the Fair Credit Reporting Act has Article III standing to seek damages against that private defendant in federal court.'],
 ['58',
  '6/25/21',
  '20-472',
  'HollyFrontier Cheyenne Refining, LLC v. Renewable Fuels Assn.',
  '\xa0',
  'NG',
  '594/2',
  '/opinions/20pdf/20-472_0pm1.pdf',
  'Under the renewable fuel program’s fuel blending requirements for domestic refineries, a small refinery that previously received a hardship exemption may obtain an “extension” under 42 U. S. C. §7545(o)(9)(B)(i) even if the refinery did not seek a hardship exemption every year after initially doing so.'],
 ['57',
  '6/25/21',
  '20-543',
  'Yellen v. Confederated Tribes of Chehalis Reservation',
  '7/28/21',
  'SS',
  '594/2',
  '/opinions/20pdf/20-543_new_8n59.pdf',
  'Alaska Native Corporations are “Indian tribe[s]” under the Indian Self-Determination and Education Assistance Act and thus eligible for funding available to “Tribal governments” under Title V of the Coronavirus Aid, Relief, and Economic Security Act.'],
 ['56',
  '6/23/21',
  '20-18',
  'Lange v. California',
  '6/24/21',
  'EK',
  '594/1',
  '/opinions/20pdf/20-18_new_6k47.pdf',
  'Under the Fourth Amendment, pursuit of a fleeing misdemeanor suspect does not always or categorically qualify as an exigent circumstance justifying a warrantless entry into a home.'],
 ['55',
  '6/23/21',
  '19-422',
  'Collins v. Yellen',
  '7/28/21',
  'A',
  '594/1',
  '/opinions/20pdf/19-422_new_c0n2.pdf',
  'Because the Federal Housing Finance Agency (FHFA) did not exceed its authority under the Housing and Economic Recovery Act of 2008 as a conservator of Fannie Mae and Freddie Mac, the anti-injunction provisions of the Recovery Act bar the statutory claim brought by shareholders of those entities; the Recovery Act’s structure, which restricts the President’s power to remove the FHFA Director, violates the separation of powers.'],
 ['54',
  '6/23/21',
  '20-255',
  'Mahanoy Area School Dist. v. B. L.',
  '\xa0',
  'B',
  '594/1',
  '/opinions/20pdf/20-255_g3bi.pdf',
  'The school district’s decision to suspend student B. L. from the cheerleading team for posting to social media (outside of school hours and away from the school’s campus) vulgar language and gestures critical of the school violates the First Amendment.'],
 ['53',
  '6/23/21',
  '20-107',
  'Cedar Point Nursery v. Hassid',
  '\xa0',
  'R',
  '594/1',
  '/opinions/20pdf/20-107_ihdj.pdf',
  'A California regulation granting labor organizations a “right to take access” to an agricultural employer’s property to solicit support for unionization constitutes a per se physical taking.'],
 ['52',
  '6/21/21',
  '20-222',
  'Goldman Sachs Group, Inc. v. Arkansas Teacher Retirement System',
  '\xa0',
  'AB',
  '594/1',
  '/opinions/20pdf/20-222_2c83.pdf',
  'The generic nature of a misrepresentation in connection with the sale of securities often is important evidence of price impact that courts should consider at class certification; defendants bear the burden of persuasion to prove a lack of price impact by a preponderance of the evidence at class certification.'],
 ['51',
  '6/21/21',
  '20-512',
  'National Collegiate Athletic Assn. v. Alston',
  '6/23/21',
  'NG',
  '594/1',
  '/opinions/20pdf/20-512_new_7mi8.pdf',
  'The district court’s injunction pertaining to certain NCAA rules limiting the education-related benefits schools may make available to student-athletes is consistent with established antitrust principles.'],
 ['50',
  '6/21/21',
  '19-1434',
  'United States v. Arthrex, Inc.',
  '7/08/21',
  'R',
  '594/1',
  '/opinions/20pdf/19-1434_new_j4el.pdf',
  'The unreviewable authority wielded by Administrative Patent Judges during inter partes review is incompatible with their appointment by the Secretary of Commerce to an inferior office; the judgment of the Federal Circuit is vacated, and the case is remanded.'],
 ['49',
  '6/17/21',
  '19-840',
  'California v. Texas',
  '7/08/21',
  'B',
  '593/2',
  '/opinions/20pdf/19-840_new_5hdk.pdf',
  'Plaintiffs lack standing to challenge the Patient Protection and Affordable Care Act’s minimum essential coverage provision.'],
 ['48',
  '6/17/21',
  '19-416',
  'Nestlé USA, Inc. v. Doe',
  '\xa0',
  'T',
  '593/2',
  '/opinions/20pdf/19-416_i4dj.pdf',
  'To plead facts sufficient to support a domestic application of the Alien Tort Statute, 28 U. S. C. §1350, plaintiffs must allege more domestic conduct than general corporate activity; the Ninth Circuit’s contrary holding is reversed, and the case is remanded.'],
 ['47',
  '6/17/21',
  '19-123',
  'Fulton v. Philadelphia',
  None,
  'R',
  '593/2',
  '/opinions/20pdf/19-123_new_9olb.pdf',
  'Philadelphia’s refusal to contract with Catholic Social Services for the provision of foster care services unless CSS agrees to certify same-sex couples as foster parents violates the Free Exercise Clause of the First Amendment.'],
 ['46',
  '6/14/21',
  '19-8709',
  'Greer v. United States',
  '\xa0',
  'BK',
  '593/2',
  '/opinions/20pdf/19-8709_n7io.pdf',
  'In felon-in-possession cases under 18 U. S. C. §922(g)(1), an error under Rehaif v. United States, 588 U. S. ___, is not a basis for plain-error relief unless the defendant first makes a sufficient argument or representation on appeal that he would have presented evidence at trial that he did not in fact know he was a felon.'],
 ['45',
  '6/14/21',
  '20-5904',
  'Terry v. United States',
  '\xa0',
  'T',
  '593/2',
  '/opinions/20pdf/20-5904_i4dk.pdf',
  'A sentence reduction under the First Step Act is available only if an offender’s prior conviction of a crack cocaine offense triggered a mandatory minimum sentence.'],
 ['44',
  '6/10/21',
  '19-5410',
  'Borden v. United States',
  '\xa0',
  'EK',
  '593/2',
  '/opinions/20pdf/19-5410_8nj9.pdf',
  'The decision of the Sixth Circuit—holding that an offense with a mental state of recklessness may qualify as a “violent felony” under the Armed Career Criminal Act’s elements clause, 18 U. S. C. §924(e)(2)(B)(i)—is reversed, and the case is remanded.'],
 ['43',
  '6/07/21',
  '20-315',
  'Sanchez v. Mayorkas',
  '\xa0',
  'EK',
  '593/2',
  '/opinions/20pdf/20-315_q713.pdf',
  'An individual who entered the United States unlawfully is not eligible to become a lawful permanent resident under 8 U.S.C. §1255 even if the United States has granted the individual temporary protected status.'],
 ['42',
  '6/03/21',
  '19-783',
  'Van Buren v. United States',
  '\xa0',
  'AB',
  '593/2',
  '/opinions/20pdf/19-783_k53l.pdf',
  'An individual “exceeds authorized access” under the Computer Fraud and Abuse Act of 1986, 18 U. S. C. §1030(a)(2), when he accesses a computer with authorization but then obtains information located in particular areas of the computer—such as files, folders, or databases—that are off-limits to him.'],
 ['41',
  '6/01/21',
  '19-1155',
  'Garland v. Ming Dai',
  '6/01/21',
  'NG',
  '593/2',
  '/opinions/20pdf/19-1155_new_197d.pdf',
  'The Ninth Circuit’s rule in immigration disputes—that in the absence of an explicit adverse credibility determination by an immigration judge or the Board of Immigration Appeals, a reviewing court must treat a petitioning noncitizen’s testimony as credible and true—cannot be reconciled with the terms of the Immigration and Nationality Act.'],
 ['40',
  '6/01/21',
  '19-1414',
  'United States v. Cooley',
  '\xa0',
  'B',
  '593/2',
  '/opinions/20pdf/19-1414_8m58.pdf',
  'A tribal police officer has authority to detain temporarily and to search a non-Indian traveling on a public right-of-way running through a reservation for potential violations of state or federal law.']]
  
```

In [None]:
all_rows = third_table.find_all('tr')
ac_third = []
for row in all_rows[1:]:
    one_row = []
    cells = row.find_all('td')
    for cell in cells:
        one_row.append(cell.string)
    one_row.append(row.find('a')['href'])
    one_row.append(row.find('a')['title'])
    ac_third.append(one_row)
ac_third

Finally, go through EVERY table, and get out every row--no headers. So you have all of the 2020 decisions from 68-1 info in highly useful list-within-list format.

In [None]:
tables = soup_doc.find_all(class_="table table-bordered")
all_table = []
for table in tables:
    all_rows = table.find_all('tr')
    all_cells = []
    for row in all_rows[1:]:
        one_row = []
        cells = row.find_all('td')
        for cell in cells:
            one_row.append(cell.string)
        one_row.append(row.find('a')['href'])
        one_row.append(row.find('a')['title'])
        all_cells.append(one_row)
    all_table.append(all_cells)
all_table

## The Guardian: Best Non-Fiction Books of All Time 
I do not endorse this list. However, there are some interesting things within. You will notice that the Internet is filled with rankings, a form that is as readilty consumable as it is programmable, because code understands ranking pretty well.

For this task you want to extract different elements of this list separately. You will start by extracting the information for the first entry on the list. You want to get three elements separately: RankNumber, Title_Author_Year, Blurb. And these need to be placed into a Python list with three elements. Once you've accomplished that, you want to loop through all of the entries in that list, 1-100, and make individual Python lists with those same three elements, and then put those lists into a Python list as you go...

Step one, go to this page and take a look at what you're contending with:

https://www.theguardian.com/books/2017/dec/31/the-100-best-nonfiction-books-of-all-time-the-full-list

Note, some of the tags you will see on Chrome in the "Inspect" area and even in "View Source" or not the same as the HTML that's being downloaded by requests and parsed by Beautiful Soup.


Step 1: In the next two cells, use requests to download the HTML, and use beautiful soup to parse it. Then print the prettify() version of that downloaded HTML, and copy and paste that output into an HTML editor to look at the tags in there. The overall structure will be the same as what you see in Chrome, but some of the "class=" names will be different.

In [None]:
raw_html2 = requests.get("https://www.theguardian.com/books/2017/dec/31/the-100-best-nonfiction-books-of-all-time-the-full-list").content
soup_doc2 = BeautifulSoup(raw_html2, "html.parser")
print(soup_doc2.prettify())

Step 2: Find the HTML that contains the first entry on the list. Your output should look like this:

`<p class="dcr-eu20cu"><strong>1. <a data-link-name="in body link" href="https://www.theguardian.com/books/2016/feb/01/100-best-nonfiction-books-of-all-time-the-sixth-extinction-elizabeth-kolbert">The Sixth Extinction by Elizabeth Kolbert (2014)</a> </strong><br/> An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.</p>`

In [7]:
#the class attributes for the div are actually updated dynamically by the JavaScript, 
#so they are not always consistent 
#they may not even be consistent from download to download.
first_book = soup_doc2.find_all(class_="article-body-commercial-selector article-body-viewer-selector dcr-ucgxn1")[0].p
first_book

<p class="dcr-eu20cu"><strong>1. <a data-link-name="in body link" href="https://www.theguardian.com/books/2016/feb/01/100-best-nonfiction-books-of-all-time-the-sixth-extinction-elizabeth-kolbert">The Sixth Extinction by Elizabeth Kolbert (2014)</a> </strong><br/> An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.</p>

Step 3: Extract the number from that entry. 

Result: 

`'1. '`


In [80]:
first_book.strong.next 

'1. '

Step 4: extract the title_author_year. 

Result:
`'The Sixth Extinction by Elizabeth Kolbert (2014)'`
        

In [81]:
first_book.strong.find('a').string

'The Sixth Extinction by Elizabeth Kolbert (2014)'

Step 5: Extract the blurb.

Result:

`' An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.'`



In [10]:
first_book.br.next
first_book.contents[-1]

' An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.'

Step 6: Take those three elements you extracted, and put them into a Python list. If you had success in the three steps above, you don't need to use beautiful soup to do this, you just need to take those individual elements that you extracted and place them inside a list.

Result: 

`['1. ',
 'The Sixth Extinction by Elizabeth Kolbert (2014)',
 ' An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.']`

In [84]:
first_book_info = []
first_book_info.append(first_book.strong.next)
first_book_info.append(first_book.strong.find('a').string)
first_book_info.append(first_book.br.next)
first_book_info

['1. ',
 'The Sixth Extinction by Elizabeth Kolbert (2014)',
 ' An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.']

Step 7: This is more a leap than a step! Take all of the methods you used to isolate that one entry, and apply it to every entry. So that you are creating the same list you see above, over and over for each entry, and place each those into a master list-of-lists.

(Hints:
1) You will need to use some version of find_all() to get all of the entries.

2) You will need a loop to iterate through all of those entries.

3) For each entry you will need to extract each of the three elements (number, title_author_year,description) in the exact same way you did with the first entry.

4) It may help to use print() inside your loop to make sure you're getting everything out correctly.

5) Once you're sure you're getting everything out correctly, you will need to make a Python list that will capture each list that is being built within the loop.

6) It may be helpful to include an **is not None** if statement in this loop as there are some variations and even mistakes in the HTML 

)

Your desired output is in the cell below.



In [None]:
all_books = soup_doc2.find(class_="article-body-commercial-selector article-body-viewer-selector dcr-ucgxn1").find_all('p')
all_books_info =  []
for book in all_books:
    book_info = []
    if book.strong is not None:
        book_info.append(book.strong.next)
        book_info.append(book.strong.find('a').string)
        if book.br is not None:
            book_info.append(book.br.next)
        else:
            book_info.append(book.strong.next_sibling)
    all_books_info.append(book_info)
all_books_info_clear = [x for x in all_books_info if x != []]
all_books_info_clear

Final Result: `
[['1. ',
  'The Sixth Extinction by Elizabeth Kolbert (2014)',
  ' An engrossing account of the looming catastrophe caused by ecology’s “neighbours from hell” – mankind.'],
 ['2. ',
  'The Year of Magical Thinking by Joan Didion (2005)',
  'This steely and devastating examination of the author’s grief following the sudden death of her husband changed the nature of writing about bereavement. '],
 ['3. ',
  'No Logo by Naomi Klein (1999)',
  ' Naomi Klein’s timely anti-branding bible combined a fresh approach to corporate hegemony with potent reportage from the dark side of capitalism. '],
 ['4. ',
  'Birthday Letters by Ted Hughes (1998)',
  ' These passionate, audacious poems addressed to Hughes’s late wife, Sylvia Plath, contribute to the couple’s mythology and are a landmark in English poetry. '],
 ['5. ',
  'Dreams from My Father by Barack Obama (1995)',
  ' This remarkably candid memoir revealed not only a literary talent, but a force that would change the face of US politics for ever. '],
 ['6. ',
  'A Brief History of Time by Stephen Hawking (1988)',
  ' The theoretical physicist’s mega-selling account of the origins of the universe is a masterpiece of scientific inquiry that has influenced the minds of a generation. '],
`
and so on until:
`['98. ',
  'The Anatomy of Melancholy by Robert Burton (1621)',
  'Burton’s garrulous, repetitive masterpiece is a compendious study of melancholia, a sublime literary doorstop that explores humanity in all its aspects.'],
 ['99. ',
  'The History of the World by Walter Raleigh (1614)',
  'Raleigh’s most important prose work, close to 1m words in total, used ancient history as a sly commentary on present-day issues.'],
 ['100. ',
  'King James Bible: The Authorised Version (1611)',
  'It is impossible to imagine the English-speaking world celebrated in this series without the King James Bible, which is as universal and influential as Shakespeare.']]`
 
I didn't want to print all 100 elements of this list here. But notice that this list begins with `[[` and ends with `]]` That is because this is a list of lists. Note the commas between each entry, `],[` that means that each list for each book is an element in the list-of-lists. 

## Real Shakespeare: Extra Credit
The Folger  Shakespeare Library has HTML versions of their Shakespeare publicly available, but in terrible HTML format. If you want to challenge yourself try pulling out the first 100 lines of Twelfth Night, available here:

http://floatingmedia.com/columbia/FolgerShakes/TN.html

The final output should resemble what you see below. Each of these lines contains three elements:

1) a code for act.scene.line along with whether is the stage direction 
2) the speaker or the last person who spoke prior to the stage direction
3) a line or stage direction.

`
line-SD 1.1.0	NOSPEAKER	Enter Orsino, Duke of Illyria, Curio, and other Lords,
line-SD 1.1.0	NOSPEAKER	with
line-SD 1.1.0	NOSPEAKER	 Musicians playing.
line-1.1.1	ORSINO	If music be the food of love, play on.
line-1.1.2	ORSINO	Give me excess of it, that, surfeiting,
line-1.1.3	ORSINO	The appetite may sicken and so die.
line-1.1.4	ORSINO	That strain again! It had a dying fall.
line-1.1.5	ORSINO	O, it came o’er my ear like the sweet sound
line-1.1.6	ORSINO	That breathes upon a bank of violets,
line-1.1.7	ORSINO	Stealing and giving odor. Enough; no more.
line-1.1.8	ORSINO	’Tis not so sweet now as it was before.
line-1.1.9	ORSINO	O spirit of love, how quick and fresh art thou,
line-1.1.10	ORSINO	That, notwithstanding thy capacity
line-1.1.11	ORSINO	Receiveth as the sea, naught enters there,
line-1.1.12	ORSINO	Of what validity and pitch soe’er,
line-1.1.13	ORSINO	But falls into abatement and low price
line-1.1.14	ORSINO	Even in a minute. So full of shapes is fancy
line-1.1.15	ORSINO	That it alone is high fantastical.
line-1.1.16	CURIO	Will you go hunt, my lord?
line-1.1.17	ORSINO	What, Curio?
line-1.1.18	CURIO	The hart.
line-1.1.19	ORSINO	Why, so I do, the noblest that I have.
line-1.1.20	ORSINO	O, when mine eyes did see Olivia first,
line-1.1.21	ORSINO	Methought she purged the air of pestilence.
line-1.1.22	ORSINO	That instant was I turned into a hart,
line-1.1.23	ORSINO	And my desires, like fell and cruel hounds,
line-1.1.24	ORSINO	E’er since pursue me.
line-SD 1.1.24.1	ORSINO	Enter Valentine.
line-1.1.25	ORSINO	How now, what news from her?
line-1.1.26	VALENTINE	So please my lord, I might not be admitted,
line-1.1.27	VALENTINE	But from her handmaid do return this answer:
line-1.1.28	VALENTINE	The element itself, till seven years’ heat,
line-1.1.29	VALENTINE	Shall not behold her face at ample view,
line-1.1.30	VALENTINE	But like a cloistress she will veilèd walk,
line-1.1.31	VALENTINE	And water once a day her chamber round
line-1.1.32	VALENTINE	With eye-offending brine—all this to season
line-1.1.33	VALENTINE	A brother’s dead love, which she would keep fresh
line-1.1.34	VALENTINE	And lasting in her sad remembrance.
line-1.1.35	ORSINO	O, she that hath a heart of that fine frame
line-1.1.36	ORSINO	To pay this debt of love but to a brother,
line-1.1.37	ORSINO	How will she love when the rich golden shaft
line-1.1.38	ORSINO	Hath killed the flock of all affections else
line-1.1.39	ORSINO	That live in her; when liver, brain, and heart,
line-1.1.40	ORSINO	These sovereign thrones, are all supplied, and filled
line-1.1.41	ORSINO	Her sweet perfections with one self king!
line-1.1.42	ORSINO	Away before me to sweet beds of flowers!
line-1.1.43	ORSINO	Love thoughts lie rich when canopied with bowers.
line-SD 1.1.43.1	ORSINO	They exit.
line-SD 1.2.0	ORSINO	Enter Viola, a Captain, and Sailors.
line-1.2.1	VIOLA	What country, friends, is this?
line-1.2.2	CAPTAIN	This is Illyria, lady.
line-1.2.3	VIOLA	And what should I do in Illyria?
line-1.2.4	VIOLA	My brother he is in Elysium.
line-1.2.5	VIOLA	Perchance he is not drowned.—What think you,
line-1.2.6	VIOLA	sailors?
line-1.2.7	CAPTAIN	It is perchance that you yourself were saved.
line-1.2.8	VIOLA	O, my poor brother! And so perchance may he be.
line-1.2.9	CAPTAIN	True, madam. And to comfort you with chance,
line-1.2.10	CAPTAIN	Assure yourself, after our ship did split,
line-1.2.11	CAPTAIN	When you and those poor number saved with you
line-1.2.12	CAPTAIN	Hung on our driving boat, I saw your brother,
line-1.2.13	CAPTAIN	Most provident in peril, bind himself
line-1.2.14	CAPTAIN	(Courage and hope both teaching him the practice)
line-1.2.15	CAPTAIN	To a strong mast that lived upon the sea,
line-1.2.16	CAPTAIN	Where, like Arion
line-1.2.16	CAPTAIN	 on the dolphin’s back,
line-1.2.17	CAPTAIN	I saw him hold acquaintance with the waves
line-1.2.18	CAPTAIN	So long as I could see.
line-SD 1.2.19	VIOLA	, giving
line-SD 1.2.19	VIOLA	 him money
line-1.2.19	VIOLA	For saying so, there’s gold.
line-1.2.20	VIOLA	Mine own escape unfoldeth to my hope,
line-1.2.21	VIOLA	Whereto thy speech serves for authority,
line-1.2.22	VIOLA	The like of him. Know’st thou this country?
line-1.2.23	CAPTAIN	Ay, madam, well, for I was bred and born
line-1.2.24	CAPTAIN	Not three hours’ travel from this very place.
line-1.2.25	VIOLA	Who governs here?
line-1.2.26	CAPTAIN	A noble duke, in nature as in name.
line-1.2.27	VIOLA	What is his name?
line-1.2.28	CAPTAIN	Orsino.
line-1.2.29	VIOLA	Orsino. I have heard my father name him.
line-1.2.30	VIOLA	He was a bachelor then.
line-1.2.31	CAPTAIN	And so is now, or was so very late;
line-1.2.32	CAPTAIN	For but a month ago I went from hence,
line-1.2.33	CAPTAIN	And then ’twas fresh in murmur (as, you know,
line-1.2.34	CAPTAIN	What great ones do the less will prattle of)
line-1.2.35	CAPTAIN	That he did seek the love of fair Olivia.
line-1.2.36	VIOLA	What’s she?
line-1.2.37	CAPTAIN	A virtuous maid, the daughter of a count
line-1.2.38	CAPTAIN	That died some twelvemonth since, then leaving her
line-1.2.39	CAPTAIN	In the protection of his son, her brother,
line-1.2.40	CAPTAIN	Who shortly also died, for whose dear love,
line-1.2.41	CAPTAIN	They say, she hath abjured the sight
line-1.2.42	CAPTAIN	And company of men.
line-1.2.43	VIOLA	O, that I served that lady,
line-1.2.44	VIOLA	And might not be delivered to the world
line-1.2.45	VIOLA	Till I had made mine own occasion mellow,
line-1.2.46	VIOLA	What my estate is.
line-1.2.47	CAPTAIN	That were hard to compass
line-1.2.48	CAPTAIN	Because she will admit no kind of suit,
`

Request and parse the HTML, and give it a shot!

In [None]:
raw_html3 = requests.get("http://floatingmedia.com/columbia/FolgerShakes/TN.html").content
soup_doc3 = BeautifulSoup(raw_html3, "html.parser")
print(soup_doc3.prettify())

In [156]:
# music = soup_doc3.find_all('span', attrs={'id':True, 'title':True})[18]
# music.string
# music.find_previous_siblings(class_='speaker')[0].string

'Will you go hunt, my lord?'

In [239]:
scene_lines = []
all_lines = soup_doc3.find_all('span', attrs={'id':True, 'title':True})
for line in all_lines[0:3]:
    one_line = []
    if line.text is not None:
        one_line.append(line['id'])
        one_line.append('NONSPEAKER')
        one_line.append(line.text.replace(u'\xa0', u' '))
        scene_lines.append(one_line)
for line in all_lines[3:]:
    one_line = []
    if line.text is not None:
        one_line.append(line['id'])
        speaker = line.find_previous_siblings(class_='speaker')
        if len(speaker) != 0:
            one_line.append(speaker[0].string)
        else:
            former_speaker = scene_lines[all_lines.index(line)-1][1]
            one_line.append(former_speaker)
        one_line.append(line.text.replace(u'\xa0', u' '))
        scene_lines.append(one_line)
scene_lines[:100]

[['line-SD 1.1.0',
  'NONSPEAKER',
  'Enter Orsino, Duke of Illyria, Curio, and other Lords,'],
 ['line-SD 1.1.0', 'NONSPEAKER', 'with'],
 ['line-SD 1.1.0', 'NONSPEAKER', ' Musicians playing.'],
 ['line-1.1.1', 'ORSINO', 'If music be the food of love, play on.'],
 ['line-1.1.2', 'ORSINO', 'Give me excess of it, that, surfeiting,'],
 ['line-1.1.3', 'ORSINO', 'The appetite may sicken and so die.'],
 ['line-1.1.4', 'ORSINO', 'That strain again! It had a dying fall.'],
 ['line-1.1.5', 'ORSINO', 'O, it came o’er my ear like the sweet sound'],
 ['line-1.1.6', 'ORSINO', 'That breathes upon a bank of violets,'],
 ['line-1.1.7', 'ORSINO', 'Stealing and giving odor. Enough; no more.'],
 ['line-1.1.8', 'ORSINO', '’Tis not so sweet now as it was before.'],
 ['line-1.1.9', 'ORSINO', 'O spirit of love, how quick and fresh art thou,'],
 ['line-1.1.10', 'ORSINO', 'That, notwithstanding thy capacity'],
 ['line-1.1.11', 'ORSINO', 'Receiveth as the sea, naught enters there,'],
 ['line-1.1.12', 'ORSINO', 

In [233]:
# for line in all_lines[3:28]:
#     speaker = line.find_previous_siblings(class_='speaker')
#     if len(speaker) == 0:
#         speaker = all_lines[all_lines.index(line)-1][1]
#         print(all_lines.index(line)-1)

26
