### Part I (18 points): 
  - Working with HTML, XML, and JSON
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more
than one author. 
  - For each book, include the title, authors, and two or three other attributes that you find
interesting. 
  - Take the information that you’ve selected about these three books, and separately create three
files which store the book’s information in HTML (using an html table), XML, and JSON formats (e.g.
“books.html”, “books.xml”, and “books.json”). To help you better understand the different file structures, I’d
prefer that you create each of these files “by hand” unless you’re already very comfortable with the file
formats. 
  - Write Python code, using your packages of choice, to load the information from each of the three
sources into separate PANDAS data frames. 
  - Are the three data frames identical? 
  - Your deliverable is the three
source files and the Python code. Post the three source files to GitHub and package your Python code within a
Jupyter notebook (along with your code for Part II below) and post it to GitHub as well.

In [22]:
import pandas as pd
import json

In [25]:
# Read JSON File from Github and convert to DataFrame
url = r'https://raw.githubusercontent.com/AVIMARCUS6/DAV-5400/master/Fall/Datasets/books2.json'
df = pd.read_json(url)

In [26]:
df

Unnamed: 0,Title,Author,ISBN,Amazon Price
0,The Kimball Group Reader,"Ralph Kimball, Margy Ross",978-1119216315,$50.00
1,Project Management The Managerial Process,"Clifford F. Gray, Erik W. Larson, Gautam V. Desai",978-9339212032,$28.03
2,Project Management for Non-Project Managers,Jack Ferraro,978-0814417362,$26.09


***

### XML:

In [86]:
from bs4 import BeautifulSoup
import urllib

url = r'https://raw.githubusercontent.com/AVIMARCUS6/DAV-5400/master/Fall/Datasets/books.xml'
response = urllib.request.urlopen(url)
data = response.read()      # a `bytes` object
text = data.decode('utf-8') # a `str`;
soup = BeautifulSoup(text,'xml')

#regex = "\>(.*?)\<"
books = soup.find_all('Books')

In [87]:
book_list = []
for book in books:
    book_list.append(book.get_text())

In [88]:
book_list

['\nThe Kimball Group Reader\n\nRalph Kimball\nMargy Ross\n\n978-1119216315\nWiley\n$50.00\n',
 '\nProject Management The Managerial Process\n\nClifford F. Gray\nErik W. Larson\nGautam V. Desai\n\n978-9339212032\nMgh\n$28.03\n',
 '\nProject Management for Non-Project Managers\nJack Ferraro\n978-0814417362\nAMACOM\n$26.09\n']

In [95]:
df = pd.DataFrame({'Title': book_list})

In [100]:
df['Title'].str.split( pat='\n',expand=True) 

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,,The Kimball Group Reader,,Ralph Kimball,Margy Ross,,978-1119216315,Wiley,$50.00,,
1,,Project Management The Managerial Process,,Clifford F. Gray,Erik W. Larson,Gautam V. Desai,,978-9339212032,Mgh,$28.03,
2,,Project Management for Non-Project Managers,Jack Ferraro,978-0814417362,AMACOM,$26.09,,,,,


***

### HTML:

In [44]:
# Read HTML File from Github
tables = pd.read_html('https://raw.githubusercontent.com/AVIMARCUS6/DAV-5400/master/Fall/Datasets/Books.html')

In [44]:
# The first item in list is the dataframe
books = tables[0]

In [45]:
books

Unnamed: 0,Title,Authors,ISBN,Publisher,Amazon Price
0,The Kimball Group Reader,"Ralph Kimball, Margy Ross",978-1119216315,Wiley,$50.00
1,Project Management The Managerial Process,"Clifford F. Gray, Erik W. Larson, Gautam V. Desa",978-9339212032,Mgh,$28.03
2,Project Management for Non-Project Managers,Jack Ferraro,978-0814417362,AMACOM,$26.09


***

### Part II (12 points): 
  - Working with Web API’s
The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com
  - You’ll need to start by signing up for an API key.
  - Your task is to then choose one of the New York Times APIs and construct an interface in Python to read JSON
data accessible via the API and transform that data into a Pandas data frame that is suitable for use in data
analysis work.

In [101]:
import json
import urllib.request
import bs4 as bs
from bs4 import  BeautifulSoup
import requests
import pandas as pd

In [102]:
apiKey = 'xp6gR9lEpsg4eIYMkGMvzN7DYraPre76'

In [103]:
url = 'https://api.nytimes.com/svc/books/v3/lists.json?list=hardcover-fiction&api-key=' + apiKey
# Check API response to check approval (200 = approved, 400 = Unauthorized request, 429: Too many requests.)
resp = requests.get(url)
resp

<Response [200]>

In [104]:
# convert the 'resp' object's JSON content into a list of native Python objects
data = resp.json()
type(data)

dict

In [105]:
# len of dictionary
len(data)

5

In [108]:
# check content of list
data

{'status': 'OK',
 'copyright': 'Copyright (c) 2019 The New York Times Company.  All Rights Reserved.',
 'num_results': 15,
 'last_modified': '2019-11-27T23:38:01-05:00',
 'results': [{'list_name': 'Hardcover Fiction',
   'display_name': 'Hardcover Fiction',
   'bestsellers_date': '2019-11-23',
   'published_date': '2019-12-08',
   'rank': 1,
   'rank_last_week': 0,
   'weeks_on_list': 1,
   'asterisk': 0,
   'dagger': 0,
   'amazon_product_url': 'https://www.amazon.com/Minute-Midnight-Atlee-Pine-Thriller/dp/1538761602?tag=NYTBS-20',
   'isbns': [{'isbn10': '1538761602', 'isbn13': '9781538761601'},
    {'isbn10': '1538734036', 'isbn13': '9781538734032'}],
   'book_details': [{'title': 'A MINUTE TO MIDNIGHT',
     'description': 'When Atlee Pine returns to her hometown to investigate her sister’s kidnapping from 30 years ago, she winds up tracking a potential serial killer.',
     'contributor': 'by David Baldacci',
     'author': 'David Baldacci',
     'contributor_note': '',
     'pric

In [112]:
# the data type of the first item in the list
type(data['results'][0])

dict

In [115]:
# what's in the 'results' component of the dict?
data['results'][0]

{'list_name': 'Hardcover Fiction',
 'display_name': 'Hardcover Fiction',
 'bestsellers_date': '2019-11-23',
 'published_date': '2019-12-08',
 'rank': 1,
 'rank_last_week': 0,
 'weeks_on_list': 1,
 'asterisk': 0,
 'dagger': 0,
 'amazon_product_url': 'https://www.amazon.com/Minute-Midnight-Atlee-Pine-Thriller/dp/1538761602?tag=NYTBS-20',
 'isbns': [{'isbn10': '1538761602', 'isbn13': '9781538761601'},
  {'isbn10': '1538734036', 'isbn13': '9781538734032'}],
 'book_details': [{'title': 'A MINUTE TO MIDNIGHT',
   'description': 'When Atlee Pine returns to her hometown to investigate her sister’s kidnapping from 30 years ago, she winds up tracking a potential serial killer.',
   'contributor': 'by David Baldacci',
   'author': 'David Baldacci',
   'contributor_note': '',
   'price': 0,
   'age_group': '',
   'publisher': 'Grand Central',
   'primary_isbn13': '9781538761601',
   'primary_isbn10': '1538761602'}],
 'reviews': [{'book_review_link': '',
   'first_chapter_link': '',
   'sunday_revi

In [114]:
# Convert extracted JSON data into a data frame
books = pd.DataFrame(data['results'])
books

Unnamed: 0,list_name,display_name,bestsellers_date,published_date,rank,rank_last_week,weeks_on_list,asterisk,dagger,amazon_product_url,isbns,book_details,reviews
0,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,1,0,1,0,0,https://www.amazon.com/Minute-Midnight-Atlee-P...,"[{'isbn10': '1538761602', 'isbn13': '978153876...","[{'title': 'A MINUTE TO MIDNIGHT', 'descriptio...","[{'book_review_link': '', 'first_chapter_link'..."
1,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,2,4,64,0,0,https://www.amazon.com/Where-Crawdads-Sing-Del...,"[{'isbn10': '0735219095', 'isbn13': '978073521...","[{'title': 'WHERE THE CRAWDADS SING', 'descrip...","[{'book_review_link': '', 'first_chapter_link'..."
2,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,3,2,6,0,0,https://www.amazon.com/Guardians-Novel-John-Gr...,"[{'isbn10': '0385544189', 'isbn13': '978038554...","[{'title': 'THE GUARDIANS', 'description': 'Cu...","[{'book_review_link': '', 'first_chapter_link'..."
3,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,4,1,2,0,0,https://www.amazon.com/Twisted-Twenty-Six-Step...,"[{'isbn10': '0399180192', 'isbn13': '978039918...","[{'title': 'TWISTED TWENTY-SIX', 'description'...","[{'book_review_link': '', 'first_chapter_link'..."
4,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,5,3,4,0,0,https://www.amazon.com/Blue-Moon-Jack-Reacher-...,"[{'isbn10': '0399593543', 'isbn13': '978039959...","[{'title': 'BLUE MOON', 'description': 'Jack R...","[{'book_review_link': '', 'first_chapter_link'..."
5,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,6,0,1,0,0,https://www.amazon.com/Clancy-Code-Honor-Jack-...,"[{'isbn10': '0525541721', 'isbn13': '978052554...","[{'title': 'TOM CLANCY: CODE OF HONOR', 'descr...","[{'book_review_link': '', 'first_chapter_link'..."
6,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,7,6,9,0,0,https://www.amazon.com/Dutch-House-Novel-Ann-P...,"[{'isbn10': '0062963678', 'isbn13': '978006296...","[{'title': 'THE DUTCH HOUSE', 'description': '...",[{'book_review_link': 'https://www.nytimes.com...
7,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,8,5,5,0,0,https://www.amazon.com/Night-Renée-Ballard-Har...,"[{'isbn10': '0316485616', 'isbn13': '978031648...","[{'title': 'THE NIGHT FIRE', 'description': 'H...","[{'book_review_link': '', 'first_chapter_link'..."
8,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,9,8,11,0,0,https://www.amazon.com/Institute-Novel-Stephen...,"[{'isbn10': '1982110562', 'isbn13': '978198211...","[{'title': 'THE INSTITUTE', 'description': 'Ch...",[{'book_review_link': 'https://www.nytimes.com...
9,Hardcover Fiction,Hardcover Fiction,2019-11-23,2019-12-08,10,7,6,0,0,https://www.amazon.com/Olive-Again-Novel-Eliza...,"[{'isbn10': '0812996542', 'isbn13': '978081299...","[{'title': 'OLIVE, AGAIN', 'description': 'In ...",[{'book_review_link': 'https://www.nytimes.com...
