# EXTRACT

### STEP 1: Extract data from kaggle's file 'nyt2.json'

Our first challenge started with the data file 'nyt2.json' we downloaded from kaggle. We expected a traditional json file (a list of dictionaries or a dictionary of dictionaries), but it is, instead, a compilation of json files, one per line. We wrote the code below to transform it into a true json file, which we named 'output.json'.

In [1]:
# Import dependencies needed

import json
import pandas as pd
from pprint import pprint

In [2]:
# Load 'nyt2.json' file into dataframe:

raw_nyt = pd.read_json('Resources/nyt2.json', lines=True, orient='columns')
raw_nyt.head()

Unnamed: 0,_id,amazon_product_url,author,bestsellers_date,description,price,published_date,publisher,rank,rank_last_week,title,weeks_on_list
0,{'$oid': '5b4aa4ead3089013507db18b'},http://www.amazon.com/Odd-Hours-Dean-Koontz/dp...,Dean R Koontz,{'$date': {'$numberLong': '1211587200000'}},"Odd Thomas, who can communicate with the dead,...",{'$numberInt': '27'},{'$date': {'$numberLong': '1212883200000'}},Bantam,{'$numberInt': '1'},{'$numberInt': '0'},ODD HOURS,{'$numberInt': '1'}
1,{'$oid': '5b4aa4ead3089013507db18c'},http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,{'$date': {'$numberLong': '1211587200000'}},Aliens have taken control of the minds and bod...,{'$numberDouble': '25.99'},{'$date': {'$numberLong': '1212883200000'}},"Little, Brown",{'$numberInt': '2'},{'$numberInt': '1'},THE HOST,{'$numberInt': '3'}
2,{'$oid': '5b4aa4ead3089013507db18d'},http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,{'$date': {'$numberLong': '1211587200000'}},A woman's happy marriage is shaken when she en...,{'$numberDouble': '24.95'},{'$date': {'$numberLong': '1212883200000'}},St. Martin's,{'$numberInt': '3'},{'$numberInt': '2'},LOVE THE ONE YOU'RE WITH,{'$numberInt': '2'}
3,{'$oid': '5b4aa4ead3089013507db18e'},http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,{'$date': {'$numberLong': '1211587200000'}},A Massachusetts state investigator and his tea...,{'$numberDouble': '22.95'},{'$date': {'$numberLong': '1212883200000'}},Putnam,{'$numberInt': '4'},{'$numberInt': '0'},THE FRONT,{'$numberInt': '1'}
4,{'$oid': '5b4aa4ead3089013507db18f'},http://www.amazon.com/Snuff-Chuck-Palahniuk/dp...,Chuck Palahniuk,{'$date': {'$numberLong': '1211587200000'}},An aging porn queens aims to cap her career by...,{'$numberDouble': '24.95'},{'$date': {'$numberLong': '1212883200000'}},Doubleday,{'$numberInt': '5'},{'$numberInt': '0'},SNUFF,{'$numberInt': '1'}


In [3]:
# Save DataFrame 'raw_nyt' into a json file ('output.json') and load it as 'data' we can now work with:

raw_nyt.to_json(path_or_buf='output.json', orient = "records")
with open('output.json') as file:
    data = json.load(file)
#pprint(data)

In [4]:
# Set up lists to hold reponse info:

nyt_ids = []
urls = []
authors = []
bestsellers_dates = []
descriptions = []
prices = []
published_dates = []
publishers = []
ranks = []
ranks_last_week = []
titles = []
weeks_on_lists = []

# Populate the lists:

for item in data:
    nyt_ids.append(item['_id']['$oid'])
    urls.append(item['amazon_product_url'])
    authors.append(item['author'])
    bestsellers_dates.append(item['bestsellers_date']['$date']['$numberLong'])
    descriptions.append(item['description'])
    published_dates.append(item['published_date']['$date']['$numberLong'])
    publishers.append(item['publisher'])
    ranks.append(item['rank']['$numberInt'])
    ranks_last_week.append(item['rank_last_week']['$numberInt'])
    titles.append(item['title'])
    weeks_on_lists.append(item['weeks_on_list']['$numberInt'])
# Here we have to check for the correct keyname before we can extract the price string:
    price_key, = item['price'].keys()
    if price_key == '$numberInt' or price_key == '$numberDouble':
        prices.append(item['price'][price_key])# Populate the lists:

In [5]:
# Create a DataFrame from the lists

bestsellers_dict = {
    "nyt_id": nyt_ids,
    "title": titles,
    "author": authors,
    "url": urls,
    "publisher": publishers,
    "description": descriptions,
    "list_price": prices,
    "published_date": published_dates,
    "bestseller_date": bestsellers_dates,
    "rank": ranks,
    "rank_last_week": ranks_last_week,
    "weeks_on_list": weeks_on_lists
}
bestsellers_data = pd.DataFrame(bestsellers_dict)
bestsellers_data.head()

Unnamed: 0,nyt_id,title,author,url,publisher,description,list_price,published_date,bestseller_date,rank,rank_last_week,weeks_on_list
0,5b4aa4ead3089013507db18b,ODD HOURS,Dean R Koontz,http://www.amazon.com/Odd-Hours-Dean-Koontz/dp...,Bantam,"Odd Thomas, who can communicate with the dead,...",27.0,1212883200000,1211587200000,1,0,1
1,5b4aa4ead3089013507db18c,THE HOST,Stephenie Meyer,http://www.amazon.com/The-Host-Novel-Stephenie...,"Little, Brown",Aliens have taken control of the minds and bod...,25.99,1212883200000,1211587200000,2,1,3
2,5b4aa4ead3089013507db18d,LOVE THE ONE YOU'RE WITH,Emily Giffin,http://www.amazon.com/Love-Youre-With-Emily-Gi...,St. Martin's,A woman's happy marriage is shaken when she en...,24.95,1212883200000,1211587200000,3,2,2
3,5b4aa4ead3089013507db18e,THE FRONT,Patricia Cornwell,http://www.amazon.com/The-Front-Garano-Patrici...,Putnam,A Massachusetts state investigator and his tea...,22.95,1212883200000,1211587200000,4,0,1
4,5b4aa4ead3089013507db18f,SNUFF,Chuck Palahniuk,http://www.amazon.com/Snuff-Chuck-Palahniuk/dp...,Doubleday,An aging porn queens aims to cap her career by...,24.95,1212883200000,1211587200000,5,0,1


In [6]:
bestsellers_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10195 entries, 0 to 10194
Data columns (total 12 columns):
nyt_id             10195 non-null object
title              10195 non-null object
author             10195 non-null object
url                10195 non-null object
publisher          10195 non-null object
description        10195 non-null object
list_price         10195 non-null object
published_date     10195 non-null object
bestseller_date    10195 non-null object
rank               10195 non-null object
rank_last_week     10195 non-null object
weeks_on_list      10195 non-null object
dtypes: object(12)
memory usage: 955.9+ KB


### STEP 2: Scrape info from amazon.com

In this step, we identify the list of unique Amazon URLs from our bestsellers_data DataFrame, and we visit each of those URLs to scrape the Amazon price offer, number of customer reviews, and average 5-star rating.
We encountered our second challenge here, as we had to iterate multiple times through our list of Amazon URLs and repetitively request the info we were seeking. Eventually we were able to scrape info for 1211 URLs out of the 2329 identified in our kaggle dataset.

In [7]:
# Import dependencies needed

from bs4 import BeautifulSoup
import requests
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}


In [8]:
# Get list of unique amazon urls

amazon_urls = list(dict.fromkeys(urls))
len(amazon_urls)

2329

# TRANSFORM

### STEP 1: Transform data from 'bestsellers_data' DataFrame

The original kaggle dataset contains over 2300 datapoints, yet we could not scrape all of the info sought in amazon for each of the unique urls contained in the dataset. [Explain why we think we could not scrape it all]. Each team member scraped on her end with different using requests, but iterating in different ways (see jupyter notebook X and Y). We then merged our scraped data together and ended up with a dataframe of with a number of NaN values on certain fields. 

All data extracted from the json file and read into our bestsellers_data DataFrame has a the type 'string'. We now need to transform some of into the correct data type. Specifically:
- 'list_price' should have the float type
- 'published_date' and 'bestseller_date' should be formatted as dates
- 'rank', 'rank_last_week', and 'weeks_on_list' should have the integer type

In [9]:
# Import dependencies needed

import numpy as np
from datetime import datetime

In [10]:
# Load our file of scraped Amazon data ('join_data.csv') into dataframe

Amazon_data1 = pd.read_csv("Output/join_data.csv")
Amazon_data2 = pd.read_csv("Output/more_scraped_data.csv")
print(Amazon_data1.count())
print(Amazon_data2.count())
print(bestsellers_data.count())

Unnamed: 0    1211
url           1211
reviews       1211
rating        1211
price         1211
dtype: int64
Unnamed: 0      2046
url             2046
nb_reviews      1641
nb_stars        1615
amazon_price    1237
img_url         1307
dtype: int64
nyt_id             10195
title              10195
author             10195
url                10195
publisher          10195
description        10195
list_price         10195
published_date     10195
bestseller_date    10195
rank               10195
rank_last_week     10195
weeks_on_list      10195
dtype: int64


In [11]:
# Group bestsellers_data by url

grouped = bestsellers_data.groupby('url')
bestsellers_grouped = grouped['url', 'title', 'author', 'publisher', 'description', 'list_price', 'weeks_on_list'].max()

# Calculate additional columns

bestsellers_grouped['first_date_listed'] = grouped[['published_date']].min()
bestsellers_grouped['last_date_listed'] = grouped[['published_date']].max()
bestsellers_grouped['worst_rank'] = grouped[['rank']].max()
bestsellers_grouped['best_rank'] = grouped[['rank']].min()
bestsellers_grouped['times_listed'] = grouped[['url']].count()
#bestsellers_grouped["better_price"] = bestsellers_df.apply(lambda row : 'Amazon' if row['NYT_Price'] > row['Amazon_Price'] else 'NYT', axis=1)
bestsellers_grouped.count()

url                  2329
title                2329
author               2329
publisher            2329
description          2329
list_price           2329
weeks_on_list        2329
first_date_listed    2329
last_date_listed     2329
worst_rank           2329
best_rank            2329
times_listed         2329
dtype: int64

In [12]:
# Merge dataframes

df = pd.merge(bestsellers_grouped, Amazon_data1, on='url',how='left')
df = pd.merge(df, Amazon_data2, on='url',how='left')
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2329 entries, 0 to 2328
Data columns (total 21 columns):
url                  2329 non-null object
title                2329 non-null object
author               2329 non-null object
publisher            2329 non-null object
description          2329 non-null object
list_price           2329 non-null object
weeks_on_list        2329 non-null object
first_date_listed    2329 non-null object
last_date_listed     2329 non-null object
worst_rank           2329 non-null object
best_rank            2329 non-null object
times_listed         2329 non-null int64
Unnamed: 0_x         1211 non-null float64
reviews              1211 non-null object
rating               1211 non-null object
price                1211 non-null object
Unnamed: 0_y         2046 non-null float64
nb_reviews           1641 non-null float64
nb_stars             1615 non-null float64
amazon_price         1237 non-null object
img_url              1307 non-null object
dtypes: 

Defaulting to column, but this will raise an ambiguity error in a future version
  exec(code_obj, self.user_global_ns, self.user_ns)


In [13]:
# Drop columns 'Unnamed: 0'

df.drop('Unnamed: 0_x', axis=1, inplace=True)
df.drop('Unnamed: 0_y', axis=1, inplace=True)

In [14]:
# Convert all price columns to float:

df['list_price'] = df['list_price'].apply(lambda x : float(x))
df['price']=df['price'].replace('[\$,]', '', regex=True).astype(float)
df['amazon_price']=df['amazon_price'].replace('[\$,]', '', regex=True).astype(float)

In [15]:
# Convert all date columns from unix time stamp to date format:

df['first_date_listed'] = df['first_date_listed'].apply(lambda x : datetime.utcfromtimestamp(int(x[:10])).strftime('%Y-%m-%d'))
df['last_date_listed'] = df['last_date_listed'].apply(lambda x : datetime.utcfromtimestamp(int(x[:10])).strftime('%Y-%m-%d'))


In [16]:
# Convert 'weeks_on_list' and 'reviews; to integer, 'rating' to float:

df['weeks_on_list'] = df['weeks_on_list'].apply(lambda x : int(x))

In [17]:
#df['reviews'] = df['reviews'].str.replace(',', '')
#df['reviews'] = df['reviews'].apply(lambda x : int(x.split()[0]))
#df['rating'] = df['rating'].apply(lambda x : x.split()[0])
df.head()

Unnamed: 0,url,title,author,publisher,description,list_price,weeks_on_list,first_date_listed,last_date_listed,worst_rank,best_rank,times_listed,reviews,rating,price,nb_reviews,nb_stars,amazon_price,img_url
0,http://www.amazon.com/10th-Anniversary-Womens-...,10TH ANNIVERSARY,James Patterson and Maxine Paetro,"Little, Brown",Detective Lindsay Boxer’s long-awaited wedding...,27.99,8,2011-05-22,2011-07-24,9,16,10,"1,365 customer reviews",4.6 out of 5 stars,9.99,1365.0,4.6,8.87,"{""https://images-na.ssl-images-amazon.com/imag..."
1,http://www.amazon.com/11-22-63-A-Novel/dp/1451...,11/22/63,Stephen King,Scribner,An English teacher travels back to 1958 by way...,35.0,9,2011-11-27,2012-05-13,6,1,23,"26,427 customer reviews",4.5 out of 5 stars,10.99,26432.0,,,
2,http://www.amazon.com/11th-Hour-Womens-Murder-...,11TH HOUR,James Patterson and Maxine Paetro,"Little, Brown","When a millionaire is gunned down, Detective L...",27.99,8,2012-05-27,2012-07-29,6,1,10,"1,858 customer reviews",4.6 out of 5 stars,9.99,,4.6,10.0,"{""https://images-na.ssl-images-amazon.com/imag..."
3,http://www.amazon.com/1225-Christmas-Tree-Lane...,1225 CHRISTMAS TREE LANE,Debbie Macomber,Mira,Puppies that need good homes and an ex-husband...,16.95,4,2011-10-16,2011-11-27,8,10,7,,,,741.0,,5.98,"{""https://images-na.ssl-images-amazon.com/imag..."
4,http://www.amazon.com/12th-Never-Womens-Murder...,12TH OF NEVER,James Patterson and Maxine Paetro,"Little, Brown","One week after the birth of her baby, Detectiv...",0.0,6,2013-05-19,2013-06-30,8,1,7,"3,789 customer reviews",4.5 out of 5 stars,9.99,3790.0,4.5,12.0,"{""https://images-na.ssl-images-amazon.com/imag..."


In [66]:
# Rename some columns

df = df.rename(columns={"list_price":"NYT_Price", "price":"Amazon_Price", 'url':'Amazon_url', 'reviews': 'nb_Amazon_reviews', 'rating': 'star_rating'})


In [105]:
# Group data by bestseller

grouped = df.groupby('Amazon_url')
bestsellers_df = grouped['Amazon_url', 'title', 'author', 'publisher', 'description', 'star_rating', 'nb_Amazon_reviews', 'weeks_on_list', 'NYT_Price', 'Amazon_Price'].max()

# Calculate additional columns

bestsellers_df['first_date_listed'] = grouped[['published_date']].min()
bestsellers_df['last_date_listed'] = grouped[['published_date']].max()
bestsellers_df['worst_rank'] = grouped[['rank']].max()
bestsellers_df['best_rank'] = grouped[['rank']].min()
bestsellers_df['times_listed'] = grouped[['Amazon_url']].count()
bestsellers_df["better_price"] = bestsellers_df.apply(lambda row : 'Amazon' if row['NYT_Price'] > row['Amazon_Price'] else 'NYT', axis=1)
bestsellers_df.head()

Unnamed: 0_level_0,Amazon_url,title,author,publisher,description,star_rating,nb_Amazon_reviews,weeks_on_list,NYT_Price,Amazon_Price,first_date_listed,last_date_listed,worst_rank,best_rank,times_listed,better_price
Amazon_url,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
http://www.amazon.com/10th-Anniversary-Womens-Murder-Club/dp/1455511463?tag=NYTBS-20,http://www.amazon.com/10th-Anniversary-Womens-...,10TH ANNIVERSARY,James Patterson and Maxine Paetro,"Little, Brown",Detective Lindsay Boxer’s long-awaited wedding...,4.6,1365,8,27.99,9.99,2011-05-22,2011-07-24,18,2,10,Amazon
http://www.amazon.com/11-22-63-A-Novel/dp/1451627297?tag=NYTBS-20,http://www.amazon.com/11-22-63-A-Novel/dp/1451...,11/22/63,Stephen King,Scribner,An English teacher travels back to 1958 by way...,4.5,26427,21,35.0,10.99,2011-11-27,2012-05-13,20,1,23,Amazon
http://www.amazon.com/11th-Hour-Womens-Murder-Club/dp/0446571830?tag=NYTBS-20,http://www.amazon.com/11th-Hour-Womens-Murder-...,11TH HOUR,James Patterson and Maxine Paetro,"Little, Brown","When a millionaire is gunned down, Detective L...",4.6,1858,8,27.99,9.99,2012-05-27,2012-07-29,20,1,10,Amazon
http://www.amazon.com/12th-Never-Womens-Murder-Club/dp/1455515795?tag=NYTBS-20,http://www.amazon.com/12th-Never-Womens-Murder...,12TH OF NEVER,James Patterson and Maxine Paetro,"Little, Brown","One week after the birth of her baby, Detectiv...",4.5,3789,6,0.0,9.99,2013-05-19,2013-06-30,19,1,7,NYT
http://www.amazon.com/14th-Deadly-Womens-Murder-Club/dp/031640702X?tag=NYTBS-20,http://www.amazon.com/14th-Deadly-Womens-Murde...,14TH DEADLY SIN,James Patterson and Maxine Paetro,"Little, Brown",Detective Lindsay Boxer and her friends must r...,4.3,4140,7,0.0,9.99,2015-05-24,2015-07-12,20,1,8,NYT


# LOAD into MongoDB

### Why MongoDB?

The original kaggle dataset contains over 2300 datapoints, yet we could not scrape all of the info sought in amazon for each of the unique urls contained in the dataset. [Explain why we think we could not scrape it all]. Each team member scraped on her end with different using requests, but iterating in different ways (see jupyter notebook X and Y). We then merged our scraped data together and ended up with a dataframe of with a number of NaN values on certain fields. Since we wanted to keep all info for future app developpement, and one field is a long string (book description), we thought MongoDB would offer the best flexibility to store our data.

In [21]:
# Import dependencies needed
import pymongo

In [22]:
# Initialize PyMongo to work with MongoDB

conn = 'mongodb://localhost:27017'
client = pymongo.MongoClient(conn)

# Check if database exists already and drop it if so

dblist = client.list_database_names()
if "nyt_bestsellers" in dblist:
    client.drop_database("nyt_bestsellers")

In [23]:
# Define database and collection

db = client.nyt_bestsellers
collection = db.bestsellers

In [24]:
bestsellers_df = pd.read_csv("Output/output_table_final_version.csv")
bestsellers_df.head()

Unnamed: 0.1,Unnamed: 0,nyt_id,title,Amazon_url,author,publisher,description,published_date,bestseller_date,rank,rank_last_week,weeks_on_list,NYT_Price,Amazon_Price,Amazon_reviews,Amazon_rating,Amazon_Discount%
0,0,5b4aa4ead3089013507db18b,ODD HOURS,http://www.amazon.com/Odd-Hours-Dean-Koontz/dp...,Dean R Koontz,Bantam,"Odd Thomas, who can communicate with the dead,...",2008-06-08,2008-05-24,1,0,1,27.0,7.99,920.0,4.4,29.59
1,1,5b4aa4ead3089013507db18c,THE HOST,http://www.amazon.com/The-Host-Novel-Stephenie...,Stephenie Meyer,"Little, Brown",Aliens have taken control of the minds and bod...,2008-06-08,2008-05-24,2,1,3,25.99,7.99,6109.0,4.5,30.74
2,2,5b4aa4ead3089013507db18d,LOVE THE ONE YOU'RE WITH,http://www.amazon.com/Love-Youre-With-Emily-Gi...,Emily Giffin,St. Martin's,A woman's happy marriage is shaken when she en...,2008-06-08,2008-05-24,3,2,2,24.95,8.99,702.0,4.0,36.03
3,3,5b4aa4ead3089013507db18e,THE FRONT,http://www.amazon.com/The-Front-Garano-Patrici...,Patricia Cornwell,Putnam,A Massachusetts state investigator and his tea...,2008-06-08,2008-05-24,4,0,1,22.95,7.99,323.0,3.0,34.81
4,4,5b4aa4ead3089013507db18f,SNUFF,http://www.amazon.com/Snuff-Chuck-Palahniuk/dp...,Chuck Palahniuk,Doubleday,An aging porn queens aims to cap her career by...,2008-06-08,2008-05-24,5,0,1,24.95,9.99,237.0,3.5,40.04


In [25]:
# Create a list of dictionaries that hold the mongoDB documents to be inserted

post = bestsellers_df.to_dict(orient='records')
print(f"Posting {len(post)} documents into collection of bestsellers inside nyt_bestsellers mongo database...")

# Insert the list of documents into the database

collection.insert_many(post)

Posting 2194 documents into collection of bestsellers inside nyt_bestsellers mongo database...


<pymongo.results.InsertManyResult at 0x111725388>

In [26]:
# Verify results

results = collection.find()
for result in results:
    print(result)

{'_id': ObjectId('5c808d57bd7b6a18fc59d51d'), 'Unnamed: 0': 0, 'nyt_id': '5b4aa4ead3089013507db18b', 'title': 'ODD HOURS', 'Amazon_url': 'http://www.amazon.com/Odd-Hours-Dean-Koontz/dp/0553807056?tag=NYTBS-20', 'author': 'Dean R Koontz', 'publisher': 'Bantam', 'description': 'Odd Thomas, who can communicate with the dead, confronts evil forces in a California coastal town.', 'published_date': '2008-06-08', 'bestseller_date': '2008-05-24', 'rank': 1, 'rank_last_week': 0, 'weeks_on_list': 1, 'NYT_Price': 27.0, 'Amazon_Price': 7.99, 'Amazon_reviews': 920.0, 'Amazon_rating': 4.4, 'Amazon_Discount%': 29.59}
{'_id': ObjectId('5c808d57bd7b6a18fc59d51e'), 'Unnamed: 0': 1, 'nyt_id': '5b4aa4ead3089013507db18c', 'title': 'THE HOST', 'Amazon_url': 'http://www.amazon.com/The-Host-Novel-Stephenie-Meyer/dp/0316218502?tag=NYTBS-20', 'author': 'Stephenie Meyer', 'publisher': 'Little, Brown', 'description': 'Aliens have taken control of the minds and bodies of most humans, but one woman won’t surrender.

{'_id': ObjectId('5c808d57bd7b6a18fc59dc35'), 'Unnamed: 0': 1816, 'nyt_id': '5b4aa4ead3089013507dd248', 'title': 'BEFORE THE FALL', 'Amazon_url': 'http://www.amazon.com/Before-Fall-Noah-Hawley-ebook/dp/B0151YQUTE?tag=NYTBS-20', 'author': 'Noah Hawley', 'publisher': 'Grand Central', 'description': 'After a private jet crashes, a firestorm of media madness ensues.', 'published_date': '2016-06-19', 'bestseller_date': '2016-06-04', 'rank': 2, 'rank_last_week': 0, 'weeks_on_list': 1, 'NYT_Price': 0.0, 'Amazon_Price': 9.99, 'Amazon_reviews': 4075.0, 'Amazon_rating': 4.0, 'Amazon_Discount%': inf}
{'_id': ObjectId('5c808d57bd7b6a18fc59dc36'), 'Unnamed: 0': 1817, 'nyt_id': '5b4aa4ead3089013507dd249', 'title': 'ALL SUMMER LONG', 'Amazon_url': 'http://www.amazon.com/Summer-Long-Dorothea-Benton-Frank-ebook/dp/B015MOCRMC?tag=NYTBS-20', 'author': 'Dorothea Benton Frank', 'publisher': 'Morrow/HarperCollins', 'description': 'A successful interior decorator balks at retiring with her husband to South C

### STEP 2: Render data using Flask

In this step we build an app to render our data.

In [None]:
# Import and set-up Flask

from flask import Flask, jsonify
app = Flask(__name__)

# Define Flask routes

@app.route("/")
def welcome():
    return (
        f"Welcome to Best of the NYT BestSellers!<br/>"
        f"Available Routes:<br/>"
        f"/api/v1.0/justice-league"
    )


if __name__ == "__main__":
    app.run(debug=True)