<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module6_Part1_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 6 - Part 1 Python: Application Programming Interface (API) - Yelp

## What is an API

An <strong>Application Programming Interface (API)</strong> is common method to obtain and share data across various applications. 


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1k7Y6z2RSr_91vhSduvYYqPB0h85AEooX"></p>

Source:  https://en.wikipedia.org/wiki/API

An API allows for communication to easily transpire between a user and database (or server).

<p align='center'><img src="https://drive.google.com/uc?export=view&id=106-gH7Vj2VFPmFYRZeNYgKPQKZgl3cPD"></p>

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Example 6.1.P
For this notebook, we will use the Yelp API to obtain data regarding the *Best Restaurants* in Winona, MN. 
 
The following search criteria will be used

*   Locaton: Winona, MN
*   Search Term: Best Restaurants
*   Price: $, 1 dollar sign implies cheapest 
*   Sort by: Highest Rated

Source:  https://www.yelp.com/search?find_desc=Best%20Restaurants&find_loc=Winona%2C%20MN%2055987&attrs=RestaurantsPriceRange2.1&sortby=rating 

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Search via Yelp

Consider the following search done by Yelp.  The specifications of this search include: Search Term = "Best Restaurants", Location = "Winona, MN", 1 Filter applied for "$" (i.e. cheapest), and Sort outcomes by "Highest Rated".


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1gEpb125_jZfFg3hMvf3BWE2iut4tkpYd" width='100%' height='100%'></p>

The outcomes returned by Yelp are provided here.  Notice that the outcomes are *not* necessarily sorted by Highest Rating.  Bonnie Rae's Cafe is actually the highest rated is this list of three, but is not listed on top.  I would imagine that Yelp takes into consideration the number of reviews when determining "best".  For example, a restraurant with an overall rating of 5 based on two reviews should probably be rated lower than a restraurant with an overall rating of 4.5 based on several hundred reviews.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1k8j_BT3VBuY4tuFNQZDY6wAZIm7kPvSB" width='50%' height='50%'></p>

<strong>Goal</strong>:  To obtain a list that is truely sorted by Rating.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1viEjHStuLOVfnlptrUe9ZdRleMM0w7D7" width="50%" height="50%"></p>

## Setting up an Account at Yelp

Most often, the first step in using an API is to create an account with the organization that owns the API.  For this Notebook we will be using the Yelp APO; thus, an developer account will need to be created at Yelp. 

Yelp Developer Site: https://www.yelp.com/developers

The next step is to fill out the required form for your new "app".  In this class, we will not be creating an actual app, but this form is required in order to obtain a:

*    <strong>Client ID</strong>: Unique identifier for yourself 
*    <strong> API Key</strong>: Unique identifier for your app

These two identifiers are somewhat common when working with APIs.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=14_kwYRq9yTLwtog0Rd8Gv06M_GaaxDWA" width='50%' height='50%'></p>

## Setting up Python

First, install pandas and np packages for working with data in Python.

In [None]:
import pandas as pd
import numpy as np

The <strong>YelpAPI</strong> package in Python requires the specificaiton of your Client ID and API Key.  Obtain the Client ID and API Key from the Yelp Developer site and copy and paste these strings here.

In [None]:
#Setting up API Connection Information
client_id = ""
api_key = ""

Next, download the YelpAPI package for Python. This will make connecting the the Yelp Fusion API *much* easier.

Source:  https://github.com/gfairchild/yelpapi

In [None]:
pip install yelpapi



Next, load the YelpAPI package into this Colab session.

In [None]:
from yelpapi import YelpAPI

Next, the <strong>YelpAPI()</strong> function establishes a connection with the Yelp Fusion API.  Additional details regarding this fuction can be found on the github site referenced above. 

In [None]:
yelp_api = YelpAPI(api_key)

## Making an API Call

A call to the API requires specification of a set of parameters.  Some of these parameters are required, e.g. location, and others are options, e.g. price.  The list of possible parameters can be found on the Yelp's documuntation page.

Source: https://www.yelp.com/developers/documentation/v3/business_search



<p align='center'><img src="https://drive.google.com/uc?export=view&id=1vJZV638yTbpJvLZOwF7imatucWLGNvmr" width='50%' height='50%'></p>

The following specifications will be used for our first API call.


*   <strong>Term</strong>: Best Restaurants
*   <strong>Location</strong>: Winona, MN
*   <strong>Search Limit</strong>: Use 50 -- the maximum possible


<strong>Comment</strong>:  If more than 50 outcomes are desired, the <strong>offset</strong> parameter can be used to obtain the *next* set of 50.



In [None]:
#Making a call to the Yelp Fusion API
term = 'Best Restaurants'
location = 'Winona, MN'
search_limit = 50  # 50 is the maximum for a single api call


response = yelp_api.search_query(term = term,
                                 location = location,
                                 limit = search_limit)

## The API Outcomes

Often, the <strong>JSON</strong> data format is used by APIs. A JSON data struture is much more flexible than a dataframe, e.g. JSON allows for *nested* data structures.  Most often a dataframe requires data to be in a tabluar format with clearly defined rows and columns.  A quick review of the information returned by the YelpAPI suggests that a dataframe is not the best way to store such informaiton. 





<p align='center'><img src="https://drive.google.com/uc?export=view&id=1XR5qYZQnb4pC4tyFm8qI3uDD2mTfNc8y" width='75%' height='75%'></p>

The YelpAPI has *automatically* converted the JSON data structure into a Python dictionary - a commonly used data struture within Python that allows for more struture than a dataframe.

In [None]:
type(response)

dict

Taking a look at the contents of the response dictionary.

In [None]:
response

{'businesses': [{'alias': 'culvers-winona',
   'categories': [{'alias': 'hotdogs', 'title': 'Fast Food'},
    {'alias': 'icecream', 'title': 'Ice Cream & Frozen Yogurt'},
    {'alias': 'burgers', 'title': 'Burgers'}],
   'coordinates': {'latitude': 44.04708839393416,
    'longitude': -91.67642196126742},
   'display_phone': '(507) 457-9030',
   'distance': 1439.097853692298,
   'id': 'rhd0RNi4hAWs8v7lMoUi1g',
   'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/bcGL1oPG-fkCF5QI4znL-A/o.jpg',
   'is_closed': False,
   'location': {'address1': '1441 Service Dr',
    'address2': '',
    'address3': '',
    'city': 'Winona',
    'country': 'US',
    'display_address': ['1441 Service Dr', 'Winona, MN 55987'],
    'state': 'MN',
    'zip_code': '55987'},
   'name': "Culver's",
   'phone': '+15074579030',
   'price': '$',
   'rating': 3.5,
   'review_count': 12,
   'transactions': [],
   'url': 'https://www.yelp.com/biz/culvers-winona?adjust_creative=Lb5N_d2GlF2uqp7cMBNlBg&utm_campaign=ye



**Problem**:  The data pulled from the Yelp API appears to be incomplete.  In particular, one of the top rated restaurants, i.e. Beno's Deli, is missing and was not retrieved in this first set of 50 restaurant that were retrieved from the Yelp Fusion API.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1lwvoZDdYpuYYxWIyWgH_DXnOIKYba3TQ" width='50%' height='50%'></p>

## Using Offset parameter 

The **offset** parameter for the Yelp Fusion API will be used to obtain more restuarants beyond the search limit.  Currently, the maximum number of restaurants that can be pulled from a single API call is 50. The Offset argument in the **yelp_api.search_query() ** will tell the Yelp Fusion API to retrieve a second set of restaurants.  



<p align='center'><img src="https://drive.google.com/uc?export=view&id=10MNgr3aOrrjNuqd_X8QyWMV72k6n3YvY" width='50%' height='50%'></p>

Consider the following code that retrieves 3 sets of 50 restaurants from the Yelp Fusion API.

In [None]:
term = 'Best Restaurants'
location = 'Winona, MN'
search_limit = 50  # 50 is the maximum for a single api call

#Calling Yelp API for Set #1
response_set1 = yelp_api.search_query(term = term,
                                 location = location,
                                 limit = search_limit,
                                 offset=0)

#Calling Yelp API for Set #2, Offset=50
response_set2 = yelp_api.search_query(term = term,
                                 location = location,
                                 limit = search_limit,
                                 offset=50)

#Calling Yelp API for Set #3, Offset=100
response_set3 = yelp_api.search_query(term = term,
                                 location = location,
                                 limit = search_limit,
                                 offset=100)


## Putting data into a single data.frame

The next data processing step will be to put the outcomes from each set into a single data.frame.  A custom function called convert_dict_to_df() will be used for this task.

<table width='100%'><tr><td bgcolor='orange' align='center'><font size="+2">Python Function</font></td></tr></table>

In [None]:
def convert_dict_to_df(mydictionary):
  '''  Purpose: This function is used to convert a dictionay returned from the Yelp Fusion API into a data.frame 

       Args:
         mydictionary: a dictionary that is returned by the yelp_api.search_query() function
                 
      Returns: the converted data.frame 
  '''
  
  cols = list(mydictionary['businesses'][0].keys())
  local_df = pd.DataFrame(columns=cols)

  for biz in mydictionary['businesses']:
    local_df = local_df.append(biz, ignore_index=True)


  return(local_df)

<table width='100%'><tr><td bgcolor='orange' align='center'><font size="+2">&nbsp;</font></td></tr></table>

Next, use the custom convert_dict_to_df() function to convert each dictionary to a data.frame.  

In [None]:
#Using the convert_dict_to_df() function for each set
WinonaYelp_Set1 = convert_dict_to_df(response_set1)
WinonaYelp_Set2 = convert_dict_to_df(response_set2)
WinonaYelp_Set3 = convert_dict_to_df(response_set3)

Checking the data type and size of the the data.frame returned by the convert_dict_to_df() function.

In [None]:
print(type(WinonaYelp_Set1))
print(WinonaYelp_Set1.shape)

<class 'pandas.core.frame.DataFrame'>
(50, 16)


Next, using pd.concat() function to concatenate / append the three data.frames.

In [None]:
WinonaYelp = pd.concat([WinonaYelp_Set1, WinonaYelp_Set2, WinonaYelp_Set3])

Making sure the three data.frames were correctly appended. 

In [None]:
WinonaYelp.shape

(150, 16)

Using the head() method to look at the first three records in the WinonaYelp data.frame.

In [None]:
WinonaYelp.head(3)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,G2ptcgOx5e9usmxW38OoOw,hillside-fish-house-marshland-2,Hillside Fish House,https://s3-media3.fl.yelpcdn.com/bphoto/IBZ-vV...,False,https://www.yelp.com/biz/hillside-fish-house-m...,39,"[{'alias': 'wine_bars', 'title': 'Wine Bars'},...",4.0,"{'latitude': 44.07125, 'longitude': -91.55721}",[],$$,"{'address1': 'W124 State Rd 35 54', 'address2'...",16086876141,(608) 687-6141,8487.76768
1,b7AuH8sAf0IDs6ZR8sPAtg,the-boat-house-winona,The Boat House,https://s3-media2.fl.yelpcdn.com/bphoto/sx-wDQ...,False,https://www.yelp.com/biz/the-boat-house-winona...,92,"[{'alias': 'newamerican', 'title': 'American (...",3.5,"{'latitude': 44.0550145, 'longitude': -91.6381...",[],$$,"{'address1': '2 Johnson St', 'address2': '', '...",15074746550,(507) 474-6550,1786.461266
2,w_G_5c6IxJ98-3WT01XLnw,signatures-restaurant-winona-3,Signatures Restaurant,https://s3-media2.fl.yelpcdn.com/bphoto/ryaRwx...,False,https://www.yelp.com/biz/signatures-restaurant...,36,"[{'alias': 'newamerican', 'title': 'American (...",3.5,"{'latitude': 44.007709875787, 'longitude': -91...",[],$$$,"{'address1': '22852 County Rd 17', 'address2':...",15074543767,(507) 454-3767,6653.677548


<strong>Note</strong>:  Not all of the contents will be converted into seperate fields. For example, the contents of the categories, coordinates, and location fields have not yet be exploded into seperate columns.

<table width='100%' ><tr><td bgcolor='orange' align='center'><font size="+2">Aside: Google Sheet Style Output in Colab</font></td></tr></table>

The following extension can be used to preview data via a Google Sheet structure inside of Colab.

In [None]:
%load_ext google.colab.data_table

The following command can be used to unload this extension from this session of Colab.

In [None]:
%unload_ext google.colab.data_table

<table width='100%' ><tr><td bgcolor='orange'><font size="+2">&nbsp;</font></td></tr></table>

Looking at *all* rows in a window that acts more like a Google Sheet that includes Filter options, etc.

In [None]:
WinonaYelp

## Using dfply package for cleaning the data.frame

In [None]:
pip install dfply

Collecting dfply
  Downloading dfply-0.3.3-py3-none-any.whl (612 kB)
[?25l[K     |▌                               | 10 kB 21.9 MB/s eta 0:00:01[K     |█                               | 20 kB 10.1 MB/s eta 0:00:01[K     |█▋                              | 30 kB 9.2 MB/s eta 0:00:01[K     |██▏                             | 40 kB 8.4 MB/s eta 0:00:01[K     |██▊                             | 51 kB 4.4 MB/s eta 0:00:01[K     |███▏                            | 61 kB 5.1 MB/s eta 0:00:01[K     |███▊                            | 71 kB 5.6 MB/s eta 0:00:01[K     |████▎                           | 81 kB 4.1 MB/s eta 0:00:01[K     |████▉                           | 92 kB 4.6 MB/s eta 0:00:01[K     |█████▍                          | 102 kB 5.1 MB/s eta 0:00:01[K     |█████▉                          | 112 kB 5.1 MB/s eta 0:00:01[K     |██████▍                         | 122 kB 5.1 MB/s eta 0:00:01[K     |███████                         | 133 kB 5.1 MB/s eta 0:00:01[K     

In [None]:
from dfply import *

The following actions will take place on this dataframe.


*   filter: filter on price == $, i.e. only keep cheapest restaurants 
*   arrange: sort the list by rating, and also by the number of reviews
*   filter: only keep restaurants that have 10 or more reviews
*   select: select a subset of fields



In [None]:
WinonaList = (
                 WinonaYelp
                 >> filter_by(X.price == '$')
                 >> arrange(desc(X.rating), desc(X.review_count))
                 >> filter_by(X.review_count >= 10)
                 >> select(X.name, X.image_url, X.review_count, X.rating, X.location)
             )

#Checking the dataframe
WinonaList

Unnamed: 0,name,image_url,review_count,rating,location
30,NorthEnd Pub & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/x5jzKR...,14,5.0,"{'address1': '214 N Main St', 'address2': '', ..."
31,The Root Note,https://s3-media2.fl.yelpcdn.com/bphoto/ahQdtU...,82,4.5,"{'address1': '115 4th St S', 'address2': '', '..."
47,Pickerman's Soup & Sandwiches,https://s3-media4.fl.yelpcdn.com/bphoto/lfQw31...,64,4.5,"{'address1': '327 Jay St', 'address2': '', 'ad..."
36,Taste of Thai,https://s3-media4.fl.yelpcdn.com/bphoto/IpG7kh...,42,4.5,"{'address1': '205 S Holmen Dr', 'address2': 'S..."
27,River Rats Bar and Grill,https://s3-media1.fl.yelpcdn.com/bphoto/FfTc_M...,35,4.5,"{'address1': '1311 La Crescent Pl', 'address2'..."
31,Kaddy's Kafe,https://s3-media3.fl.yelpcdn.com/bphoto/cJXl6r...,28,4.5,"{'address1': '236 E Main St', 'address2': '', ..."
19,Garden of Eatin',https://s3-media2.fl.yelpcdn.com/bphoto/rW68Cp...,24,4.5,"{'address1': '19847 E Gale Ave', 'address2': '..."
15,Beno's Deli,https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC...,22,4.5,"{'address1': '78 E 4th St', 'address2': '', 'a..."
16,Brewskie's Bar & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/llnczw...,21,4.5,"{'address1': '110 E Main St', 'address2': '', ..."
12,Barista's Coffee House,https://s3-media2.fl.yelpcdn.com/bphoto/P-MzJ7...,18,4.5,"{'address1': '110 N Grant St', 'address2': Non..."


The existing dataframe needs to be modified to include only locations in <strong>Winona, MN</strong>.  Furthermore, the display_address field will be used to create an connection with the Google Map API.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=1Tk9TYbPYU7bIUcLUiX0LdrrW-X2U16OG" width='75%' height='75%'></p>

The following json_normalize() function is used to seperate the location field into seperate columns.

In [None]:
WinonaList_NormalizeLocation = pd.json_normalize(WinonaList['location'])

<strong>Note</strong>:  The following snipit of code to accomplish that same -- pd.DataFrame(WinonaList['location'].tolist())

In [None]:
WinonaList_NormalizeLocation

Unnamed: 0,address1,address2,address3,city,zip_code,country,state,display_address
0,214 N Main St,,,Cochrane,54622,US,WI,"[214 N Main St, Cochrane, WI 54622]"
1,115 4th St S,,,La Crosse,54601,US,WI,"[115 4th St S, La Crosse, WI 54601]"
2,327 Jay St,,,La Crosse,54601,US,WI,"[327 Jay St, La Crosse, WI 54601]"
3,205 S Holmen Dr,Ste 106,,Holmen,54636,US,WI,"[205 S Holmen Dr, Ste 106, Holmen, WI 54636]"
4,1311 La Crescent Pl,,,French Island,54603,US,WI,"[1311 La Crescent Pl, French Island, WI 54603]"
5,236 E Main St,,,La Crescent,55947,US,MN,"[236 E Main St, La Crescent, MN 55947]"
6,19847 E Gale Ave,,,Galesville,54630,US,WI,"[19847 E Gale Ave, Galesville, WI 54630]"
7,78 E 4th St,,,Winona,55987,US,MN,"[78 E 4th St, Winona, MN 55987]"
8,110 E Main St,,,Utica,55979,US,MN,"[110 E Main St, Utica, MN 55979]"
9,110 N Grant St,,,Houston,55943,US,MN,"[110 N Grant St, Houston, MN 55943]"


Next, the contents of the WinonaList_NormalizeLocation dataframe will need to be joined to WinonaList dataframe from above (that contains all the other columns).

The inherent <strong>index</strong> will be as the *key* for this simple join.  The index for the WinonaList_NormalizeLocation dataframe goes from 0 to 17.  The index in the WinonaList dataframe will reset to go from 0 to 17.  The current index in the WinonaList is from before the filters being applied above.

In [None]:
WinonaList = WinonaList.reset_index(drop=True)

Check to make sure reindexing worked for the WinonaList dataframe.

In [None]:
WinonaList

Unnamed: 0,name,image_url,review_count,rating,location
0,NorthEnd Pub & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/x5jzKR...,14,5.0,"{'address1': '214 N Main St', 'address2': '', ..."
1,The Root Note,https://s3-media2.fl.yelpcdn.com/bphoto/ahQdtU...,82,4.5,"{'address1': '115 4th St S', 'address2': '', '..."
2,Pickerman's Soup & Sandwiches,https://s3-media4.fl.yelpcdn.com/bphoto/lfQw31...,64,4.5,"{'address1': '327 Jay St', 'address2': '', 'ad..."
3,Taste of Thai,https://s3-media4.fl.yelpcdn.com/bphoto/IpG7kh...,42,4.5,"{'address1': '205 S Holmen Dr', 'address2': 'S..."
4,River Rats Bar and Grill,https://s3-media1.fl.yelpcdn.com/bphoto/FfTc_M...,35,4.5,"{'address1': '1311 La Crescent Pl', 'address2'..."
5,Kaddy's Kafe,https://s3-media3.fl.yelpcdn.com/bphoto/cJXl6r...,28,4.5,"{'address1': '236 E Main St', 'address2': '', ..."
6,Garden of Eatin',https://s3-media2.fl.yelpcdn.com/bphoto/rW68Cp...,24,4.5,"{'address1': '19847 E Gale Ave', 'address2': '..."
7,Beno's Deli,https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC...,22,4.5,"{'address1': '78 E 4th St', 'address2': '', 'a..."
8,Brewskie's Bar & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/llnczw...,21,4.5,"{'address1': '110 E Main St', 'address2': '', ..."
9,Barista's Coffee House,https://s3-media2.fl.yelpcdn.com/bphoto/P-MzJ7...,18,4.5,"{'address1': '110 N Grant St', 'address2': Non..."


Complete the simple join.

In [None]:
WinonaList_WithLocation = WinonaList.join(WinonaList_NormalizeLocation)

Taking a look at the dataframe after the JOIN.

In [None]:
WinonaList_WithLocation

Unnamed: 0,name,image_url,review_count,rating,location,address1,address2,address3,city,zip_code,country,state,display_address
0,NorthEnd Pub & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/x5jzKR...,14,5.0,"{'address1': '214 N Main St', 'address2': '', ...",214 N Main St,,,Cochrane,54622,US,WI,"[214 N Main St, Cochrane, WI 54622]"
1,The Root Note,https://s3-media2.fl.yelpcdn.com/bphoto/ahQdtU...,82,4.5,"{'address1': '115 4th St S', 'address2': '', '...",115 4th St S,,,La Crosse,54601,US,WI,"[115 4th St S, La Crosse, WI 54601]"
2,Pickerman's Soup & Sandwiches,https://s3-media4.fl.yelpcdn.com/bphoto/lfQw31...,64,4.5,"{'address1': '327 Jay St', 'address2': '', 'ad...",327 Jay St,,,La Crosse,54601,US,WI,"[327 Jay St, La Crosse, WI 54601]"
3,Taste of Thai,https://s3-media4.fl.yelpcdn.com/bphoto/IpG7kh...,42,4.5,"{'address1': '205 S Holmen Dr', 'address2': 'S...",205 S Holmen Dr,Ste 106,,Holmen,54636,US,WI,"[205 S Holmen Dr, Ste 106, Holmen, WI 54636]"
4,River Rats Bar and Grill,https://s3-media1.fl.yelpcdn.com/bphoto/FfTc_M...,35,4.5,"{'address1': '1311 La Crescent Pl', 'address2'...",1311 La Crescent Pl,,,French Island,54603,US,WI,"[1311 La Crescent Pl, French Island, WI 54603]"
5,Kaddy's Kafe,https://s3-media3.fl.yelpcdn.com/bphoto/cJXl6r...,28,4.5,"{'address1': '236 E Main St', 'address2': '', ...",236 E Main St,,,La Crescent,55947,US,MN,"[236 E Main St, La Crescent, MN 55947]"
6,Garden of Eatin',https://s3-media2.fl.yelpcdn.com/bphoto/rW68Cp...,24,4.5,"{'address1': '19847 E Gale Ave', 'address2': '...",19847 E Gale Ave,,,Galesville,54630,US,WI,"[19847 E Gale Ave, Galesville, WI 54630]"
7,Beno's Deli,https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC...,22,4.5,"{'address1': '78 E 4th St', 'address2': '', 'a...",78 E 4th St,,,Winona,55987,US,MN,"[78 E 4th St, Winona, MN 55987]"
8,Brewskie's Bar & Grill,https://s3-media4.fl.yelpcdn.com/bphoto/llnczw...,21,4.5,"{'address1': '110 E Main St', 'address2': '', ...",110 E Main St,,,Utica,55979,US,MN,"[110 E Main St, Utica, MN 55979]"
9,Barista's Coffee House,https://s3-media2.fl.yelpcdn.com/bphoto/P-MzJ7...,18,4.5,"{'address1': '110 N Grant St', 'address2': Non...",110 N Grant St,,,Houston,55943,US,MN,"[110 N Grant St, Houston, MN 55943]"


Next, <strong>keep</strong> only restaurant locations in Winona.

In [None]:
WinonaList_WithLocation_OnlyWinona = (
                          WinonaList_WithLocation
                          >> filter_by(X.city == "Winona")
                          >> select(X.name, X.image_url, X.review_count, X.rating, X.display_address)
                       )

WinonaList_WithLocation_OnlyWinona = WinonaList_WithLocation_OnlyWinona.reset_index(drop=True)
WinonaList_WithLocation_OnlyWinona

Unnamed: 0,name,image_url,review_count,rating,display_address
0,Beno's Deli,https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC...,22,4.5,"[78 E 4th St, Winona, MN 55987]"
1,The Acoustic Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/bcDdRg...,77,4.0,"[77 Lafayette, Winona, MN 55987]"
2,Lakeview Drive Inn,https://s3-media3.fl.yelpcdn.com/bphoto/uDUhkd...,45,4.0,"[610 E Sarnia St, Winona, MN 55987]"
3,Rocco's Pub & Pizza,https://s3-media1.fl.yelpcdn.com/bphoto/pjaUxD...,12,4.0,"[5242 W 6th St, Winona, MN 55987]"
4,Bub's Brewing,https://s3-media1.fl.yelpcdn.com/bphoto/znNYZv...,66,3.5,"[65 E 4th St, Winona, MN 55987]"
5,Winona Sandwich Company,https://s3-media4.fl.yelpcdn.com/bphoto/ntWfgR...,39,3.5,"[619 Huff St, Winona, MN 55987]"
6,Great Hunan Restaurant,https://s3-media3.fl.yelpcdn.com/bphoto/q-CzK3...,28,3.5,"[111 W 3rd St, Winona, MN 55987]"
7,Culver's,https://s3-media2.fl.yelpcdn.com/bphoto/bcGL1o...,12,3.5,"[1441 Service Dr, Winona, MN 55987]"
8,Winona Family Restaurant,https://s3-media4.fl.yelpcdn.com/bphoto/lxYd9Z...,32,3.0,"[1611 Service Dr, Winona, MN 55987]"
9,Wellington's Pub & Grill,https://s3-media3.fl.yelpcdn.com/bphoto/_Ep3L8...,20,3.0,"[1429 W Service Dr, Winona, MN 55987]"


Recall, the goal is to create an HTML file that contains a properly sorted list of restaurants in Winona.  A few modifications are needed in order to create an HTML file.  For example, an *image tag* and *link tag* will be created for the **image_url** field and the **display_address** field that will be used to connect to the Google Maps API.

*  Image tag:  \<img src=https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg width="100px" height="100px"\>

*  Link tag: \<a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank">Map</a\>


A new field, named **rank**, will be used to rank the restrauants from 1 on up.

In [None]:
WinonaList_WithLocation_OnlyWinona_PicMap = (
                          WinonaList_WithLocation_OnlyWinona
                          >> mutate(pic = 
                                          '<img src="' 
                                           + X.image_url 
                                           + '" width="100px" height="100px">'
                                    )
                          >> mutate(mapurl =
                                         '<a href="https://www.google.com/maps/place/'
                                          + X.display_address.str.join(sep=' ').str.replace(' ','+')
                                          + '" target="_blank">Map</a>'
                                    )
                          >> select(X.name, X.review_count, X.rating, X.pic, X.mapurl)

                       )
#Create a vector from 1 to n -- used for ranks
WinonaList_WithLocation_OnlyWinona_PicMap['rank'] = 1 + np.arange(len(WinonaList_WithLocation_OnlyWinona_PicMap))

#Move the selected column to left-most column
WinonaList_WithLocation_OnlyWinona_PicMap.insert(0,'rank',WinonaList_WithLocation_OnlyWinona_PicMap.pop('rank'))

print(WinonaList_WithLocation_OnlyWinona_PicMap.to_string(index=False))


 rank                     name review_count  rating                                                                                                           pic                                                                                                    mapurl
    1              Beno's Deli           22     4.5 <img src="https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg" width="100px" height="100px">          <a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank">Map</a>
    2        The Acoustic Cafe           77     4.0 <img src="https://s3-media2.fl.yelpcdn.com/bphoto/bcDdRgDmRBrXjWyb3yK3eQ/o.jpg" width="100px" height="100px">         <a href="https://www.google.com/maps/place/77+Lafayette+Winona,+MN+55987" target="_blank">Map</a>
    3       Lakeview Drive Inn           45     4.0 <img src="https://s3-media3.fl.yelpcdn.com/bphoto/uDUhkdZ74iu_Vr-pnCZe6Q/o.jpg" width="100px" height="100px">      <a href="https://www.google.c

## Creating the HTML File

To start, this dataframe will be converted into an HTML table using the to_html() method.

In [None]:
WinonaList_WithLocation_OnlyWinona_PicMap.to_html(index=False)

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th>rank</th>\n      <th>name</th>\n      <th>review_count</th>\n      <th>rating</th>\n      <th>pic</th>\n      <th>mapurl</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>1</td>\n      <td>Beno\'s Deli</td>\n      <td>22</td>\n      <td>4.5</td>\n      <td>&lt;img src="https://s3-media1.fl.yelpcdn.com/bphoto/snAzXC5xLgybMzV8NxGqyw/o.jpg" width="100px" height="100px"&gt;</td>\n      <td>&lt;a href="https://www.google.com/maps/place/78+E+4th+St+Winona,+MN+55987" target="_blank"&gt;Map&lt;/a&gt;</td>\n    </tr>\n    <tr>\n      <td>2</td>\n      <td>The Acoustic Cafe</td>\n      <td>77</td>\n      <td>4.0</td>\n      <td>&lt;img src="https://s3-media2.fl.yelpcdn.com/bphoto/bcDdRgDmRBrXjWyb3yK3eQ/o.jpg" width="100px" height="100px"&gt;</td>\n      <td>&lt;a href="https://www.google.com/maps/place/77+Lafayette+Winona,+MN+55987" target="_blank"&gt;Map&lt;/a&gt;</td>\n    </tr>\n    <tr>

Instead of pushing the HTML converted table to the screen, let's push this contents into a HTML file.  The name of the html file is **WinonaYelpList.html**.  This file is being written to the sample_data/ directory.

In [None]:
text_file = open("sample_data/WinonaYelpList.html", "w")
text_file.write(WinonaList_WithLocation_OnlyWinona_PicMap.to_html(index=False))
text_file.close()

The data.frame to html table conversion done above created the contents of the html file; however, other tags are needed for the HTML file.  For example, the \<html\> & \</html\> and \<body\> & \</body\> tags are a required part of an html file.

<p align='center'><img src="https://drive.google.com/uc?export=view&id=12RH4q90vusTh8Oaf_jWe70fBulOgVN9E" width='50%' height='50%'></p>

Adding <strong>\<html\><strong> to beginning of file.

In [None]:
!sed -i '1i \<html>\' /content/sample_data/WinonaYelpList.html

Next, adding <strong>\<body\></strong> on the 2nd line.

In [None]:
!sed -i '2i \<body>\' /content/sample_data/WinonaYelpList.html

Next, need to put \<\body\> and \<\html\> tags at end of HTML file.

In [None]:
!sed -i '$ a </body>' /content/sample_data/WinonaYelpList.html
!sed -i '$ a </html>' /content/sample_data/WinonaYelpList.html

The HTML conversion done by Pandas does have some minor issues that will need to be cleaned up.


*   The < symbol was written as \&lt; 
*   The > symbol was written as \&gt;



Fixing the beginning of each image and link tag.

In [None]:
!sed -i 's/&lt;/</g' /content/sample_data/WinonaYelpList.html

Similarly, fix the end tags...

In [None]:
!sed -i 's/&gt;/>/g' /content/sample_data/WinonaYelpList.html

Next, center the contents of the header row using the text-align arugment.

In [None]:
!sed -i 's/text-align: right;/text-align: center;/' /content/sample_data/WinonaYelpList.html

Making the table headers slighly larger and removing pic and mapurl names all together.

In [None]:
!sed -i 's/rank/<font size="+1"\>Rank<\/font>/' /content/sample_data/WinonaYelpList.html
!sed -i 's/name/<font size="+1"\>Restaurant<br>Name<\/font>/' /content/sample_data/WinonaYelpList.html
!sed -i 's/review_count/<font size="+1"\># of <br>Reviews<\/font>/' /content/sample_data/WinonaYelpList.html
!sed -i 's/rating/<font size="+1"\>Rating<\/font>/' /content/sample_data/WinonaYelpList.html
!sed -i 's/pic/<font size="+1"\>\&nbsp;<\/font>/' /content/sample_data/WinonaYelpList.html
!sed -i 's/mapurl/<font size="+1"\>\&nbsp;<\/font>/' /content/sample_data/WinonaYelpList.html

Next, specify that the contents of the table be centered...

In [None]:
!sed -i 's/<tbody>/<tbody style="text-align: center;">/' /content/sample_data/WinonaYelpList.html

Taking a look at a final version of this file...

In [None]:
!cat /content/sample_data/WinonaYelpList.html

## The Final Product

The following is a picture of the final HTML file.  This has the restaurants truely sorted by Rating.  Filters were applied in gettnig this table; thus, not all locations in Winona are included.


<p align='center'><img src="https://drive.google.com/uc?export=view&id=1uc5709XLHynutOCqFW1r80JwbmYDMVZd" width='50%' height='50%'></p>



---



---
End of Document
