## Uber Exercise

Approach that is used in this code 
- Match restaurant name and post code of restaurant in Uber list and restaurant in UK Food Rating list 
- String matching is tricky for address field and yields a lot of false negative (restaurant that are supposed to match but flagged as not match)
- This is because some address used abbreviation such as rd for Road or ln for Lane

- Instead, postal code of one restaurant in Uber list is compared with restaurant in UK Food Rating list.
- Post code might yields a higher false positive but significantly lower false negative. FP might occur when two different restaurant with same name and same postal code.


In [None]:
import requests
import csv
import json
import re
from IPython.display import HTML, display

## Extract data from uber csv file

-  Store each row as dictionary with field as key (e.g "Name", "Address")
-  Extract post code from address with Regular Expression

In [None]:
pattern ='([A-Z]){1,2}[0-9]{1}[A-Z0-9]?(\s[[0-9]{1}([A-Z]){2})?'
file_path = 'ubereats.csv'
uber_restaurant_list =[]
uber_rest = {}

with open(file_path) as file:
    for i in csv.DictReader(file):
        uber_rest = dict(i)
        address = uber_rest['address']
        if re.search(pattern, address):
            postal_code = (re.search(pattern, address).group(0))
        uber_rest["postal_code"] = postal_code
        uber_restaurant_list.append(uber_rest)


## Extract authority list from Food Rating API

-  Get all the local authority code with London as region

In [None]:
BASE_PATH = "https://ratings.food.gov.uk/"

response = requests.get(BASE_PATH + "authorities/en-GB/json")
authority_list = (json.loads(response.text))
london_authority_id = []

for elem in authority_list['ArrayOfWebLocalAuthorityAPI']['WebLocalAuthorityAPI']:
    if elem['RegionName'] == "London":
        london_authority_id.append(elem['LocalAuthorityIdCode'])

## Extract restaurant within London from Food Rating API

- Get all the establishment with business type https://ratings.food.gov.uk/business-types/xml
- 1 = Restaurant/Cafe/Canteen
- 7843 = Pub/bar/nightclub
- 7844 = Takeaway/sandwich shop

In [None]:
business_type_list = ['1','7843','7844']
restaurant_london_list =[]

def get_restaurant_in_region(authority_code_list, business_type_list):
    restaurant_in_region = []
    for authority_id in (authority_code_list):
        for business_type in (business_type_list):
            api_resp = requests.get(BASE_PATH+ "enhanced-search/en-GB/^/^/ALPHA/"+business_type+"/"+authority_id+"/1/5000/json")
            json_resp = (json.loads(api_resp.text))
            restaurant_in_region.extend(json_resp['FHRSEstablishment']['EstablishmentCollection']['EstablishmentDetail'])
    return restaurant_in_region

restaurant_london_list = get_restaurant_in_region (london_authority_id,business_type_list)

## Get all the restaurants that are not present on  Uber 
- Check if name of Restaurant in Uber list is present in Restaurant from Food API and vice versa
- Simple substring check using "in" operator is performed
- Next, if there is a match, check if they have the same postal code using "==" operator

Finally, exclude all of the restaurants that fulfill both conditions and only keep the restaurants that are yet to be on Uber list

In [None]:
final_list = []

for i in range (len(uber_restaurant_list)):
    for j in range (len(restaurant_london_list)):
        if (restaurant_london_list[j]['BusinessName'] in uber_restaurant_list[i]['name']) or ( uber_restaurant_list[i]['name'] in restaurant_london_list[j]['BusinessName']):  
            if restaurant_london_list[j]['PostCode'] == uber_restaurant_list[i]['postal_code']:
                continue
            else:
                final_list.append(restaurant_london_list[j])
        else:
            final_list.append(restaurant_london_list[j])
    

## Print all the new restaurants with format Name, Address, Postcode

HTML Format

In [None]:
data = [['Name','Address','Postal Code']
        ]

for i in range (len(final_list)):
    row = []
    address_details = []
    row.append(final_list[i]['BusinessName'])
    
    address_details.append(final_list[i]['AddressLine1'])
    address_details.append(final_list[i]['AddressLine2'])
    address_details.append(final_list[i]['AddressLine3'])
    address_details.append(final_list[i]['AddressLine4'])
    
    row.append(','.join(filter(None, address_details)))
    row.append(final_list[i]['PostCode'])
    
    data.append(row)

display(HTML(
   '<table><tr>{}</tr></table>'.format(
       '</tr><tr>'.join(
           '<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in data)
       )
))