<div class="alert alert-block alert-success">
    
    
### <center> Yelp Business Reviews for</center>
### <center> Chicago Downtown Area in </center>
### <center> EdgeDB </center>


**Author: Atef Bader, PhD** 
<br>
**Created on: 12/10/2022** </br>
**Last Edit: 6/14/2024**
<br>
<br>


**Revised by Tom Miller, 2/19/2023**    
<br>
<br>


**Revised by Edward Arroyo, 5/18/2024** 
    
<br>
<br>
    
    
</div>



<div class="alert alert-block alert-warning">

    
  

## Assignment Objectives:

- Use the YelpAPI Package to collect Yelp data (Businesses and Reviews) for Chicago downtown area
- Use a Google Maps API tool to obtain latitude and longitude coordinates from street addresses (reverse geocoding)
- Use EdgeDB to create the Graph-Relational Data Model for Yelp data (Businesses and Reviews); this involves using the EdgeDB Schema Definition Language
- Use EdgeDB with EdgeQL queries
</br>
</br>

</div>

<hr style="border:5px solid orange"> </hr>


<div class="alert alert-block alert-danger">
    
## Deliverables:

You are required to submit **three** files with the naming convention <font color = 'red'> <b>LastName_Assignment_8<b> </font>:  

1. **IPYNB Script**: This is your original notebook file ( <font color = 'red'> <b> LastName_Assignment_8.ipynb<b> </font>).  

2. **HTML Document**: This file must include all the source code you have written along with its output. This should include your source code and output. Follow these steps to generate the HTML file: 

   - After completing your work in the Jupyter Notebook, go to the menu bar at the top. 

   - Click on File, then hover over Download as. 

   - Select HTML (.html) 

   - Save the file with the appropriate naming convention ( <font color = 'red'> <b>LastName_Assignment_8.html</b> </font>). 
   - **Ensure all code and outputs are properly displayed in the HTML document.**

3. **MP4 Video Recording:** A live demo recording lasting between 5 and 10 minutes.

**Note**: You are required to provide your code and its output immediately following each requirement for this assignment.
    
</div>

<div class="alert alert-block alert-danger">

### General Assignment Instructions:

- Set up your working directory on your personal computer.
- Download the assignment notebook: `Chicago-Yelp-Reviews.ipynb` to your working directory.

- Install EdgeDB on your personal development computer by following the instructions at: https://www.edgedb.com/docs/intro/quickstart.
- Create a Google Developer Account and obtain your `geocoder.ApiKey` for Google Maps.
- Navigate to your working directory from a console/terminal window.
- Execute EdgeDB commands from the console/terminal window to set up your EdgeDB database. Refer to the "EdgeDB Development & Build Instructions" section below for detailed steps.
- Run the provided Python script on your personal development computer to populate the EdgeDB database and perform analyses.

<br>

</div>

<hr style="border:5px solid orange"> </hr>


<div class="alert alert-block alert-danger">

### EdgeDB Development & Build Instructions: 
- From the assignment directory (containing `Chicago-Yelp-Reviews.ipynb`), execute the following commands to set up the EdgeDB database project and the database schema for the assignment:
    - To see where the EdgeDB system files are located, use:
        - `edgedb info`
    - Ensure that the following files are in your working directory:
        - `dbschema/default.esdl`
        - `edgedb.toml`
    - To initialize the database project, use:
        - `edgedb project init`
    - To verify that the project has been defined in the active working directory, use:
        - `edgedb project info`
    - To prepare for populating the database and running queries against it, use:
        - `edgedb migration create`
        - `edgedb migrate`
    - Note that there will now be a `dbschema/migrations` subdirectory containing a query file with the `.edgeql` extension, for example:
        - `00001.edgeql`
    - To check information about the database structure you have defined, use:
        - `edgedb list types`
    - To explore the unpopulated database structure with a graphical user interface (GUI) (and later to explore the populated EdgeDB database), use:
        - `edgedb ui`
    - If you need help with the EdgeDB Command Line Interface (CLI), use:
        - `edgedb help`

### Configure Instance Authentication via EdgeDB UI:
- After exploring the EdgeDB UI, open the REPL within the UI to execute configuration commands:
    - In the EdgeDB UI, find and open the "REPL" or "Console" to access the command interface.
    - Paste and execute the following EdgeQL command to configure instance authentication for passwordless access:
        ```edgeql
        configure instance insert Auth {
            # Human-oriented comment helps figuring out
            # what authentication methods have been setup
            # and makes it easier to identify them.
            comment := 'passwordless access',
            priority := 1,
            method := (insert Trust),
        };
        ```
    - This command sets up an authentication method allowing passwordless access, useful for development environments or specific secure contexts. Ensure this configuration aligns with your security and operational policies.
    

<br>

</div>

<hr style="border:5px solid orange"> </hr>



<div class="alert alert-info">


    

## Online Educational Resources and References

The following is a list of available Online Education Resources and References for EdgeDB, Graph Databses, and Yelp graph data model.
Some of the refernces are discussed within the context of **Neo4J** graph database, however, please note that this assignment requires the use of **EdgeDB**

- Graph Data Modeling Tips & Tricks
 - https://www.youtube.com/watch?v=eAbPgyouAE4
 
 
- Predicting Influence and Communities Using Graph Algorithms
 - https://www.youtube.com/watch?v=MTnozZ5Cy0E
 
        
- Exploring Yelp with Graph Algorithms (Neo4j Online Meetup #45)
 - https://www.youtube.com/watch?v=7f2Tdn94JhY&t=1105s
 
 
- Yelp Data Analysis [Part 1]
 - https://www.youtube.com/watch?v=3KX41CaJVpY


- Graph Algorithms: Practical Examples in Apache Spark & Neo4j
 - https://neo4j.com/press-releases/new-oreilly-graph-algorithms-book/
![image.png](attachment:image.png)


 
</br>

</div>


<hr style="border:5px solid orange"> </hr>


<div class="alert alert-info">



# EdgeDB

- Download __[EdgeDB](https://www.edgedb.com/install)__  to your laptop
- Getting Started with  __[EdgeDB](https://www.edgedb.com/docs/intro/index)__ 
- Quick Start with __[Quickstart](https://www.edgedb.com/docs/intro/quickstart#ref-quickstart)__ 
- Data modeling in EdgeDB __[Data Model](https://www.edgedb.com/showcase/data-modeling)__ 
- EdgeDB’s Schema Definition Language __[ESDL](https://www.edgedb.com/docs/datamodel/index)__ 


## The three major platofrms are supported:
1. Windows
2. MacOS
3. Linux



## EdgeDB instance


- A quickstart tutorial will walk you through the entire process of creating a simple EdgeDB-powered application  __[Quick Start Tutorial](https://www.edgedb.com/docs/intro/quickstart)__ 

- Make sure to initialize your EdgeDB project from your current project directory and type from the terminal/command prompt the following command:
    - edgedb project init

    

## EdgeDB Migration


- After you update your default.esdl in your current project **dbschema** directory, execute from the terminal/command prompt the following commands in the sequence listed below:
    - edgedb migration create
    - edgedb migrate
    

## EdgeDB Instance Destroy


- Once your are done coding/testing, you could get a list of EdgeDB instance names and destroy/kill them using the following commands:
    - edgedb instance list
    - edgedb instance destroy -I "your_instance_name" --force


## EdgeDB UI

- You can use the **EdgeDB UI**, the admin dashboard baked into every EdgeDB instance when you are instrumenting with your queries and requirements

- Type the following command from a terminal/window to start the edgedb ui in a browser:
    - edgedb ui     



## EdgeQL

- https://www.edgedb.com/docs/edgeql/index

</br>


</br>


</div>




<hr style="border:5px solid orange"> </hr>



<div class="alert alert-info">
    
    
## EdgeDB Python Driver (ensure that this has been installed in Anaconda/Conda virtual environment that is currently active)

-    Install the package (https://www.edgedb.com/docs/clients/python/installation#edgedb-python-installation )
 - pip install edgedb
 

-    Examples:
    - Basic Usage :   ( https://www.edgedb.com/docs/clients/python/usage)
    - AsyncIO API: (https://www.edgedb.com/docs/clients/python/api/asyncio_client#edgedb-python-asyncio-api-reference)
    - Blocking API: (https://www.edgedb.com/docs/clients/python/api/blocking_client#edgedb-python-blocking-api-reference)

 
 
 </div>

<hr style="border:5px solid orange"> </hr>

 
 

<div class="alert alert-info">

    

# Yelp

-  To get an idea about Yelp Fusion API and  Yelp's GraphQL API, go to this URL : __[FAQ](https://www.yelp.com/developers/faq)__  

- Yelp Fusion API can let you get data about different **Businesses** and their **Reviews**  __[EdgeDB](https://www.yelp.com/developers/documentation/v3/get_started)__ 

- The graph data model has been discussed in Chapter 7 of in this book  __[ Graph Algorithms — Practical Examples in Apache Spark & Neo4J](https://neo4j.com/books/free-book-graph-algorithms-practical-examples-in-apache-spark-and-neo4j/).
    
- However, in this Assignment we want to use **Community Areas** to track businesses and their reviews



</div>


![image-3.png](attachment:image-3.png)

<hr style="border:5px solid orange"> </hr>



<div class="alert alert-info">
    

### Yelp Datasets (Businesses and Reviews) are documented in detail at  the following URL:

- https://www.yelp.com/dataset/documentation/main

Please note that  the dataset we use for this script is collected using **YELP Fusion API** rather than their online datasets for the purposes of this project

</div>

<hr style="border:5px solid orange"> </hr>


<div class="alert alert-info">
    

### Our Chicago Yelp Reviews Datasets (Businesses and Reviews) are composed of two JSON files:

- **chicago_yelp_reviews.json** : it has the businesses reviewed for Chicago Downtown Area
- **chicago_business_reviews.json** : it has 3 review excerpts collected from Yelp for every reviewed business in Chicago Downtown Area

<br><br>
    
**Note:** the business_id is the common attribute/field between the two datasets.
    
</div>

<hr style="border:5px solid orange"> </hr>



<div class="alert alert-info">
    
### Here is the official URL for Yelp Fusion API docs:

- https://docs.developer.yelp.com/docs/fusion-intro
- https://docs.developer.yelp.com/reference/v3_business_search
- https://docs.developer.yelp.com/reference/v3_business_reviews

</div>

<hr style="border:5px solid orange"> </hr>




<div class="alert alert-info">
  
    
### Use Yelp Fusion API to search for:
- Businesses:
    - This endpoint returns up to 1000 businesses with some basic information based on the provided search criteria. 
        - https://docs.developer.yelp.com/reference/v3_business_search


- Reviews:
    - This endpoint returns up to three review **excerpts** for a given business 
        - https://www.yelp.com/developers/documentation/v3/business_reviews 
 
 
- Location:
    - Please make a note of the following **"Businesses returned in the response may not be strictly within the specified location"** in the highlighted text of Yelp API business_seach documentation for **location**

    <br>
    <br>
    
![image.png](attachment:image.png)
    
<br>


<hr style="border:5px solid orange"> </hr>



<div class="alert alert-info">
    
# Yelp Reviews for Chicago Businesses 

We are interested to collect data about Chicago businesses and their Reviews from Yelp.

Currently the API give back 3 reviews for every business using the **business_reviews** endpoint, but we have the average stars rating and counts of the reviews for every business given in the **business_search** endpoint.



# Why use Yelp data?

Imagine you are visiting Chicago for the first time and you like to get an idea about popular restaurants that serve Chicago-Style Food? 
- Chicago Style Hotdogs
- Chicago Style Pizza

    
</div>

<hr style="border:5px solid orange"> </hr>


   <div class="alert alert-block alert-warning">
    

# Nature of Collected Data and Dirty Data

When you construct your Data Model in EdgeDB and use EdgeDB UI to explore the  populated data, please make a note of the following 
- Chain restaurants like McDonalds might appear multiple times as a business, however, the street address and Latitude/Longitude pairs MUST be different
- If you get TWO or MORE businesses in the SAME Address, it is due to the fact that one business is still open in that address, however, the other business names show those businesses are closed currently and they used to occupy that  same address before
- Not all Latitude/Longitude pairs are mapped correctly to their perspective community areas and that is due to the fact the Google Geocoder/reverse lookup couldn't find the correct community area and it maps/defaults the community area name to "Chicago"

    
</div>

<hr style="border:5px solid orange"> </hr>


<div class="alert alert-info">
    
### Here are the URLs for Yelp Fusion API and YelpAPI package:

Here are the URLs for the **Yelp Fusion API** and **YelpAPI Package**:

- Get started with the **Yelp Fusion API**
    - https://docs.developer.yelp.com/docs/fusion-intro 
 
 
- Extensive list of search **parameters** using Endpoint /businesses/search
    - https://www.yelp.com/developers/documentation/v3/business_search
 

- Example how to use **Yelp Fusion API**  Endpoint 
    - https://github.com/Yelp/yelp-fusion/blob/master/fusion/python/sample.py
 
 
- Example how to use **YelpAPI Package**  Endpoint 
    - https://github.com/lanl/yelpapi/
    - https://www.yelp.com/developers/documentation/v3/business_search
 
 
 
 
</br>

    
</div>

<hr style="border:5px solid orange"> </hr>

</br> 



<div class="alert alert-info">
    
# Chicago Community Areas

We are interested in the community areas of downtown chicago:

- Chicago Loop
- West Town
- Near North Side Chicago
- Near South Side Chicago
- Near West Side Chicago


You can see the official list for the Boundaries of different Community Areas at the following URL:

- https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6

</div>
    
<hr style="border:5px solid orange"> </hr>



![image.png](attachment:image.png)


<div class="alert alert-block alert-warning">
    
#### Google Developer Account

- You need to have a Google Developer account and Geocoder API key in order to do reverse lookup latitiude/logitude to cross-reference Address/Zip-Code/Community Area
- Create Google Developer Account and get your geocoder.ApiKey from the following URL:
https://developers.google.com/maps/documentation/geocoding/get-api-key
- Add your geocoder.ApiKey = "ADD-YOUR-API-KEY-HERE" 
- Use the latitude and longitude in geocoder.GeocodingReverse to find the zip-code 
- Docuemntation about the Geocoder here: (https://geocoder.readthedocs.io/providers/Google.html#geocoding)


    
</br>


</div>


<hr style="border:5px solid orange"> </hr>


In [14]:
from pprint import pprint
import datetime
import edgedb
import time
import warnings
warnings.filterwarnings('ignore')

### Business Categories
There are certain categories that are recognized by Yelp and accordingly index their reviews.

Visit the following URL to see the list:
- https://docs.developer.yelp.com/docs/resources-categories
- https://docs.developer.yelp.com/reference/v3_all_categories 

For **Chicago Businesses**, we will collect businesses in the following categories
- Restaurants
- Entertainment
- Nightlife


In [15]:
import pandas as pd
import json

df__business_reviews = pd.DataFrame()

list__business_reviews_documents = []

## Chicago Downtown Neighborhoods and Zip-Codes


- **The following zip codes for Chicago downtown area and neighborhoods**
- **Visit the following website for  complete list of Chicago zip-codes for Chicago downtown area**

 (https://www.seechicagorealestate.com/chicago-zip-codes-by-neighborhood.php)



![image.png](attachment:image.png)

<hr style="border:5px solid orange"> </hr>

<BR>
<BR>
        
- **The following URL has the complete  list of the 77 Community Areas for the City of Chicago**

 ( https://en.wikipedia.org/wiki/Community_areas_in_Chicago )
 
<BR>
<BR>
            
 
![image.png](attachment:image.png)
    
 

<hr style="border:5px solid orange"> </hr>




<hr style="border:30px solid brown "> </hr>

# Dataset #1 to Build the Database

<hr style="border:2px solid brown "> </hr>

### Read the provided chicago_yelp_reviews.json dataset and load into a dataframe object
### The dataset has the businesses reviewed on Yelp for  the Chicago Downtown area

<hr style="border:2px solid brown "> </hr>


In [16]:
# Load the JSON data from a file
file_path = 'chicago_yelp_reviews.json'

# Open the file and read line by line
with open(file_path, 'r') as file:
    for line in file:
        # Parse each line as JSON and append to the list
        list__business_reviews_documents.append(json.loads(line))

# Convert the list of dictionaries to a DataFrame
df= pd.DataFrame(list__business_reviews_documents)


source_df = pd.json_normalize(df['source'])


df_businesses = df.drop(columns=['id','source']).join(source_df)


In [17]:
# Display the business info DataFrame
df_businesses.tail()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,transactions,...,coordinates.latitude,coordinates.longitude,location.address1,location.address2,location.address3,location.city,location.zip_code,location.country,location.state,location.display_address
1996,mhfLifyOVfK8VbUwMNH7TQ,ketel-one-club-chicago,Ketel One Club,https://s3-media4.fl.yelpcdn.com/bphoto/9Jx3NA...,False,https://www.yelp.com/biz/ketel-one-club-chicag...,16,"[{'alias': 'tradamerican', 'title': 'American'}]",3.0,[],...,41.881195,-87.674409,1901 W Madison St,,The United Center,Chicago,60612,US,IL,"[1901 W Madison St, The United Center, Chicago..."
1997,qZsUJ3WRVbSkM-lWO5LVoA,shehed-bakery-chicago-2,Shehed Bakery,https://s3-media1.fl.yelpcdn.com/bphoto/inY_YC...,False,https://www.yelp.com/biz/shehed-bakery-chicago...,1,"[{'alias': 'bakeries', 'title': 'Bakeries'}, {...",4.0,[],...,41.883983,-87.705639,135 N Kedzie Ave,,,Chicago,60612,US,IL,"[135 N Kedzie Ave, Chicago, IL 60612]"
1998,j5wLsTnvuD1voV9W0Dl32Q,sbarro-chicago-15,Sbarro,https://s3-media2.fl.yelpcdn.com/bphoto/6KPDZG...,False,https://www.yelp.com/biz/sbarro-chicago-15?adj...,1,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.0,[delivery],...,41.871316,-87.669862,1717 W Polk St,,,Chicago,60612,US,IL,"[1717 W Polk St, Chicago, IL 60612]"
1999,I37aZuiMpvQvQDJFkzZxnQ,tempesta-chicago-2,Tempesta,https://s3-media4.fl.yelpcdn.com/bphoto/dXaEHU...,False,https://www.yelp.com/biz/tempesta-chicago-2?ad...,1,"[{'alias': 'italian', 'title': 'Italian'}]",4.0,"[delivery, pickup]",...,41.89117,-87.661817,433 W Van Buren St,,,Chicago,60607,US,IL,"[433 W Van Buren St, Chicago, IL 60607]"
2000,wZrd1D_bIHpe1wM-JvbznQ,hawkeyes-bar-and-grill-chicago,Hawkeye's Bar & Grill,https://s3-media3.fl.yelpcdn.com/bphoto/Ygph77...,False,https://www.yelp.com/biz/hawkeyes-bar-and-gril...,368,"[{'alias': 'sportsbars', 'title': 'Sports Bars...",3.0,"[delivery, pickup]",...,41.869484,-87.663926,1458 W Taylor St,,,Chicago,60607,US,IL,"[1458 W Taylor St, Chicago, IL 60607]"


In [18]:
df_businesses.describe()

Unnamed: 0,review_count,rating,distance,coordinates.latitude,coordinates.longitude
count,2001.0,2001.0,2001.0,2001.0,2001.0
mean,215.496252,3.927886,1203.364428,41.880323,-87.640107
std,529.505377,0.696399,797.907298,0.020229,0.016972
min,1.0,1.0,19.911342,41.830179,-87.706481
25%,12.0,3.5,612.860415,41.86908,-87.64883
50%,56.0,4.0,1014.525707,41.8811,-87.63441
75%,217.0,4.4,1761.269317,41.88654,-87.628345
max,10096.0,5.0,5453.081964,41.92733,-87.606944


## Chicago-Style Food

Chicago is renowned for its **Chicago-style deep-dish pizza**, and you can learn more about the origins of this term by reading the Chicago Tribune article available at [this link](https://www.chicagotribune.com/news/ct-xpm-2009-02-18-0902180055-story.html). However, Chicago's culinary fame doesn't stop there; the city is also known for its Chicago-style hot dogs, which you can read about on [Wikipedia](https://en.wikipedia.org/wiki/Chicago-style_hot_dog).

Visitors to Chicago often seek to experience these two iconic dishes that bear the city's culinary signature:
- Pizza
- Hot Dogs

For illustration purposes, let's consider the following two famous restaurants in downtown Chicago:
- Portillo's - Known for Chicago-Style Hot Dogs
- Giordano's - Renowned for Chicago-Style Pizza

**Note 1:**
Some business names may appear multiple times, which could lead to the mistaken belief that the dataset contains redundant entries. However, this is not the case, as a restaurant chain might use the same business name for different locations, a detail that is captured under the alias field.

Here is an example for Potbelly Sandwich Shop:

- 'alias': 'potbelly-sandwich-shop-chicago-25'
- 'name': 'Potbelly Sandwich Shop'

- 'alias': 'potbelly-sandwich-shop-chicago-10'
- 'name': 'Potbelly Sandwich Shop'

**Note 2:**
Yelp reviews may refer to business names with or without an **s** at the end. For example, you might see reviews mentioning the business names either as "Portillo's" or "Portillo" and "Giordano's" or "Giordano".

In [19]:
df_businesses.name.value_counts()

name
Starbucks                42
Dunkin'                  27
Subway                   17
7-Eleven                 15
Jimmy John's             14
                         ..
Momentum Coffee           1
Veggie House              1
Daifuku Ramen             1
3 Little Pigs Chi         1
Hawkeye's Bar & Grill     1
Name: count, Length: 1729, dtype: int64

In [20]:
print(len(list__business_reviews_documents))
pprint(list__business_reviews_documents)

2001
[{'id': 'lxE3OEbLWnGP1pUrNzlgEw',
  'source': {'alias': 'maggie-roeder-cakes-chicago',
             'categories': [{'alias': 'desserts', 'title': 'Desserts'}],
             'coordinates': {'latitude': 41.88626, 'longitude': -87.62231},
             'display_phone': '',
             'distance': 324.6541824855661,
             'id': 'lxE3OEbLWnGP1pUrNzlgEw',
             'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/kCFZvtk9jrqmj6DYqAcKaA/o.jpg',
             'is_closed': False,
             'location': {'address1': '',
                          'address2': '',
                          'address3': '',
                          'city': 'Chicago',
                          'country': 'US',
                          'display_address': ['Chicago, IL 60601'],
                          'state': 'IL',
                          'zip_code': '60601'},
             'name': 'Maggie Roeder Cakes',
             'phone': '',
             'price': '$$',
             'rating': 5.0,
        

                                              'Chicago, IL 60606'],
                          'state': 'IL',
                          'zip_code': '60606'},
             'name': 'Bombay Eats',
             'phone': '+13127399727',
             'price': '$',
             'rating': 3.9,
             'review_count': 473,
             'transactions': ['delivery', 'pickup'],
             'url': 'https://www.yelp.com/biz/bombay-eats-chicago-18?adjust_creative=j_3xxHM_fKdWcYKUO806vA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=j_3xxHM_fKdWcYKUO806vA'}},
 {'id': '8oOZU9ERRyiiz3AP78d1Gw',
  'source': {'alias': 'roanoke-restaurant-chicago-3',
             'categories': [{'alias': 'cocktailbars', 'title': 'Cocktail Bars'},
                            {'alias': 'newamerican', 'title': 'New American'},
                            {'alias': 'breakfast_brunch',
                             'title': 'Breakfast & Brunch'}],
             'coordinates': {'latitude': 41.8818, 'lon

                          'address3': None,
                          'city': 'Chicago',
                          'country': 'US',
                          'display_address': ['2119 S Halsted St',
                                              'Chicago, IL 60608'],
                          'state': 'IL',
                          'zip_code': '60608'},
             'name': 'Pleasant House Pub',
             'phone': '+17735237437',
             'price': '$$',
             'rating': 4.4,
             'review_count': 942,
             'transactions': ['delivery', 'pickup'],
             'url': 'https://www.yelp.com/biz/pleasant-house-pub-chicago-3?adjust_creative=j_3xxHM_fKdWcYKUO806vA&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=j_3xxHM_fKdWcYKUO806vA'}},
 {'id': 'incGwtfg6rwT00w9WAqE0A',
  'source': {'alias': 'apolonia-chicago',
             'categories': [{'alias': 'modern_european',
                             'title': 'Modern European'},
                  

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)





<hr style="border:30px solid brown "> </hr>
<hr style="border:2px solid brown "> </hr>

# Dataset #2 to Build the Database

<hr style="border:2px solid brown "> </hr>


### Read the provided chicago_business_reviews.json dataset and load into a dataframe object
### The dataset has 3  review excerpts collected from Yelp  for  every business reviewed in the  Chicago Downtown area


<hr style="border:2px solid brown "> </hr>


In [21]:
# Path to the business reviews JSON file
business_reviews_json_file_path = 'chicago_business_reviews.json'

# Read the business reviews from JSON file into a DataFrame
df__business_reviews = pd.read_json(business_reviews_json_file_path , orient='records', lines=True)


In [22]:
df__business_reviews.tail()

Unnamed: 0,id,url,text,rating,time_created,user,business_id,business_name
5666,ZgxSQmsTNfupRNzVsODVDg,https://www.yelp.com/biz/ketel-one-club-chicag...,"Great service!!!!!! Amazingly kind staff, gre...",5,2023-04-08 06:41:33,"{'id': '4fqk_SfYUfs8jghF8Qn29g', 'profile_url'...",mhfLifyOVfK8VbUwMNH7TQ,Ketel One Club
5667,1hSNJYRqPpBpuXwjGOJQaA,https://www.yelp.com/biz/ketel-one-club-chicag...,Sat there for 30 minutes couldn't even get a w...,1,2023-02-02 20:01:48,"{'id': 'y-Gxjq9v0fh81yWweEYkIA', 'profile_url'...",mhfLifyOVfK8VbUwMNH7TQ,Ketel One Club
5668,UvQxUOCYAW3WCx64yHh8qw,https://www.yelp.com/biz/shehed-bakery-chicago...,I stumbled upon this company at an artisan mar...,4,2023-12-03 19:41:12,"{'id': 's-yGhMIJTcW39FTTzc2_Eg', 'profile_url'...",qZsUJ3WRVbSkM-lWO5LVoA,Shehed Bakery
5669,oNW8CABLe8jcNLaWOk-0QQ,https://www.yelp.com/biz/sbarro-chicago-15?adj...,Quick Tip! Some sbarros sell breakfast. Typi...,4,2023-06-26 06:19:18,"{'id': 'nJ6c6Tcg7e5vpOdQa8mdqQ', 'profile_url'...",j5wLsTnvuD1voV9W0Dl32Q,Sbarro
5670,lpLyqXwKXCaaR6R8OVcVcw,https://www.yelp.com/biz/tempesta-chicago-2?ad...,Visited Tempesta today for a quick lunch takeo...,4,2022-06-21 10:10:46,"{'id': '488KrW58ryVduy0Me8DRxg', 'profile_url'...",I37aZuiMpvQvQDJFkzZxnQ,Tempesta


<div class="alert alert-block alert-warning">
    
#### Geocoder API key -

- In order get the complete address that has  community area name , you need to use Geocoder API and do reverse lookup on lat/lng. That is use latitude/logitude to cross-reference Address/Zip-Code/Community Area
- You need to have a Google Developer account to use  Geocoder API key in order to do reverse lookup on lat/lng
- Create Google Developer Account and get your geocoder.ApiKey from the following URL:
https://developers.google.com/maps/documentation/geocoding/get-api-key
- Add your geocoder.ApiKey = "ADD-YOUR-API-KEY-HERE" 
- Use the latitude and longitude in geocoder.GeocodingReverse to find the zip-code 
- Docuemntation about the Geocoder here: (https://geocoder.readthedocs.io/providers/Google.html#geocoding)


    
</br>


</div>


<hr style="border:5px solid orange"> </hr>


<div class="alert alert-block alert-warning">

#### Geocoder API Key -

- To obtain the complete address, including the community area name, you need to use the Geocoder API for reverse lookup based on latitude and longitude. This means using latitude and longitude to cross-reference the Address, Zip Code, and Community Area.
- A Google Developer account is required to use the Geocoder API key for reverse lookup on latitude and longitude.
- Create a Google Developer Account and obtain your geocoder.ApiKey from the following URL: https://developers.google.com/maps/documentation/geocoding/get-api-key
- Add your geocoder.ApiKey = "ADD-YOUR-API-KEY-HERE".
- Use the latitude and longitude in `geocoder.GeocodingReverse` to find the zip code.
- Documentation about the Geocoder can be found here: https://geocoder.readthedocs.io/providers/Google.html#geocoding

#### Important Warning:
- Please be aware that while Google offers a free tier for its Geocoder API, charges may apply if you exceed the free usage limit. Google requires credit card information even for the free trial. Ensure you monitor your usage to avoid unexpected charges.

<br>

</div>

<hr style="border:5px solid orange"> </hr>


## Installing the Geocoder Package

In the given code snippet, we're using a method to install the `geocoder` package directly within our Jupyter Notebook. This is particularly useful for ensuring that the package is installed to the correct Python environment associated with the notebook's kernel.



In [23]:
!pip install geocoder




<div class="alert alert-block alert-danger">
    
# IMPORTANT Note

### Community Area Name vs. Neighborhood Name: 
- City of Chicago acknowledges the presence of Neighborhood names but they do NOT use them for official business
- City of Chicago uses Community Area Names and Numbers for doing business
- Google Geocoder uses Neighborhood Name type to represent Community Area Name

<br>
    
See below an example
    
</div>


![image.png](attachment:image.png)


<hr style="border:5px solid orange"> </hr>

<div class="alert alert-block alert-warning">
    
#### Use the EdgeDB Graphical User Interface (GUI) to Trace Your Progress -

Having initialized an EdgeDB instance and project, we should be able to trace our progress in populating that database. A good way to do this is to open the EdgeDB GUI and refer to it as we do our work with this Python script, gathering restaurant review data from Yelp and geocoding street addresses with the help of Google. 
    
From the working project directory enter this command:
    - edgedb ui

Before we populate the database, we will see that there are 10 object types but a count of zero objects. The schema has been defined, but there are no data in the database.
    
</br>


</div>


<hr style="border:5px solid orange"> </hr>

In [24]:
# Sample code snippet to show you how you can do Geocoder Reverse Lookup

import geocoder
import requests
import json


geocoder.ApiKey = "YOUR-GEOCODE-API-KEY-HERE" 



# Sanity test your API Key for Girdano'S Pizza on Rush street in downtown; it should show you Chicago Loop Neighborhood
# something like this:
# " .... {'long_name': 'Chicago Loop', 'short_name': 'Chicago Loop', 'types': ['neighborhood', 'political']}, .... "

latitude_str = str(41.8840343)
longitude_str = str(-87.628081)


url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng='+latitude_str+','+longitude_str+'&key='+geocoder.ApiKey
response = requests.get(url).json()

print(response)
for item in response['results']:
    pprint(item)
    

{'plus_code': {'compound_code': 'V9MC+JQ7 Chicago, IL, USA', 'global_code': '86HJV9MC+JQ7'}, 'results': [{'address_components': [{'long_name': 'Washington', 'short_name': 'Washington', 'types': ['establishment', 'point_of_interest', 'subway_station', 'transit_station']}, {'long_name': 'Chicago Loop', 'short_name': 'Chicago Loop', 'types': ['neighborhood', 'political']}, {'long_name': 'Chicago', 'short_name': 'Chicago', 'types': ['locality', 'political']}, {'long_name': 'Cook County', 'short_name': 'Cook County', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Illinois', 'short_name': 'IL', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'United States', 'short_name': 'US', 'types': ['country', 'political']}, {'long_name': '60602', 'short_name': '60602', 'types': ['postal_code']}], 'entrances': [], 'formatted_address': 'Washington, Chicago, IL 60602, USA', 'geometry': {'location': {'lat': 41.88372690000001, 'lng': -87.6292991}, 'location_type

### Use EdgeDB to create the object graph relational database for Yelp Reviews for Chicago Businesses  

You need to store the **Businesses** and their **Reviews** considering the data model listed above. We don't have access to Yelp endpoint for **users**, but we do have Yelp endpoints  **Businesses** and their **Reviews**

Here is the link for the API documentation:
- https://docs.developer.yelp.com/reference/v3_business_search
    
    

<br><br><br>

<hr style="border:30px solid coral "> </hr>
<hr style="border:2px solid coral "> </hr>


# Requirements Specification:

<hr style="border:2px solid coral "> </hr>


In [25]:
import edgedb

<div class="alert alert-block alert-danger">
    
### Requirement 1:
- Create a Graph-Relational Data Model for Yelp Businesses/Reviews using **EdgeDB**.
- Insert the YELP REVIEW objects into the EdgeDB database.

Consider the following code snippet for your schema:

```edgedb
module default {
    type Country {
        property name -> str;
        multi link has_states := .<in_country[is State];
        constraint exclusive on (.name);
    }

    type State {
        property name -> str;
        link in_country -> Country;
        multi link has_cities := .<in_state[is City];
        constraint exclusive on ((.name, .in_country));
    }

    type City {
        property name -> str;
        link in_state -> State;
        multi link has_communityAreas := .<in_city[is CommunityArea];
        multi link has_zipCodes := .<in_city[is ZipCode];
        constraint exclusive on ((.name, .in_state));
    }

    type CommunityArea {
        property name -> str;
        link in_city -> City;
        link has_zipcodes := .<in_communityArea[is ZipCode];
        multi link has_businesses := .<in_communityArea[is Business];
        constraint exclusive on ((.name, .in_city));
    }

    type ZipCode {
        property digits -> str;
        link in_city -> City;
        multi link in_communityArea -> CommunityArea;
        multi link has_streetAddresses := .<in_zipcode[is StreetAddress];
        constraint exclusive on (.digits);
    }

    type StreetAddress {
        property address1 -> str;
        property address2 -> str;
        property address3 -> str;
        link in_zipcode -> ZipCode;
        property coordinates -> tuple<latitude: float32, longitude: float32>;
        link has_business := .<has_address[is Business];
        constraint exclusive on (.coordinates);
    }

    type Business {
        property ID -> str;
        property name -> str;
        property alias -> str;
        property rating -> float32;
        property review_count -> int32;
        link in_communityArea -> CommunityArea;
        property coordinates -> tuple<latitude: float32, longitude: float32>;
        link has_address -> StreetAddress;
        multi link in_category -> Category;
        multi link has_reviews := .<reviews_business[is Review];
        constraint exclusive on (.ID);
    }

    type Category {
        property name -> str;
        link has_business := .<in_category[is Business];
        constraint exclusive on (.name);
    }

    type Review {
	    property review_id -> str;
        property text -> str;
        property rating -> float32;
        property time_created -> str;
        link reviews_business -> Business;
        link written_by -> User;
    }

    type User {
        property user_id -> str;
        multi link reviews := .<written_by[is Review];
        constraint exclusive on (.user_id);
    }
}
                                              
```
         
</div>                                              
                                             



<br><br><br>

<hr style="border:30px solid brown "> </hr>
<hr style="border:2px solid brown "> </hr>


# Build the Database

<hr style="border:2px solid brown "> </hr>



- The following code will insert the objects into the database.
<br>
- It takes roughly **20 minutes** to insert total of **13691** objects into the database.

<hr style="border:5px solid brown"> </hr>


In [26]:
client = edgedb.create_client()

# Consider the following code snippet

# for review in final__list_business_reviews_documents:
# INSERT BusinessReviewsObject ...
count = 0
IDs = []
chicago_community_areas = ["Rogers Park", "West Ridge", "Uptown", "Lincoln Square", "North Center", "Lake View", "Lincoln Park", "Chicago Loop", "Near North Side", "Edison Park", "Norwood Park", "Jefferson Park", "Forest Glen", "North Park", "Albany Park", "Portage Park", "Irving Park", "Dunning", "Montclare", "Belmont Cragin", "Hermosa", "Avondale", "Logan Square", "Humboldt Park", "West Town", "Austin", "West Garfield Park", "East Garfield Park", "Near West Side", "North Lawndale", "South Lawndale", "Lower West Side", "The Loop", "Loop", "Near South Side", "Armour Square", "Douglas", "Oakland", "Fuller Park", "Grand Boulevard", "Kenwood", "Washington Park", "Hyde Park", "Woodlawn", "South Shore", "Chatham", "Avalon Park", "South Chicago", "Burnside", "Calumet Heights", "Roseland", "Pullman", "South Deering", "East Side", "West Pullman", "Riverdale", "Hegewisch", "Garfield Ridge", "Archer Heights", "Brighton Park", "McKinley Park", "Bridgeport", "New City", "West Elsdon", "Gage Park", "Clearing", "West Lawn", "Chicago Lawn", "West Englewood", "Englewood", "Greater Grand Crossing", "Ashburn", "Auburn Gresham", "Beverly", "Washington Heights", "Mount Greenwood", "Morgan Park", "O'Hare", "Edgewater"]

for i in range(min(2000, len(list__business_reviews_documents))):
    review = list__business_reviews_documents[i]
    if review['id'] in IDs:
        continue
    IDs.append(review['id'])
    count += 1
    if count % 49 == 0:
        time.sleep(1)
        
    community_area = 'Chicago'
    categories = []
    if review['source']['coordinates']['latitude'] != None and review['source']['coordinates']['longitude'] != None:
        coordinates = (review['source']['coordinates']['latitude'], review['source']['coordinates']['longitude'])
    else:
        continue
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng='+str(review['source']['coordinates']['latitude'])+','+str(review['source']['coordinates']['longitude'])+'&key='+geocoder.ApiKey
    response = requests.get(url, verify=False).json()

    if response['results']:
        if response['results'][0] != None:
            for item in response['results'][0]['address_components']:
                if 'neighborhood' in item['types'] and item['long_name'] in chicago_community_areas:
                    community_area = item['long_name']
                   
    
        for category in review['source']['categories']:
            if category['title'] not in categories:
                categories.append(category['title'])
            client.query("""
                INSERT Category {
                    name := <str>$name
                } unless conflict on .name
            """, name=category['title'])

        client.query("""
            INSERT Country {
                name := <str>$name
            } unless conflict on .name
        """, name=review['source']['location']['country'])

        client.query("""
            INSERT State {
                name := <str>$name,
                in_country := (
                    select Country
                    filter
                        .name = <str>$country_name
                    limit 1
                )
            } unless conflict on (.name, .in_country)
        """, name=review['source']['location']['state'], country_name=review['source']['location']['country'])

        client.query("""
            INSERT City {
                name := <str>$name,
                in_state := (
                    select State
                    filter
                        .name = <str>$state_name
                    limit 1
                )
            } unless conflict on (.name, .in_state)
        """, name=review['source']['location']['city'], state_name=review['source']['location']['state'])

        client.query("""
            INSERT CommunityArea {
                name := <str>$name,
                in_city := (
                    select City
                    filter
                        .name = <str>$city_name
                    limit 1
                )
            } unless conflict on (.name, .in_city)
        """, name=community_area, city_name=review['source']['location']['city'])

        client.query("""
            INSERT ZipCode {
                digits := <str>$digits,
                in_city := (
                    select City
                    filter
                        .name = <str>$city_name
                    limit 1
                ),
                in_communityArea := (
                    select CommunityArea
                    filter
                        .name = <str>$community_area_name
                    limit 1
                )
            } unless conflict on .digits else (
                update ZipCode
                  set {
                    in_communityArea += (
                        select CommunityArea
                        filter
                            .name = <str>$community_area_name
                        limit 1
                    )
                  }
                )
        """, digits=review['source']['location']['zip_code'], city_name=review['source']['location']['city'], community_area_name=community_area)

        client.query("""
            INSERT StreetAddress {
                address1 := <str>$address1,
                address2 := <str>$address2,
                address3 := <str>$address3,
                coordinates := """ + str(coordinates) + """,
                in_zipcode := (
                    select ZipCode
                    filter
                        .digits = <str>$zip_code
                    limit 1
                )
            } unless conflict on .coordinates
        """, address1=review['source']['location']['address1'] if review['source']['location']['address1'] else '', address2=review['source']['location']['address2'] if review['source']['location']['address2'] else '', address3=review['source']['location']['address3'] if review['source']['location']['address3'] else '', zip_code=review['source']['location']['zip_code'])
            
        client.query("""
            INSERT Business {
                ID := <str>$id,
                name := <str>$name,
                alias := <str>$alias,
                rating := <float32>$rating,
                review_count := <int32>$review_count,
                coordinates := """ + str(coordinates) + """,
                in_communityArea := (
                    select CommunityArea
                    filter
                        .name = <str>$community_area_name
                    limit 1
                ),
                has_address := (
                    select StreetAddress
                    filter
                        .coordinates.latitude = <float32>$latitude and .coordinates.longitude = <float32>$longitude 
                    limit 1
                ),
                in_category := (
                    select Category
                    filter
                        .name in array_unpack(<array<str>>$category_array)
                )
            } unless conflict on .ID
        """, id=review['id'], name=review['source']['name'], alias=review['source']['alias'], rating=review['source']['rating'], review_count=review['source']['review_count'], community_area_name=community_area, latitude=coordinates[0], longitude=coordinates[1], category_array=categories)
    
    business_reviews = df__business_reviews[df__business_reviews['business_id'] == review['id']]

    for _, business_review in business_reviews.iterrows(): 
        client.query("""
            INSERT User {
               user_id := <str>$user_id
            } unless conflict on .user_id
        """, user_id=business_review['user']['id'])
            
        client.query("""
                    INSERT Review {
                        review_id := <str>$review_id,
                        text := <str>$review_text,
                        rating := <float32>$rating,
                        time_created :=<str>$review_time_created,
                        reviews_business := (
                            SELECT Business
                            FILTER .ID = <str>$business_id
                            LIMIT 1
                        ),
                     written_by := (
                                SELECT User
                                FILTER .user_id = <str>$user_id
                                LIMIT 1
                            )
                    }
                """, review_id=business_review['id'], business_id=review['id'], review_text=business_review['text'], review_time_created=business_review['time_created'], rating=business_review['rating'], user_id=business_review['user']['id'])
    

client.close()

<hr style="border:5px solid brown"> </hr>

### Note:

- If there are no errors, that means the insertion of the objects is completed successfully by now.
<br>
- Verify objects inserted into the database using the EdgeDB UI.

<hr style="border:5px solid brown"> </hr>


<div class="alert alert-block alert-danger">

Now is a good time to check the EdgeDB GUI (EdgeDB UI). You should notice that the object count is no longer zero.

For the remaining assignment requirements, we will use the Python EdgeDB client to execute EdgeQL commands. Afterward, we can verify our work by comparing the results with those from the EdgeDB UI.

</div>


<div class="alert alert-block alert-danger">
    
    

### Requirement 2: 
- Write and execute an EdgeQL/Python code to retrieve all businesses with 5 stars from Yelp Graph-Relational Data Model you created in **Requirement 1**  
    
- Compute the time it takes to execute this requirement with the Python EdgeQL client    

<br>
    
</div>

In [27]:
# Write your code here

client = edgedb.create_client()

# Consider the following code snippet

# Start the timer
start_time = time.time()

reviews_with_rating_5 = client.query('SELECT Business {name} FILTER .rating = <float32>$rating', rating = 5)

# Stop the timer
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

# Print the time needed to execute the requirement
print(f'\n\n\t\t Total Execution Time:  {elapsed_time:.2f} seconds \n\n')

for review in reviews_with_rating_5:
   print(review.name)

client.close()



		 Total Execution Time:  0.06 seconds 


Maggie Roeder Cakes
Rhoyal Decadence
Miro Statue
TNS Pop Shop
Mindworks
Yelp Elite Battle Of The Band T's @ RAISED Urban Rooftop Bar
State Street, That Great Street
Auntie Anne's
Picasso Statue
7-Eleven
Marshall Field and Company Building
Brews & Bites
Ron Christian
Biker Dude
Cascade Park
Sneakerhead University
Farmer's Fridge
The Good Eating Company
Fashion Focus
Occupy Chicago
Field Museum Dig Site
Louise Nevelson's Dawn Shadows
Project Windows Design Contest
Washington-Morris-Salomon Monument
Marla's Mandel Bread
Chicago Temple Building
Flash UYE: Chicago Cultural Center
TourBound Golf Academy
TJ's Heavenly Sweet
Sweetling Cupcakes
Clark Johnson Orchestra
Wolverine Trading Kitchen
Porter's Steakhouse
The Flight of Daedalus and Icarus By Roger Brown 1991
The Member Bar at Terzo Piano
Wells Street Bridge
Chick-fil-A
Bagels Time
Salads UP
Marshall Suloway Bridge
Cupcakes & Cocktails
River North Association
Verve Coffee Roasters
Wings of Mexi


<div class="alert alert-block alert-danger">
    
    

### Requirement 3: 
- Write and execute an EdgeQL/Python code to retrieve all businesses that offer **Pizza** in Zip-Code 60601 from Yelp Graph-Relational Data Model you created in **Requirement 1**  
    
- Compute the time it takes to execute this requirement with the Python EdgeQL client      

    
 **Note:** If you don't get "Pizza" results for zip-code 60601, try any of the following zip codes(60602, 60603, 60604, 60605, 60606)
<br>
    
</div>

In [28]:
# Write your code here



<div class="alert alert-block alert-danger">

### Requirement 4: 
- Write and execute EdgeQL/Python code to retrieve the top 5 businesses with the highest number of reviews that offer **Hot Dogs** in Zip Code 60601, using the Yelp Graph-Relational Data Model you created in **Requirement 1**.

- Measure the execution time of this requirement using the Python EdgeQL client.      
<br>

**Note:** If you don't get "Hot Dogs" results for zip-code 60601, try any of the following zip codes(60602, 60603, 60604, 60605, 60606)

    
</div>


In [29]:
# Write your code here



<div class="alert alert-block alert-danger">

### Requirement 5: 
- Write and execute EdgeQL/Python code to display 3 review excerpts of the top 3 businesses with the highest number of reviews offering **Italian** food in Zip Code 60601, utilizing the Yelp Graph-Relational Data Model you established in **Requirement 1**.
    - First, get the business IDs of the top 3 businesses with the highest number of reviews offering **Italian** food in Zip Code 60601 from your database
    - Then get the 3 review excerpts  for every business. 

- Measure the execution time for this requirement using the Python EdgeQL client.      
<br>

</div>



In [30]:
# Write your code here



<div class="alert alert-block alert-danger">

### Requirement 6: 
- Write and execute EdgeQL/Python code to retrieve the top 5 businesses with the highest number of reviews that offer **Hot Dogs** in the **Lincoln Park** Community Area, using the Yelp Graph-Relational Data Model you created in **Requirement 1**.


- Measure the execution time for this requirement using the Python EdgeQL client.  
<br>

</div>


In [31]:
# Write your code here


<div class="alert alert-block alert-danger">

### Requirement 7: 
- Write and execute EdgeQL/Python code to display 3 reviews of the top 3 businesses with the highest number of reviews that offer **Greek** food in the Community Area **Chicago Loop**, based on the Yelp Graph-Relational Data Model you developed in **Requirement 1**.
    - First, get the business IDs of the top 3 businesses with the highest number of reviews ohat offer **Greek** food in the **Chicago Loop** Community Area  from your database
    - Then  get the 3 review excerpts  for every business. 


- Measure the execution time for this requirement using the Python EdgeQL client.  
<br>

</div>


In [32]:
# Write your code here



            