

For this practice we want to store the Yelp shops and review in MongoDB and use that to run some queries.
Before starting this notebook, create a yelp dataset on MongoDB, then add reviews and shops collection to it. The notes on Setting up MongoDB show how you can do this.

** Submission Instruction **

* Make a copy and replace blank with yourname
* Complete the notebook and run all the cells
* Download the .ipybn
* Submit on Gradescope

In [1]:
!pip install yelpapi
!pip install "pymongo[srv]"

Collecting yelpapi
  Downloading yelpapi-2.5.1-py3-none-any.whl.metadata (1.3 kB)
Downloading yelpapi-2.5.1-py3-none-any.whl (7.4 kB)
Installing collected packages: yelpapi
Successfully installed yelpapi-2.5.1
Collecting pymongo[srv]
  Downloading pymongo-4.15.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (22 kB)
[0mCollecting dnspython<3.0.0,>=1.16.0 (from pymongo[srv])
  Downloading dnspython-2.8.0-py3-none-any.whl.metadata (5.7 kB)
Downloading dnspython-2.8.0-py3-none-any.whl (331 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m331.1/331.1 kB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pymongo-4.15.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m57.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.8.0 pymo

In [2]:
# Libraries
from yelpapi import YelpAPI
import pandas as pd
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi


In [7]:

yelp_api = YelpAPI('')

In [4]:
#connecting to MongoDB

#change uri with your connection string
uri = "client = MongoClient(uri, server_api=ServerApi('1'))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)


Pinged your deployment. You successfully connected to MongoDB!


In [5]:
#making sure that shops and review collections have been created!

db = client['yelp']
shops_collection = db['shops']
reviews_collection = db['reviews']

Let's search for taco shops and store to taco_shops, this time in Illinois


In [8]:
taco_shops = yelp_api.search_query(term = 'taco',  location = 'IL')

Let's first check if we got the shops correctly:

In [9]:
taco_shops['businesses'][0]

{'id': 'CADdsveMbJY-x_PFrsCkQw',
 'alias': 'el-tragon-taqueria-chicago-5',
 'name': 'El Tragon Taqueria',
 'image_url': 'https://s3-media0.fl.yelpcdn.com/bphoto/EANG1s0BojybA2qEx0w7Iw/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/el-tragon-taqueria-chicago-5?adjust_creative=xa8hs_B5jAwIuyuEtG_r0g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=xa8hs_B5jAwIuyuEtG_r0g',
 'review_count': 160,
 'categories': [{'alias': 'tacos', 'title': 'Tacos'}],
 'rating': 4.7,
 'coordinates': {'latitude': 41.904700405779366,
  'longitude': -87.64861798544506},
 'transactions': ['pickup', 'delivery'],
 'location': {'address1': '1234 N Halsted St',
  'address2': 'Unit C',
  'address3': '',
  'city': 'Chicago',
  'zip_code': '60642',
  'country': 'US',
  'state': 'IL',
  'display_address': ['1234 N Halsted St', 'Unit C', 'Chicago, IL 60642']},
 'phone': '+13123741458',
 'display_phone': '(312) 374-1458',
 'distance': 2412.6555644337254,
 'business_hours': [{'open': [{

Now we insert all of the documents to the shops collections:

`insert_many` is similar to `insert_one` but accept a list of document to insert.

In [10]:
result = shops_collection.insert_many(taco_shops['businesses'])
result

InsertManyResult([ObjectId('690e46bbf32eb86c842b88e6'), ObjectId('690e46bbf32eb86c842b88e7'), ObjectId('690e46bbf32eb86c842b88e8'), ObjectId('690e46bbf32eb86c842b88e9'), ObjectId('690e46bbf32eb86c842b88ea'), ObjectId('690e46bbf32eb86c842b88eb'), ObjectId('690e46bbf32eb86c842b88ec'), ObjectId('690e46bbf32eb86c842b88ed'), ObjectId('690e46bbf32eb86c842b88ee'), ObjectId('690e46bbf32eb86c842b88ef'), ObjectId('690e46bbf32eb86c842b88f0'), ObjectId('690e46bbf32eb86c842b88f1'), ObjectId('690e46bbf32eb86c842b88f2'), ObjectId('690e46bbf32eb86c842b88f3'), ObjectId('690e46bbf32eb86c842b88f4'), ObjectId('690e46bbf32eb86c842b88f5'), ObjectId('690e46bbf32eb86c842b88f6'), ObjectId('690e46bbf32eb86c842b88f7'), ObjectId('690e46bbf32eb86c842b88f8'), ObjectId('690e46bbf32eb86c842b88f9')], acknowledged=True)

Q1 [3 point]: Using find, list id (yelp id ) and name of all shops in the collection that are in Chicago. You need to use location.city

In [12]:
chicago_shops = shops_collection.find(
    {"location.city": "Chicago"},
    {"id": 1, "name": 1}
)

for shop in chicago_shops:
    print(shop)


{'_id': ObjectId('690e46bbf32eb86c842b88e6'), 'id': 'CADdsveMbJY-x_PFrsCkQw', 'name': 'El Tragon Taqueria'}
{'_id': ObjectId('690e46bbf32eb86c842b88e7'), 'id': 'HiCXpGgxqjHPIlLyoZ06yw', 'name': 'Carniceria Maribel'}
{'_id': ObjectId('690e46bbf32eb86c842b88e8'), 'id': 'Gr42JZ9GsKaESW-864rwFQ', 'name': 'Tepalcates'}
{'_id': ObjectId('690e46bbf32eb86c842b88e9'), 'id': 'VNTJYUiOj6hD2Uac_prtPQ', 'name': 'Taqueria Chingon'}
{'_id': ObjectId('690e46bbf32eb86c842b88ea'), 'id': '0Zu5S3zJvLC1Ve7kGAttDQ', 'name': 'Bárbaro Taquería'}
{'_id': ObjectId('690e46bbf32eb86c842b88eb'), 'id': '-HqkUZg3tu-ZAV1RhdzXHQ', 'name': 'Tacotlan'}
{'_id': ObjectId('690e46bbf32eb86c842b88ec'), 'id': 'oxJFCIVL9oQcmSIVSOBYew', 'name': 'Taco El Jalisciense'}
{'_id': ObjectId('690e46bbf32eb86c842b88ed'), 'id': 'Yzs5dkjw3hEkM00RgBwNag', 'name': 'Taco Loco Of Pilsen'}
{'_id': ObjectId('690e46bbf32eb86c842b88ee'), 'id': 'BGTumAF6UxyE-TrEB11WVg', 'name': 'Antique Taco '}
{'_id': ObjectId('690e46bbf32eb86c842b88ef'), 'id': '

Note that in the above documents 'id' is the yelp id for the document and '_id' is the MonogoDB id for the document.

In [13]:
#check  one review for one shop to see the structure:
reviews = yelp_api.reviews_query(id = 'htKi45T3U08n8FoHgTmLwg');
print(reviews['reviews'][0])

{'id': 'fEyRvZmfKOlhx5GAdNoSJw', 'url': 'https://www.yelp.com/biz/onions-and-cilantro-chicago?adjust_creative=xa8hs_B5jAwIuyuEtG_r0g&hrid=fEyRvZmfKOlhx5GAdNoSJw&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_reviews&utm_source=xa8hs_B5jAwIuyuEtG_r0g', 'text': "Our family's favorite Mexican food!  We order carry-out chicken and steak burritos from Onions and Cilantro a couple times a month.  Very clean and friendly...", 'rating': 5, 'time_created': '2025-10-30 15:40:17', 'user': {'id': 'bTd-1CglL6Ze4hrMvoNmJA', 'profile_url': 'https://www.yelp.com/user_details?userid=bTd-1CglL6Ze4hrMvoNmJA', 'image_url': 'https://s3-media0.fl.yelpcdn.com/photo/LBYEiYW1COl2P9Zu71_XNg/o.jpg', 'name': 'Laura T.'}}


Now for each document in the shop query, we use yelp to retrieve the reviews  and insert them to reviews collection at MongoDB. Reviews do not have shop name and the title, so add them to the reviews.

In [14]:
inserted_review = 0;
chicago_shops = shops_collection.find({}, {"id":1, "name":1})
for shop in chicago_shops:
  shop_id = shop['id'];
  shop_name = shop['name'];
  reviews = yelp_api.reviews_query(id = shop_id)
  for review in reviews['reviews']:
    review['shop_id'] = shop_id;
    review['shop_name'] = shop_name;
    reviews_collection.insert_one(review);
    inserted_review += 1;

print("Inserted ", inserted_review , " reviews into the reviews collection");

Inserted  140  reviews into the reviews collection


Q2: [4 points] Find all the reviews with the rating of greater than or equal to 4.5. Only keep the name of the shop, text, and rating (hint: use projection).

In [15]:
results = reviews_collection.find(
    {"rating": {"$gte": 4.5}},
    {"shop_name": 1, "text": 1, "rating": 1, "_id": 0}
)

for r in results:
    print(r)


{'text': 'Food: Amazing!! Very authentic and has a lot of flavor. My family got the chicken quesadillas and steak burrito. They were both delicious.\n\nService:...', 'rating': 5, 'shop_name': 'El Tragon Taqueria'}
{'text': 'For the best tacos in Chicago - look no further! \nWe were blown away by the food at El Tragón. The quesabirria and carne asada were outstanding, and the...', 'rating': 5, 'shop_name': 'El Tragon Taqueria'}
{'text': 'Best tacos in Chicago, no competition! Truly a hidden gem and a place I take a lot of pride bringing friends and family to, all who absolutely love it!...', 'rating': 5, 'shop_name': 'El Tragon Taqueria'}
{'text': 'The services was so quick! I ordered the rice beans with three tacos. I chose asada, pollo and al pastor. My favorite was the pollo taco it was flavorful....', 'rating': 5, 'shop_name': 'El Tragon Taqueria'}
{'text': 'Staff so nice and welcoming!! Food was delicious, loved the asada burrito and all of the salsas were tasty. Clean restaurant a

Q3 [4 points]: using `time_created` field in the reviews, find of all reviews that have been entered in 2024 (after '2024-01-01)'. Only keep the shop_name, text, rating, and time_created.

In [16]:
results = reviews_collection.find(
    {"time_created": {"$gte": "2024-01-01"}},
    {"shop_name": 1, "text": 1, "rating": 1, "time_created": 1, "_id": 0}
)

for r in results:
    print(r)

{'text': 'Food: Amazing!! Very authentic and has a lot of flavor. My family got the chicken quesadillas and steak burrito. They were both delicious.\n\nService:...', 'rating': 5, 'time_created': '2025-07-07 20:31:42', 'shop_name': 'El Tragon Taqueria'}
{'text': 'For the best tacos in Chicago - look no further! \nWe were blown away by the food at El Tragón. The quesabirria and carne asada were outstanding, and the...', 'rating': 5, 'time_created': '2025-06-03 21:34:53', 'shop_name': 'El Tragon Taqueria'}
{'text': 'Best tacos in Chicago, no competition! Truly a hidden gem and a place I take a lot of pride bringing friends and family to, all who absolutely love it!...', 'rating': 5, 'time_created': '2025-06-03 05:43:10', 'shop_name': 'El Tragon Taqueria'}
{'text': 'The services was so quick! I ordered the rice beans with three tacos. I chose asada, pollo and al pastor. My favorite was the pollo taco it was flavorful....', 'rating': 5, 'time_created': '2025-10-10 13:26:18', 'shop_name': 'E

Ok, now let's find the average rating for each shop

In [17]:
#To keep the the shop name, we can use "$first" which keep the first value of the requested field in all of the documents in the group.
#Since all of the documents would have the same shop name (we group them on shop_id), shop_name would be the same and $first gets the correct shop name

pipeline = [
    {
        "$group": {
            "_id": "$shop_id",
            "averageRating": { "$avg": "$rating" },
            "shop_name": { "$first": "$shop_name" }
        }
    },
    {
        "$project": {
            "shop_id": "$_id",
            "shop_name": 1,
            "averageRating": 1
        }
    }

]

result = reviews_collection.aggregate(pipeline)

# Print results
for doc in result:
    print(doc)

{'_id': 'BGTumAF6UxyE-TrEB11WVg', 'averageRating': 4.142857142857143, 'shop_name': 'Antique Taco ', 'shop_id': 'BGTumAF6UxyE-TrEB11WVg'}
{'_id': 'Tkcjl_dPhBqx_8b59WByAA', 'averageRating': 5.0, 'shop_name': 'Tacos & Burrito Express 3', 'shop_id': 'Tkcjl_dPhBqx_8b59WByAA'}
{'_id': 'VNTJYUiOj6hD2Uac_prtPQ', 'averageRating': 4.571428571428571, 'shop_name': 'Taqueria Chingon', 'shop_id': 'VNTJYUiOj6hD2Uac_prtPQ'}
{'_id': 'cO0DycIvBZ2Gwk2j2DJAOA', 'averageRating': 4.285714285714286, 'shop_name': 'Don Pez Tacos Cantina', 'shop_id': 'cO0DycIvBZ2Gwk2j2DJAOA'}
{'_id': '-HqkUZg3tu-ZAV1RhdzXHQ', 'averageRating': 4.0, 'shop_name': 'Tacotlan', 'shop_id': '-HqkUZg3tu-ZAV1RhdzXHQ'}
{'_id': 'HiCXpGgxqjHPIlLyoZ06yw', 'averageRating': 4.428571428571429, 'shop_name': 'Carniceria Maribel', 'shop_id': 'HiCXpGgxqjHPIlLyoZ06yw'}
{'_id': 'RIEypHKiSCZB5mJJJQNMcQ', 'averageRating': 5.0, 'shop_name': 'Birrieria Zaragoza', 'shop_id': 'RIEypHKiSCZB5mJJJQNMcQ'}
{'_id': 'Gr42JZ9GsKaESW-864rwFQ', 'averageRating': 4.85

Q4 [4 points]: Find the averate rating of each shop for the only reviews that have been entered in 2024, and return top 10 shops with highest average rating (all with one pipeline)

In [18]:
#Hint: you have to add a couple of stages to the previous pipline

pipeline = [
    {
        "$match": {
            "time_created": {"$gte": "2024-01-01"}
        }
    },
    {
        "$group": {
            "_id": "$shop_id",
            "averageRating": { "$avg": "$rating" },
            "shop_name": { "$first": "$shop_name" }
        }
    },
    {
        "$project": {
            "shop_id": "$_id",
            "shop_name": 1,
            "averageRating": 1
        }
    },

]

result = reviews_collection.aggregate(pipeline)

# Print results
for doc in result:
    print(doc)


{'_id': 'cO0DycIvBZ2Gwk2j2DJAOA', 'averageRating': 4.285714285714286, 'shop_name': 'Don Pez Tacos Cantina', 'shop_id': 'cO0DycIvBZ2Gwk2j2DJAOA'}
{'_id': 'RI_TzjULvOknWecEq8cBQA', 'averageRating': 4.571428571428571, 'shop_name': 'Taquizas Valdez', 'shop_id': 'RI_TzjULvOknWecEq8cBQA'}
{'_id': 'BGTumAF6UxyE-TrEB11WVg', 'averageRating': 4.142857142857143, 'shop_name': 'Antique Taco ', 'shop_id': 'BGTumAF6UxyE-TrEB11WVg'}
{'_id': 'mzdaPepjOxUCqA7AwOr7kQ', 'averageRating': 4.714285714285714, 'shop_name': 'Taqueria Mazamitla', 'shop_id': 'mzdaPepjOxUCqA7AwOr7kQ'}
{'_id': 'CADdsveMbJY-x_PFrsCkQw', 'averageRating': 4.857142857142857, 'shop_name': 'El Tragon Taqueria', 'shop_id': 'CADdsveMbJY-x_PFrsCkQw'}
{'_id': 'oxJFCIVL9oQcmSIVSOBYew', 'averageRating': 5.0, 'shop_name': 'Taco El Jalisciense', 'shop_id': 'oxJFCIVL9oQcmSIVSOBYew'}
{'_id': 'L12zP2yCgoKm_NLwW_HIDw', 'averageRating': 3.4285714285714284, 'shop_name': 'Quesabirria Jalisco', 'shop_id': 'L12zP2yCgoKm_NLwW_HIDw'}
{'_id': '0Zu5S3zJvLC1V

In [None]:
client.close()