# Zurich Restaurant Insights: <br> Analyzing and Visualizing Culinary Diversity

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Summary" data-toc-modified-id="Summary-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Summary</a></span></li><li><span><a href="#Requirements-&amp;-Configuration" data-toc-modified-id="Requirements-&amp;-Configuration-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Requirements &amp; Configuration</a></span></li><li><span><a href="#ELT-Process" data-toc-modified-id="ELT-Process-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>ELT Process</a></span><ul class="toc-item"><li><span><a href="#DB-Setup" data-toc-modified-id="DB-Setup-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>DB Setup</a></span></li><li><span><a href="#Extract" data-toc-modified-id="Extract-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Extract</a></span></li><li><span><a href="#Load" data-toc-modified-id="Load-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Load</a></span></li><li><span><a href="#Transform" data-toc-modified-id="Transform-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Transform</a></span></li><li><span><a href="#Datastructure" data-toc-modified-id="Datastructure-3.5"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>Datastructure</a></span></li></ul></li><li><span><a href="#Data-analysis" data-toc-modified-id="Data-analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Data analysis</a></span><ul class="toc-item"><li><span><a href="#Type-of-restaurants" data-toc-modified-id="Type-of-restaurants-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Type of restaurants</a></span></li><li><span><a href="#Geospatial-Analysis" data-toc-modified-id="Geospatial-Analysis-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Geospatial Analysis</a></span></li><li><span><a href="#Top-10-postal-codes" data-toc-modified-id="Top-10-postal-codes-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Top 10 postal codes</a></span></li><li><span><a href="#Vegan-restaurants" data-toc-modified-id="Vegan-restaurants-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Vegan restaurants</a></span></li><li><span><a href="#Open-days" data-toc-modified-id="Open-days-4.5"><span class="toc-item-num">4.5&nbsp;&nbsp;</span>Open days</a></span></li></ul></li><li><span><a href="#Conclusions" data-toc-modified-id="Conclusions-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Conclusions</a></span></li><li><span><a href="#Learnings" data-toc-modified-id="Learnings-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Learnings</a></span></li></ul></div>

## Summary

Zürich, Switzerland's largest city, is a renowned attraction for tourists and residents alike. It offers a blend of natural beauty, cultural vibrancy, and a diverse culinary scene. Throughout the year, a multitude of individuals visit Zürich, either for leisure or exploration. Regardless of their origin, there's a common quest: finding the perfect place to dine, whether dining in or taking away.

This study conducts an in-depth examination of a dataset encompassing 515 restaurants within the city of Zürich. The dataset was sourced from Zürich's Open Data platform, available at: https://data.stadt-zuerich.ch/dataset/zt_gastronomie 
This dataset includes various fields such as address, category, copyright holder, description, geo-coordinates, and more. 

The journey begins with the extraction of the dataset via API and its subsequent storage in a MongoDB database. A series of data transformations follows, including the handling of missing information, removal of fields, the creation of new attributes, and field renaming. Once the data is prepared, an in-depth analyses is conducted to uncover insights about restaurant types, cuisine diversity, and spatial distribution.

<div style="height:50px;display:inline-block;width:100%;background:#fafafa;"></div>

##  Requirements & Configuration

In [3]:
! pip3 list | findstr "pymongo dnspython pandas"

dnspython                         2.4.2
pandas                            2.1.0
pymongo                           4.5.0


In [4]:
import pymongo
from pprint import pprint
import pandas as pd
import requests
import json
import time
import string
from bs4 import BeautifulSoup
import re

In [5]:
# pandas configuration
pd.set_option('display.precision', 2)
pd.set_option('display.max_rows', 30)
pd.set_option('display.max_colwidth', 50)

In [6]:
# API and Database details
API_URL = "https://www.zuerich.com/en/api/v2/data?id=101"
CNX_STR = "localhost:27017"
DB_NAME = "stadtzuerich"
COLL_NAME = "restaurants"

<div style="height:350px;display:inline-block;width:100%;background:#fafafa;"></div>

## ELT Process

<img src="dataFlow_OV.png" style="height:400px;">
This diagram displays the ELT (Extract, Load, Transform) process. Data was extracted from the opendata Zürich homepage and saved in its raw form in MongoDB. The next step involved data transformation, which was performed using pymongo and different pipelines. The resulting changes were then saved back into MongoDB. Following the transformation, the data analysis was conducted using the transformed dataset.

### DB Setup

In [7]:
# connection to MongoDB
client = pymongo.MongoClient(CNX_STR)
db = client[DB_NAME]
restaurants = db[COLL_NAME]

In [8]:
# Remove all existing documents
restaurants.drop()
restaurants.count_documents({})

0

<div style="height:100px;display:inline-block;width:100%;background:#fafafa;"></div>

###  Extract

In [14]:
# set the headers for the POST request
headers = {"Content-Type": "application/json"}

# make the GET request and download all restaurants
r = requests.get(API_URL, headers=headers)
data = r.json()

# inspect first restaurant
restaurant = data[0]

# hide fields with None values
def remove_none_values(restaurant):
    if isinstance(restaurant, dict):
        for key, value in list(restaurant.items()):
            if value is None:
                del restaurant[key]
            else:
                remove_none_values(value)
    elif isinstance(restaurant, list):
        for item in restaurant:
            remove_none_values(item)

remove_none_values(restaurant)

In [15]:
# create a new dictionary with only 'en' values from nested dictionaries
en_restaurant = {}

def get_en(restaurant):
    for key, value in restaurant.items():
        if isinstance(value, (dict)):
            if 'en' in value:
                en_restaurant[key] = {'en': value['en']}
            else:
                en_restaurant[key] = value 
        else:
            en_restaurant[key] = value
            
get_en(restaurant)

pprint(en_restaurant)


{'@context': 'https://schema.org/',
 '@type': 'LocalBusiness',
 'address': {'addressCountry': 'CH',
             'addressLocality': 'Zürich',
             'email': 'vegan@stiftung-enzian.ch',
             'postalCode': '8050',
             'streetAddress': 'Binzmühlestrasse 41',
             'telephone': '+41 43 333 55 45',
             'url': 'https://stiftung-enzian.ch/baeckerei/'},
 'category': {'Cakes': {'swissId': ''},
              'Coffee': {'swissId': ''},
              'Coffee Houses & Tea Rooms': {'swissId': ''},
              'Confectionery': {'swissId': ''},
              'Gastronomy': {'swissId': ''},
              'Restaurants': {'swissId': ''},
              'Tea': {'swissId': ''}},
 'copyrightHolder': {'en': 'Zurich Tourism www.zuerich.com'},
 'dateModified': '2023-10-23T09:04',
 'description': {'en': '<p>Vegan food enthusiasts will find the Enzian bakery '
                       'and caf&eacute; directly next to Oerlikon '
                       'Station.<p>Whether a c

<div style="height:50px;display:inline-block;width:100%;background:#fafafa;"></div>

In [16]:
# extract values from single restaurant and only the english part
def extract_values(restaurant):
    values = {
        'name': restaurant['name']['en'],
        'disambiguatingDescription': restaurant['disambiguatingDescription']['en'],
        'description': restaurant['description']['en'],
        'titleTeaser': restaurant['titleTeaser']['en'],
        'textTeaser': restaurant['textTeaser']['en'],
        'detailedInformation': restaurant['detailedInformation']['en'],
        'dateModified': restaurant['dateModified'],
        'openingHours': restaurant['openingHours'],
        'address': restaurant['address'],
        'geoCoordinates': restaurant['geoCoordinates']       
    }
            
    return values

In [17]:
# test extraction for one restaurant
doc = extract_values(restaurant)
pprint(doc)

{'address': {'addressCountry': 'CH',
             'addressLocality': 'Zürich',
             'email': 'vegan@stiftung-enzian.ch',
             'postalCode': '8050',
             'streetAddress': 'Binzmühlestrasse 41',
             'telephone': '+41 43 333 55 45',
             'url': 'https://stiftung-enzian.ch/baeckerei/'},
 'dateModified': '2023-10-23T09:04',
 'description': '<p>Vegan food enthusiasts will find the Enzian bakery and '
                'caf&eacute; directly next to Oerlikon Station.<p>Whether a '
                'coffee and croissant, a sandwich, a menu of the day, or a '
                'dessert: the bakery offers a huge selection of vegan treats. '
                'Customers can then decide for themselves if their purchases '
                'are to take out or for leisurely consumption in the adjoining '
                'caf&eacute;.</p><p>All products are made daily in the '
                'in-house bakery and confectioner&rsquo;s, and the lunch menus '
            

<div style="height:150px;display:inline-block;width:100%;background:#fafafa;"></div>

### Load

In [18]:
# insert the list of restaurants(=documents) into MongoDB collection "restaurant"
restaurants.insert_many(data)

<pymongo.results.InsertManyResult at 0x2d7e35ef460>

In [19]:
# count number of documents inserted
restaurants.count_documents({})

515

In [21]:
# get 5 restaurants from MongoDB and display as datframe
r = restaurants.aggregate([
      {"$limit": 5},
])

pd.DataFrame(r)

Unnamed: 0,_id,@context,@type,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,...,opens,openingHours,specialOpeningHoursSpecification,address,geoCoordinates,place,@customType,tomasBookingId,zurichCardDescription,openingHoursSpecification
0,654314b656b2f9360c0b7949,https://schema.org/,LocalBusiness,1011403,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Frische, saisonal und vegane Gerichte ...",{'de': '<p>Direkt am Bahnhof Oerlikon gelegen ...,...,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...",{},"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}",[],,,,
1,654314b656b2f9360c0b794a,https://schema.org/,Restaurant,1011361,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Hier tauchen die Gäste in eine kulinar...,{'de': '<p>Nur 6 Minuten vom Hauptbahnhof entf...,...,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}",[],,,,
2,654314b656b2f9360c0b794b,https://schema.org/,Restaurant,1011353,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Bodega Española', 'en': 'Bodega Españo...",{'de': 'Kult in Zürich: Die älteste spanische ...,{'de': '<p>In der Bodega Espa&ntilde;ola erleb...,...,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...",{'de': '<p>Die Tapas-Bar ist von Montag bis So...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}",[],,,,
3,654314b656b2f9360c0b794c,https://schema.org/,Restaurant,1011351,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Stripped bedeutet; frei von jeglichen ...,"{'de': '<p>&nbsp; <p>Ein Ort, an dem sich die ...",...,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]",{'de': '<p>Stripped Pizza an der Seefeldstrass...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}",[Indoors],,,,
4,654314b656b2f9360c0b794d,https://schema.org/,BarOrPub,1011318,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Irische Gastfreundschaft, gemütliche A...",{'de': '<p>Mitten im Gesch&auml;ftsviertel von...,...,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}",[],,,,


### Transform

In [489]:
# check document structure
r = restaurants.aggregate([
    {"$project": {"_id": 0}},
    {"$limit": 5},
])
pd.DataFrame(r)

Unnamed: 0,@context,@type,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,titleTeaser,...,opens,openingHours,specialOpeningHoursSpecification,address,geoCoordinates,place,@customType,tomasBookingId,zurichCardDescription,openingHoursSpecification
0,https://schema.org/,LocalBusiness,1011403,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Frische, saisonal und vegane Gerichte ...",{'de': '<p>Direkt am Bahnhof Oerlikon gelegen ...,"{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...",...,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...",{},"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}",[],,,,
1,https://schema.org/,Restaurant,1011361,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Hier tauchen die Gäste in eine kulinar...,{'de': '<p>Nur 6 Minuten vom Hauptbahnhof entf...,"{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",...,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}",[],,,,
2,https://schema.org/,Restaurant,1011353,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Bodega Española', 'en': 'Bodega Españo...",{'de': 'Kult in Zürich: Die älteste spanische ...,{'de': '<p>In der Bodega Espa&ntilde;ola erleb...,"{'de': 'Bodega Española', 'en': 'Bodega Españo...",...,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...",{'de': '<p>Die Tapas-Bar ist von Montag bis So...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}",[],,,,
3,https://schema.org/,Restaurant,1011351,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Stripped bedeutet; frei von jeglichen ...,"{'de': '<p>&nbsp; <p>Ein Ort, an dem sich die ...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",...,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]",{'de': '<p>Stripped Pizza an der Seefeldstrass...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}",[Indoors],,,,
4,https://schema.org/,BarOrPub,1011318,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Irische Gastfreundschaft, gemütliche A...",{'de': '<p>Mitten im Gesch&auml;ftsviertel von...,"{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...",...,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}",[],,,,


In [490]:
#find how many null values are in each fields
r = restaurants.aggregate([{"$project": { "_id": 0 }}])

df = pd.DataFrame(r)

null_counts = df.isna().sum()
print(null_counts)

@context                              0
@type                                 0
identifier                            0
copyrightHolder                       0
license                               0
category                              0
name                                  0
disambiguatingDescription             0
description                           0
titleTeaser                           0
textTeaser                            0
detailedInformation                   0
zurichCard                            0
osm_id                                0
image                                 0
price                                 0
photo                                97
dateModified                          0
opens                                 5
openingHours                         33
specialOpeningHoursSpecification      0
address                               0
geoCoordinates                        0
place                                 0
@customType                         515


In [491]:
#find how many empty arrays are there
empty_array_counts = df.map(lambda x: isinstance(x, (list, dict)) and len(x) == 0).sum()
empty_array_counts

@context                              0
@type                                 0
identifier                            0
copyrightHolder                       0
license                               0
category                              0
name                                  0
disambiguatingDescription             0
description                           0
titleTeaser                           0
textTeaser                            0
detailedInformation                   0
zurichCard                            0
osm_id                                0
image                                 0
price                                 1
photo                                10
dateModified                          0
opens                                27
openingHours                          0
specialOpeningHoursSpecification      1
address                               0
geoCoordinates                        0
place                               444
@customType                           0


In [492]:
# Counting empty strings
empty_string_counts = (df == "").sum()
empty_string_counts

@context                             0
@type                                0
identifier                           0
copyrightHolder                      0
license                              0
category                             0
name                                 0
disambiguatingDescription            0
description                          0
titleTeaser                          0
textTeaser                           0
detailedInformation                  0
zurichCard                           0
osm_id                              83
image                                0
price                                0
photo                                0
dateModified                         0
opens                                0
openingHours                         0
specialOpeningHoursSpecification     0
address                              0
geoCoordinates                       0
place                                0
@customType                          0
tomasBookingId           

In [493]:
# drop fields based on the null values and empty arrays
pipeline = [
    {"$unset": "@context"},
    {"$unset": "@customType"},
    {"$unset": "tomasBookingId"},
    {"$unset": "zurichCardDescription"},
    {"$unset": "openingHoursSpecification"},
    {"$unset": "place"},
    {"$unset": "osm_id"},
    {"$unset": "photo"},
    {"$unset": "image"},
    {"$out": "restaurants"},
]
r = restaurants.aggregate(pipeline)

In [494]:
# check document structure
r = restaurants.aggregate([
    {"$project": {"_id": 0}},
    {"$limit": 5},
])
pd.DataFrame(r)

Unnamed: 0,@type,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,titleTeaser,textTeaser,detailedInformation,zurichCard,price,dateModified,opens,openingHours,specialOpeningHoursSpecification,address,geoCoordinates
0,LocalBusiness,1011403,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Frische, saisonal und vegane Gerichte ...",{'de': '<p>Direkt am Bahnhof Oerlikon gelegen ...,"{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Vegane Sandwiches, Torten und wechseln...","{'de': ['Direkt beim Bahnhof Oerlikon', 'Vegan...",False,{},2023-10-23T09:04,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...",{},"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}"
1,Restaurant,1011361,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Hier tauchen die Gäste in eine kulinar...,{'de': '<p>Nur 6 Minuten vom Hauptbahnhof entf...,"{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Mediterrane Speisen in einem elegante ...,"{'de': ['Mediterrane Speisen, Fokus Griechenla...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-09-28T10:52,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}"
2,Restaurant,1011353,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Bodega Española', 'en': 'Bodega Españo...",{'de': 'Kult in Zürich: Die älteste spanische ...,{'de': '<p>In der Bodega Espa&ntilde;ola erleb...,"{'de': 'Bodega Española', 'en': 'Bodega Españo...","{'de': 'Spanische Tapas, Weine und traditionel...","{'de': ['Traditionslokal ', 'Tapas', 'Sherry- ...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-08-25T08:52,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...",{'de': '<p>Die Tapas-Bar ist von Montag bis So...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}"
3,Restaurant,1011351,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Stripped bedeutet; frei von jeglichen ...,"{'de': '<p>&nbsp; <p>Ein Ort, an dem sich die ...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Pizzen ohne Zusatzstoffe und mit einer...,"{'de': ['Vier Sorten Pizzateig', 'Take Away mö...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-08-28T12:03,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]",{'de': '<p>Stripped Pizza an der Seefeldstrass...,"{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}"
4,BarOrPub,1011318,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Irische Gastfreundschaft, gemütliche A...",{'de': '<p>Mitten im Gesch&auml;ftsviertel von...,"{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Fish’n’Chips, Craft Beer und Sportüber...","{'de': ['Irisches Pub', 'Traditionelle Einrich...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-09-05T10:47,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'de': None, 'en': None, 'fr': None, 'it': None}","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}"


In [495]:
# Create an aggregation pipeline to rename the fields
pipeline = [
    {
        "$project": {
            "specialOpeningHours": "$specialOpeningHoursSpecification",
            "type": "$@type"
        }
    }
]

r = restaurants.aggregate(pipeline)

# update the collection with the renamed fields
for doc in r:
    restaurants.update_one({"_id": doc["_id"]}, {"$set": doc})

# drop the old field
pipeline2 = [
    {"$unset": "@type"},
    {"$unset": "specialOpeningHoursSpecification"},
    {"$out": "restaurants"},
]
r = restaurants.aggregate(pipeline2)


In [496]:
#remove <p> from the description and specialOpeningHours field

for doc in restaurants.find({}):
    if "description" in doc:
        for key, value in doc["description"].items():
            soup = BeautifulSoup(doc["description"][f"{key}"], "html.parser")
            clean_description = re.sub(r"<.*?>", "", soup.get_text())
            doc["description"][f"{key}"] = clean_description
            restaurants.replace_one({"_id": doc["_id"]}, doc)
            
        

In [497]:
#remove <p> from description and specialOpeningHours nested fields
# define fields to check
fields_to_clean = ["description", "specialOpeningHours"]

for doc in restaurants.find({}):
    for field in fields_to_clean:
        if field in doc and doc[field] is not None:
            for key, value in doc[field].items():
                if doc[field][f"{key}"] is not None:
                    soup = BeautifulSoup(doc[field][f"{key}"], "html.parser")
                    clean_text = re.sub(r"<.*?>", "", soup.get_text())
                    doc[field][f"{key}"] = clean_text
    # Update the document in the database
    restaurants.replace_one({"_id": doc["_id"]}, doc)     

In [498]:
# check document structure
r = restaurants.aggregate([
    {"$project": {"_id": 0}},
    {"$limit": 5},
])
pd.DataFrame(r)

Unnamed: 0,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,titleTeaser,textTeaser,detailedInformation,zurichCard,price,dateModified,opens,openingHours,address,geoCoordinates,specialOpeningHours,type
0,1011403,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Frische, saisonal und vegane Gerichte ...",{'de': 'Direkt am Bahnhof Oerlikon gelegen fin...,"{'de': 'Enzian – Vegane Bäckerei', 'en': 'Enzi...","{'de': 'Vegane Sandwiches, Torten und wechseln...","{'de': ['Direkt beim Bahnhof Oerlikon', 'Vegan...",False,{},2023-10-23T09:04,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}",{},LocalBusiness
1,1011361,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Hier tauchen die Gäste in eine kulinar...,{'de': 'Nur 6 Minuten vom Hauptbahnhof entfern...,"{'de': 'ASTRA Kitchen & Bar', 'en': 'ASTRA Kit...",{'de': 'Mediterrane Speisen in einem elegante ...,"{'de': ['Mediterrane Speisen, Fokus Griechenla...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-09-28T10:52,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}","{'de': None, 'en': None, 'fr': None, 'it': None}",Restaurant
2,1011353,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Bodega Española', 'en': 'Bodega Españo...",{'de': 'Kult in Zürich: Die älteste spanische ...,{'de': 'In der Bodega Española erleben Besuche...,"{'de': 'Bodega Española', 'en': 'Bodega Españo...","{'de': 'Spanische Tapas, Weine und traditionel...","{'de': ['Traditionslokal ', 'Tapas', 'Sherry- ...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-08-25T08:52,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}",{'de': 'Die Tapas-Bar ist von Montag bis Sonnt...,Restaurant
3,1011351,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Stripped bedeutet; frei von jeglichen ...,"{'de': ' Ein Ort, an dem sich die Mission der...","{'de': 'Stripped Pizza', 'en': 'Stripped Pizza...",{'de': 'Pizzen ohne Zusatzstoffe und mit einer...,"{'de': ['Vier Sorten Pizzateig', 'Take Away mö...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-08-28T12:03,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}",{'de': 'Stripped Pizza an der Seefeldstrasse 8...,Restaurant
4,1011318,"{'de': 'Zürich Tourismus www.zuerich.com', 'en...",BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...","{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Irische Gastfreundschaft, gemütliche A...",{'de': 'Mitten im Geschäftsviertel von Zürich ...,"{'de': 'Paddy Reilly's Pub', 'en': 'Paddy Reil...","{'de': 'Fish’n’Chips, Craft Beer und Sportüber...","{'de': ['Irisches Pub', 'Traditionelle Einrich...",False,"{'de': None, 'en': None, 'fr': None, 'it': None}",2023-09-05T10:47,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}","{'de': None, 'en': None, 'fr': None, 'it': None}",BarOrPub


In [499]:
# filter and keep only the "en" values
pipeline = [
    {
        "$project": {
            "copyrightHolder": "$copyrightHolder.en",
            "name": "$name.en",
            "disambiguatingDescription": "$disambiguatingDescription.en",
            "description": "$description.en",
            "titleTeaser": "$titleTeaser.en",
            "textTeaser": "$textTeaser.en",
            "detailedInformation": "$detailedInformation.en",
            "price": "$price.en",
            "specialOpeningHours": "$specialOpeningHours.en",
        }
    }
]

r = restaurants.aggregate(pipeline)

# update the collection with the new fields
for doc in r:
    restaurants.update_one({"_id": doc["_id"]}, {"$set": doc})



In [500]:
# check document structure
r = restaurants.aggregate([
    {"$project": {"_id": 0}},
    {"$limit": 5},
])
pd.DataFrame(r)

Unnamed: 0,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,titleTeaser,textTeaser,detailedInformation,zurichCard,price,dateModified,opens,openingHours,address,geoCoordinates,specialOpeningHours,type
0,1011403,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Enzian – Vegan Bakery,"Fresh, seasonal, and vegan dishes delight visi...",Vegan food enthusiasts will find the Enzian ba...,Enzian – Vegan Bakery,"Vegan sandwiches, cakes, and lunch menus are a...","[Directly next to Oerlikon Station, Vegan, reg...",False,{},2023-10-23T09:04,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}",{},LocalBusiness
1,1011361,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",ASTRA Kitchen & Bar,Here guests can immerse themselves in culinary...,Located just 6 minutes from Zurich Main Statio...,ASTRA Kitchen & Bar,Mediterranean food amidst an elegant interior ...,"[Mediterranean dishes, focus on Greece, Lunch ...",False,,2023-09-28T10:52,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}",,Restaurant
2,1011353,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Bodega Española,Cult in Zurich: the oldest Spanish wine store ...,"At the Bodega Española, visitors can experienc...",Bodega Española,"Spanish tapas, wines, and traditional dishes a...","[Traditional restaurant , Tapas, Rare sherries...",False,,2023-08-25T08:52,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}",The tapas bar is open from Monday to Sunday co...,Restaurant
3,1011351,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Stripped Pizza,Stripped means free of any additives – so thes...,A place where the mission of offering healthy ...,Stripped Pizza,Assemble your own pizza without additives and ...,"[Four types of pizza dough, Take-out also poss...",False,,2023-08-28T12:03,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}",Stripped Pizza at Seefeldstrasse 88 is also op...,Restaurant
4,1011318,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Paddy Reilly's Pub,"Irish hospitality, a cozy atmosphere, and excl...",Located in the heart of Zurich’s business dist...,Paddy Reilly's Pub,"Fish & chips, craft beer, and sports broadcast...","[Irish pub, Traditional interior, Shows live s...",False,,2023-09-05T10:47,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}",,BarOrPub


In [501]:
# Data quality checks

#check for missing values in each field
r = restaurants.aggregate([{"$project": { "_id": 0 }}])

df = pd.DataFrame(r)

null_counts = df.isna().sum()
print(null_counts)


identifier                     0
copyrightHolder                0
license                        0
category                       0
name                           0
disambiguatingDescription      0
description                    0
titleTeaser                    0
textTeaser                     1
detailedInformation            0
zurichCard                     0
price                        514
dateModified                   0
opens                          5
openingHours                  33
address                        0
geoCoordinates                 0
specialOpeningHours          422
type                           0
dtype: int64


In [502]:
# in price 514 out of 515 values are missing. this was not visible in the first run at the 
# beginning of the transformation as at that time this field had nested values that were empty 
# and we didn't check if the nested values were empty
# in special Opening Hours 422 values are missing

# remove both fields
pipeline = [
    {"$unset": "price"},
    {"$unset": "specialOpeningHours"},
    {"$out": "restaurants"},
]
r = restaurants.aggregate(pipeline)

In [503]:
# check if there are string values in the following fields that should be an array with nested values
fields_to_check = ["address", "geoCoordinates"]

for field in fields_to_check:
    str_type = restaurants.find({f"{field}": {"$type": "string"}})
    # count the number of documents with string
    count = 0
    for doc in str_type:
        count += 1
    print(f"{field}", count)
        


address 0
geoCoordinates 0


In [504]:
# verify if there are any duplicates - the field identifier should be unique
pipeline = [
    {
        "$group": {
            "_id": f"${unique_field}", "count": {"$sum": 1}
        }
    },
    {
        "$match": {
            "count": {"$gt": 1}
        }
    }
]
duplicate = restaurants.aggregate(pipeline)

count = 0

for doc in duplicate:
    count += 1

print(count)

0


In [505]:
# check if there are any values in zurichCare that is not False. 
#If there are none then the field can be removed as well
zurichCard_true = restaurants.count_documents({"zurichCard": {"$ne": False}})

zurichCard_true

14

In [506]:
# check document structure
r = restaurants.aggregate([
    {"$project": {"_id": 0}},
    {"$limit": 5},
])
pd.DataFrame(r)

Unnamed: 0,identifier,copyrightHolder,license,category,name,disambiguatingDescription,description,titleTeaser,textTeaser,detailedInformation,zurichCard,dateModified,opens,openingHours,address,geoCoordinates,type
0,1011403,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Enzian – Vegan Bakery,"Fresh, seasonal, and vegan dishes delight visi...",Vegan food enthusiasts will find the Enzian ba...,Enzian – Vegan Bakery,"Vegan sandwiches, cakes, and lunch menus are a...","[Directly next to Oerlikon Station, Vegan, reg...",False,2023-10-23T09:04,[],"[Mo,Tu,We,Th,Fr 07:00:00-18:00:00, Sa 07:00:00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.413735, 'longitude': 8.545272}",LocalBusiness
1,1011361,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",ASTRA Kitchen & Bar,Here guests can immerse themselves in culinary...,Located just 6 minutes from Zurich Main Statio...,ASTRA Kitchen & Bar,Mediterranean food amidst an elegant interior ...,"[Mediterranean dishes, focus on Greece, Lunch ...",False,2023-09-28T10:52,[],"[Tu,We,Th 11:30:00-00:00:00, Fr 11:30:00-01:45...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.374393, 'longitude': 8.535549}",Restaurant
2,1011353,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Bodega Española,Cult in Zurich: the oldest Spanish wine store ...,"At the Bodega Española, visitors can experienc...",Bodega Española,"Spanish tapas, wines, and traditional dishes a...","[Traditional restaurant , Tapas, Rare sherries...",False,2023-08-25T08:52,[],"[Su,Tu,We,Th,Fr,Sa 12:00:00-14:00:00, Su,Tu,We...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371221, 'longitude': 8.544161}",Restaurant
3,1011351,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Stripped Pizza,Stripped means free of any additives – so thes...,A place where the mission of offering healthy ...,Stripped Pizza,Assemble your own pizza without additives and ...,"[Four types of pizza dough, Take-out also poss...",False,2023-08-28T12:03,"[Monday, Tuesday, Wednesday, Thursday, Friday,...","[Su,Mo,Tu,We,Th,Fr,Sa 11:30:00-22:00:00]","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.371762, 'longitude': 8.534769}",Restaurant
4,1011318,Zurich Tourism www.zuerich.com,BY-SA,"{'Gastronomy': {'swissId': ''}, 'Restaurants':...",Paddy Reilly's Pub,"Irish hospitality, a cozy atmosphere, and excl...",Located in the heart of Zurich’s business dist...,Paddy Reilly's Pub,"Fish & chips, craft beer, and sports broadcast...","[Irish pub, Traditional interior, Shows live s...",False,2023-09-05T10:47,[],"[Su 13:00:00-12:00:00, Mo,Tu,We,Th 11:30:00-00...","{'addressCountry': 'CH', 'addressLocality': 'Z...","{'latitude': 47.372152, 'longitude': 8.534629}",BarOrPub


<div style="height:450px;display:inline-block;width:100%;background:#fafafa;"></div>

### Datastructure

<img src="dataStructure_OV.png" style="height:400px;">

The UML class notation presented here outlines the data structure after the transformation. This structure is designed to capture comprehensive information about restaurants in Zurich, including their attributes, relationships, and key data points.

**Restaurant Class** represents the individual restaurants. It includes attributes like unique identifier, licensing details, name, and more. It also contains different arrays and composes relationships with the "Address" and "Coordinates" classes.

**Address Class** contains address details, such as country, locality, postal code, street address, contact information (telephone, email), and website URL.

**Coordinates Class** aptures geographical coordinates (latitude and longitude) of restaurant locations. This is linked to the "Restaurant" class to provide geospatial information.

<div style="height:150px;display:inline-block;width:100%;background:#fafafa;"></div>

## Data analysis

### Type of restaurants

In the dataset, there are a total of nine distinct restaurant types, each assigned to different establishments. It's valuable to gain insights into the distribution of these restaurant types and understand how many restaurants belong to each category.

In [507]:
#count how many restaurant of each type are there
r = restaurants.aggregate([
    {"$project": {"type": 1}},
    {"$group": {"_id": "$type", "count": {"$sum": 1}}},
    {"$project": {"_id": 0, "name": "$_id", "count":1}},
    {"$sort": {"count": -1}},
 ])

pd.DataFrame(r)

Unnamed: 0,count,name
0,440,Restaurant
1,41,CafeOrCoffeeShop
2,18,BarOrPub
3,11,Winery
4,1,MusicVenue
5,1,LocalBusiness
6,1,NightClub
7,1,Store
8,1,FastFoodRestaurant


<div style="height:250px;display:inline-block;width:100%;background:#fafafa;"></div>

### Geospatial Analysis

The aim of this geospatial analysis, was to identify the restaurants located around Zürich HB within a maximum distance of 1 km. This would be particularly useful for individuals seeking dining options in the middle of the city. To provide even more valuable information the 10 closest restaurants to Zurich HB were identified based on their geographical coordinates.

In [508]:
# create a new field called location where the longitude and latitude are
# directly saved without nested array for further geospation analysis
pipeline = [
    {
        "$project": {
            "location": ["$geoCoordinates.longitude","$geoCoordinates.latitude"],
        }
    }
]

r = restaurants.aggregate(pipeline)

# update the collection with the new field
for doc in r:
    restaurants.update_one({"_id": doc["_id"]}, {"$set": doc})

In [509]:
# geospatial analysis
restaurants.create_index([("location", "2dsphere")])

#looking for restaurants that are just around Zurich HB with the maximum distance of 1000 meter (1km) 
r = restaurants.find(
    {
        "location":
        {
            "$near":
            {
                "$geometry": {"coordinates": [ 8.540294, 47.378309 ] },
                "$minDistance": 0, 
                "$maxDistance": 1000, #1000 meter
            }
        }
    }
)
count = 0
restaurant_type_counts = {}

#get how many restaurants fall into the filter above and how of each type are there
for doc in r:
    count += 1
    restaurant_type = doc.get("type")  
    if restaurant_type:
        restaurant_type_counts[restaurant_type] = restaurant_type_counts.get(restaurant_type, 0) + 1
print("There are around", count, "restaurants around Zurich HB.")
print("\n")

# print the count of each restaurant type
for restaurant_type, count in restaurant_type_counts.items():
    print(f"{restaurant_type}: {count}")


There are around 164 restaurants around Zurich HB.


CafeOrCoffeeShop: 13
Restaurant: 140
BarOrPub: 10
NightClub: 1


In [510]:
# calculate the distance between the 10 nearest restaurants and Zurich HB
pipeline = ( [
    {
        "$geoNear": {
            "near": {
                "coordinates": [ 8.540294, 47.378309 ]
            },
            "spherical": True,
            "distanceField": "calcDistance"
        }
    },
    {"$sort": {"calcDistance": 1}},
    {"$limit": 10},
])
r = restaurants.aggregate(pipeline)

# loop through the cursor to access the results

count = 1

for doc in r:
    print("-------------------------------")
    print(count)
    print("-------------------------------")
    print("Restaurant Name:", doc["name"])
    print("Distance: " + str(doc["calcDistance"]) + "km")
    print("Address: " + doc["address"]["streetAddress"] + ", " + str(doc["address"]["postalCode"]))
    count += 1
    

-------------------------------
1
-------------------------------
Restaurant Name: Bakery Bakery
Distance: 28.743295017047018km
Address: Museumstrasse 1, 8001
-------------------------------
2
-------------------------------
Restaurant Name: Brasserie Federal
Distance: 44.34378526590442km
Address: Bahnhofplatz 15, 8001
-------------------------------
3
-------------------------------
Restaurant Name: Restaurant Spitz
Distance: 95.94802547697877km
Address: Museumsstrasse 2, 8001
-------------------------------
4
-------------------------------
Restaurant Name: Rice Up!
Distance: 101.77203761025024km
Address: Halle Löwenstrasse, 8001
-------------------------------
5
-------------------------------
Restaurant Name: Café Gourmet – Hotel Schweizerhof
Distance: 164.40889058956975km
Address: Bahnhofplatz 7, 8001
-------------------------------
6
-------------------------------
Restaurant Name: La Soupière – Hotel Schweizerhof
Distance: 164.40889058956975km
Address: Bahnhofplatz 7, 8001
-----

<div style="height:350px;display:inline-block;width:100%;background:#fafafa;"></div>

### Top 10 postal codes

After focusing on restaurants around Zürich HB it would be interesting to look at the other areas. The goal was of this part was to pinpoint the postal codes with the highest concentration of restaurants, offering a valuable insight into where dining options are most abundant in the city.

In [511]:
# find the top 10 postal codes with the most restaurants
r = restaurants.aggregate([
    {"$project": {"address.postalCode": 1}},
    {"$group": {"_id": "$address.postalCode", "count": {"$sum": 1}}},
    {"$project": {"_id": 0, "name": "$_id", "count":1}},
    {"$sort": {"count": -1}},
    {"$limit": 10},
 ])

pd.DataFrame(r)

Unnamed: 0,count,name
0,160,8001
1,46,8005
2,39,8004
3,29,8008
4,20,8003
5,17,8002
6,16,8640
7,11,8050
8,11,8006
9,10,8032


<div style="height:250px;display:inline-block;width:100%;background:#fafafa;"></div>

### Vegan restaurants

Veganism is a widespread topic and lifestyle nowadays. Therefore, it would be interesting to determine how many restaurants contain the word "vegan" in their description, indicating that they may offer vegan options or cater to a vegan audience.

In [512]:
#find out how many restaurants have the word vegan in the descriptions field
pipeline = [
    {
        "$match": {
            "description": {
                "$regex": "vegan",
                "$options": "i"  #case-insensitive search
            }
        }
    },
    {
        "$group": {
            "_id": None,
            "count": {"$sum": 1}
        }
    },
    {
        "$project": {
            "_id": 0,
            "count": 1
        }
    }
]

result = restaurants.aggregate(pipeline)

for doc in result:
    print("No. of restaurants mentioning 'vegan' in the description field:", doc["count"])


No. of restaurants mentioning 'vegan' in the description field: 30


In [513]:
# show 10 of those restaurants that have the word vegan in their description

pipeline = [
    {
        "$match": {
            "description": {
                "$regex": "vegan",
                "$options": "i"  # Case-insensitive search
            }
        }
    },
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "description": 1,
            "address": 1
        }
    },
    {
        "$limit": 10
    }
]

result = restaurants.aggregate(pipeline)
count = 1

for doc in result:
    print(count)
    print("---------------------------------")
    print("Restaurant Name:", doc["name"])
    print("Address:", doc["address"]["streetAddress"] + ", " + str(doc["address"]["postalCode"]))
    count += 1

1
---------------------------------
Restaurant Name: Enzian – Vegan Bakery
Address: Binzmühlestrasse 41, 8050
2
---------------------------------
Restaurant Name: John Baker Bahnhofstrasse
Address: Bahnhofstrasse 9, 8001
3
---------------------------------
Restaurant Name: Bill’s Burger
Address: Hardturmstrasse 161, 8005
4
---------------------------------
Restaurant Name: Elmira Restaurant
Address: Limmatstrasse 254, 8005
5
---------------------------------
Restaurant Name: Veganitas
Address: Brauerstrasse 30, 8004
6
---------------------------------
Restaurant Name: LOI Bistro
Address: Limmatstrasse 268-270, 8005
7
---------------------------------
Restaurant Name: Now Restaurant
Address: Rolandstrasse 9, 8004
8
---------------------------------
Restaurant Name: Restaurant Chimy’s
Address: Neugasse 76, 8005
9
---------------------------------
Restaurant Name: Bakery Bakery
Address: Museumstrasse 1, 8001
10
---------------------------------
Restaurant Name: Fischerstube Zürihorn
Addre

<div style="height:600px;display:inline-block;width:100%;background:#fafafa;"></div>

### Open days

It's also intriguing to explore which days of the week most restaurants are open and, conversely, on which days individuals might encounter several closed establishments. This insight can be quite valuable for planning dining experiences accordingly.

In [516]:
# filter how many restaurants are open each day.
pipeline = [
    {
        "$match": {
            "opens": {"$exists": True, "$ne": []}
        }
    },
    {
        "$unwind": "$opens"
    },
    {
        "$group": {
            "_id": "$opens",
            "restaurantCount": {"$sum": 1}
        }
    },
    {
        "$sort": {"restaurantCount": -1}
    }
]

r = restaurants.aggregate(pipeline)
for doc in r:
    print(f"{doc['_id']}: {doc['restaurantCount']} restaurants")


Friday: 479 restaurants
Thursday: 475 restaurants
Wednesday: 470 restaurants
Saturday: 457 restaurants
Tuesday: 443 restaurants
Monday: 375 restaurants
Sunday: 319 restaurants


<div style="height:200px;display:inline-block;width:100%;background:#fafafa;"></div>

## Conclusions

With a remarkable 515 restaurants scattered across Zürich city, finding a place to dine in or grab a takeaway is hardly ever a concern. The majority of these restaurants are clustered around the city's central area, particularly within Postal Code 8001. This could be attributed to Zurich's status as a frequently visited and bustling hub. In this context, examining pricing as an indicator for further analysis would be intriguing. While 8001 Zurich boasts the highest number of restaurants, with a significant count of 160, neighboring 8005 Zurich holds the second spot with only 46 restaurants. The stark difference between these areas raises questions about the completeness of the dataset. An interesting observation lies in the 8050 Zurich area, specifically in Oerlikon, where only 11 restaurants can be found according to this dataset. This leads to further questions about the dataset's accuracy and coverage.

For vegans seeking dining options, Zurich may not be the most extensive choice, with only around 30 restaurants explicitly offering vegan options. However, it's also possible that restaurants not explicitly mentioning veganism in their descriptions may still offer vegan items on their menus, a nuance the current dataset cannot discern without additional information.

On weekdays, locating an open restaurant is usually hassle-free, as the majority of them maintain regular business hours. Sundays witness the fewest open establishments, closely followed by Mondays. Notably, there's a substantial disparity of 160 restaurants between Sundays and Fridays, with around 100 between Fridays and Mondays. Regardless of the day, Zurich city ensures an abundance of dining options for everyone.

In conclusion, Zurich's diverse restaurant landscape caters to a range of tastes and preferences, offering both locals and visitors a rich culinary experience within this vibrant city.

<div style="height:400px;display:inline-block;width:100%;background:#fafafa;"></div>

## Learnings

I thoroughly enjoyed the experience of working on this project. While I had previously encountered MongoDB during my bachelor's thesis, I hadn't had the opportunity to delve into it further. One of the most time-consuming and frustrating aspects of this module was the installation process. However, once everything was successfully installed, it functioned seamlessly without any complications.

In the initial stages of the project, I selected a different dataset from https://www.bfs.admin.ch/asset/de/je-d-19.03.01.02.03.02.02a. This dataset presented a unique challenge as it wasn't a typical JSON format but rather a JSON-STAT, which proved to be more complex to process and work with. Despite initially appearing to be a nested dataset, I soon realized it was not. After almost two weeks of intensive work ultimately, I decided to postpone this dataset for a future project when I wouldn't be constrained by tight timelines.

The quest for a suitable and engaging dataset was also a challenge. After rigorous searching and evaluation, I settled on the final dataset, which, fortunately, facilitated a smoother workflow. 

My experience with MongoDB was a valuable part of this project. It reinforced my understanding of its capabilities and the significance of efficient installation processes. Furthermore, it allowed me to explore the possibilities of data handling, database management, and data analysis. Working with MongoDB and its querying capabilities provided me with hands-on experience in managing and analyzing large datasets efficiently. 

In addition, this project enhanced my knowledge of geospatial analysis, where I learned to use MongoDB's geospatial features to extract valuable information from location-based data. I also explored the application of aggregate queries for gaining deeper insights into the dataset, such as identifying the top restaurant types and the days with the most open restaurants.

I also had my fair share of struggles and challenges throughout this project. There were moments when I encountered coding issues that proved to be more complex than anticipated, despite having a clear understanding of the desired outcome. In these instances, I turned to the lecture materials for guidance and best practices. However, there were occasions when the challenges seemed too complex, and in those moments, I found Chat GPT to be a valuable resource, providing me with helpful solutions and insights. These struggles, while sometimes frustrating, ultimately contributed to my learning and problem-solving skills. 

Overall, this project has been a rewarding learning experience, offering insights into problem-solving, dataset selection, and tool proficiency.
