# Eat Safe, Love

## Part 1: Database and Jupyter Notebook Set Up

Import the data provided in the `establishments.json` file from your Terminal. Name the database `uk_food` and the collection `establishments`.

Within this markdown cell, copy the line of text you used to import the data from your Terminal. This way, future analysts will be able to repeat your process.

e.g.: Import the dataset with `YOUR IMPORT TEXT HERE`

In [1]:
#mongoimport --type json -d uk_food -c establishments --drop --jsonArray establishments.json




In [2]:
# Import dependencies
from pymongo import MongoClient, GEOSPHERE
from pprint import pprint
import pymongo
from bson import ObjectId
from pymongo import UpdateMany

In [3]:
# Create an instance of MongoClient
client = pymongo.MongoClient(port=27017)

In [4]:
# confirm that our new database was created
client.list_database_names()

['admin', 'config', 'local', 'met', 'uk_food', 'your_database_name']

In [5]:
# assign the uk_food database to a variable name
db = client['uk_food']

In [6]:
# review the collections in our new database
collections = db.list_collection_names()

# Print the list of collections
print(collections)

['establishments']


In [7]:
# review a document in the establishments collection
establishments = db['establishments']
collections = establishments.find_one()

# Display the document using pprint
pprint(collections)

{'AddressLine1': 'The Bay',
 'AddressLine2': 'St Margarets Bay',
 'AddressLine3': 'Kent',
 'AddressLine4': '',
 'BusinessName': 'The Coastguard Inn',
 'BusinessType': 'Pub/bar/nightclub',
 'BusinessTypeID': 7843,
 'ChangesByServerID': 0,
 'Distance': 4587.347174863443,
 'FHRSID': 1034540,
 'LocalAuthorityBusinessID': 'PI/000078691',
 'LocalAuthorityCode': '182',
 'LocalAuthorityEmailAddress': 'publicprotection@dover.gov.uk',
 'LocalAuthorityName': 'Dover',
 'LocalAuthorityWebSite': 'http://www.dover.gov.uk/',
 'NewRatingPending': False,
 'Phone': '',
 'PostCode': 'CT15 6DY',
 'RatingDate': '2022-08-17T00:00:00',
 'RatingKey': 'fhrs_5_en-gb',
 'RatingValue': '5',
 'RightToReply': '',
 'SchemeType': 'FHRS',
 '_id': ObjectId('65254c07890531766ececa0b'),
 'geocode': {'latitude': '51.152225', 'longitude': '1.387974'},
 'links': [{'href': 'https://api.ratings.food.gov.uk/establishments/1034540',
            'rel': 'self'}],
 'meta': {'dataSource': None,
          'extractDate': '0001-01-01T0

In [8]:
# assign the collection to a variable
establishments = db['establishments']

## Part 2: Update the Database

1. An exciting new halal restaurant just opened in Greenwich, but hasn't been rated yet. The magazine has asked you to include it in your analysis. Add the following restaurant "Penang Flavours" to the database.

In [9]:
# Define the query to find the "Penang Flavours" restaurant
query = {"BusinessName": "Penang Flavours"}

# Find the restaurant data
restaurant_data = db.establishments.find_one(query)

# Print the restaurant data
print(restaurant_data)

None


In [10]:
# Define the query to find the "Penang Flavours" restaurant
query = {"BusinessName": "Penang Flavours"}

# Delete the restaurant
result = db.establishments.delete_one(query)

# Check if the deletion was successful
if result.deleted_count == 1:
    print("Successfully deleted the 'Penang Flavours' restaurant.")
else:
    print("Restaurant not found or deletion failed.")

Restaurant not found or deletion failed.


In [11]:
# Create a dictionary for the new restaurant data
new_restaurant_data = {
    "BusinessName":"Penang Flavours",
    "BusinessType":"Restaurant/Cafe/Canteen",
    "BusinessTypeID":"",
    "AddressLine1":"Penang Flavours",
    "AddressLine2":"146A Plumstead Rd",
    "AddressLine3":"London",
    "AddressLine4":"",
    "PostCode":"SE18 7DY",
    "Phone":"",
    "LocalAuthorityCode":"511",
    "LocalAuthorityName":"Greenwich",
    "LocalAuthorityWebSite":"http://www.royalgreenwich.gov.uk",
    "LocalAuthorityEmailAddress":"health@royalgreenwich.gov.uk",
    "scores":{
        "Hygiene":"",
        "Structural":"",
        "ConfidenceInManagement":""
    },
    "SchemeType":"FHRS",
    "geocode":{
        "longitude":"0.08384000",
        "latitude":"51.49014200"
    },
    "RightToReply":"",
    "Distance":4623.9723280747176,
    "NewRatingPending":True
}



In [12]:
# Check that the new restaurant was inserted this is the BusinessTypeID to be entered in 3 cells below
inserted_id = establishments.insert_one(new_restaurant_data).inserted_id

# Print the inserted document's _id
print(f"The new restaurant's document ID: {inserted_id}")

The new restaurant's document ID: 65254c61dad06392229c93a2


In [13]:
# Define the query to find the "Penang Flavours" restaurant
query = {"BusinessName": "Penang Flavours"}

# Find the restaurant data
restaurant_data = db.establishments.find_one(query)

# Print the restaurant data
print(restaurant_data)

{'_id': ObjectId('65254c61dad06392229c93a2'), 'BusinessName': 'Penang Flavours', 'BusinessType': 'Restaurant/Cafe/Canteen', 'BusinessTypeID': '', 'AddressLine1': 'Penang Flavours', 'AddressLine2': '146A Plumstead Rd', 'AddressLine3': 'London', 'AddressLine4': '', 'PostCode': 'SE18 7DY', 'Phone': '', 'LocalAuthorityCode': '511', 'LocalAuthorityName': 'Greenwich', 'LocalAuthorityWebSite': 'http://www.royalgreenwich.gov.uk', 'LocalAuthorityEmailAddress': 'health@royalgreenwich.gov.uk', 'scores': {'Hygiene': '', 'Structural': '', 'ConfidenceInManagement': ''}, 'SchemeType': 'FHRS', 'geocode': {'longitude': '0.08384000', 'latitude': '51.49014200'}, 'RightToReply': '', 'Distance': 4623.972328074718, 'NewRatingPending': True}


2. Find the BusinessTypeID for "Restaurant/Cafe/Canteen" and return only the `BusinessTypeID` and `BusinessType` fields.

In [14]:
# Find the BusinessTypeID for "Restaurant/Cafe/Canteen" and return only the BusinessTypeID and BusinessType fields
query = {
    "BusinessType": "Restaurant/Cafe/Canteen"}
projection = {
    "_id": 0,  # Exclude the _id field from the result
    "BusinessTypeID": 1,
    "BusinessType": 1
}

result = establishments.find_one(query, projection)

# Print the result
print(result)

{'BusinessType': 'Restaurant/Cafe/Canteen', 'BusinessTypeID': 1}


3. Update the new restaurant with the `BusinessTypeID` you found.

In [15]:
# Update the new restaurant with the correct BusinessTypeID
document_id = ObjectId('65254c61dad06392229c93a2')
query = {"_id": document_id}
updated_data = {"$set": {"BusinessTypeID":1}}
update_result = establishments.update_one(query, updated_data)
if update_result.modified_count > 0:
    print("Restaurant updated successfully.")
else:
    print("No restaurant was updated.")


Restaurant updated successfully.


4. The magazine is not interested in any establishments in Dover, so check how many documents contain the Dover Local Authority. Then, remove any establishments within the Dover Local Authority from the database, and check the number of documents to ensure they were deleted.

In [16]:
# Find all distinct LocalAuthorityName values
distinct_local_authorities = establishments.distinct("LocalAuthorityName")

# Print all the unique LocalAuthorityName values
for authority in distinct_local_authorities:
    print(authority)

Aberdeenshire
Arun
Ashford
Babergh
Barking and Dagenham
Basildon
Bexley
Braintree
Brentwood
Bromley
Broxbourne
Canterbury City
Castle Point
Chelmsford
City of London Corporation
Colchester
Dartford
Dorset
Dover
East Hertfordshire
East Renfrewshire
East Suffolk
Eastbourne
Epping Forest
Folkestone and Hythe
Gravesham
Greenwich
Hackney
Harlow
Hastings
Havering
Ipswich
Kensington and Chelsea
Knowsley
Lambeth
Lewes
Lewisham
Maidstone
Maldon
Medway
Mid Sussex
Newham
North Hertfordshire
North Norfolk
Orkney Islands
Pendle
Reading
Redbridge
Rochford
Rother
Rushmoor
Salford
Sevenoaks
Slough
South Cambridgeshire
Southend-On-Sea
Spelthorne
Stratford-on-Avon
Sunderland
Swale
Tandridge
Tendring
Thanet
Thurrock
Tonbridge and Malling
Tower Hamlets
Tunbridge Wells
Uttlesford
Waltham Forest
Wealden
West Suffolk
York


In [17]:
# Find how many documents have LocalAuthorityName as "Dover"
query = {"LocalAuthorityName": "Dover"}
# Count the number of documents matching the query
dover_document_count = establishments.count_documents(query)

# Print the count
print(f"Number of documents with LocalAuthorityName as 'Dover': {dover_document_count}")

Number of documents with LocalAuthorityName as 'Dover': 994


In [18]:
# Delete all documents where LocalAuthorityName is "Dover"
query = {"LocalAuthorityName": "Dover"}
# Delete all documents matching the query
delete_result = establishments.delete_many(query)

# Print the number of documents deleted
print(f"Number of documents deleted: {delete_result.deleted_count}")


Number of documents deleted: 994


In [19]:
# Check if any remaining documents include Dover
query = {"LocalAuthorityName": "Dover"}

remaining_count = establishments.count_documents(query)

if remaining_count == 0:
    print("No remaining documents include 'Dover' in LocalAuthorityName.")
else:
    print(f"There are {remaining_count} remaining documents with 'Dover' in LocalAuthorityName.")

No remaining documents include 'Dover' in LocalAuthorityName.


In [20]:
# Check that other documents remain with 'find_one'

remaining_document = establishments.find_one()

# Check if a document was found
if remaining_document:
    # Print the remaining document
    print(remaining_document)
else:
    print("No remaining documents found.")

{'_id': ObjectId('65254c07890531766ececcee'), 'FHRSID': 1043695, 'ChangesByServerID': 0, 'LocalAuthorityBusinessID': 'PI/000073616', 'BusinessName': 'The Pavilion', 'BusinessType': 'Restaurant/Cafe/Canteen', 'BusinessTypeID': 1, 'AddressLine1': 'East Cliff Pavilion', 'AddressLine2': 'Wear Bay Road', 'AddressLine3': 'Folkestone', 'AddressLine4': 'Kent', 'PostCode': 'CT19 6BL', 'Phone': '', 'RatingValue': '5', 'RatingKey': 'fhrs_5_en-gb', 'RatingDate': '2018-04-04T00:00:00', 'LocalAuthorityCode': '188', 'LocalAuthorityName': 'Folkestone and Hythe', 'LocalAuthorityWebSite': 'http://www.folkestone-hythe.gov.uk', 'LocalAuthorityEmailAddress': 'foodteam@folkestone-hythe.gov.uk', 'scores': {'Hygiene': 5, 'Structural': 5, 'ConfidenceInManagement': 5}, 'SchemeType': 'FHRS', 'geocode': {'longitude': '1.195625', 'latitude': '51.083812'}, 'RightToReply': '', 'Distance': 4591.765489457773, 'NewRatingPending': False, 'meta': {'dataSource': None, 'extractDate': '0001-01-01T00:00:00', 'itemCount': 0, 

5. Some of the number values are stored as strings, when they should be stored as numbers.

Use `update_many` to convert `latitude` and `longitude` to decimal numbers.

In [21]:
# Define an update operation to convert latitude and longitude to decimal numbers
update_operation = [
    {
        "$set": {
            "geocode.latitude": {
                "$toDouble": {
                    "$function": {
                        "body": "function(latitude) { return parseFloat(latitude); }",
                        "args": ["$geocode.latitude"],
                        "lang": "js"
                    }
                }
            },
            "geocode.longitude": {
                "$toDouble": {
                    "$function": {
                        "body": "function(longitude) { return parseFloat(longitude); }",
                        "args": ["$geocode.longitude"],
                        "lang": "js"
                    }
                }
            }
        }
    }
]

# Use update_many to apply the update operation to all documents in the collection
update_result = establishments.update_many({}, update_operation)

# Print the number of documents updated
print(f"Number of documents updated: {update_result.modified_count}")



Number of documents updated: 38786


In [22]:
# Define the query to find documents where "scores.Hygiene" is not of numeric type
query = {
    "scores.Hygiene": {
        "$not": {
            "$type": "number"
        }
    }
}

# Define the update operation to set "scores.Hygiene" to 0 for matching documents
update_operation = {
    "$set": {
        "scores.Hygiene": 0
    }
}

# Use update_many to apply the update operation to all documents matching the query
update_result = establishments.update_many(query, update_operation)

# Print the number of documents updated
print(f"Number of documents updated: {update_result.modified_count}")

Number of documents updated: 4377


In [23]:
# Define a filter to exclude documents with non-numeric "RatingValue"
filter_query = {
    "RatingValue": {"$regex": r"^\d+$"}  # Match only numeric values
}

# Define an update operation to convert "RatingValue" to integer numbers
update_operation = {
    "$set": {
        "RatingValue": {
            "$toInt": "$RatingValue"
        }
    }
}

# Use bulk_write to update the documents
update_requests = [UpdateMany(filter_query, [update_operation])]

# Execute the bulk write operation
bulk_result = establishments.bulk_write(update_requests)

# Print the number of documents updated
print(f"Number of documents updated: {bulk_result.modified_count}")

Number of documents updated: 34694


In [25]:
# Set non 1-5 Rating Values to Null
non_ratings = ["AwaitingInspection", "Awaiting Inspection", "AwaitingPublication", "Pass", "Exempt"]
establishments.update_many({"RatingValue": {"$in": non_ratings}}, {'$set': {"RatingValue": None}})


<pymongo.results.UpdateResult at 0x138f0ddd6c0>

In [26]:
# Change the data type from String to Integer for RatingValue

update_operation = [
    {
        "$set": {
            "RatingValue": {
                "$toInt": "$RatingValue"
            }
        }
    }
]

# Use update_many to apply the update operation to all documents in the collection
update_result = establishments.update_many(filter_query, update_operation)

# Print the number of documents updated
print(f"Number of documents updated: {update_result.modified_count}")


Number of documents updated: 0


In [27]:
# Check that the coordinates and rating value are now numbers

updated_document = establishments.find_one()

# Extract and print the geocode and RatingValue fields
geocode = updated_document.get("geocode")
rating_value = updated_document.get("RatingValue")

# Check the data types of the fields
geocode_type = type(geocode)
rating_value_type = type(rating_value)

print(f"Geocode field data type: {geocode_type}")
print(f"RatingValue field data type: {rating_value_type}")


Geocode field data type: <class 'dict'>
RatingValue field data type: <class 'int'>
