# Eat Safe, Love

## Notebook Set Up

In [14]:
# Import dependencies
from pymongo import MongoClient
from pprint import pprint
import pandas as pd

In [15]:
# Create an instance of MongoClient
mongo = MongoClient(port=27017)

In [16]:
# assign the uk_food database to a variable name
db = mongo['uk_food']

In [17]:
# review the collections in our database
collection_names = db.list_collection_names()
collection_names

['establishments']

In [18]:
# assign the collection to a variable
establishments = db['establishments']

## Part 3: Exploratory Analysis
Unless otherwise stated, for each question: 
* Use `count_documents` to display the number of documents contained in the result.
* Display the first document in the results using `pprint`.
* Convert the result to a Pandas DataFrame, print the number of rows in the DataFrame, and display the first 10 rows.

*Eat Safe, Love* has specific questions they want you to answer, which will help them find the locations they wish to visit and avoid:

Some notes to be aware of while you are exploring the dataset: *RatingValue refers to the overall rating decided by the Food Authority and ranges from 1-5. The higher the value, the better the rating. This field also includes non-numeric values such as 'Pass', where 'Pass' means that the establishment passed their inspection but isn't given a number rating. We will coerce non-numeric values to nulls during the database setup before converting ratings to integers. The scores for Hygiene, Structural, and ConfidenceInManagement work in reverse. This means, the higher the value, the worse the establishment is in these areas. Use the following questions to explore the database, and find the answers, so you can provide them to the magazine editors.*

- Display the first document in the results using pprint.

- Convert the result to a Pandas DataFrame, print the number of rows in the DataFrame, and display the first 10 rows.

- Which establishments have a hygiene score equal to 20?

- Which establishments in London have a RatingValue greater than or equal to 4?

- What are the top 5 establishments with a RatingValue of 5, sorted by lowest hygiene score, nearest to the new restaurant added, "Penang Flavours"?

- How many establishments in each Local Authority area have a hygiene score of 0? Sort the results from highest to lowest, and print out the top ten local authority areas.

### 1. Which establishments have a hygiene score equal to 20?

In [19]:
# Show an example document withint the establishments collection.
establishments.find_one()

{'_id': ObjectId('6491fee37e5bbe02062e5a07'),
 'FHRSID': 1043695,
 'ChangesByServerID': 0,
 'LocalAuthorityBusinessID': 'PI/000073616',
 'BusinessName': 'The Pavilion',
 'BusinessType': 'Restaurant/Cafe/Canteen',
 'BusinessTypeID': 1,
 'AddressLine1': 'East Cliff Pavilion',
 'AddressLine2': 'Wear Bay Road',
 'AddressLine3': 'Folkestone',
 'AddressLine4': 'Kent',
 'PostCode': 'CT19 6BL',
 'Phone': '',
 'RatingValue': 5,
 'RatingKey': 'fhrs_5_en-gb',
 'RatingDate': '2018-04-04T00:00:00',
 'LocalAuthorityCode': '188',
 'LocalAuthorityName': 'Folkestone and Hythe',
 'LocalAuthorityWebSite': 'http://www.folkestone-hythe.gov.uk',
 'LocalAuthorityEmailAddress': 'foodteam@folkestone-hythe.gov.uk',
 'scores': {'Hygiene': 5, 'Structural': 5, 'ConfidenceInManagement': 5},
 'SchemeType': 'FHRS',
 'geocode': {'longitude': 1.195625, 'latitude': 51.083812},
 'RightToReply': '',
 'Distance': 4591.765489457773,
 'NewRatingPending': False,
 'meta': {'dataSource': None,
  'extractDate': '0001-01-01T00:00

In [20]:
# Find the establishments with a hygiene score of 20
establishments_hygiene_20 = establishments.find({'scores.Hygiene': 20})

# Convert cursor object to a list
list_documents = list(establishments_hygiene_20)

# Convert the list to a pandas DataFrame
df_hygiene_20 = pd.DataFrame(list_documents)

# Display the number of rows in the DataFrame
row_count = df_hygiene_20.shape[0]
print(f"Number of establishments with hygiene score = 20: {row_count}")

# Use .head() to display the first few rows of the DataFrame
df_hygiene_20.head()

Number of establishments with hygiene score = 20: 1066


Unnamed: 0,_id,FHRSID,ChangesByServerID,LocalAuthorityBusinessID,BusinessName,BusinessType,BusinessTypeID,AddressLine1,AddressLine2,AddressLine3,...,LocalAuthorityWebSite,LocalAuthorityEmailAddress,scores,SchemeType,geocode,RightToReply,Distance,NewRatingPending,meta,links
0,6491fee37e5bbe02062e7539,110681,0,4029,The Chase Rest Home,Caring Premises,5,5-6 Southfields Road,Eastbourne,East Sussex,...,http://www.eastbourne.gov.uk/foodratings,Customerfirst@eastbourne.gov.uk,"{'Hygiene': 20, 'Structural': 20, 'ConfidenceI...",FHRS,"{'longitude': 0.27694, 'latitude': 50.769705}",,4613.888288,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'https://api.ratings...."
1,6491fee37e5bbe02062e78ba,612039,0,1970/FOOD,Brenalwood,Caring Premises,5,Hall Lane,Walton-on-the-Naze,Essex,...,http://www.tendringdc.gov.uk/,fhsadmin@tendringdc.gov.uk,"{'Hygiene': 20, 'Structural': 15, 'ConfidenceI...",FHRS,"{'longitude': 1.278721, 'latitude': 51.857536}",,4617.965824,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'https://api.ratings...."
2,6491fee37e5bbe02062e7bc4,730933,0,1698/FOOD,Melrose Hotel,Hotel/bed & breakfast/guest house,7842,53 Marine Parade East,Clacton On Sea,Essex,...,http://www.tendringdc.gov.uk/,fhsadmin@tendringdc.gov.uk,"{'Hygiene': 20, 'Structural': 20, 'ConfidenceI...",FHRS,"{'longitude': 1.15927, 'latitude': 51.789429}",,4619.656144,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'https://api.ratings...."
3,6491fee37e5bbe02062e7db2,172735,0,PI/000023858,Seaford Pizza,Takeaway/sandwich shop,7844,4 High Street,Seaford,East Sussex,...,http://www.lewes-eastbourne.gov.uk/,ehealth.ldc@lewes-eastbourne.gov.uk,"{'Hygiene': 20, 'Structural': 10, 'ConfidenceI...",FHRS,"{'longitude': 0.10202, 'latitude': 50.770885}",,4620.421725,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'https://api.ratings...."
4,6491fee37e5bbe02062e7dc1,172953,0,PI/000024532,Golden Palace,Restaurant/Cafe/Canteen,1,5 South Street,Seaford,East Sussex,...,http://www.lewes-eastbourne.gov.uk/,ehealth.ldc@lewes-eastbourne.gov.uk,"{'Hygiene': 20, 'Structural': 10, 'ConfidenceI...",FHRS,"{'longitude': 0.101446, 'latitude': 50.770724}",,4620.437179,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'https://api.ratings...."


### 2. Which establishments in London have a `RatingValue` greater than or equal to 4?

In [21]:
# Find the establishments with London as the Local Authority and has a RatingValue greater than or equal to 4.
# Define the query
query = {
    'LocalAuthorityName': {'$regex':'London'},
    'RatingValue': {'$gte': 4}
}

# Query the documents
establishments_london = db.establishments.find(query)

# Convert cursor object to a list
list_establishments_london = list(establishments_london)

# Convert the list to a pandas DataFrame
df_establishments_london = pd.DataFrame(list_establishments_london)

# Display the number of rows in the DataFrame
row_count = df_establishments_london.shape[0]
print(f"The number of establishments with RatingValue >= 4: {row_count}")

df_establishments_london["BusinessName"].head()



The number of establishments with RatingValue >= 4: 858


0                               Charlie's
1                 Mv City Cruises Erasmus
2               Benfleet Motor Yacht Club
3    Coombs Catering t/a The Lock and Key
4                Tilbury Seafarers Centre
Name: BusinessName, dtype: object

### 3. What are the top 5 establishments with a `RatingValue` rating value of 5, sorted by lowest hygiene score, nearest to the new restaurant added, "Penang Flavours"?

In [22]:
# Search within 0.01 degree on either side of the latitude and longitude.
degree_search = 0.01

# Define the query
query = {
    'BusinessName': "Penang Flavours"
}

# Search for the document
restaurant = db.establishments.find_one(query)

# Check if the restaurant exists
if restaurant:
    print("The 'Penang Flavours' restaurant exists in the collection.")
else:
    print("The 'Penang Flavours' restaurant does not exist in the collection.")

# Request Latitude and Longitude from restaurant
latitude = restaurant['geocode']['latitude']
longitude = restaurant['geocode']['longitude']


# Rating value must equal 5
# Define Query
query = {
    'geocode.latitude': {'$gte': latitude - degree_search, '$lte': latitude + degree_search},
    'geocode.longitude': {'$gte': longitude - degree_search, '$lte': longitude + degree_search},
    'RatingValue': 5
}

# Specify the sort order to sort the establishments by the hygiene score in ascending order.
sort = [('scores.Hygiene', 1)]

The 'Penang Flavours' restaurant exists in the collection.


In [23]:
# Execute the query and retrieve the top 5 establishments using the limit method
closest_establishments = list(db.establishments.find(query).sort(sort).limit(5))

# Print the BusinessName of the closest establishments
business_names = [establishment['BusinessName'] for establishment in closest_establishments]
print("BusinessNames of the closest establishments:")
print(business_names)

# Convert the query result to a Pandas DataFrame
df_ratingvalue_5 = pd.DataFrame(closest_establishments)

# Display establishments with RatingValue = 5
df_ratingvalue_5.head()

BusinessNames of the closest establishments:
['Howe and Co Fish and Chips - Van 17', 'Atlantic Fish Bar', 'Plumstead Manor Nursery', 'Iceland', 'Volunteer']


Unnamed: 0,_id,FHRSID,ChangesByServerID,LocalAuthorityBusinessID,BusinessName,BusinessType,BusinessTypeID,AddressLine1,AddressLine2,AddressLine3,...,LocalAuthorityWebSite,LocalAuthorityEmailAddress,scores,SchemeType,geocode,RightToReply,Distance,NewRatingPending,meta,links
0,6491fee37e5bbe02062ecd7b,1380578,0,14425,Howe and Co Fish and Chips - Van 17,Mobile caterer,7846,Restaurant And Premises 107A Plumstead High St...,,Plumstead,...,http://www.royalgreenwich.gov.uk,health@royalgreenwich.gov.uk,"{'Hygiene': 0, 'Structural': 0, 'ConfidenceInM...",FHRS,"{'longitude': 0.0925370007753372, 'latitude': ...",,4646.955931,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'http://api.ratings.f..."
1,6491fee37e5bbe02062ecdb3,694478,0,PI/000086506,Atlantic Fish Bar,Takeaway/sandwich shop,7844,35 Lakedale Road,,Plumstead,...,http://www.royalgreenwich.gov.uk,health@royalgreenwich.gov.uk,"{'Hygiene': 0, 'Structural': 0, 'ConfidenceInM...",FHRS,"{'longitude': 0.0912164, 'latitude': 51.4867296}",,4646.974612,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'http://api.ratings.f..."
2,6491fee37e5bbe02062ecdb0,695241,0,PI/000179088,Plumstead Manor Nursery,Caring Premises,5,Plumstead Manor School Old Mill Road,,Plumstead,...,http://www.royalgreenwich.gov.uk,health@royalgreenwich.gov.uk,"{'Hygiene': 0, 'Structural': 0, 'ConfidenceInM...",FHRS,"{'longitude': 0.0859939977526665, 'latitude': ...",,4646.97401,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'http://api.ratings.f..."
3,6491fee37e5bbe02062ecd6b,695223,0,PI/000178842,Iceland,Retailers - supermarkets/hypermarkets,7840,144 - 146 Plumstead High Street,,Plumstead,...,http://www.royalgreenwich.gov.uk,health@royalgreenwich.gov.uk,"{'Hygiene': 0, 'Structural': 5, 'ConfidenceInM...",FHRS,"{'longitude': 0.0924199968576431, 'latitude': ...",,4646.946071,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'http://api.ratings.f..."
4,6491fee37e5bbe02062ecd98,694609,0,PI/000116619,Volunteer,Pub/bar/nightclub,7843,130 - 132 Plumstead High Street,,Plumstead,...,http://www.royalgreenwich.gov.uk,health@royalgreenwich.gov.uk,"{'Hygiene': 0, 'Structural': 0, 'ConfidenceInM...",FHRS,"{'longitude': 0.09208, 'latitude': 51.4873437}",,4646.965635,False,"{'dataSource': None, 'extractDate': '0001-01-0...","[{'rel': 'self', 'href': 'http://api.ratings.f..."


### 4. How many establishments in each Local Authority area have a hygiene score of 0?

In [24]:
# Get the distinct LocalAuthorityName values
distinct_local_authorities = db.establishments.distinct("LocalAuthorityName")

# Get the count of distinct LocalAuthorityName values
count = len(distinct_local_authorities)

# Print the number of distinct LocalAuthorityName values
print(f"There are {count} Local Authority areas:")

# Create a pipeline:
# 1. Matches establishments with a hygiene score of 0
# 2. Groups the matches by Local Authority
# 3. Sorts the matches from highest to lowest
pipeline = [
    {"$match": {"scores.Hygiene": 0}},
    {"$group": {"_id": "$LocalAuthorityName", "Number of establishments with hygiene score of 0": {"$sum": 1}}},
    {"$sort": {"count": -1}}
]

# Execute the pipeline and store the results in a variable
hygiene_score_0 = list(db.establishments.aggregate(pipeline))

# Print the number of documents in the result
total_count = 0
for doc in hygiene_score_0:
    total_count += doc["Number of establishments with hygiene score of 0"]

# Print the total number of establishments
print("Total number of establishments with hygiene score 0:", total_count)

# Print the first 10 results
print("\nA sample of 10 of these Local Authority areas:")
count = 0
for doc in hygiene_score_0:
    print(doc)
    count += 1
    if count >= 10:
        break

There are 71 Local Authority areas:
Total number of establishments with hygiene score 0: 437502

A sample of 10 of these Local Authority areas:
{'_id': 'Southend-On-Sea', 'Number of establishments with hygiene score of 0': 15236}
{'_id': 'Basildon', 'Number of establishments with hygiene score of 0': 9412}
{'_id': 'Lewisham', 'Number of establishments with hygiene score of 0': 1690}
{'_id': 'East Suffolk', 'Number of establishments with hygiene score of 0': 4134}
{'_id': 'Hastings', 'Number of establishments with hygiene score of 0': 12064}
{'_id': 'Gravesham', 'Number of establishments with hygiene score of 0': 8814}
{'_id': 'Castle Point', 'Number of establishments with hygiene score of 0': 7306}
{'_id': 'Kensington and Chelsea', 'Number of establishments with hygiene score of 0': 26}
{'_id': 'Epping Forest', 'Number of establishments with hygiene score of 0': 4680}
{'_id': 'Brentwood', 'Number of establishments with hygiene score of 0': 7410}


In [25]:
# Convert the result to a Pandas DataFrame
# Define the pipeline stages
pipeline = [
    {"$match": {"scores.Hygiene": 0}},
    {"$group": {"_id": "$LocalAuthorityName", "count": {"$sum": 1}}},
    {"$sort": {"count": -1}}
]

# Execute the pipeline
result = db.establishments.aggregate(pipeline)

# Convert the result to a Pandas DataFrame
df_hygiene_0= pd.DataFrame(result)

# Display the number of rows in the DataFrame
print("Number of rows in the DataFrame:", len(df_hygiene_0))

# Display the first 10 rows of the DataFrame
print("First 10 rows of the DataFrame:")
df_hygiene_0.head(10)


Number of rows in the DataFrame: 55
First 10 rows of the DataFrame:


Unnamed: 0,_id,count
0,Thanet,29380
1,Greenwich,22932
2,Maidstone,18538
3,Newham,18486
4,Swale,17836
5,Chelmsford,17680
6,Medway,17472
7,Bexley,15782
8,Southend-On-Sea,15236
9,Tendring,14092
