# Coursework 1

A Jupyter notebook has a series of cells, which are split into two different types.  There is a Markdown cell (like this one) which allows text to be input in either Markdown or HTML format.  The other type of cell is a code cell, which allows you to run the code written inside it.

When you run a code cell, you will not necessarily see any output, but if you do it will be below the cell.  To check whether it has completed running, look at the `In [ ]` next to the cell.  Whilst it's processing it will be filled with an asterisk (\* symbol), and when it's finished it will increase the number inside it.

Run the code in the following cell by pressing `Crtl` + `Enter`.  If you want to move the focus onto the cell after the code has finished running, use `Shift` + `Enter` instead

In [4]:
import json
import requests
from pymongo import MongoClient
from datetime import datetime
print("Success!")



#Base url for following questions
base_api_url = "http://api.ratings.food.gov.uk/"

Success!


# Question 1

Using the API (v2) at http://api.ratings.food.gov.uk, perform the following tasks with the [Python Requests](http://docs.python-requests.org/en/master/) library
## Question 1(a) 

Write a function `get_local_authorities()` which gets a list of all the local authorities with a parameter `data_format`, which accepts a string.
- If the parameter is XML it should return the data in XML format, if it is JSON, it should return in JSON format
- If the parameter is not one of those two strings the function should raise a `ValueError` with an appropriate error message
- You should return the `requests` object


In [39]:
def get_local_authorities(data_format):
    authorities_url = "Authorities/basic"

    if data_format == "XML":
        headers = {'x-api-version': 2, 'accept':'application/xml'}
    elif data_format == "JSON":
        headers = {'x-api-version': 2, 'accept':'application/json'}
    else:
        raise ValueError("ValueError: The argument must be one of these options:['XML','JSON']")
    request = requests.get(base_api_url+authorities_url,headers=headers)
    return request
    

    

#get_local_authorities("JSON").json()
#get_local_authorities("XML")
#get_local_authorities("yolo")

## Question 1(b)

Write a function `get_establishment_ids()` which accepts parameters `page_number` and `page_size` and returns a list of integers of the FHRSID of each establishment on the page.

- This function should gracefully handle any records which do not have a FHRSID attribute by putting `None` in the list instead of the FHRSID.
- If the `page_number` or `page_error` parameters are not integers the function should raise a ValueError
- If there are no more records left to collect, the function should return `None`


In [9]:
def get_establishment_ids(page_number, page_size):
    headers = {'x-api-version': 2, 'accept':'application/json'}
    establishement_url = "Establishments/basic/"+str(page_number)+"/"+str(page_size)
    if isinstance( page_number , int ) and  isinstance( page_size, int ):
        request = requests.get(base_api_url+establishement_url,headers=headers)
        list_establishments = []
        for fhrsid in request.json()["establishments"]:
            try:
                list_establishments.append(fhrsid['FHRSID'])
            except KeyError:
                list_establishments.append(None)
        if len(list_establishments) == 0:
            return None
        return list_establishments
    else:
        raise ValueError("ValueError: Both arguments must be integers ")


#get_establishment_ids(100,100)
#get_establishment_ids("a",100)

## Question 1(c)

Write a function `get_establishments`, which accepts the parameter `establishment_ids`, which is a list of the establishment IDs

- The function should iterate through the list of IDs, and download the detailed information for that ID from the API
- It should not assume that the caller will provide correct IDs.  If an ID does not exist, the function should not add it to the JSON object
- Use the provided stub function `insert_data(js)` to represent the insertion of data into a database.  The `js` parameter should be a JSON object, or a list of JSON objects.  This function should only be called once within the `get_establishments` function.
- The `insert_data` function should not be called if the JSON object is empty
- A `requests.exceptions.HTTPError` should be thrown for a 4XX or 5XX status code.

In [77]:
def insert_data(js):
    pass

def get_establishments(establishment_ids):
    headers = {'x-api-version': 2, 'accept':'application/json'}
    url_details = "Establishments/"
    list_of_details = []
    #print("Individual Requests")
    #print()
    for id in establishment_ids:
        response = requests.get(base_api_url+url_details+str(id),headers=headers)
        status = response.status_code
        if status >= 400 and status < 600:
            raise requests.exceptions.HTTPError("This ID: "+str(id)+" can not be found in the database")
        request = response.json()
        list_of_details.append(request)
    #print(list_of_details)
    insert_data(list_of_details)
    
#get_establishments([3,4,1])
#get_establishments([3,4,1,2])

# Question 2

Suppose you have completed collecting the data and are storing it in a MongoDB database.  This question will require you to query that data.  The database is called `health_data`, and contains collections for each local authority (e.g., `db.southampton`, `db.swansea`, `db.westminster`), as well as one for the whole of the UK (`db.uk`).  You can see all the collections by running `db.collection_names()`.  

Note that you will need to be on the ECS network to complete this question.

## Question 2(a)
Using the `MongoClient` class in `PyMongo`, Create a database object `db` with the following information.
- Server: svm-hf1g10-comp6235-temp.ecs.soton.ac.uk
- Port: 27017
- User: COMP6235
- Password: wkbbsdh8oDY2
- Database: health_data

In [56]:
"""
In this cell, the variable db should be defined, as a PyMongo database object connected to health_data.
"""

client = MongoClient("mongodb://COMP6235:wkbbsdh8oDY2@svm-hf1g10-comp6235-temp.ecs.soton.ac.uk:27017/health_data")
db = client.health_data

#db.collection_names()
#for i in db.slough.find():
#    print(i)
    

## Question 2(b)

Write a function `get_count`, which takes a PyMongo collection object as a parameter and returns the amount of businesses in the collection.

In [29]:
def get_count(collection):
    """
    Return an integer which gives the amount of unique businesses in the given collection
    """
    return collection.count()

#get_count(db.southampton)

## Question 2(c)

Write a function `get_rating_value_percentage` which returns the percentage of businesses which were awarded an overall `RatingValue` of 5?  The function should accept a parameter `collection` of type `Collection`, for which it should return the percentage for.

In [30]:
def get_rating_value_percentage(collection):
    """
    Return a float between 0 and 1 of the amount with a RatingValue of 5
    """
    if collection.count != 0:
        return collection.find({'RatingValue':5}).count()/collection.count()
    else:
        print("This collection is empty!")
        return 0
    
#get_rating_value_percentage(db.uk)
#get_rating_value_percentage(db.southampton)

## Question 2(d)

What was the earliest and latest dates that an inspection was carried out? Write a function which returns a dictionary in the form `{'earliest_date': 'YYYY-MM-DD', 'latest_date': 'YYYY-MM-DD'}`.

In [31]:
from datetime import datetime
def get_earliest_and_latest_dates(collection):
    earliest_date = collection.find()[0]['RatingDate']
    latest_date = collection.find()[0]['RatingDate']
    for business in collection.find():
        if business["RatingDate"] != None and business["RatingDate"] > latest_date:
            latest_date = business["RatingDate"]
        if business["RatingDate"] != None and business["RatingDate"] < earliest_date:
            earliest_date = business["RatingDate"]
    return {'earliest_date':str(earliest_date.year)+"-"+str(earliest_date.month)+"-"+str(earliest_date.day),'latest_date':str(latest_date.year)+"-"+str(latest_date.month)+"-"+str(latest_date.day)}
            
#get_earliest_and_latest_dates(db.southampton)

## Question 2(e)

Write a function `get_nearest_establishment_by_gps()` which returns the nearest eating establishment to the given GPS co-ordinates.  It should have two parameters:
- `collection` - A Python collection object
- `gps_dict` which is a dict in the format `{'lat': lat_value, 'lng': 'lng_value'}`

The `Geocode` field has a 2dsphere index which you will need for this answer.

In [33]:
def get_nearest_establishment_by_gps(collection, gps_dict):
    #print(collection.find()[0]['Geocode']['coordinates'])

    return collection.find_one( { 'Geocode':
                            { '$near' :
                               { '$geometry' :
                                  { 'type' : "Point" ,
                                    'coordinates' : [ gps_dict['lng'] , gps_dict['lat'] ] }
                      } } } )
    
#gps = {'lat': 50, 'lng': -1}
#get_nearest_establishment_by_gps(db.uk,gps)