## U37870238
### Sri Venkata Likhitha Duggi

In this notebook, I will be doing:
1) Load the synthesized data (JSON) into a MongoDB collection

2) Demonstrate an aggregation query on the data

3) Save the results from the query to BSON or JSON file.

# Import packages

In [1]:
import pymongo # pymongo is a python driver for MongoDB
import credentials # load username and password from credentials.py
import json
import bson.json_util as bju

In [2]:
connection_string = f"mongodb+srv://{credentials.username}:{credentials.password}@cluster0.efqvsfo.mongodb.net/?retryWrites=true&w=majority"
client = pymongo.MongoClient(connection_string)

# Create the database and collection in MongoDB

In [3]:
assignment_db = client['assignment'] # this connects to an existing database called assignment or creates a new databse if assignment does not exist.
assignment_collection = assignment_db['assignment_data']# This is my collection name

# Insert data from the synthesized JSON file.

The below syntax is to delete any pre-existing documents in the collection.

In [4]:
assignment_collection.delete_many({})  

<pymongo.results.DeleteResult at 0x216c5d2e040>

### Data Source and Data Structure

The data used in this script is loaded from a JSON file named "Likhitha_dataset.json." 

The structure of the data in "Likhitha_dataset.json" is a collection of documents in JSON format, where each document represents information about a food item. Each document has the following key attributes:

"food_name": The name of the food item.
"category": The cuisine to which the food item belongs (e.g., "Mexican").
"calories": The number of calories in the food item.
"price_usd": The price of the food item in US dollars.

Few of the cuisines have same food_name but differ in their nutritional content and price.

After loading the JSON data from the file, the script inserts the data into a MongoDB collection named "assignment_data" within the "assignment" database.


In [5]:
# Open and load the JSON file
with open("C:/Users/KUSHAL/BIG DATA/MongoDB/Likhitha_dataset.json", 'r') as file:
    json_data = json.load(file)

# Insert the JSON data into the collection
assignment_collection.insert_many(json_data)

print("Data from JSON file inserted into the MongoDB collection.")


Data from JSON file inserted into the MongoDB collection.


# Aggregation Query

The objective of this script is to perform an aggregation operation on the data stored in the MongoDB collection. Specifically, it aims to find the total number of calories and average price in US dollars for food items that belong to the "Mexican" category. 

By changing the category, we can gain insights about other cuisines in the data

In [6]:
pipeline = [
    {
        "$match": {"category": "Mexican"}
    },
    { 
        "$group": {
            "_id": "$food_name",  
            "total_calories": {"$sum": "$calories"},
            "average_price_usd": {"$avg": "$price_usd"}
        }
    }
]

mexican_aggregation = assignment_collection.aggregate (pipeline, allowDiskUse = True)

The pipeline consists of the following stages:

$match: This initial stage filters the documents based on the "category" field, selecting only those documents where the category is "Mexican."

$group: In this stage, the script groups the selected documents by the "food_name" field. Within each group, it calculates two aggregate values:
"total_calories": The total sum of calories for all food items in the group.
"average_price_usd": The average price in US dollars for all food items in the group.

The result of this aggregation pipeline is a list of food items in the "Mexican" category, where each item is associated with its total calories and average price in USD. 

In [7]:
mexican_aggregation

<pymongo.command_cursor.CommandCursor at 0x216c4191400>

# Saving aggregation Results in a JSON file

The following script writes the aggregation result to a new JSON file named "result_mexican.json" for reporting.

In [8]:
fin = open("result_mexican.json","w")
fin.write(bju.dumps(list(mexican_aggregation), indent=2))
fin.close()


## Summary

In summary, this Python script loads food-related data from the JSON file, inserts it into a MongoDB collection, and then performs a MongoDB aggregation to obtain statistics about Mexican food items in terms of calories and average price. The result of this aggregation is stored in a separate JSON file for future reference or reporting purposes. 