# Assignment: MongoDB Python Programming
### Description : Using a python notebook, do the following:
#### 1. Load the data into a MongoDB collection
#### 2. Demonstrate an aggregation query on the data
####  3. Save the results from the query to either a JSON or BSON file format.

<p style="color:green;">This dataset pertains to Airbnb listings in New York City. The objective is to analyze various aspects of these listings, such as pricing, room types, and host information, in different neighborhood groups within the city. The analysis aims to uncover insights related to Airbnb accommodations in New York City.</p>


### Print current working directory

In [112]:
import os
print("Current Working Directory:", os.getcwd())

Current Working Directory: c:\Users\sudes\OneDrive\Desktop\BigData\bd-f23\W05


### Structure of dataset 

<p style="color:green;">Each entry includes details such as the host's ID and name, the neighborhood group (borough) where the listing is situated, the room type (e.g., Private room or Entire home/apt), the listing's price, minimum nights required for booking, the number of reviews it has received, and its availability throughout the year (out of 365 days). The dataset contains ten records, each representing a distinct Airbnb listing..</p>

In [113]:
import pymongo
import credentials

connection_string = f"mongodb+srv://{credentials.username}:{credentials.password}@cluster0.lfmmfz4.mongodb.net/?retryWrites=true&w=majority"


client = pymongo.MongoClient(connection_string)
ratings_db = client['hotel_ratings'] # create a database called ratings, to store ratings data
ratings_collection = ratings_db['rating'] # this creates a new collection called 'ratings_collection' in the database for which we have credentials and an address


### Now let us the load the data from the json file. 

In [114]:
import json 

with open ('rating_data.json', 'rb') as fin:
    rating_data = json.load(fin)

### Data insertion to the collection 

In [115]:
result = ratings_collection.insert_many(rating_data)

### Now let us find some insights by using the aggregation.

In [116]:
pipeline = [
    {
        "$group": {
            "_id": "$neighbourhood_group",
            "average_price": {"$avg": "$price"}
        }
    },
    {
        "$sort": {
            "average_price": -1
        }
    }
]

### Execute the aggregation query.

In [117]:
results = list(ratings_collection.aggregate(pipeline))

### Print the results

In [119]:
# Filter out records with None values
filtered_results = [result for result in results if result['_id'] is not None and result['average_price'] is not None]

for result in filtered_results:
    print(f"Neighborhood: {result['_id']}, Average Price: {result['average_price']:.2f}")

with open("avg_price.json", "w") as outfile:
    json.dump(filtered_results, outfile, indent=4)

Neighborhood: Queens, Average Price: 142.92
Neighborhood: Manhattan, Average Price: 124.80
Neighborhood: Brooklyn, Average Price: 111.00
