# MongoDB notebook
Shahzeb Imtiaz
### Dataset
World Nuclear Power Reactors (Updated 7 months ago)
https://www.kaggle.com/datasets/tariqbashir/world-nuclear-power-reactors
### Collaborators
Tariq Mahmood(Owner)

### Introduction
This dataset provides a comprehensive stats about the world nuclear power generation resources. It provides detailed information about the operable nuclear power stations working globally. It comprises of location, country, reactor name, type, commissioning date, status, production of electricity per unit.

In [None]:
#autoreload
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Load the dataset

In [None]:
!pip install pymongo
!pip install pandas pymongo dnspython
!curl ifconfig.me

In [None]:
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
import pandas as pd
# Initiate the database connection

username = "<username>"
password = "<password>"
cluster_url = "<clusterurl>"
uri = f"mongodb+srv://{username}:{password}@{cluster_url}/?retryWrites=true&w=majority"
uri = "mongodb+srv://<username here>:<password here>@<cluster url here>/?retryWrites=true&w=majority" #use this if you are having issue with the above line in macOS

# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))

# Create or select a database
db = client["nuclear_power_reactors"]

# Load the dataset into a pandas DataFrame
data = pd.read_csv("../data/World_Nuclear_Power_Reactors 2.csv", encoding="latin1")

# Convert DataFrame to dictionary
data_dict = data.to_dict(orient="records")

# Insert data into MongoDB collection
collection = db["reactors"]
collection.insert_many(data_dict)

# Confirm insertion
print(collection.count_documents({}))

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Lists of Databses:

In [3]:
database_links = client.list_database_names()
print(database_links)

['employee_db', 'nuclear_power_reactors', 'sample_airbnb', 'sample_analytics', 'sample_flix', 'sample_geospatial', 'sample_guides', 'sample_mflix', 'sample_restaurants', 'sample_supplies', 'sample_training', 'sample_weatherdata', 'test', 'test-database', 'admin', 'local']


Collections of nuclear_power_reactors:

In [4]:
current_db = "nuclear_power_reactors"
db = client[current_db]

collection_links = db.list_collection_names()
print(collection_links)

['reactors']


### Questions and anwsers

### 1. Easy: Retrieve Documents

Retrieve one document from the reactors collection where the Country is Japan.

Print the result.
Store the _id of the reactor into a variable.

In [5]:
result = collection.find_one({"Country": "Japan"})

print(result)

reactor_id = result["_id"]
print("Reactor ID:", reactor_id)

{'_id': ObjectId('66114fd996a3a5e5fdc360fb'), 'Location': nan, 'Reactor Name': 'Genkai 3', 'Model': 'M (4-loop)', 'Reactor Type': 'PWR', 'Net Capacity (MWe)': '1127', 'Construction Start': nan, 'First Grid Connection': '1993-06', 'Status': 'Working', 'Country': 'Japan', 'Unnamed: 9': nan}
Reactor ID: 66114fd996a3a5e5fdc360fb


### 2. Easy: Count Documents

Count the total number of Reactor where the Status is 'Working'.

In [6]:
working_count = collection.count_documents({"Status": "Working"})

print("Total number of Reactors where the Status is 'Working':", working_count)

Total number of Reactors where the Status is 'Working': 313


### 3. Easy: Find Unique Values

Retrieve all unique values of the Reactor Type.

Print the unique values and count

In [7]:
unique_reactor_types = collection.aggregate([
    {"$group": {"_id": "$Reactor Type", "count": {"$sum": 1}}}
])

print("Unique values of Reactor Type and their counts:")
for reactor_type in unique_reactor_types:
    print(f"{reactor_type['_id']}: {reactor_type['count']}")

Unique values of Reactor Type and their counts:
LWGR: 4
nan: 16
SPIC: 1
560: 1
FBR: 4
1720: 2
PWR: 384
CNNC & Huaneng: 1
HTGR: 3
950: 4
720: 2
CNNC: 9
Huaneng: 1
32: 2
SFR: 1
820: 1
SGR: 1
1417: 1
PWRx4: 2
OCR: 1
HWGCR: 1
BWR: 89
CGN: 9
1057: 1
GCR: 10
CGN & SPI: 3
CNNC & Huadian: 3
PHWR: 48
CGN & Datang: 2


### 4. Easy: Simple Deletion

Delete all reactors from the reactors collection where the Status is 'Proposed'.

In [11]:
result = collection.delete_many({ "Status": "Proposed" })

deleted_count = result.deleted_count

print("Number of documents deleted:", deleted_count)


Number of documents deleted: 51


### 5. Moderate: Conditional Find

Retrieve all reactors from the reactors collection located in Canada and have a Reactor Type of PHWR.

Print the count of the documents.

In [53]:
query = {"Country": "Canada", "Reactor Type": "PHWR"}

# Count the documents matching the query
count = collection.count_documents(query)

# Print the count
print("Count of reactors in Canada with Reactor Type of PHWR:", count)

Count of reactors in Canada with Reactor Type of PHWR: 19


### 6. Moderate: Conditional Update

Update the status of the reactor where First Grid Connection is 2025, change the status as 'In the future'.


Count the updated documents.

In [54]:
query = {
    "First Grid Connection": "2025"
}

update_operation = {
    "$set": {
        "Status": "In future"
    }
}

result = collection.update_one(query, update_operation)
updated_count = result.modified_count

print(f"Updated {updated_count} documents.")

Updated 1 documents.


### 7. Moderate: Date Range Query

Retrieve all reactors from the reactors collection that started operation between January 1, 1970, and December 31, 1979.

Print the count of the documents.

In [55]:
from datetime import datetime

start_date = datetime(1970, 1, 1)
end_date = datetime(1979, 12, 31)
query = {
    "First Grid Connection": {
        "$gte": start_date.strftime("%Y-%m-%d"),
        "$lte": end_date.strftime("%Y-%m-%d")
    }
}

# Count documents matching the date range query
count = collection.count_documents(query)

# Print the count
print("Count of reactors started operation between January 1, 1970, and December 31, 1979:", count)


Count of reactors started operation between January 1, 1970, and December 31, 1979: 32


### 8. Moderate: Grouping and Counting

Group all reactors by Country and count the number of reactors in each country.

Print the result.

In [56]:
pipeline = [
    {"$group": {"_id": "$Country", "count": {"$sum": 1}}},
    {"$sort": {"count": -1}}
]

result = collection.aggregate(pipeline)

# Print the result
print("Number of reactors by country:")
for doc in result:
    print(f"{doc['_id']}: {doc['count']}")

Number of reactors by country:
USA: 145
Russia: 90
France: 56
Germany: 36
South Korea: 34
Japan: 33
India: 30
China: 29
Ukraine: 21
UK : 20
Canada: 19
Spain: 7
Turkey: 7
Sweden: 6
Pakistan: 6
Czech republic: 6
Balgaria: 5
Finland: 5
Baljium: 5
Salovakia: 5
Egypt: 4
Switzerland: 4
Italy: 4
Argentina: 3
Poland: 3
Brazil: 3
Jordan: 2
Bangladesh: 2
Kazakhstan: 2
Hungary: 2
Maxico: 2
Romania: 2
Lithuania: 2
Balarus: 2
nan: 1
Netherland: 1
Armenia: 1
Iran: 1
UAE: 1


### 9. Difficult: Aggregation Pipeline

Calculate the average Net Capacity (MWe) for Reactor Type in the reactors collection for country Pakistan and sort the result in descending order.

Print the result.

In [49]:
pipeline = [
    {"$match": {"Country": "Pakistan", "Net Capacity (MWe)": {"$exists": True}}},
    {"$group": {"_id": "$Reactor Type", "average_capacity": {"$avg": {"$toDouble": "$Net Capacity (MWe)"}}}},
    {"$sort": {"average_capacity": -1}}
]

result = collection.aggregate(pipeline)

for doc in result:
    print(doc)


{'_id': 'PWR', 'average_capacity': 543.6666666666666}


### 10 . Challenging: Conditional Aggregation
Calculate the percentage of reactors in each country that are currently under construction.

Use the $match stage to filter reactors based on their status.


Group reactors by country and count the number of reactors under construction and total reactors for each country.


Calculate the percentage of reactors under construction for each country.

In [60]:
pipeline = [
    {"$match": {"Status": "Under Construction"}},
    {"$group": {
        "_id": "$Country",
        "under_construction_count": {"$sum": 1},
        "total_count": {"$sum": 1}
    }},
    {"$project": {
        "_id": 1,
        "under_construction_percentage": {"$multiply": [{"$divide": ["$under_construction_count", "$total_count"]}, 100]}
    }}
]

result = collection.aggregate(pipeline)

for doc in result:
    print("Country:", doc["_id"])
    print("Percentage of reactors under construction:", doc["under_construction_percentage"])


Country: Brazil
Percentage of reactors under construction: 100.0
Country: Iran
Percentage of reactors under construction: 100.0
Country: Turkey
Percentage of reactors under construction: 100.0
Country: Bangladesh
Percentage of reactors under construction: 100.0
Country: Egypt
Percentage of reactors under construction: 100.0
Country: India
Percentage of reactors under construction: 100.0
