# MongoDB and NASA API Integration - Data Analysis Course

**In this exercise** you will work with **MongoDB Atlas** and the **Nasa Asteroids - NeoWs API** to retrieve, manipulate, and analyze asteroid data. The goal is to practice connecting to a MongoDB Atlas cluster, using PyMongo to interact with the database, and utilizing the NASA API to retrieve data on asteroids. You will also learn how to manipulate data within MongoDB and perform simple analysis tasks such as filtering, updating, and visualizing data.

## Task 1: Connecting to MongoDB Atlas

1. **Create a MongoDB Atlas Account**:  
   - Go to [MongoDB Atlas](https://www.mongodb.com/cloud/atlas) and create an account.
   - Set up a cluster and note your connection URI.

2. **Connect to MongoDB Atlas Using Python**:  
   - Follow the instructions on [MongoDB Atlas documentation](https://www.mongodb.com/docs/atlas/connect/) to connect to your cluster from a Python environment using `PyMongo`.
   - You need to give network access to your IP address under Network Access tab in MongoDB Atlas.

3. **Test Your MongoDB Connection**:  
   - Verify your connection by checking the available databases and collections.  
   - Insert test data to verify your connection.  
   - Reference: [MongoDB CRUD Operations](https://www.mongodb.com/docs/manual/crud/)

## Task 2: Get Meteor Data from NASA API

1. **Get a NASA API Key**:  
   - Visit [NASA API](https://api.nasa.gov/) to generate an API key.

2. **Fetch Asteroid Data from the NASA API**:  
   - Use the [Asteroids - NeoWs API Documentation](https://api.nasa.gov/) to fetch data on asteroids for specific dates.

## Task 3: Database Manipulation with PyMongo

1. **Find the Largest Asteroid by Diameter**:  
   - Query the database to find the asteroid with the largest diameter.

2. **Find Asteroids with a Close Approach Distance Less Than 1 Million KM**:  
   - Use the database to find asteroids that passed within 1 million kilometers of Earth.  

3. **Find Asteroids with Specific Velocity Range**
    - Retrieve asteroids whose velocity is between 10 and 30 km/s and print their details.

3. **Update Meteor Data to Mark Hazardous Asteroids**:  
   - Update asteroids with a diameter greater than 0.1 km to be marked as hazardous.

4. **Aggregate Count of Asteroids by Hazardous Status**
    - Use the aggregation framework to group asteroids by their hazardous status (is_hazardous) and count how many asteroids are hazardous and non-hazardous.

5. **Delete All Asteroids Smaller Than 0.05 km**


## Task 4: Data Visualization

1. **Plot the Velocities of Hazardous vs. Non-Hazardous Asteroids**:  
   - Use `matplotlib` to visualize the velocities of hazardous and non-hazardous asteroids in a single histogram plot.  

**Submission Instructions**   
Your solution should be uploaded to Git, using the same GitHub project as before.

**Useful links**  

https://www.mongodb.com/atlas   
https://www.w3schools.com/python/python_mongodb_getstarted.asp    
https://api.nasa.gov/

In [None]:
%%capture
!pip install pymongo

Get your IP in order to give access to the virtual machine in MongoDB Atlas (under Network Access)

In [None]:
!curl ipecho.net/plain

35.230.169.251

In [None]:
mongo_uri = "mongodb+srv://evelynschuller6:datanalysis24@cluster0.wzjehli.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"

In [None]:
# @title Default title text
import pymongo
from pymongo import MongoClient

Creating the connection

In [None]:
client = MongoClient(mongo_uri)
db = client["test_database"]
collection = db["test_collection"]

Inserting some data. The collection and the database is created when data is added.

In [None]:
collection.insert_one({"name": "Test", "value": 42})

for doc in collection.find():
    print(doc)

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}


Check your current databases

In [None]:
print(client.list_database_names())

['sample_mflix', 'test_database', 'admin', 'local']


Check current collections

In [None]:
print(db.list_collection_names())

['test_collection']


Insert multiple data to your collection

In [None]:
mylist = [
  { "name": "Amy", "address": "Apple st 652"},
  { "name": "Hannah", "address": "Mountain 21"},
  { "name": "Michael", "address": "Valley 345"},
  { "name": "Sandy", "address": "Ocean blvd 2"},
  { "name": "Betty", "address": "Green Grass 1"},
  { "name": "Richard", "address": "Sky st 331"},
  { "name": "Susan", "address": "One way 98"},
  { "name": "Vicky", "address": "Yellow Garden 2"},
  { "name": "Ben", "address": "Park Lane 38"},
  { "name": "William", "address": "Central st 954"},
  { "name": "Chuck", "address": "Main Road 989"},
  { "name": "Viola", "address": "Sideway 1633"}
]
x = collection.insert_many(mylist)

print(x.inserted_ids)

[ObjectId('67ee880b6a31ae3dcbe76096'), ObjectId('67ee880b6a31ae3dcbe76097'), ObjectId('67ee880b6a31ae3dcbe76098'), ObjectId('67ee880b6a31ae3dcbe76099'), ObjectId('67ee880b6a31ae3dcbe7609a'), ObjectId('67ee880b6a31ae3dcbe7609b'), ObjectId('67ee880b6a31ae3dcbe7609c'), ObjectId('67ee880b6a31ae3dcbe7609d'), ObjectId('67ee880b6a31ae3dcbe7609e'), ObjectId('67ee880b6a31ae3dcbe7609f'), ObjectId('67ee880b6a31ae3dcbe760a0'), ObjectId('67ee880b6a31ae3dcbe760a1')]


Getting a single document

In [None]:
collection.find_one()

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}

Getting multiple documents

In [None]:
for x in collection.find():
  print(x)

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}
{'_id': ObjectId('67ee880b6a31ae3dcbe76096'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76097'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76098'), 'name': 'Michael', 'address': 'Valley 345'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609a'), 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609b'), 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609c'), 'name': 'Susan', 'address': 'One way 98'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609d'), 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609e'), 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609f'), 'name': 'William', 'address': 'Central st 954'}
{'_id': ObjectId('67ee8

Getting certain attributes of the data

In [None]:
for x in collection.find({}, {"name": 1, "address": 1}):
  print(x)

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76096'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76097'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76098'), 'name': 'Michael', 'address': 'Valley 345'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609a'), 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609b'), 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609c'), 'name': 'Susan', 'address': 'One way 98'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609d'), 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609e'), 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609f'), 'name': 'William', 'address': 'Central st 954'}
{'_id': ObjectId('67ee880b6a31ae3dcb

Getting certain data, using filtering

In [None]:
myquery = { "address": "Park Lane 38" }
mydoc = collection.find(myquery)

for x in mydoc:
  print(x)

{'_id': ObjectId('67ee880b6a31ae3dcbe7609e'), 'name': 'Ben', 'address': 'Park Lane 38'}


Filtering with regular expressions

In [None]:
myquery = { "name": { "$regex": "^S" } }
mydoc = collection.find(myquery)

for x in mydoc:
  print(x)

{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609c'), 'name': 'Susan', 'address': 'One way 98'}


Sorting the output

In [None]:
mydoc = collection.find().sort("name")
for x in mydoc:
  print(x)

{'_id': ObjectId('67ee880b6a31ae3dcbe76096'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609e'), 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609a'), 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': ObjectId('67ee880b6a31ae3dcbe760a0'), 'name': 'Chuck', 'address': 'Main Road 989'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76097'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76098'), 'name': 'Michael', 'address': 'Valley 345'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609b'), 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609c'), 'name': 'Susan', 'address': 'One way 98'}
{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609d'), 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': ObjectId('67ee880b

Update

In [None]:
myquery = { "address": "Valley 345" }
newvalues = { "$set": { "address": "Canyon 123" } }

collection.update_one(myquery, newvalues)
for x in collection.find():
  print(x)

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}
{'_id': ObjectId('67ee880b6a31ae3dcbe76096'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76097'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76098'), 'name': 'Michael', 'address': 'Canyon 123'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609a'), 'name': 'Betty', 'address': 'Green Grass 1'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609b'), 'name': 'Richard', 'address': 'Sky st 331'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609c'), 'name': 'Susan', 'address': 'One way 98'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609d'), 'name': 'Vicky', 'address': 'Yellow Garden 2'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609e'), 'name': 'Ben', 'address': 'Park Lane 38'}
{'_id': ObjectId('67ee880b6a31ae3dcbe7609f'), 'name': 'William', 'address': 'Central st 954'}
{'_id': ObjectId('67ee8

Getting limited number of results

In [None]:
myresult = collection.find().limit(5)

for x in myresult:
  print(x)

{'_id': ObjectId('67ee87d56a31ae3dcbe76095'), 'name': 'Test', 'value': 42}
{'_id': ObjectId('67ee880b6a31ae3dcbe76096'), 'name': 'Amy', 'address': 'Apple st 652'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76097'), 'name': 'Hannah', 'address': 'Mountain 21'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76098'), 'name': 'Michael', 'address': 'Canyon 123'}
{'_id': ObjectId('67ee880b6a31ae3dcbe76099'), 'name': 'Sandy', 'address': 'Ocean blvd 2'}


Delete one document

In [None]:
myquery = { "address": "Mountain 21" }

collection.delete_one(myquery)

DeleteResult({'n': 1, 'electionId': ObjectId('7fffffff0000000000000345'), 'opTime': {'ts': Timestamp(1743685776, 26), 't': 837}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1743685776, 26), 'signature': {'hash': b'\xe1\xb8_\x16;\x98\x12\x17\x85\xe9b\xc4}\xefX~\x81\xf2<\xc0', 'keyId': 7450200818835259411}}, 'operationTime': Timestamp(1743685776, 26)}, acknowledged=True)

Getting multiple documents

In [None]:
myquery = { "address": {"$regex": "^S"} }

collection.delete_many(myquery)

DeleteResult({'n': 2, 'electionId': ObjectId('7fffffff0000000000000345'), 'opTime': {'ts': Timestamp(1743685783, 25), 't': 837}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1743685783, 25), 'signature': {'hash': b'2\x15l\xaft<\\\x95 ,\xe2b\xb7\xf6p\xe9\x7f^k?', 'keyId': 7450200818835259411}}, 'operationTime': Timestamp(1743685783, 25)}, acknowledged=True)

Delete all

In [None]:
x = collection.delete_many({})
for x in collection.find():
  print(x)

Dropping the collection

In [None]:
collection.drop()

In [None]:
print(db.list_collection_names())

[]


Using the Nasa API

In [None]:
import requests
from urllib.request import urlretrieve
from pprint import PrettyPrinter
pp = PrettyPrinter()

In [None]:
nasa_api_key = "lQ9WYs1Eoe1oLCCVxhyoQHl75feD127FQj6Ehnbn"

In [None]:
def fetchAsteroidNeowsFeed(nasa_api_key, start_date = "2020-01-22", end_date = "2020-01-23"):
  URL_NeoFeed = "https://api.nasa.gov/neo/rest/v1/feed"
  params = {
      'api_key':nasa_api_key,
      'start_date':start_date,
      'end_date':end_date
  }
  response = requests.get(URL_NeoFeed,params=params).json()
  return response

In [None]:
neo_resp = fetchAsteroidNeowsFeed(nasa_api_key)

In [None]:
pp.pprint(neo_resp)

{'element_count': 41,
 'links': {'next': 'http://api.nasa.gov/neo/rest/v1/feed?start_date=2020-01-23&end_date=2020-01-24&detailed=false&api_key=lQ9WYs1Eoe1oLCCVxhyoQHl75feD127FQj6Ehnbn',
           'previous': 'http://api.nasa.gov/neo/rest/v1/feed?start_date=2020-01-21&end_date=2020-01-22&detailed=false&api_key=lQ9WYs1Eoe1oLCCVxhyoQHl75feD127FQj6Ehnbn',
           'self': 'http://api.nasa.gov/neo/rest/v1/feed?start_date=2020-01-22&end_date=2020-01-23&detailed=false&api_key=lQ9WYs1Eoe1oLCCVxhyoQHl75feD127FQj6Ehnbn'},
 'near_earth_objects': {'2020-01-22': [{'absolute_magnitude_h': 18.64,
                                        'close_approach_data': [{'close_approach_date': '2020-01-22',
                                                                 'close_approach_date_full': '2020-Jan-22 '
                                                                                             '20:52',
                                                                 'epoch_date_close_approach': 1

Put your code down below

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb+srv://evelynschuller6:datanalysis24@cluster0.wzjehli.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")
db = client["nasa_asteroids"]
asteroids = db["asteroids"]

# Loop through the "near_earth_objects" and insert into the collection
for date, asteroids_data in neo_resp["near_earth_objects"].items():
    for asteroid in asteroids_data:
        asteroids.insert_one(asteroid)

# Task 1: Find the Largest Asteroid by Diameter
largest_asteroid = asteroids.find_one(sort=[("estimated_diameter.meters.estimated_diameter_max", -1)])
print("Largest Asteroid by Diameter:", largest_asteroid)

Largest Asteroid by Diameter: {'_id': ObjectId('67ee8f206a31ae3dcbe760ac'), 'links': {'self': 'http://api.nasa.gov/neo/rest/v1/neo/2203015?api_key=lQ9WYs1Eoe1oLCCVxhyoQHl75feD127FQj6Ehnbn'}, 'id': '2203015', 'neo_reference_id': '2203015', 'name': '203015 (1999 YF3)', 'nasa_jpl_url': 'https://ssd.jpl.nasa.gov/tools/sbdb_lookup.html#/?sstr=2203015', 'absolute_magnitude_h': 18.64, 'estimated_diameter': {'kilometers': {'estimated_diameter_min': 0.4972273129, 'estimated_diameter_max': 1.1118340719}, 'meters': {'estimated_diameter_min': 497.2273129092, 'estimated_diameter_max': 1111.8340719346}, 'miles': {'estimated_diameter_min': 0.3089626326, 'estimated_diameter_max': 0.6908614491}, 'feet': {'estimated_diameter_min': 1631.3232572851, 'estimated_diameter_max': 3647.7496965659}}, 'is_potentially_hazardous_asteroid': False, 'close_approach_data': [{'close_approach_date': '2020-01-22', 'close_approach_date_full': '2020-Jan-22 20:52', 'epoch_date_close_approach': 1579726320000, 'relative_veloci

In [None]:
# Task 2: Find Asteroids with Close Approach Distance < 1 Million KM
asteroids_with_close_approach = asteroids.find({})

# Loop through each asteroid and check the miss distance
for asteroid in asteroids_with_close_approach:
    try:
        # Extract the miss distance in kilometers as a float
        miss_distance_km = float(asteroid["close_approach_data"][0]["miss_distance"]["kilometers"])

        # Check if the distance is less than 1 million kilometers
        if miss_distance_km < 1000000:
            print("Asteroid Name:", asteroid["name"])
            print("Close Approach Date:", asteroid["close_approach_data"][0]["close_approach_date"])
            print("Miss Distance (kilometers):", miss_distance_km)
            print("-----------------------------")
    except (KeyError, IndexError, ValueError) as e:
        # Handle any missing or incorrect data
        continue

Asteroid Name: (2020 BB5)
Close Approach Date: 2020-01-22
Miss Distance (kilometers): 265498.954420583
-----------------------------
Asteroid Name: (2020 BB1)
Close Approach Date: 2020-01-23
Miss Distance (kilometers): 985846.088715167
-----------------------------
Asteroid Name: (2020 BY11)
Close Approach Date: 2020-01-23
Miss Distance (kilometers): 774528.057017361
-----------------------------


In [None]:
# Task 3: Find Asteroids with a Specific Velocity Range (10 - 30 km/s)
asteroids_with_velocity_range = asteroids.find({})

# Loop through each asteroid and check the velocity
for asteroid in asteroids_with_velocity_range:
    try:
        # Extract the velocity in kilometers per second as a float
        velocity_km_per_s = float(asteroid["close_approach_data"][0]["relative_velocity"]["kilometers_per_second"])

        # Check if the velocity is between 10 km/s and 30 km/s
        if 10 <= velocity_km_per_s <= 30:
            print("Asteroid Name:", asteroid["name"])
            print("Close Approach Date:", asteroid["close_approach_data"][0]["close_approach_date"])
            print("Velocity (km/s):", velocity_km_per_s)
            print("-----------------------------")
    except (KeyError, IndexError, ValueError) as e:
        # Handle any missing or incorrect data
        continue

Asteroid Name: 203015 (1999 YF3)
Close Approach Date: 2020-01-22
Velocity (km/s): 13.6704714683
-----------------------------
Asteroid Name: (2007 AM)
Close Approach Date: 2020-01-22
Velocity (km/s): 29.1775994175
-----------------------------
Asteroid Name: (2015 MZ53)
Close Approach Date: 2020-01-22
Velocity (km/s): 16.0116618339
-----------------------------
Asteroid Name: (2020 BP8)
Close Approach Date: 2020-01-22
Velocity (km/s): 16.4924433067
-----------------------------
Asteroid Name: (2020 BP9)
Close Approach Date: 2020-01-22
Velocity (km/s): 15.9132921123
-----------------------------
Asteroid Name: (2020 BM13)
Close Approach Date: 2020-01-22
Velocity (km/s): 10.9924666083
-----------------------------
Asteroid Name: (2020 BX14)
Close Approach Date: 2020-01-22
Velocity (km/s): 21.1958363895
-----------------------------
Asteroid Name: (2020 BH15)
Close Approach Date: 2020-01-22
Velocity (km/s): 10.1120795949
-----------------------------
Asteroid Name: (2020 NG)
Close Approac

In [None]:
# Task 4: Update Meteor Data to Mark Hazardous Asteroids (diameter > 0.1 km)
asteroids_to_update = asteroids.find({})

for asteroid in asteroids_to_update:
    try:
        # Extract the maximum diameter in kilometers
        diameter_max = asteroid["estimated_diameter"]["kilometers"]["estimated_diameter_max"]

        # Check if the diameter is greater than 0.1 km
        if diameter_max > 0.1:
            # Update the 'is_potentially_hazardous_asteroid' field to True
            asteroids.update_one(
                {"_id": asteroid["_id"]},
                {"$set": {"is_potentially_hazardous_asteroid": True}}
            )
            print(f"Asteroid {asteroid['name']} marked as hazardous.")
        else:
            print(f"Asteroid {asteroid['name']} is not hazardous.")

    except (KeyError, IndexError, ValueError) as e:
        # Handle any missing or malformed data
        continue

Asteroid 203015 (1999 YF3) marked as hazardous.
Asteroid (2006 YU1) marked as hazardous.
Asteroid (2007 AM) marked as hazardous.
Asteroid (2015 MZ53) is not hazardous.
Asteroid (2015 XP169) is not hazardous.
Asteroid (2020 AY2) marked as hazardous.
Asteroid (2020 BU) is not hazardous.
Asteroid (2020 BB5) is not hazardous.
Asteroid (2020 BF6) is not hazardous.
Asteroid (2020 BP8) is not hazardous.
Asteroid (2020 BV8) is not hazardous.
Asteroid (2020 BP9) is not hazardous.
Asteroid (2020 BJ13) is not hazardous.
Asteroid (2020 BM13) is not hazardous.
Asteroid (2020 BX14) marked as hazardous.
Asteroid (2020 BH15) is not hazardous.
Asteroid (2020 CD) is not hazardous.
Asteroid (2020 NG) marked as hazardous.
Asteroid (2020 TK2) is not hazardous.
Asteroid (2020 UK6) is not hazardous.
Asteroid (2021 CE1) is not hazardous.
Asteroid (2023 GR1) is not hazardous.
Asteroid (2023 OX4) is not hazardous.
Asteroid (2015 JN1) marked as hazardous.
Asteroid (2015 SH) is not hazardous.
Asteroid (2018 BM5) 

In [None]:
# Task 5: Aggregate Count of Asteroids by Hazardous Status
pipeline = [
    {
        "$group": {
            "_id": "$is_potentially_hazardous_asteroid",  # Group by hazardous status (True/False)
            "count": {"$sum": 1}  # Count the number of occurrences for each group
        }
    },
    {
        "$sort": {"_id": 1}  # Sort by hazardous status (False first, then True)
    }
]

# Perform the aggregation query
hazardous_count = asteroids.aggregate(pipeline)

# Print the results
for result in hazardous_count:
    status = "Hazardous" if result["_id"] else "Not Hazardous"
    print(f"{status}: {result['count']} asteroids")

Not Hazardous: 31 asteroids
Hazardous: 10 asteroids


In [None]:
# Task 6: Delete All Asteroids Smaller Than 0.05 km (50 meters)
result = asteroids.delete_many({
    "estimated_diameter.kilometers.estimated_diameter_max": {"$lt": 0.05}
})

# Print the number of deleted documents
print(f"Deleted {result.deleted_count} asteroids smaller than 0.05 km")

Deleted 23 asteroids smaller than 0.05 km
