# Homework 2: Exploring Solar System Bodies (Pandas introduction)
Welcome to Assignment 12!

In this assignment, we will analyze data about celestial bodies in the solar system using Python, NumPy, and Pandas. The goals of this assignment are to:

 - Open a simple dataset formatted as JSON using pandas.
 - Apply simple statistical analysis to real-world data.
 - Refine Python programming skills through hands-on practice.
 - Ensure you can run Python and Python notebook environments (e.g., Jupyter Notebook, JupyterLab, Collab, VSCode) and troubleshoot any setup issues.

A key part of this homework is verifying that you can successfully run Python notebooks. If you encounter any difficulties, seek help from the instructor or AIs. Additionally, use Slack to ask questions or share insights. If you see a classmate struggling, helping them out will be great for a collaborative learning environment (and may count extra points in engagement 😀).

In [1]:
# if you are running this notebook in your local machine,
# make sure you have all the dependencies installed
# uncomment the following lines to install the dependencies
# This may be needed if you are running this notebook in online
# environments such as Google Colab

!pip install numpy pandas

# also copy the data file to the same directory as this notebook
# and update the paths accordingly



### Instructions

1. Follow the instructions on how to setup your Python and Jupyter (or VSCode) environment and cloning or downloading our repository. Instructions can be found in the class notes.
2. Ensure that you have Python, Jupyter Notebook, and the necessary libraries installed (`NumPy` and `Pandas`).
3. Load the dataset `Datasets/sol_data.json` into a Pandas DataFrame.
4. Answer the questions below by writing Python code.
5. No plots or visualizations are required—your insights should come from code-based analysis and outputs.

### Dataset Overview
The dataset contains information about celestial objects, including:
- **isPlanet**: Indicates whether the object is a planet (`True` or `False`).
- **isDwarfPlanet**: Indicates whether the object is a dwarf planet (`True` or `False`).
- **orbit_type**: Classifies the object as "Primary" (planets) or "Secondary" (moons).
- Physical and orbital properties, such as **mass**, **density**, **meanRadius**, **gravity**, **sideralOrbit**, and more.


### Submission Guidelines

- Submit your completed notebook as a HTML export, or a PDF file.

To export to HTML, if you are on Jupyter, select `File` > `Export Notebook As` > `HTML`.

If you are on VSCode, you can use the `Jupyter: Export to HTML` command.
 - Open the command palette (Ctrl+Shift+P or Cmd+Shift+P on Mac).
    - Search for `Jupyter: Export to HTML`.
    - Save the HTML file to your computer and submit it via Canvas.

---

> **Hint:** If you are learning pandas, check out our tutorials or the official documentation:
> - [Pandas Getting started](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)
> - [Pandas DataFrame API Documentation](https://pandas.pydata.org/docs/reference/frame.html)
> - [Our lecture on Pandas](https://filipinascimento.github.io/usable_ai/panda_basics)
> 
> 
> **Using Generative AI Responsibly**
>
> You're welcome to use Generative AI to assist your learning, but focus on understanding the concepts rather than just solving the assignment. For example:
>
> - Instead of asking: `What's the code to count moons orbiting each planet?`
> - Try asking: `How can I use Pandas to group and count values? Can you provide examples? Can you explain the steps?`
>
> This way, you will learn how the solution works while building your skills. Remember to give context to the generative AI, so it can better assist you. Talk to the instructor and AIs if you have any questions or need insights.

In [2]:
import pandas as pd
import numpy as np

# Load the dataset
data = pd.read_json('/Users/rad/Desktop/Useable Ai/Assignments/HW2 Week 3/sol_data.json')
# The ../../ are needed to go back two levels in the directory structure.
# Note that the path is relative to the location of the notebook file. Double check
# if the path is correct based on your system
data.head()

Unnamed: 0,eName,isPlanet,isDwarfPlanet,semimajorAxis,perihelion,aphelion,eccentricity,inclination,density,gravity,...,orbits,bondAlbido,geomAlbido,RV_abs,p_transit,transit_visibility,transit_depth,massj,semimajorAxis_AU,grav_int
0,Moon,False,False,384400,363300,405500,0.0549,5.145,3.344,1.62,...,Earth,,,,1.811589,326.086108,2.2e-09,3.9e-05,0.00257,6.606324e+25
1,Phobos,False,False,9376,9234,9518,0.0151,1.075,1.9,0.0057,...,Mars,,,,74.272078,13368.973976,2.2e-09,0.0,6.3e-05,1.601437e+22
2,Deimos,False,False,23458,23456,23471,0.0002,1.075,1.75,0.003,...,Mars,,,,29.686035,5343.486231,2.2e-09,0.0,0.000157,5.792534e+20
3,Io,False,False,421800,0,0,0.004,0.036,3.53,1.79,...,Jupiter,,,,1.6552,297.93606,6.8425e-06,4.7e-05,0.00282,6.666188e+25
4,Europa,False,False,671100,0,0,0.009,0.466,3.01,1.31,...,Jupiter,,,,1.039939,187.188949,5.024e-06,2.5e-05,0.004486,1.415488e+25


### 1. General Information

- How many objects are in the dataset?
- How many are planets? How many are moons?


In [3]:
# Total number of objects
# Fill in code to calculate total number of objects
total_objects = data.shape[0]
print(f"Total number of objects: {total_objects}")

Total number of objects: 265


In [4]:
# Number of planets
planetsdf = data[data['isPlanet'] == True ].shape[0]
print(f"Number of planets in the dataset: {planetsdf}")

Number of planets in the dataset: 8


In [8]:
# Number of moons
# Fill in code to calculate number of moons
num_moons = data[data['orbit_type'] == 'Secondary'].shape[0]
print(f"Number of moons in the dataset: {num_moons}")

Number of moons in the dataset: 205


> **Hint**: By moon we mean a natural satellite of a planet or another object in the solar system. Take a look at the columns and see if you can identify the criteria for classifying an object as a moon. Ask the instructor or AIs for help if needed. 

### 2. Planets

- What is the mean density of all planets?
- Which planet has the highest surface gravity, and what is its gravity value?
- List all planets in descending order of their mass.


In [9]:
planets_df = data[data['isPlanet'] == True]

In [11]:
# Mean density of all planets
planets = data[(data['isPlanet'] == True)]

mean_density = planets['density'].mean()

print(f"Mean Density of all planets: {mean_density:.2f} g/cm³")

Mean Density of all planets: 3.13 g/cm³


In [12]:
# Planet with the highest surface gravity
max_gravity_planet = planets.loc[planets['gravity'].idxmax()]

print(f"Planet with the highest gravity: {max_gravity_planet['eName']} ({max_gravity_planet['gravity']} m/s²)")

Planet with the highest gravity: Jupiter (24.79 m/s²)


In [13]:
# Planets by descending mass
sorted_planets = planets.sort_values(by='mass_kg', ascending=False)

print("Planets sorted by descending mass:\n")
for _, row in sorted_planets.iterrows():
    print(f"{row['eName']}: {row['mass_kg']:.2e} kg")

Planets sorted by descending mass:

Jupiter: 1.90e+27 kg
Saturn: 5.68e+26 kg
Neptune: 1.02e+26 kg
Uranus: 8.68e+25 kg
Earth: 5.97e+24 kg
Venus: 4.87e+24 kg
Mars: 6.42e+23 kg
Mercury: 3.30e+23 kg


### 3. Moons (Satellites)
- How many moons orbit each planet? Present this as a table or dictionary.
- What is the average radius (meanRadius) of all moons?
- Compare the average surface gravity of moons to that of planets.


In [14]:
# Number of moons orbiting each planet
moons = data[data['orbit_type'] == 'Secondary']

moons_per_planet = moons['orbits'].value_counts()

for planet, count in moons_per_planet.items():
    moon_word = "moon" if count == 1 else "moons"
    print(f"{planet}: {count} {moon_word}")

Jupiter: 79 moons
Saturn: 65 moons
Uranus: 27 moons
Neptune: 14 moons
Pluto: 5 moons
Mars: 2 moons
136108 Haumea: 2 moons
87 Sylvia: 2 moons
216 Kleopatra: 2 moons
Earth: 1 moon
136199 Eris: 1 moon
45 Eugenia: 1 moon
90482 Orcus: 1 moon
243 Ida: 1 moon
50000 Quaoar: 1 moon
136472 Makemake: 1 moon


In [15]:
# Average radius of all moons
moons = data[data['orbit_type'] == 'Secondary']
average_moon_radius = moons['meanRadius'].mean()
print(f"Average radius of all moons: {average_moon_radius:.2f} km")

Average radius of all moons: 120.96 km


In [16]:
# Compare average surface gravity of moons vs. planets

moons = data[data['orbit_type'] == 'Secondary']
avg_moon_gravity = moons['gravity'].mean()

# Compute average gravity of planets
planets = data[data['isPlanet'] == True]
avg_planet_gravity = planets['gravity'].mean()

print(f"Average surface gravity of moons: {avg_moon_gravity:.2f} m/s²")
print(f"Average surface gravity of planets: {avg_planet_gravity:.2f} m/s²")

# Compare which is higher
if avg_moon_gravity > avg_planet_gravity:
    print("Moons have a higher average surface gravity than planets.")
else:
    print("Planets have a higher average surface gravity than moons.")

Average surface gravity of moons: 0.04 m/s²
Average surface gravity of planets: 10.17 m/s²
Planets have a higher average surface gravity than moons.


### 4. Orbital Properties

- Which object has the highest orbital eccentricity, and what is its value?
- Calculate the average semi-major axis (semimajorAxis) for planets and compare it to that of moons.
- Identify the moon with the shortest orbital period (sideralOrbit) and the planet it orbits.


In [17]:
# Highest orbital eccentricity
eccentric_object = data.loc[data["eccentricity"].idxmax()]

planet_name = eccentric_object['eName']
eccentricity_value = eccentric_object['eccentricity']
print(f"Object with the highest orbital eccentricity is {planet_name} with a orbital eccentricity of {eccentricity_value}")

Object with the highest orbital eccentricity is Nereid with a orbital eccentricity of 0.7512000000000001


In [18]:
# Average semi-major axis of planets vs. moons
moons = data[data['orbit_type'] == 'Secondary']
avg_moon_semi_major_axis = moons['semimajorAxis'].mean()


planets = data[data['isPlanet'] == True]
avg_planet_semi_major_axis = planets['semimajorAxis'].mean()

print(f"Average semi-major axis of moons: {avg_moon_semi_major_axis:.2f} km")
print(f"Average semi-major axis of planets: {avg_planet_semi_major_axis:.2f} km")

if avg_moon_semi_major_axis > avg_planet_semi_major_axis:
    print("Moons have a higher average semi-major axis than planets.")
else:
    print("Planets have a higher average semi-major axis than moons.")

Average semi-major axis of moons: 12257587.94 km
Average semi-major axis of planets: 1264715207.25 km
Planets have a higher average semi-major axis than moons.


In [19]:
# Moon with the shortest orbital period
shortest_orbit_moon_idx = data[data['orbit_type'] == 'Secondary']["sideralOrbit"].idxmin()
shortest_orbit_moon = data.loc[shortest_orbit_moon_idx]

planet_name = shortest_orbit_moon['orbits']
moon_name = shortest_orbit_moon['eName']

print(f"The moon with the shortest orbital period is {moon_name} and the planet it orbits is {planet_name}")

The moon with the shortest orbital period is Ferdinand and the planet it orbits is Uranus


### 5. Discovery Dates

- How many objects have recorded discovery dates?
- Which is the oldest discovered moon (except ours) for which we have recorded discovery dates, and when was it discovered?

> Look at the format of dates in the dataset. You will find NA values for objects without recorded discovery dates. Also some dates are just a year, while others are more precise. Complete dates are formatted as `DD/MM/YYYY` (e.g, 12/04/1997), while years are formatted as `YYYY`, e.g., `1997`. Finally some dates may have `??` in place of day or months, which should be cleaned up. For instance by converting `??/??/1997` to `01/01/1997` or `??/04/1997` to `01/04/1997`. 

> **Hint**: Pandas `.to_datetime()` does not support dates before 1600. I recommend to create a function to clean the dates and use the `.apply()` to run. For example, first ignore NA values, then convert the valid complete dates while handling the years by padding them to a full date format if needed (like Jan 1st). Alternatively, you can use pd.period.

In [20]:
# Example of how to parse and clean the strings for the assignment
def preprocess_dates(date_string):
    # convert to YYYY-MM-DD
    if pd.isna(date_string):
        return pd.NA
    
    # replace ?? by 01
    date_string = date_string.replace('??', '01')

    # add 01/01 if only year is provided
    if len(date_string) == 4:
        date_string = '01/01/' + date_string
    
    # transform to YYYY-MM-DD
    date_splitted = date_string.split('/')

    # but only if the string has 3 parts (day, month, year)
    if len(date_splitted) == 3:
        day = date_splitted[0]
        month = date_splitted[1]
        year = date_splitted[2]
        return f"{year}-{month}-{day}"
        # or using pandas Period (pd.Period)
        # return pd.Period(year=int(year), month=int(month), day=int(day), freq="D")
    else:
        return pd.NA

data['parsedDiscoveryDate'] = data['discoveryDate'].apply(preprocess_dates)

In [21]:
# Objects with discovery dates
num_discovered_objects = data['parsedDiscoveryDate'].notna().sum()
num_discovered_objects

256

In [22]:
# Oldest discovered moon
moons_with_dates = data[(data['orbit_type'] == 'Secondary') & data['discoveryDate'].notna()]

oldest_moon = moons_with_dates.loc[moons_with_dates['discoveryDate'].idxmin()]

moon_name = oldest_moon['eName']
discovery_date = oldest_moon['discoveryDate']

print(f"The oldest discovered moon is {moon_name}, discovered on {discovery_date}.")

The oldest discovered moon is Helene, discovered on 01/03/1980.


### 6. Advanced Analysis

- Calculate the average density of moons that orbit planets with a mass greater than Earth's mass (`5.97e24 kg`).
- Group all objects by their `orbit_type` and compute the average orbital eccentricity for each group.
- Identify the top 3 moons with the highest escape velocity (escape).


In [23]:
# Average density of moons orbiting planets with mass > Earth
earth_mass = data[(data['isPlanet'] == True) & (data['eName'] == 'Earth')]['mass_kg'].values[0]

massive_planets = data[(data['isPlanet'] == True) & (data['mass_kg'] > earth_mass)]

massive_planet_names = massive_planets['eName'].unique()

moons_orbiting_massive_planets = data[(data['orbit_type'] == 'Secondary') & (data['orbits'].isin(massive_planet_names))]

average_moon_density = moons_orbiting_massive_planets['density'].mean()

print(f"Average density of moons orbiting planets more massive than Earth: {average_moon_density:.2f} g/cm³")

Average density of moons orbiting planets more massive than Earth: 1.06 g/cm³


In [24]:
# Average orbital eccentricity by orbit_type
avg_eccentricity_by_type = data.groupby('orbit_type')['eccentricity'].mean()

print("Average Orbital Eccentricity by Orbit Type:\n")
for orbit_type, eccentricity in avg_eccentricity_by_type.items():
    print(f"  {orbit_type}: {eccentricity:.6f}")


Average Orbital Eccentricity by Orbit Type:

  Primary: 0.026622
  Secondary: 0.182512


In [25]:
# Top 3 moons with highest escape velocity
moons = data[data['orbit_type'] == 'Secondary']

top_moons_escape_velocity = moons.nlargest(3, 'escape')[['eName', 'escape', 'orbits']]

print("Top 3 moons with the highest escape velocity:\n")
for index, row in top_moons_escape_velocity.iterrows():
    print(f"{row['eName']} (orbits {row['orbits']}): {row['escape']:.2f} m/s")

Top 3 moons with the highest escape velocity:

Moon (orbits Earth): 2380.00 m/s
Phobos (orbits Mars): 11.39 m/s
Deimos (orbits Mars): 5.56 m/s


### 7. Extra questions

1. How many moons have a mass less than 10% of Earth's moon? What percentage of all moons does this represent?
2. Calculate the ratio of moons to planets in the dataset. Which planet has the highest number of moons relative to its mass?
3. Group moons by their host planet and calculate the average density for each group. Which planet hosts moons with the highest average density?

In [57]:
# Moons with a mass less than Earth's moon and percentage
earth_moon_mass = data[(data['orbit_type'] == 'Secondary') & (data['orbits'] == 'Earth')]['mass_kg'].values[0]

lighter_moons = data[(data['orbit_type'] == 'Secondary') & (data['mass_kg'] < earth_moon_mass)]

num_lighter_moons = lighter_moons.shape[0]

total_moons = data[data['orbit_type'] == 'Secondary'].shape[0]

percentage_lighter_moons = (num_lighter_moons / total_moons) * 100 if total_moons > 0 else 0

print(f"Number of moons with a mass less than Earth's Moon: {num_lighter_moons}")
print(f"Percentage of such moons: {percentage_lighter_moons:.2f}%")

Number of moons with a mass less than Earth's Moon: 200
Percentage of such moons: 97.56%


In [26]:
# Ratio of moons to planets and planet with highest moon to mass ratio

NameError: name 'num_planets' is not defined

In [27]:
all_planets = data[data['isPlanet'] == True]['eName'].unique()

moons_filtered = moons[moons['orbits'].isin(all_planets)]

average_moon_density_by_planet = moons_filtered.groupby('orbits')['density'].mean()

print("Average Density of Moons per Planet:\n")
for planet in all_planets:
    if planet in average_moon_density_by_planet:
        print(f"{planet}: {average_moon_density_by_planet[planet]:.2f} g/cm³")
    else:
        print(f"{planet}: No moons")
# Average density of moons per planet
all_planets = data[data['isPlanet'] == True]['eName'].unique()

moons_filtered = moons[moons['orbits'].isin(all_planets)]

average_moon_density_by_planet = moons_filtered.groupby('orbits')['density'].mean()

print("Average Density of Moons per Planet:\n")
for planet in all_planets:
    if planet in average_moon_density_by_planet:
        print(f"{planet}: {average_moon_density_by_planet[planet]:.2f} g/cm³")
    else:
        print(f"{planet}: No moons")


Average Density of Moons per Planet:

Uranus: 1.09 g/cm³
Neptune: 1.07 g/cm³
Jupiter: 1.11 g/cm³
Mars: 1.83 g/cm³
Mercury: No moons
Saturn: 0.99 g/cm³
Earth: 3.34 g/cm³
Venus: No moons
Average Density of Moons per Planet:

Uranus: 1.09 g/cm³
Neptune: 1.07 g/cm³
Jupiter: 1.11 g/cm³
Mars: 1.83 g/cm³
Mercury: No moons
Saturn: 0.99 g/cm³
Earth: 3.34 g/cm³
Venus: No moons
