In [1]:
import pandas as pd 

data_path = "https://raw.githubusercontent.com/drguthals/learnwithdrg/main/OverTheMoon/sample-return/data/rocksamples.csv"
rock_samples = pd.read_csv(data_path)

In [2]:
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (g),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,125.8,88.36
1,10002,Apollo11,Soil,Unsieved,5629.0,93.73
2,10003,Apollo11,Basalt,Ilmenite,213.0,65.56
3,10004,Apollo11,Core,Unsieved,44.8,71.76
4,10005,Apollo11,Core,Unsieved,53.4,40.31


In [3]:
rock_samples.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2229 entries, 0 to 2228
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ID            2229 non-null   int64  
 1   Mission       2229 non-null   object 
 2   Type          2229 non-null   object 
 3   Subtype       2226 non-null   object 
 4   Weight (g)    2229 non-null   float64
 5   Pristine (%)  2229 non-null   float64
dtypes: float64(2), int64(1), object(3)
memory usage: 104.6+ KB


From this output, we can see that 2,229 samples were collected from the Apollo missions, and looking at a sample of the data, we can see that each row contains:

    ID - The unique ID used to keep track of the sample at NASA.
    Mission - The mission responsible for retrieving the sample.
    Type - The type of sample (type of rock or other classification).
    Subtype - A more specific type classification.
    Weight (g) - The original weight of the sample, in grams.
    Pristine (%) - The percentage of the sample that remains (some sample is used up during research).

Part 2. Data Cleansing It's important to establish what information you want to use to make a recommendation to the astronauts. In this case, the more publicly available information that could affect the amount of sample that is possible to return is the type of aircraft used.

Some considerations:

    The weight of the modules (parts of the rocket) that will carry the samples back to Earth.
    The total amount of weight that the rocket can lift above the atmosphere.

Rocket weight is often measured in kilograms, not grams. We should then manipulate the original data by converting the sample weights into kilograms for easier data analysis later. Let's modify the values in the Weight (g) column to be the same value multiplied by 0.001. Then rename the column to be more accurate by changing it to Weight (kg).


In [4]:
rock_samples['Weight (g)'] = rock_samples['Weight (g)'].apply(lambda x : x * 0.001)
rock_samples.rename(columns={'Weight (g)':'Weight (kg)'}, inplace=True)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36
1,10002,Apollo11,Soil,Unsieved,5.629,93.73
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56
3,10004,Apollo11,Core,Unsieved,0.0448,71.76
4,10005,Apollo11,Core,Unsieved,0.0534,40.31


Pandas, the Python library we are using to do our data analysis, has a structure called a DataFrame that is really effective for representing 2D data (think of a kind of table, like in an Excel worksheet).

Let's create a new DataFrame called missions that will be a summary of data for each of the six Apollo missions that brought samples back. Create a column in this DataFrame called Mission that has one row for each mission.

In [5]:
missions = pd.DataFrame()
missions['Mission'] = rock_samples['Mission'].unique()
missions.head()

Unnamed: 0,Mission
0,Apollo11
1,Apollo12
2,Apollo14
3,Apollo15
4,Apollo16


In [6]:
missions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Mission  6 non-null      object
dtypes: object(1)
memory usage: 176.0+ bytes


In [7]:
# Sum all the values in the Weight (kg) column for each unique value in the Mission column.
sample_total_weight = rock_samples.groupby('Mission')['Weight (kg)'].sum()

# Merge missions DataFrame with sample_total_weight series by using the Mission column as the index to merge on. 
# For each value in the Missions column in the missions DataFrame, find that same value in the sample_total_weight series, 
# and add the value from the series into the row as a new column in the DataFrame.
missions = pd.merge(missions, sample_total_weight, on='Mission')

# Rename the column
missions.rename(columns={'Weight (kg)':'Sample weight (kg)'}, inplace=True)

# Print out the entire missions DataFrame
missions

Unnamed: 0,Mission,Sample weight (kg)
0,Apollo11,21.55424
1,Apollo12,34.34238
2,Apollo14,41.83363
3,Apollo15,75.3991
4,Apollo16,92.46262
5,Apollo17,109.44402


In [8]:
sample_total_weight

Mission
Apollo11     21.55424
Apollo12     34.34238
Apollo14     41.83363
Apollo15     75.39910
Apollo16     92.46262
Apollo17    109.44402
Name: Weight (kg), dtype: float64

In [9]:
missions['Weight diff'] = missions['Sample weight (kg)'].diff()
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff
0,Apollo11,21.55424,
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


In [10]:
# Fix first row's NaN
missions['Weight diff'] = missions['Weight diff'].fillna(value=0)
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff
0,Apollo11,21.55424,0.0
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


Part 3. Add rocket weight data to the mission analysis

For lunar landings, there are two critical modules:

    Command module: The module that astronauts live in. When two astronauts are down on the surface of the Moon, the third astronaut stays in the command module. This module is returned to Earth.
    Lunar module: The module that detaches from the command module after it has reached orbit around the Moon. This module lands on the surface of the Moon and can carry two astronauts. When the lunar module returns from the surface to the command module, it leaves part of the base (the landing gear) on the surface of the Moon.

By using the NASA Space Science Data Coordinated Archive, we gathered information about each module used in each mission. Let's create six new columns, three for the lunar modules and three for the command modules:

    Module name
    Module mass
    Module mass diff

Fill in any NaN values with 0:


In [17]:
missions['Lunar module (LM)'] = ['Eagle (LM-5)', 'Intrepid (LM-6)', 'Antares (LM-8)', 'Falcon (LM-10)', 'Orion (LM-11)', 'Challenger (LM-12)']
missions['LM mass (kg)'] = [15103, 15235, 15264, 16430, 16445, 16456]
missions['LM mass diff'] = missions['LM mass (kg)'].diff()
missions['LM mass diff'] = missions['LM mass diff'].fillna(value=0)

missions['Command module (CM)'] = ['Columbia (CSM-107)', 'Yankee Clipper (CM-108)', 'Kitty Hawk (CM-110)', 'Endeavor (CM-112)', 'Casper (CM-113)', 'America (CM-114)']
missions['CM mass (kg)'] = [5560, 5609, 5758, 5875, 5840, 5960]
missions['CM mass diff'] = missions['CM mass (kg)'].diff()
missions['CM mass diff'] = missions['CM mass diff'].fillna(value=0)

missions


Unnamed: 0,Mission,Sample weight (kg),Weight diff,Lunar module (LM),LM mass (kg),LM mass diff,Command module (CM),CM mass (kg),CM mass diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15103,0.0,Columbia (CSM-107),5560,0.0
1,Apollo12,34.34238,12.78814,Intrepid (LM-6),15235,132.0,Yankee Clipper (CM-108),5609,49.0
2,Apollo14,41.83363,7.49125,Antares (LM-8),15264,29.0,Kitty Hawk (CM-110),5758,149.0
3,Apollo15,75.3991,33.56547,Falcon (LM-10),16430,1166.0,Endeavor (CM-112),5875,117.0
4,Apollo16,92.46262,17.06352,Orion (LM-11),16445,15.0,Casper (CM-113),5840,-35.0
5,Apollo17,109.44402,16.9814,Challenger (LM-12),16456,11.0,America (CM-114),5960,120.0


In [18]:
missions['Total weight (kg)'] = missions['LM mass (kg)'] + missions['CM mass (kg)']
missions['Total weight diff'] = missions['LM mass diff'] + missions['CM mass diff']
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff,Lunar module (LM),LM mass (kg),LM mass diff,Command module (CM),CM mass (kg),CM mass diff,Total weight (kg),Total weight diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15103,0.0,Columbia (CSM-107),5560,0.0,20663,0.0
1,Apollo12,34.34238,12.78814,Intrepid (LM-6),15235,132.0,Yankee Clipper (CM-108),5609,49.0,20844,181.0
2,Apollo14,41.83363,7.49125,Antares (LM-8),15264,29.0,Kitty Hawk (CM-110),5758,149.0,21022,178.0
3,Apollo15,75.3991,33.56547,Falcon (LM-10),16430,1166.0,Endeavor (CM-112),5875,117.0,22305,1283.0
4,Apollo16,92.46262,17.06352,Orion (LM-11),16445,15.0,Casper (CM-113),5840,-35.0,22285,-20.0
5,Apollo17,109.44402,16.9814,Challenger (LM-12),16456,11.0,America (CM-114),5960,120.0,22416,131.0
