# Project : Over the Moon

by DARJYO, 10 Sep 2022



This is the project notebook for the Microsoft's Explore space with Python.


Though NASA's Astromaterials Acquisition and Curation Office (https://curator.jsc.nasa.gov/)  has a list of all extraterrestrial samples from the Moon and elsewhere, this Project focuses on those collected from the Moon. Across the six Apollo missions that landed humans on the Moon, and the three lunar missions that landed robotic probes on the Moon, they have successfully returned 383 kilograms (over 800 pounds) of rocks, soil, core samples. All of these samples have been categorized and photographed and are available to view on the NASA Lunar Sample and Photo Catalog (https://curator.jsc.nasa.gov/lunar/samplecatalog/index.cfm).


This Project begins to explore how you can use data and a bit of Python coding to come up with an understanding of what we have, and make recommendations for what the next pair of astronauts should look for when they land on the Moon as part of the Artemis program in 2024.

The following questions will be answered:
   - Gathered information about samples brought back from the Moon via the Apollo missions.
   - Acquired data about the types of spacecraft and rockets used for the Apollo and upcoming Artemis missions.
   - Compiled DataFrames, or tables, of that data, which tells stories and provides insights.
   - Created a prediction of how much sample weight could be returned from the Artemis missions.
   - Made a recommendation for the amount and types of rocks the Artemis astronauts should focus their efforts on, based on the rocks that are currently being used for research here on Earth.





# The Apollo program

The Apollo program focused on using the Saturn V rocket to send humans into space and onto the Moon. The Saturn V (https://www.nasa.gov/centers/johnson/rocketpark/saturn_v.html) rocket that was used in the Apollo program is known as a three-stage rocket. This means that the rocket has three parts, each of which burns at a different times to achieve a different goal.

The first stage is the main thrust portion that gets the rocket to about 68 kilometers into the sky, and then it falls away back to Earth, making the rocket significantly lighter. The second stage starts burning its engines until the rocket nearly reaches Earth's orbit and likewise falls back down to Earth. The final stage gets the spacecraft into Earth's orbit and thrusts it toward the Moon.

The Apollo program and Saturn V is that for lunar landings there are two important modules:

- Command module: The module that astronauts live in. When two astronauts are down on the surface of the Moon, the third astronaut stays in the 
  command module. This module is returned to Earth.
- Lunar module: The module that detaches from the command module after it has reached orbit around the Moon. This module lands on the surface of the Moon and can carry two astronauts. When the lunar module returns from the surface to the command module, it leaves part of the base (the landing gear) on the surface of the Moon.


The modules are critical parts of the ship because they are designed precisely to ensure that the astronauts can enter the Moon's orbit, orbit the Moon, land on the Moon, launch from the Moon, and return safely to Earth. The amount of space and weight on each of these modules is precise to ensure the safety and success of the mission. It is fair to conclude that the specifications around these modules affect the amount of mineral samples that can be collected, because the samples have to be carried on each of these modules before returning to Earth.

# Getting the data

The data explored during this Project is a file full of all the samples collected from the six Apollo missions that landed on the Moon. Right-click on the following URL and choose 'Open Link in New Window' (or similar, depending on your browser):

https://curator.jsc.nasa.gov/lunar/samplecatalog/index.cfm

You can copy each data directly from the browser to a text editor like Notepad or TextEdit, to obtain a single file with as many  as you wish or all.

Now load the CSV file into a dataframe making sure that any extra spaces are skipped:

In [1]:
import pandas as pd

In [2]:
rock_samples = pd.read_csv('data/rocksamples.csv') 

In [3]:
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (g),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,125.8,88.36
1,10002,Apollo11,Soil,Unsieved,5629.0,93.73
2,10003,Apollo11,Basalt,Ilmenite,213.0,65.56
3,10004,Apollo11,Core,Unsieved,44.8,71.76
4,10005,Apollo11,Core,Unsieved,53.4,40.31


In [4]:
rock_samples.info

<bound method DataFrame.info of          ID   Mission     Type   Subtype  Weight (g)  Pristine (%)
0     10001  Apollo11     Soil  Unsieved      125.80         88.36
1     10002  Apollo11     Soil  Unsieved     5629.00         93.73
2     10003  Apollo11   Basalt  Ilmenite      213.00         65.56
3     10004  Apollo11     Core  Unsieved       44.80         71.76
4     10005  Apollo11     Core  Unsieved       53.40         40.31
...     ...       ...      ...       ...         ...           ...
2224  79528  Apollo17  Breccia  Regolith        2.38        100.00
2225  79529  Apollo17  Breccia  Regolith        1.84        100.00
2226  79535  Apollo17  Breccia  Regolith        1.69        100.00
2227  79536  Apollo17  Breccia  Regolith        1.66        100.00
2228  79537  Apollo17  Breccia  Regolith        1.05        100.00

[2229 rows x 6 columns]>

In [5]:
rock_samples.dtypes

ID                int64
Mission          object
Type             object
Subtype          object
Weight (g)      float64
Pristine (%)    float64
dtype: object

Modifying the values in the Weight (g) column to be the same value multiplied by 0.001. Then modifying the name of the column to be more accurate by changing it to Weight (kg).

In [6]:
rock_samples['Weight (g)'] = rock_samples['Weight (g)'].apply(lambda x : x * 0.001)
rock_samples.rename(columns={'Weight (g)':'Weight (kg)'}, inplace=True)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36
1,10002,Apollo11,Soil,Unsieved,5.629,93.73
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56
3,10004,Apollo11,Core,Unsieved,0.0448,71.76
4,10005,Apollo11,Core,Unsieved,0.0534,40.31


# Creating a new DataFrame

The rock_samples DataFrame has a row for every sample that was collected but, as mentioned earlier, to understand the rock samples in total as they relate to the specific rockets that brought them back.

Creating a new DataFrame called missions that will be a summary of data for each of the six Apollo missions that brought samples back. Creating a column in this DataFrame called Mission that has one row for each mission.

In [7]:
missions = pd.DataFrame()
missions['Mission'] = rock_samples['Mission'].unique()
missions.head()

Unnamed: 0,Mission
0,Apollo11
1,Apollo12
2,Apollo14
3,Apollo15
4,Apollo16


In [8]:
missions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Mission  6 non-null      object
dtypes: object(1)
memory usage: 176.0+ bytes


# Sum total sample weight by mission

Adding a new column to the missions DataFrame to represent the sum of all samples collected on that mission.

Grouping all the rows by the values in the Mission column, grabing all the values in the Weight (kg) column, and groups it by unique values in the Mission column.

Suming up all the values in the Weight (kg) column for each unique value in the Mission column and Merging the missions DataFrame with the sample_total_weight series by using the Mission column as the index to merge on.

In [9]:
sample_total_weight = rock_samples.groupby('Mission')['Weight (kg)'].sum()
missions = pd.merge(missions, sample_total_weight, on='Mission')
missions.rename(columns={'Weight (kg)':'Sample weight (kg)'}, inplace=True)
missions

Unnamed: 0,Mission,Sample weight (kg)
0,Apollo11,21.55424
1,Apollo12,34.34238
2,Apollo14,41.83363
3,Apollo15,75.3991
4,Apollo16,92.46262
5,Apollo17,109.44402


# Get the difference in weights across missions
I am not rocket experts, so it's important to take a look at a lot of different cross sections of data that are available to you. In this case, I will see that the total weight of the samples increased with each mission, but it's hard to immediately see by how much. So, adding one more column to the missions DataFrame that simply grabs the difference between the current row and the row preceding it:

In [10]:
missions['Weight diff'] = missions['Sample weight (kg)'].diff()
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff
0,Apollo11,21.55424,
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


Notice that in the first row, for Apollo11, the value in the Weight diff column is NaN. Because Apollo11 was the first mission, there is no difference between the weight of the rock collected on Apollo11 and that of the previous mission. We can fill this NaN value with 0:

In [11]:
missions['Weight diff'] = missions['Weight diff'].fillna(value=0)
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff
0,Apollo11,21.55424,0.0
1,Apollo12,34.34238,12.78814
2,Apollo14,41.83363,7.49125
3,Apollo15,75.3991,33.56547
4,Apollo16,92.46262,17.06352
5,Apollo17,109.44402,16.9814


# Adding command and lunar module data
By using the NASA Space Science Data Coordinated Archive (https://nssdc.gsfc.nasa.gov/nmc/SpacecraftQuery.jsp), I gathered information about each module used in each mission. Creating the samples tables, creating six new columns, three for the lunar modules and three for the command modules:

- Module name
- Module mass
- Module mass diff

Fill in any NaN values with 0:

In [12]:
missions['Lunar module (LM)'] = ['Eagle (LM-5)', 'Intrepid (LM-6)', 'Antares (LM-8)', 'Falcon (LM-10)', 'Orion (LM-11)', 'Challenger (LM-12)']
missions['LM mass (kg)'] = [15103, 15235, 15264, 16430, 16445, 16456]
missions['LM mass diff'] = missions['LM mass (kg)'].diff()
missions['LM mass diff'] = missions['LM mass diff'].fillna(value=0)

missions['Command module (CM)'] = ['Columbia (CSM-107)', 'Yankee Clipper (CM-108)', 'Kitty Hawk (CM-110)', 'Endeavor (CM-112)', 'Casper (CM-113)', 'America (CM-114)']
missions['CM mass (kg)'] = [5557, 5609, 5758, 5875, 5840, 5960]
missions['CM mass diff'] = missions['CM mass (kg)'].diff()
missions['CM mass diff'] = missions['CM mass diff'].fillna(value=0)

missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff,Lunar module (LM),LM mass (kg),LM mass diff,Command module (CM),CM mass (kg),CM mass diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15103,0.0,Columbia (CSM-107),5557,0.0
1,Apollo12,34.34238,12.78814,Intrepid (LM-6),15235,132.0,Yankee Clipper (CM-108),5609,52.0
2,Apollo14,41.83363,7.49125,Antares (LM-8),15264,29.0,Kitty Hawk (CM-110),5758,149.0
3,Apollo15,75.3991,33.56547,Falcon (LM-10),16430,1166.0,Endeavor (CM-112),5875,117.0
4,Apollo16,92.46262,17.06352,Orion (LM-11),16445,15.0,Casper (CM-113),5840,-35.0
5,Apollo17,109.44402,16.9814,Challenger (LM-12),16456,11.0,America (CM-114),5960,120.0


Adding Total weight and total difference for each mission across both the lunar and command modules:

In [13]:
missions['Total weight (kg)'] = missions['LM mass (kg)'] + missions['CM mass (kg)']
missions['Total weight diff'] = missions['LM mass diff'] + missions['CM mass diff']
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff,Lunar module (LM),LM mass (kg),LM mass diff,Command module (CM),CM mass (kg),CM mass diff,Total weight (kg),Total weight diff
0,Apollo11,21.55424,0.0,Eagle (LM-5),15103,0.0,Columbia (CSM-107),5557,0.0,20660,0.0
1,Apollo12,34.34238,12.78814,Intrepid (LM-6),15235,132.0,Yankee Clipper (CM-108),5609,52.0,20844,184.0
2,Apollo14,41.83363,7.49125,Antares (LM-8),15264,29.0,Kitty Hawk (CM-110),5758,149.0,21022,178.0
3,Apollo15,75.3991,33.56547,Falcon (LM-10),16430,1166.0,Endeavor (CM-112),5875,117.0,22305,1283.0
4,Apollo16,92.46262,17.06352,Orion (LM-11),16445,15.0,Casper (CM-113),5840,-35.0,22285,-20.0
5,Apollo17,109.44402,16.9814,Challenger (LM-12),16456,11.0,America (CM-114),5960,120.0,22416,131.0


Picture of each of the six Apollo missions that landed on the Moon. This picture contains information about the samples that each mission collected and the weights of each lunar and command module.

# Compare the data
The interesting thing about predicting how much sample each Artemis mission can bring back is that we don't yet know the full specs of the spacecraft that the Artemis plans on using. Using some information from the NASA Factsheet on the Space Launch System (SLS) and Orion Modules (https://www.nasa.gov/sites/default/files/atoms/files/0080_sls_fact_sheet_sept2020_09082020_final_0.pdf), we have data on weights and payloads.

Read about payload here : https://en.wikipedia.org/wiki/Payload

We know that the Saturn V payload was 48,500 kg, and the weights of the modules varied from mission to mission. So, to determine the ratios that will allow us to make predictions about the Artemis missions, we can use:

- Saturn V payload
- Mission sample weight
- Mission module weight

In [14]:
# Sample-to-weight ratio
saturnVPayload = 48500
missions['Crewed area : Payload'] = missions['Total weight (kg)'] / saturnVPayload
missions['Sample : Crewed area'] = missions['Sample weight (kg)'] / missions['Total weight (kg)']
missions['Sample : Payload'] = missions['Sample weight (kg)'] / saturnVPayload
missions

Unnamed: 0,Mission,Sample weight (kg),Weight diff,Lunar module (LM),LM mass (kg),LM mass diff,Command module (CM),CM mass (kg),CM mass diff,Total weight (kg),Total weight diff,Crewed area : Payload,Sample : Crewed area,Sample : Payload
0,Apollo11,21.55424,0.0,Eagle (LM-5),15103,0.0,Columbia (CSM-107),5557,0.0,20660,0.0,0.425979,0.001043,0.000444
1,Apollo12,34.34238,12.78814,Intrepid (LM-6),15235,132.0,Yankee Clipper (CM-108),5609,52.0,20844,184.0,0.429773,0.001648,0.000708
2,Apollo14,41.83363,7.49125,Antares (LM-8),15264,29.0,Kitty Hawk (CM-110),5758,149.0,21022,178.0,0.433443,0.00199,0.000863
3,Apollo15,75.3991,33.56547,Falcon (LM-10),16430,1166.0,Endeavor (CM-112),5875,117.0,22305,1283.0,0.459897,0.00338,0.001555
4,Apollo16,92.46262,17.06352,Orion (LM-11),16445,15.0,Casper (CM-113),5840,-35.0,22285,-20.0,0.459485,0.004149,0.001906
5,Apollo17,109.44402,16.9814,Challenger (LM-12),16456,11.0,America (CM-114),5960,120.0,22416,131.0,0.462186,0.004882,0.002257


# Save the ratios
Using the mean() function to take the average of all those ratios across all the missions.

In [15]:
crewedArea_payload_ratio = missions['Crewed area : Payload'].mean()
sample_crewedArea_ratio = missions['Sample : Crewed area'].mean()
sample_payload_ratio = missions['Sample : Payload'].mean()
print(crewedArea_payload_ratio)
print(sample_crewedArea_ratio)
print(sample_payload_ratio)

0.445127147766323
0.0028487896378978235
0.0012887834707903782


# Predict Artemis sample capacity

By using the NASA Factsheet on the Space Launch System (SLS) and Orion Modules (https://www.nasa.gov/sites/default/files/atoms/files/0080_sls_fact_sheet_sept2020_09082020_final_0.pdf) to gather estimated data on the rockets and modules that will be used in the Artemis program.

As a reminder, the Artemis program (https://www.nasa.gov/specials/artemis/) is NASA's second set of missions to land humans on the surface of the Moon. The program will launch in 2024 and will send not only the next pair of humans, but also the first woman to set foot on the Moon. The preparation for this mission is even bigger than focusing on a Moon landing. It will also provide space for a commercial payload on the ship and is the first step along the Moon to Mars program (https://www.nasa.gov/topics/moon-to-mars/). So, while the Artemis missions will likely bring home additional samples, there are other goals that might affect the amount of capacity that's required to do so.



# An Artemis mission DataFrame
Don't have all the details about the Artemis mission, but do know currently that three iterations of the rocket will be cycled through for each mission. Each rocket will have one version meant to sustain a crew and one meant only for cargo. For the purposes of this module, focus only on the three rockets meant to house crew, to be more aligned with the Apollo missions. We also know that the expected payload of the Space Launch System (SLS) is expected to grow with each iteration, but that the current weight of Orion (the command and lunar modules combined) has one estimated weight today.

Again, we will call the command and lunar modules the crewed area, and we can create a DataFrame with the information we have about the three crewed missions:

In [16]:
artemis_crewedArea = 26520
artemis_mission = pd.DataFrame({'Mission':['artemis1','artemis1b','artemis2'],
                                 'Total weight (kg)':[artemis_crewedArea,artemis_crewedArea,artemis_crewedArea],
                                 'Payload (kg)':[26988, 37965, 42955]})
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg)
0,artemis1,26520,26988
1,artemis1b,26520,37965
2,artemis2,26520,42955


Estimating the weight of samples based on the ratios we determined from the Artemis missions:

In [17]:
artemis_mission['Sample weight from total (kg)'] = artemis_mission['Total weight (kg)'] * sample_crewedArea_ratio
artemis_mission['Sample weight from payload (kg)'] = artemis_mission['Payload (kg)'] * sample_payload_ratio
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg),Sample weight from total (kg),Sample weight from payload (kg)
0,artemis1,26520,26988,75.549901,34.781688
1,artemis1b,26520,37965,75.549901,48.928664
2,artemis2,26520,42955,75.549901,55.359694


The average of the two predictions:

In [18]:
artemis_mission['Estimated sample weight (kg)'] = (artemis_mission['Sample weight from payload (kg)'] + artemis_mission['Sample weight from total (kg)'])/2
artemis_mission

Unnamed: 0,Mission,Total weight (kg),Payload (kg),Sample weight from total (kg),Sample weight from payload (kg),Estimated sample weight (kg)
0,artemis1,26520,26988,75.549901,34.781688,55.165795
1,artemis1b,26520,37965,75.549901,48.928664,62.239283
2,artemis2,26520,42955,75.549901,55.359694,65.454798


We can see now that the three Artemis missions can likely return 55.16 kg, 62.23 kg, and 65.45 kg, respectively.

# Prioritizing Moon rock sample gathering based on data

Determining which types of samples to collect from the Moon requires expertise, making some assumptions to learn how to clean and manipulate data.

Determining how much remains of each sample that was returned from the Apollo missions, given the amount that was originally collected and the percentage of remaining pristine sample.

- Multipling the Pristine (%) column by 0.01, because it was being represented as a whole number.

In [19]:
rock_samples['Remaining (kg)'] = rock_samples['Weight (kg)'] * (rock_samples['Pristine (%)'] * .01)
rock_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%),Remaining (kg)
0,10001,Apollo11,Soil,Unsieved,0.1258,88.36,0.111157
1,10002,Apollo11,Soil,Unsieved,5.629,93.73,5.276062
2,10003,Apollo11,Basalt,Ilmenite,0.213,65.56,0.139643
3,10004,Apollo11,Core,Unsieved,0.0448,71.76,0.032148
4,10005,Apollo11,Core,Unsieved,0.0534,40.31,0.021526


In [20]:
rock_samples.describe()

Unnamed: 0,ID,Weight (kg),Pristine (%),Remaining (kg)
count,2229.0,2229.0,2229.0,2229.0
mean,52058.432032,0.168253,84.512764,0.138103
std,26207.651471,0.637286,22.057299,0.525954
min,10001.0,0.0,0.0,0.0
25%,15437.0,0.003,80.01,0.002432
50%,65527.0,0.0102,92.3,0.00853
75%,72142.0,0.09349,98.14,0.07824
max,79537.0,11.729,180.0,11.169527


On average, each sample weighs about .16 kg and has about 84% of the original amount remaining. By using this knowledge to extract only the samples that are likely running low, which means that they have been used a lot by researchers.

In [21]:
low_samples = rock_samples.loc[(rock_samples['Weight (kg)'] >= .16) & (rock_samples['Pristine (%)'] <= 50)]
low_samples.head()

Unnamed: 0,ID,Mission,Type,Subtype,Weight (kg),Pristine (%),Remaining (kg)
11,10017,Apollo11,Basalt,Ilmenite,0.973,43.71,0.425298
14,10020,Apollo11,Basalt,Ilmenite,0.425,27.88,0.11849
15,10021,Apollo11,Breccia,Regolith,0.25,30.21,0.075525
29,10045,Apollo11,Basalt,Olivine,0.185,12.13,0.022441
37,10057,Apollo11,Basalt,Ilmenite,0.919,35.15,0.323028


In [22]:
low_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27 entries, 11 to 2183
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              27 non-null     int64  
 1   Mission         27 non-null     object 
 2   Type            27 non-null     object 
 3   Subtype         27 non-null     object 
 4   Weight (kg)     27 non-null     float64
 5   Pristine (%)    27 non-null     float64
 6   Remaining (kg)  27 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 1.7+ KB


Twenty-seven samples seem like a small amount to base a recommendation on. Probably find some other samples that are needed for more research here on Earth. To discover them, use the unique() function to see how many unique types we have across the low_samples and rock_samples DataFrames.

In [23]:
low_samples.Type.unique()

array(['Basalt', 'Breccia', 'Soil', 'Core'], dtype=object)

In [24]:
rock_samples.Type.unique()

array(['Soil', 'Basalt', 'Core', 'Breccia', 'Special', 'Crustal'],
      dtype=object)

Although six unique types were collected across all samples, the samples that are running low are from only four unique types. But this doesn't tell us everything about the samples we might want to focus on. For example, in our low_samples DatFrame, how many of each type are considered low?

In [25]:
low_samples.groupby('Type')['Weight (kg)'].count()

Type
Basalt     14
Breccia     8
Core        1
Soil        4
Name: Weight (kg), dtype: int64

Notice that there are more Basalt and Breccia type rocks with low samples than those of Core and Soil. Additionally, because the likelihood is high that every mission has some Core and Soil collection requirements, focusing on the Basalt and Breccia rock types for the samples that we need to have collected:

In [26]:
needed_samples = low_samples[low_samples['Type'].isin(['Basalt', 'Breccia'])]
needed_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 22 entries, 11 to 2183
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              22 non-null     int64  
 1   Mission         22 non-null     object 
 2   Type            22 non-null     object 
 3   Subtype         22 non-null     object 
 4   Weight (kg)     22 non-null     float64
 5   Pristine (%)    22 non-null     float64
 6   Remaining (kg)  22 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 1.4+ KB


But are Basalt and Breccia the only two types of rocks we want to look for?

# Developing a recommendation of Moon rock samples to be collected
Let's take a step back and see how the number of samples compares to the amount of sample. By comparing the total weight from the needed_samples DataFrame to the rock_samples DataFrame. That is, we'll compare the samples we've identified as running low to all the samples collected on Apollo missions.

In [27]:
needed_samples.groupby('Type')['Weight (kg)'].sum()

Type
Basalt     17.4234
Breccia    10.1185
Name: Weight (kg), dtype: float64

In [28]:
rock_samples.groupby('Type')['Weight (kg)'].sum()

Type
Basalt      93.14077
Breccia    168.88075
Core        19.93587
Crustal      4.74469
Soil        87.58981
Special      0.74410
Name: Weight (kg), dtype: float64

One bit of information really stands out: we've never had a lot of Crustal rocks in the first place.

We can add Crustal rocks to the set of needed samples:

In [29]:
needed_samples = needed_samples.append(rock_samples.loc[rock_samples['Type'] == 'Crustal'])
needed_samples.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 68 entries, 11 to 2189
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   ID              68 non-null     int64  
 1   Mission         68 non-null     object 
 2   Type            68 non-null     object 
 3   Subtype         68 non-null     object 
 4   Weight (kg)     68 non-null     float64
 5   Pristine (%)    68 non-null     float64
 6   Remaining (kg)  68 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 4.2+ KB


  needed_samples = needed_samples.append(rock_samples.loc[rock_samples['Type'] == 'Crustal'])


# Summary of needed samples
The final step is to consolidate everything we know into one table that can be shared with the astronauts. Firstly, need a column for each type of rock that we have already identified as rocks we want more samples of:

In [30]:
needed_samples_overview = pd.DataFrame()
needed_samples_overview['Type'] = needed_samples.Type.unique()
needed_samples_overview

Unnamed: 0,Type
0,Basalt
1,Breccia
2,Crustal


In [31]:
needed_sample_weights = needed_samples.groupby('Type')['Weight (kg)'].sum().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, needed_sample_weights, on='Type')
needed_samples_overview.rename(columns={'Weight (kg)':'Total weight (kg)'}, inplace=True)
needed_samples_overview

Unnamed: 0,Type,Total weight (kg)
0,Basalt,17.4234
1,Breccia,10.1185
2,Crustal,4.74469


When astronauts are up on the Moon, one way they can identify rocks is by their size. If we can give them an estimated size of each type of rock, that might make their collection process easier.

In [32]:
needed_sample_ave_weights = needed_samples.groupby('Type')['Weight (kg)'].mean().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, needed_sample_ave_weights, on='Type')
needed_samples_overview.rename(columns={'Weight (kg)':'Average weight (kg)'}, inplace=True)
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg)
0,Basalt,17.4234,1.244529
1,Breccia,10.1185,1.264813
2,Crustal,4.74469,0.103145


Crustals are small! They're probably a lot harder to spot, so no wonder we don't have a lot of them.

We probably want to give the astronauts some indication of how many of each type we want them to collect. So, for the three types we're looking for, we should grab the total number we have of each type and get the remaining percentage of each type of rock.

In [33]:
total_rock_count = rock_samples.groupby('Type')['ID'].count().reset_index()
needed_samples_overview = pd.merge(needed_samples_overview, total_rock_count, on='Type')
needed_samples_overview.rename(columns={'ID':'Number of samples'}, inplace=True)
total_rocks = needed_samples_overview['Number of samples'].sum()
needed_samples_overview['Percentage of rocks'] = needed_samples_overview['Number of samples'] / total_rocks
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg),Number of samples,Percentage of rocks
0,Basalt,17.4234,1.244529,351,0.25885
1,Breccia,10.1185,1.264813,959,0.707227
2,Crustal,4.74469,0.103145,46,0.033923


And finally, to tie it all back into a recommendation to the Artemis program, we can determine the average weight of samples we estimated in the preceding unit.

In [34]:
artemis_ave_weight = artemis_mission['Estimated sample weight (kg)'].mean()
artemis_ave_weight

60.95329172619983

We can use this number to determine how many of each rock we want the astronauts to aim to collect:

In [35]:
needed_samples_overview['Weight to collect'] = needed_samples_overview['Percentage of rocks'] * artemis_ave_weight
needed_samples_overview['Rocks to collect'] = needed_samples_overview['Weight to collect'] / needed_samples_overview['Average weight (kg)']
needed_samples_overview

Unnamed: 0,Type,Total weight (kg),Average weight (kg),Number of samples,Percentage of rocks,Weight to collect,Rocks to collect
0,Basalt,17.4234,1.244529,351,0.25885,15.777733,12.677678
1,Breccia,10.1185,1.264813,959,0.707227,43.107822,34.082381
2,Crustal,4.74469,0.103145,46,0.033923,2.067737,20.046811


So, we might tell the Artemis astronauts to please try to collect 12 Basalt rocks, 34 Breccia rocks, and 20 Crustal rocks. Whew!