In [1]:
import os
os.chdir("../")

# Feasibility Analysis

This notebook deals with feasibility questions which arise in asteroid mining.

In [2]:
import pandas as pd
import numpy as np

# Plotting
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objs as go

# Statistics
from scipy import stats
from scikit_posthocs import posthoc_scheffe

In [3]:
df = pd.read_csv("data/Asteroid_Cleaned.csv", index_col=0)
df.shape

(1340599, 28)

## Asteroid Size Consideration

The size of an asteroid is a major contributing factor in its inherent value. More materials can be mined from larger asteroids.

### On average, which NEO type has larger asteroids?

When starting out, mining will be limited to asteroids that are close to earth. Among the 4 types of near earth asteroids, which type is the best for starting out?

In [4]:
data = df.loc[df.neo == 1, ["diameter", "neo_type"]].copy(deep=True)
data.shape

(33950, 2)

I'll remove outliers in the data.

In [5]:
def calculate_diameter_fences(d):
    q1 = np.quantile(d, 0.25)
    q3 = np.quantile(d, 0.75)
    iqr = q3 - q1
    span = 1.5 * iqr
    lower_fence = q1 - span
    upper_fence = q3 + span
    return (lower_fence, upper_fence)


for neo_type in data.neo_type.unique():
    lower_fence, upper_fence = calculate_diameter_fences(
        data[data.neo_type == neo_type].diameter
    )
    mask = data.neo_type == neo_type

    outliers = (
        data[mask].diameter.map(lambda x: x < lower_fence or x > upper_fence).dropna()
    )
    outlier_count = outliers.sum()
    print(f'NEO type "{neo_type}" had {outlier_count} outliers')

    data.drop(index=outliers[outliers].index, inplace=True)

data.shape

NEO type "Apollo" had 259 outliers
NEO type "Atira" had 127 outliers
NEO type "Aten" had 4 outliers
NEO type "Amor" had 15 outliers


(33545, 2)

Now let's look at the range of diameter values for each type.

In [6]:
fig = px.box(data, y="diameter", color="neo_type")
fig.update_layout(
    height=600,
    width=800,
    title_x=0.5,
    title_text=f"Box Plot<br><sup>Distribution of values for NEO asteroid types</sup>",
    legend=dict(orientation="h", yanchor="top", xanchor="center", y=-0.1, x=0.5),
    yaxis_title="Diameter (km)",
)
fig.show()

Now let's see what the histogram for each looks like.

In [7]:
def histogram_trace(row, col, name):
    diameters = data[data.neo_type == name].diameter

    fig.add_trace(
        go.Histogram(name=name, x=diameters, texttemplate="%{y}"),
        row=row,
        col=col,
    )
    fig.update_xaxes(title_text=name, row=row, col=col)


fig = make_subplots(rows=2, cols=2)
histogram_trace(1, 1, "Apollo")
histogram_trace(1, 2, "Atira")
histogram_trace(2, 1, "Aten")
histogram_trace(2, 2, "Amor")
fig.update_layout(
    height=600,
    width=800,
    title_x=0.5,
    title_text=f"Histogram<br><sup>Distribution of diameters for NEO asteroid types</sup>",
    legend=dict(orientation="h", yanchor="top", xanchor="center", y=-0.1, x=0.5),
)
fig.show()

All have mostly normal shapes and the variance is pretty same as well. Let's do an ANOVA test to see if there are any distinctions between their means.

**Hypothesis**

* $\mathbf{H_0}:$ All groups have the same mean diameter.

* $\mathbf{H_1}:$ At least one group has different mean diameter.

**Confidence Interval**

Because there is a lot of data, I can afford to have a lower significance level and thus a higher confidence level. I want to perform my hypothesis test at 99.99\% confidence level.

In [8]:
def decision(alpha, p_value):
    if p_value <= alpha:
        print("Reject null hypothesis")
    else:
        print("Fail to reject null hypothesis")

In [9]:
conf_level = 0.9999
alpha = 1 - conf_level

_, p_value = stats.f_oneway(
    *[data[data.neo_type == neo_type].diameter for neo_type in data.neo_type.unique()]
)

decision(alpha, p_value)

Reject null hypothesis


So, not all groups have the same mean diameter. At least one group has different mean diameter, i.e. a different mean size. I'll do a posthoc test using the **Scheffe** method.

In [10]:
p_value = posthoc_scheffe(data, val_col="diameter", group_col="neo_type", sort=True)
p_value

Unnamed: 0,Amor,Apollo,Aten,Atira
Amor,1.0,0.9878796,0.021365,0.000106971
Apollo,0.98788,1.0,6.6e-05,1.134332e-59
Aten,0.021365,6.56797e-05,1.0,0.9930387
Atira,0.000107,1.134332e-59,0.993039,1.0


In [11]:
fig = px.imshow(
    p_value <= alpha,
    range_color=[0, 1],
    color_continuous_scale=[(0, "#444444"), (1, "#00FF00")],
)
fig.update_layout(
    height=500,
    width=500,
    title_x=0.5,
    title_text=f"Heatmap<br><sup>Statistically Significant Difference</sup>",
)
fig.show()

From the p-values, **Apollo** has different mean diameter when compared to **Aten** and **Atira**. I'll do t-tests to see if **Apollo** has a lower size compared to **Aten** and **Atira**.

**Hypothesis Test**

* $\mathbf{H_0}: \mu_{Apollo} \ge \mu_{other}$

* $\mathbf{H_1}: \mu_{Apollo} < \mu_{other}$

**Confidence Interval**

As before, I want to perform my hypothesis test at 99.99\% confidence level.

In [12]:
p_value = stats.ttest_ind(
    a=data[data.neo_type == "Apollo"].diameter,
    b=data[data.neo_type == "Aten"].diameter,
    alternative="less",
    random_state=29,
    equal_var=False,
).pvalue

decision(alpha, p_value)

Fail to reject null hypothesis


This basically means that **Apollo** has a higher mean diameter than **Aten** asteroids.

In [13]:
p_value = stats.ttest_ind(
    a=data[data.neo_type == "Apollo"].diameter,
    b=data[data.neo_type == "Atira"].diameter,
    alternative="less",
    random_state=29,
    equal_var=False,
).pvalue

decision(alpha, p_value)

Fail to reject null hypothesis


This means **Apollo** has a higher mean diameter than **Atira** asteroids.

In [14]:
p_value = stats.ttest_ind(
    a=data[data.neo_type == "Apollo"].diameter,
    b=data[data.neo_type == "Amor"].diameter,
    alternative="less",
    random_state=29,
    equal_var=False,
).pvalue

decision(alpha, p_value)

Fail to reject null hypothesis


This means **Apollo** has a higher mean diameter than **Amor** asteroids.

From these tests we can see that the **Apollo** type asteroids have higher mean diameters than other asteroid types. So it makes sense to start asteroid mining with these asteroids.