# A Statistical Analysis of NASA's Meteorite Data 

## Overview:

This notebook contains our analysis by way of T-tests and proportion tests on the meteorite and popualation datasets (stored in ./CSV_MASTERS). Contained within, we ask and answer questions using statistical tests about meteorite masses, strike dispursion, and observations by people in various time periods and countries.

***
## Data:

#### [NASA Meteorite Dataset](https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh)

##### Supplemental data about countries and landmass pulled from:

* #### http://worldpopulationreview.com/countries/countries-by-density/
* #### https://photius.com/rankings/world2050_rank.html 

***
## Content:

### [Mass & Classification](http://localhost:8888/notebooks/TESTING.ipynb#T-test-comparsion-of-masses-of-the-meteor-classifications)
### [Dispersion Over landmass](#)
### [Proportion of Observed Falling Meteorites to Meteorites Found on the Ground ](#)
***
***

## Imports

In [1]:
import pandas as pd
import statsmodels.api as sm

from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import ttest_ind

import warnings
warnings.filterwarnings('ignore')

from test_functions import *


In [2]:
final_df =  pd.read_csv("CSV_MASTERS/Final.csv")
population_df = pd.read_csv("CSV_MASTERS/population.csv")

***
# Mass & Classification

## Introduction:

Meteorites are broadly categorized as being of class **_Chondrite, Achondrite, Iron, or Stony-Iron_**. The chart below shows how we took the various subclasses and put them into one of these four categories.  

![image.png](attachment:image.png)


Using this information, we tested the following hypotheses.

### Question 1: Do the different classifications have different distributions of mass (in grams)?

* #### Ho: The distributions of masses are no different between classifications

* #### Ha: The distributions of masses are different between classifications

### Methodology:

We begin by subsetting our dataset into the four main classifications - _Chondrite, Achondrite, Iron, and Stony-Iron_.  

From there, we run a t-test on the samples of masses of each classification between each pair of classifications. We display the data in a chart indicating at which alpha levels we can claim statistical significance.  

In [3]:
test_classification_masses(population_df)

Unnamed: 0,sample_1,sample_2,t_stat,p_value,alpha 10%,alpha 5%,alpha 1%
0,Chondrite,Achondrite,-1.509633,0.131428,False,False,False
1,Chondrite,Iron,-4.238587,2.5e-05,True,True,True
2,Chondrite,Stony-Iron,-3.092844,0.002233,True,True,True
3,Achondrite,Iron,-4.187946,3.1e-05,True,True,True
4,Achondrite,Stony-Iron,-2.85104,0.004747,True,True,True
5,Iron,Stony-Iron,3.448491,0.000586,True,True,True
