# A Statistical Analysis of NASA's Meteorite Data 

## Overview:

This notebook contains our analysis by way of T-tests and proportion tests on the meteorite and popualation datasets (stored in ./CSV_MASTERS). Contained within, we ask and answer questions using statistical tests about meteorite masses, strike dispursion, and observations by people in various time periods and countries.

***
## Data:

#### [NASA Meteorite Dataset](https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh)

##### Supplemental data about countries and landmass pulled from:

* #### http://worldpopulationreview.com/countries/countries-by-density/
* #### https://photius.com/rankings/world2050_rank.html 

***
## Content:

### [Packages & Imports](#Imports)

### [Mass & Classification](#Mass-&-Classification)

### [Found Vs Sighted Falling](#FOUND-VS-SIGHTED-FALLING)

### [Dispersion Over landmass](#DISTRIBUTION-OVER-LANDMASS)

### [The Impact of Population on Reported Meteors](#THE-IMPACT-OF-POPULATION-ON-REPORTED-METEOR-STRIKES-FOR-A-COUNTRY)
***


## Imports

In [2]:
import pandas as pd
import statsmodels.api as sm

from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import ttest_ind

import warnings
warnings.filterwarnings('ignore')

from test_functions import *


In [3]:
final_df =  pd.read_csv("CSV_MASTERS/Final.csv")
population_df = pd.read_csv("CSV_MASTERS/population_by_years.csv")
earth_strikes = pd.read_csv("CSV_MASTERS/GeoEarth_with_num_strikes.csv")

***
# Mass & Classification

Meteorites are broadly categorized as being of class **_Chondrite, Achondrite, Iron, or Stony-Iron_**. The chart below shows how we took the various subclasses and put them into one of these four categories.  

![image.png](attachment:image.png)


Using this information, we tested the following hypotheses.

### Question 1: Do the different classifications have different distributions of mass (in grams)?

* #### Ho: The distributions of masses are no different between classifications

* #### Ha: The distributions of masses are different between classifications

We begin by subsetting our dataset into the four main classifications - _Chondrite, Achondrite, Iron, and Stony-Iron_.  

From there, we run a t-test on the samples of masses of each classification between each pair of classifications. We display the data in a chart indicating at which alpha levels we can claim statistical significance.  

For the alpha columns in the resulting chart (and all the charts shown in this notebook), the boolean value indicates if we were able to reject the null hypothesis at that alpha level. 

In [4]:
test_classification_masses(population_df) # found in test_function.py

Unnamed: 0,sample_1,sample_2,t_stat,p_value,alpha 10%,alpha 5%,alpha 1%
0,Chondrite,Achondrite,-1.348895,0.178059,False,False,False
1,Chondrite,Iron,-4.214083,2.8e-05,True,True,True
2,Chondrite,Stony-Iron,-3.018075,0.002948,True,True,True
3,Achondrite,Iron,-4.112824,4.3e-05,True,True,True
4,Achondrite,Stony-Iron,-2.588523,0.010396,True,True,False
5,Iron,Stony-Iron,3.299822,0.001004,True,True,True


### Analysis: 

As can be seen in this chart, we were able to regect the null hypothesis for most of the classification comparisons that their masses were the same. Of note, we were not able to prove that there was a statistical different between the Chondrite and Achondrite classifications when it comes to mass.  

From our research, this conclusion makes sense. Chondrite and Achondrite meteors are both stony type classes. All other comparisons compare some stony type to some iron type. Iron is a dense material, and we would expect the masses to be different for stony to and iron type meteors. 

***
## FOUND VS SIGHTED FALLING

The primary dataset included a column called _Fall_, which included two values: fell and found. These indicated  if a meteorite was seen falling and later found on the ground, or if it was just found with no record of it falling.  

Given this distinction, we wanted to know if the mass of a meteorite would possibly make it more visible when falling. 

### Question: Are hevier meteorites more likely to be seen falling?


* #### Ho: There is not difference in mean mass for falling and found meteorites

* #### Ha: There is a difference in the mean mass for falling and found meteorites

First, we test this visually. As can be seen in the image below, we compare the distribution os masses for each classification of meteorite acording to if it was seen falling or just found.  


![image.png](attachment:image.png)

The distributions of found to fell are clearly not the same, but we are unsure of how different the found to fell masses are. Therefore, we tested the results in a proportion test. The results are in the chart below. 

In [5]:
test_found_vs_fall_masses(final_df)

Unnamed: 0,Sample 1,Sample 2,t-stat,p-value,alpha 10 %,aplha 5 %,alpha 1 %
0,Found,Fell,-1.363541,0.172967,False,False,False


### Analysis:
Here, we fail to find any significant difference between the masses of found to fell meteorites. Therefore, we fail to reject the null hypothesis that the masses for found vs fell meteorites is different. 

***
## Meteor Strikes over years

Our dataset contained several years worth of data going back as early as the 800s (historical records). Using the data about found vs fell meteorites, we wanted to test whether there is a higher proportion of meteorite sightings in any particular time period. 

### Question: Are falling meteor sightings more widely reported in any specific time period?

* #### Ho: The proportion of falling meteorite sightings is the same in our compared time periods

* #### Ha: There proportion of meteorite sightings is different in our compared time periods

Again, we decided to take a visual analysis first. The chart below shows the number of meteorites reported in general (found or fell) over the years 1970-2013. We observed that, prior to 1070, there were very few meteors reported in a given year. Then, in the 70s, we noticed this significant uptick. 

![image.png](attachment:image.png)

To test our hypotheses more accurately, we performed a proportion test on the proportion of fell to found meteorites for the years 1940-1969 and 1970-2000. The results are in the table below.

In [7]:
test_fall_sightings_dates(final_df, start_date=1940, split_date=1970, end_date=2000)

Unnamed: 0,Date Range 1,Date Range 2,z-stat,p-value
0,1940 - 1969,1970 - 2000,50.519683,0.0


### Analysis:

As can be seen, the propotion of meteorites reported as falling prior to 1970 is (VERY) significantly different. We have an extreme z statistic of more than 50, and our p value is essentially 0, meaning that the proportions in these two time periods is starkly different. 

***
# DISTRIBUTION OVER LANDMASS

In the primary dataset, we were given the latitude and longitude coordinates for wach meteor landing. We matched all falit coordinates to their respective countries and geographical regions.  

From this new data that we pulled in, performed the following analysis.

### Question: Are meteors equaly distributed across the global landmass?

* #### Ho: There is no area where meteores are more heavily distributed 

* #### Ha: There are some areas where meteors are more heavily distributed on the planet.

We began again with a visual analysis to see if our test would have any merit.  

The graph below shows the distribution of meteor reports over the longitude. If we imagine a map of the world behind this graphic, we can easily see that the distribution of meteors over the landmasses aproximates a uniform distribution. 

![image.png](attachment:image.png)
_See VISUALS.ipynb for the code that created this graph_

We performed a number of test to confirm this suspicion.  

We starte broadly, comparing each of the four quadrants of the globe to each other. 

In [6]:
test_quadrants(final_df)

Unnamed: 0,quadrant_1,quadrant_2,t_stat,p_value,alpha 10%,alpha 5%,alpha 1%
0,NE,NW,-2.541272,0.011108,True,True,False
1,NE,SW,-0.919997,0.357687,False,False,False
2,NE,SE,1.629069,0.103324,False,False,False
3,NW,SW,1.378149,0.168233,False,False,False
4,NW,SE,2.870973,0.00413,True,True,True
5,SW,SE,1.300079,0.193734,False,False,False


Mostly, we were not able to reject the null hypothesis. however, for comparisons between NE and NW, and NE and SE, this test would allow us to reject the null hypothesis. But, inconsidering the proportion of land to water in each of these comparisons, it is understandable by we might see a greater concentration of meteorites sightings in one quadrant and not another.  

To account for this difference in landmass, we broke down our comparisons further, and tested specific geographical regions against each other. Those results are below. 

In [7]:
test_regions(earth_strikes)

Unnamed: 0,Region 1,Region 2,t-stat,p-value,alpha 10%,alpha 5%,alpha 1%
0,Latin America & Caribbean,North America,-0.886507,0.534286,False,False,False
1,Latin America & Caribbean,Europe & Central Asia,1.238742,0.234426,False,False,False
2,Latin America & Caribbean,Middle East & North Africa,-0.960486,0.347235,False,False,False
3,Latin America & Caribbean,South Asia,1.056172,0.306036,False,False,False
4,Latin America & Caribbean,East Asia & Pacific,0.791141,0.438777,False,False,False
5,Latin America & Caribbean,Sub-Saharan Africa,1.220304,0.240944,False,False,False
8,North America,Europe & Central Asia,1.055162,0.482911,False,False,False
9,North America,Middle East & North Africa,0.594272,0.648478,False,False,False
10,North America,South Asia,1.034743,0.488864,False,False,False
11,North America,East Asia & Pacific,1.002038,0.498793,False,False,False


### Analysis:

Since this series of tests accounts for the landmass, we can see that we were not able to reject the null hypothesis across the board. 

Therefore, our conclusion is that meteors are distributed uniformaly across the global landmass. 

***
## THE IMPACT OF POPULATION ON REPORTED METEOR STRIKES FOR A COUNTRY

Finally, we brought in population data for each country in our dataset, and tested if a country's population has an impact on the number of reported meteors in that country.  

For this test, we subset the years from 1990 - 2010, and used population data from the year 2000. 

### Question: Does a country's popultion impact how many meteors are reported in that country?

* #### Ho: Countries with different populations will report the same number of meteors in a given time period

* #### Ha: Countries with different populations will report different numbers of meteors in a given time period

The visual below shows that, between the years of 1990 - 2010, countries with fewer than 20 million people reported less than half the meteors than those countries with greater than 20 million.  

![image.png](attachment:image.png)
_See TESTING.ipynb for the code that created this graph_

We tested this with a standard t test to see if the average number of meteors reported in countries with fewer than 20 million people was the same as the average number reported by those with more than 20 million people. The results of that test are below.

In [11]:
test_population_impact(population_df, pop_split=20_000)

Unnamed: 0,Population 1,Population 2,t-stat,p-value
0,< 20000000,> 20000000,-0.969922,0.338249


### Analysis:

Although the visualization above appears to tell one story, our test revealed that we cannot say with any certainty that the average number of meteors reported are different for the two population sizes (p value of .34). When looking more deeply into the data (see the chart below), we can see that most countries, no matter their population, report very few meteors. It is only a few countries that report many, which skewed our original findings.  

Therefore, we cannot reject our null hypothesis.

![image.png](attachment:image.png)
_See TESTING.ipynb for the code that created this graph_