# Meteorite Landing

### Project **Objetive**
#### *Understanding meteorite falls means looking at where and how often rocks from space land on Earth. This helps us learn about our planet's history and any potential dangers from space. Scientists study these falls to find out more about these fascinating events and what they can tell us about our world."*



### **Stakeholders**
#### ***General Public**: Exploring meteorite falls is valuable for everyone. It helps us learn about Earth's history and how our planet works. Knowing more about these events can also keep us safe by understanding any potential risks. Plus, it's just fascinating to discover more about the wonders of space! So, whether you're a student, a scientist, or just curious about the world around you, studying meteorite falls is an exciting adventure for everyone.*



### **Specific Objetive**
#### *To create a data visualization showcasing the geographic locations of meteorite falls. This visualization will enable individuals to explore and gain a better understanding of where meteorites have impacted across the globe. .*

### **Data Source**
#### *The data source for this project draws inspiration from the following dataframe: NASA Meteorite Landings. However, due to potential outdatedness, a new dataset will be curated utilizing information from the webpage Meteoritical Bulletin Database. This database provides up-to-date records of meteorite landings, ensuring the accuracy and relevance of the data used in the visualization.*


## **Preparing Data**

In [1]:
import pandas as pd  # Importing pandas for data manipulation
import matplotlib.pyplot as plt  # Importing matplotlib for plotting
import seaborn as sns  # Importing seaborn for enhanced data visualization

In [2]:
# Importing the CSV file named "Meteorite_Landing.csv" using pandas
meteorite_data = pd.read_csv("Meteorite_Landing.csv")

In [3]:
# Displaying the first few rows of the meteorite data to inspect its structure and content
meteorite_data.head()

Unnamed: 0,name,id,nametype,recclass,mass (g),fall,year,reclat,reclong,GeoLocation
0,Aachen,1,Valid,L5,21.0,Fell,1880.0,50.775,6.08333,"(50.775, 6.08333)"
1,Aarhus,2,Valid,H6,720.0,Fell,1951.0,56.18333,10.23333,"(56.18333, 10.23333)"
2,Abee,6,Valid,EH4,107000.0,Fell,1952.0,54.21667,-113.0,"(54.21667, -113.0)"
3,Acapulco,10,Valid,Acapulcoite,1914.0,Fell,1976.0,16.88333,-99.9,"(16.88333, -99.9)"
4,Achiras,370,Valid,L6,780.0,Fell,1902.0,-33.16667,-64.95,"(-33.16667, -64.95)"


In [4]:
# Displaying the data types of each column in the meteorite dataset
meteorite_data.dtypes

name            object
id               int64
nametype        object
recclass        object
mass (g)       float64
fall            object
year           float64
reclat         float64
reclong        float64
GeoLocation     object
dtype: object

In [5]:
# Displaying the shape of the meteorite dataset, which shows the number of rows and columns
meteorite_data.shape

(45716, 10)

## **Cleaning Data**

In [6]:

df = meteorite_data

# Dropping columns that are deemed unnecessary for the analysis
# The columns "nametype" and "recclass" are not relevant for the visualization of meteorite landings over time and location.
# The column "mass (g)" might not be necessary if the focus is on frequency rather than individual masses of meteorites.
# The column "fall" indicates whether the meteorite was found or fell, which might not be directly relevant to the visualization objective.
# The column "GeoLocation" could be redundant if latitude and longitude coordinates are already present in separate columns.
col_to_drop = ["nametype", "recclass", "mass (g)", "fall", "GeoLocation"]

# Dropping the specified columns from the dataset
df = df.drop(columns=col_to_drop)

In [7]:
# Removing any duplicate rows from the dataset
df = df.drop_duplicates()

# Displaying the new shape of the dataset after removing duplicates
df.shape


(45716, 5)

In [8]:
# Displaying the number of unique values in each column of the dataset
df.nunique()

name       45716
id         45716
year         265
reclat     12738
reclong    14640
dtype: int64

In [9]:
# Calculating the number of null values in each column of the dataset
df_null_values = df.isnull().sum()

# Displaying the count of null values in each column
print(df_null_values)


name          0
id            0
year        291
reclat     7315
reclong    7315
dtype: int64


In [10]:
# Given the nature of the data, using data imputation models for coordinates and years might affect data accuracy.
# Therefore, it's better to remove rows with missing values.

# Dropping rows with missing values from the DataFrame
df = df.dropna()

# Displaying the new shape of the DataFrame after removing rows with missing values
df.shape

(38223, 5)

In [11]:
# Removing rows where both "reclat" and "reclong" are equal to 0, as it's unlikely for meteorites to have landed at coordinates 
# indicating "the null island"


# Removing rows where both "reclat" and "reclong" are equal to 0
df = df[(df["reclat"] != 0) & (df["reclong"] != 0)]

# Displaying the new shape of the DataFrame after removing the specified rows
df.shape


(31813, 5)

In [12]:
# Displaying the number of unique values in each column of the cleaned DataFrame
df.nunique()

name       31813
id         31813
year         264
reclat     12629
reclong    14522
dtype: int64

In [13]:
# Displaying the data types of each column in the cleaned DataFrame
df.dtypes

name        object
id           int64
year       float64
reclat     float64
reclong    float64
dtype: object

In [14]:
# Converting the "year" column to integer type
df["year"] = df["year"].astype(int)


In [15]:
# Displaying descriptive statistics for the numerical columns in the DataFrame
df.describe()

Unnamed: 0,id,year,reclat,reclong
count,31813.0,31813.0,31813.0,31813.0
mean,20773.422972,1986.898061,-47.320727,73.28191
std,14983.047605,28.179797,46.921846,83.406886
min,1.0,860.0,-87.36667,-165.43333
25%,9202.0,1983.0,-79.68333,26.0
50%,18516.0,1991.0,-72.0,57.04363
75%,27323.0,2000.0,18.45333,159.39972
max,57455.0,2013.0,81.16667,354.47333


In [16]:
# Identifying outliers in the "reclat" and "reclong" columns based on the established criteria:
# Values are considered outliers if "reclat" is less than -90 or greater than 90, 
# or if "reclong" is less than -180 or greater than 180. These criteria are based on the valid range of latitude and longitude values on Earth.

# Identifying outliers where "reclat" is less than -90 or greater than 90,
# and "reclong" is less than -180 or greater than 180
outliers_df = df[(df["reclat"] < -90) | (df["reclat"] > 90) | (df["reclong"] < -180) | (df["reclong"] > 180)]

# Displaying the DataFrame containing outliers
print(outliers_df)


                   name     id  year   reclat    reclong
22946  Meridiani Planum  32789  2005 -1.94617  354.47333


In [17]:
# Removing the row corresponding to the meteorite in Meridiani Planum, as it is not relevant to our dataset since it belongs to the planet Mars.
df = df.drop(index=22946)

In [20]:
df = df.sort_values("year", ascending=True)

In [26]:
df.keys()

Index(['name', 'id', 'year', 'reclat', 'reclong'], dtype='object')

### Data Transoform

In [19]:
"""
# Export to tableu

df.to_csv("Meteorite_Cleaned.csv", index=False)


SyntaxError: incomplete input (1196814464.py, line 1)

In [24]:
df.sample(5)

Unnamed: 0,name,id,year,reclat,reclong
33224,Queen Alexandra Range 94420,20053,1994,-84.0,168.0
32074,Queen Alexandra Range 02100,18913,2002,-84.0,168.0
15824,Grove Mountains 054658,50655,2006,-72.99889,75.1872
31385,Pecora Escarpment 02013,18195,2002,-85.63333,-68.7
15919,Grove Mountains 99013,11410,2000,-73.08333,75.2
