## Unveiling the Stars: An Exploratory Study on NASA Astronauts

We are tasked with exploring the dataset about the  demographics, careers, and accomplishments of NASA astronauts.

Using the latest tools and techniques in data analysis, you dissect the dataset, analysing astronauts' backgrounds, experiences, and missions. We will uncover patterns and trends that reveal the diverse tapestry ofabout NASA's astronaut corps, from their educational journeys to their military service and their remarkable achievements in space.


## Module 1
### Task 1: Exploring NASA's Data Universe.
Our analysis of NASA's dataset is a mission to unveil profound insights within the realm of space exploration. Beyond mere data analysis, it's a journey to harness the knowledge hidden in the stars. Through this exploration, we aim to uncover patterns that will guide future missions, enhancing NASA's cosmic endeavors. Every data point is a piece of the cosmic puzzle, fueling innovation and inspiring generations to reach for the stars. In these numbers and statistics, we find the roadmap to the next frontier of human exploration.

In [4]:
#--- Import Pandas ---
import pandas as pd
# remove any future warning 
import warnings
warnings.filterwarnings("ignore")
#--- Read in dataset ----
df = pd.read_csv("nasa.csv")

#--- Inspect data ---
df.head()

Unnamed: 0,Name,Year,GroupNum,Status,Birth_Date,Birth_Place,Gender,Alma_Mater,Undergraduate_Major,Graduate_Major,Military_Rank,Military_Branch,Space_Flights,Space_Flight_hr,Space_Walks,Space_Walks_hr,Missions,Death_Date,Death_Mission
0,Alan B. Shepard Jr.,1959,1,Deceased,18-11-1923,"East Derry, NH",Male,US Naval Academy,Naval Sciences,Naval Science,Rear Admiral,US Navy (Retired),2,216,2,9.0,"Mercury 3, Apollo 14",21-07-1998,Natural causes
1,Alan G. Poindexter,1998,17,Deceased,05-11-1961,"Pasadena, CA",Male,Georgia Institute of Technology; US Naval Post...,Aerospace Engineering,Aeronautical Engineering,Captain,US Navy,2,669,0,0.0,"STS-122 (Atlantis), STS-131 (Discovery)",01-07-2012,Personal watercraft accident
2,Alan L. Bean,1963,3,Deceased,15-03-1932,"Wheeler, TX",Male,University of Texas,Aeronautical Engineering,Aeronautical Engineering,Captain,US Navy (Retired),2,1671,3,10.0,"Apollo 12, Skylab 3",26-05-2018,Natural causes
3,Albert Sacco Jr.,1963,3,Retired,03-05-1949,"Boston, MA",Male,Northeastern University; MIT,Chemical Engineering,Chemical Engineering,Captain,US Navy (Retired),1,381,0,0.0,STS-73 (Columbia),,
4,Alfred M. Worden,1966,5,Retired,07-02-1932,"Jackson, MI",Male,US Military Academy; University of Michigan,Military Science,Aeronautical & Astronautical Engineering,Colonel,US Air Force (Retired),1,295,1,0.5,Apollo 15,,


### Task 2: Exploring Data Completeness.

In this step our task is to count all null values to know the extent of our data completeness

In [5]:
null_values = df.isnull().sum()

#--- Inspect data ---
print(df.shape)
null_values

(357, 19)


Name                     0
Year                     0
GroupNum                 0
Status                   0
Birth_Date               0
Birth_Place              0
Gender                   0
Alma_Mater               1
Undergraduate_Major      0
Graduate_Major           0
Military_Rank            0
Military_Branch          0
Space_Flights            0
Space_Flight_hr          0
Space_Walks              0
Space_Walks_hr           0
Missions                 0
Death_Date             303
Death_Mission          309
dtype: int64

### Task 3: Data Refinement for NASA Astronaut Data.
In our ongoing journey through the NASA astronaut dataset, we'll begin our data cleaning process by removing missing values in the Alma_Mater column. In addition, we filter rows where Death_Mission is null and 'Death_Date' not null inorder to idenitfy astronaunts that are deceased but death date are not mentioned.Converting the Birth_Date and the Death_date to datetime inorder to make our analysis later easier.

In [6]:
# dropping null values in alma_mater
df["Alma_Mater"].dropna(inplace=True)

# filtering values where death_mission is null and death_date is not null
incorrect_df = df[(df["Death_Mission"].isnull()) & (df["Death_Date"].notna()) ].index
df.drop(incorrect_df,inplace=True)

# converting birth date and death date to datetime
dates_to_change = ["Birth_Date","Death_Date"]
df[dates_to_change] = df[dates_to_change].apply(pd.to_datetime)
#--- Inspect data ---
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 351 entries, 0 to 356
Data columns (total 19 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Name                 351 non-null    object        
 1   Year                 351 non-null    int64         
 2   GroupNum             351 non-null    int64         
 3   Status               351 non-null    object        
 4   Birth_Date           351 non-null    datetime64[ns]
 5   Birth_Place          351 non-null    object        
 6   Gender               351 non-null    object        
 7   Alma_Mater           350 non-null    object        
 8   Undergraduate_Major  351 non-null    object        
 9   Graduate_Major       351 non-null    object        
 10  Military_Rank        351 non-null    object        
 11  Military_Branch      351 non-null    object        
 12  Space_Flights        351 non-null    int64         
 13  Space_Flight_hr      351 non-null    int

### Task 4: Preparing Data for SQL Analysis.
In the midst of our data journey through the NASA astronaut dataset, we've arrived at a point where our focus shifts to data export and preparation for a new phase in your analysis.

The destination is  a CSV file named 'astronauts.csv' that will serve as the foundation for your SQL exploration. This export step ensures that the data we've curated and cleaned is ready to be loaded into a relational database, where we can gain deeper insights.

In [7]:
#export the cleaned data
df.to_csv("cleaned_nasa.csv",index=False)


### Task 5: Data Download, Import, and Database Connection.

In [8]:
# -- Load the sql extention ----
%load_ext sql

# --- Load your mysql db using credentials from the "DB" area ---
%sql mysql+pymysql://root:password@localhost/nasa_data

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
