# The Basics

In this notebook, we'll cover how to: 
1. Import your data into a dataframe
2. Preform any basic transformations including

    -Pivoting, Melting,Grouping and Sorting


In [3]:
import pandas as pd
import numpy as np
import os

In [4]:
filepath = os.path.join("..","Space Data")
[f for f in os.listdir(filepath)]

['exoplanets.csv',
 'SolarSystemAndEarthquakes.csv',
 'Fireball Reports.csv',
 'UFO_Sightings_Global.csv',
 'astronauts.csv',
 'Meteorite_Landings.csv']

In [5]:
astro = pd.read_csv(os.path.join(filepath,"astronauts.csv"))

### 1. Getting to know your data

In [6]:
#.head(n), and .tail(n) show you the top n rows (default is 5) in your dataset
astro.head()

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Rank,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,5/17/1967,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",,
1,Loren W. Acton,,,Retired,3/7/1936,"Lewiston, MT",Male,Montana State University; University of Colorado,Engineering Physics,Solar Physics,,,1,190,0,0.0,STS 51-F (Challenger),,
2,James C. Adamson,1984.0,10.0,Retired,3/3/1946,"Warsaw, NY",Male,US Military Academy; Princeton University,Engineering,Aerospace Engineering,Colonel,US Army (Retired),2,334,0,0.0,"STS-28 (Columbia), STS-43 (Atlantis)",,
3,Thomas D. Akers,1987.0,12.0,Retired,5/20/1951,"St. Louis, MO",Male,University of Missouri-Rolla,Applied Mathematics,Applied Mathematics,Colonel,US Air Force (Retired),4,814,4,29.0,"STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...",,
4,Buzz Aldrin,1963.0,3.0,Retired,1/20/1930,"Montclair, NJ",Male,US Military Academy; MIT,Mechanical Engineering,Astronautics,Colonel,US Air Force (Retired),2,289,2,8.0,"Gemini 12, Apollo 11",,


In [7]:
#.shape returns (#rows,#cols)
astro.shape

(357, 19)

In [8]:
#Get data types for your columns
astro.dtypes

Name                    object
Year                   float64
Group                  float64
Status                  object
Birth Date              object
Birth Place             object
Gender                  object
Alma Mater              object
Undergraduate Major     object
Graduate Major          object
Military Rank           object
Military Branch         object
Space Flights            int64
Space Flight (hr)        int64
Space Walks              int64
Space Walks (hr)       float64
Missions                object
Death Date              object
Death Mission           object
dtype: object

In [9]:
#Returns metadata for numerical fields
astro.describe()

Unnamed: 0,Year,Group,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr)
count,330.0,330.0,357.0,357.0,357.0,357.0
mean,1985.106061,11.409091,2.364146,1249.266106,1.246499,7.707283
std,13.216147,5.149962,1.4287,1896.759857,2.056989,13.367973
min,1959.0,1.0,0.0,0.0,0.0,0.0
25%,1978.0,8.0,1.0,289.0,0.0,0.0
50%,1987.0,12.0,2.0,590.0,0.0,0.0
75%,1996.0,16.0,3.0,1045.0,2.0,12.0
max,2009.0,20.0,7.0,12818.0,10.0,67.0


In [12]:
#Shows you the number of nulls per column
astro.isnull().sum()

Name                     0
Year                    27
Group                   27
Status                   0
Birth Date               0
Birth Place              0
Gender                   0
Alma Mater               1
Undergraduate Major     22
Graduate Major          59
Military Rank          150
Military Branch        146
Space Flights            0
Space Flight (hr)        0
Space Walks              0
Space Walks (hr)         0
Missions                23
Death Date             305
Death Mission          341
dtype: int64

### 2.Transforming you data (Cleaning)
Cleaning involves getting your data into a format you can use later. This includes: 

- Getting data types correct

- Getting data into a "tidy" format

In [55]:
#1. Make date fields (Year, Birth Date, and Death Date) into dates

In [56]:
astro['Year']=pd.to_datetime(astro['Year']).dt.year

In [57]:
astro['Birth Date']=pd.to_datetime(astro['Birth Date']).dt.date

In [58]:
astro['Death Date']= pd.to_datetime(astro['Death Date']).dt.date

In [59]:
#Sometimes the Military Rank as (Retired). Let's separate that out in case we just want to compare branches
astro['Retired_Military']=np.where(astro['Military Branch'].str.contains("Retired"),1,0)
astro['Military Branch']= [x.split("(")[0].strip() for x in astro['Military Branch'].astype(str)]

### 3. Feature Engineering (adding columns)
Feature engineering just means adding columns you think may be relavent or interesting. 

In [60]:
#Let's count the number of missions
astro['Missions']=astro['Missions'].astype(str)
astro['MissionCount']= [len(x.split(",")) for x in astro['Missions']]

In [61]:
astro.tail()


Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,...,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission,Retired_Military,MissionCount
352,David A. Wolf,1970.0,13.0,Retired,1956-08-23,"Indianapolis, IN",Male,Purdue University; Indiana University,Electrical Engineering,Medicine,...,,3,4044,7,41.0,STS-58 (Columbia). STS-86/89 (Atlantis/Endeavo...,NaT,,1,3
353,Neil W. Woodward III,1970.0,17.0,Retired,1962-07-26,"Chicago, IL",Male,MIT; University of Texas-Austin; George Washin...,Physics,Physics; Business Management,...,US Navy,0,0,0,0.0,,NaT,,0,1
354,Alfred M. Worden,1970.0,5.0,Retired,1932-02-07,"Jackson, MI",Male,US Military Academy; University of Michigan,Military Science,Aeronautical & Astronautical Engineering,...,US Air Force,1,295,1,0.5,Apollo 15,NaT,,1,1
355,John W. Young,1970.0,2.0,Retired,1930-09-24,"San Francisco, CA",Male,Georgia Institute of Technology,Aeronautical Engineering,,...,US Navy,6,835,3,20.0,"Gemini 3, Gemini 10, Apollo 10, Apollo 16, STS...",NaT,,1,6
356,George D. Zamka,1970.0,17.0,Retired,1962-06-29,"Jersey City, NJ",Male,US Naval Academy; Florida Institute of Technology,Mathematics,Engineering Management,...,US Marine Corps,2,692,0,0.0,"STS-120 (Discovery), STS-130 (Endeavor)",NaT,,1,2


## CHALLENGE: Can you create a column that is the number of universities attended per astronaut?

In [62]:
#Type code here...

### 4. Transforming our data

Now we can start re-arranging our data to get some insights!

In [63]:
#Total Space Flights by Gender
astro.groupby('Gender')['Space Flights'].sum()

Gender
Female    120
Male      724
Name: Space Flights, dtype: int64

In [64]:
#Per-person seems more equal
astro.groupby('Gender')['Space Flights'].mean()

Gender
Female    2.400000
Male      2.358306
Name: Space Flights, dtype: float64

## CHALLENGE: Can you find the most common Undergraduate Major for Astronauts? Does that differ by Gender?

In [65]:
#Type code here...

In [66]:
#Pivot tables: 
astro.pivot_table(index = ['Gender','Military Branch'],values = ['Space Flight (hr)'],aggfunc = [np.mean,len])

Unnamed: 0_level_0,Unnamed: 1_level_0,mean,len
Unnamed: 0_level_1,Unnamed: 1_level_1,Space Flight (hr),Space Flight (hr)
Gender,Military Branch,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,US Air Force,2238.2,5
Female,US Army,999.0,1
Female,US Naval Reserves,711.0,1
Female,US Navy,1794.5,6
Female,,1491.810811,37
Male,US Air Force,963.519481,77
Male,US Air Force Reserves,1140.2,5
Male,US Army,2758.8125,16
Male,US Coast Guard,2411.5,2
Male,US Marine Corps,651.35,20


In [71]:
#Sorting:
astro[['Name','MissionCount']].sort_values(by = 'MissionCount',ascending = False).head(10)

Unnamed: 0,Name,MissionCount
279,Jerry L. Ross,7
65,Franklin R. Chang-Diaz,7
111,C. Michael Foale,6
355,John W. Young,6
339,James D. Wetherbee,6
237,Story Musgrave,6
41,Curtis L. Brown Jr.,6
144,James D. Halsell Jr.,5
27,John E. Blaha,5
315,Norman E. Thagard,5


## CHALLENGE: What's the most popular first name for astronauts?

In [72]:
#Type code here...

## CHALLENGE: Which university has hosted the most astronauts?

In [75]:
#Type code here...