# **Process Documentation**

### **Overview**

- The goal is to create a smaller archive from the CORGIS Dataset Project
- This archive will contain this information - "Name", "Gender", "Birth Year", "Nationality", "Mission Count", "Mission Name", "Mission Role", "Mission Year", and "Mission Duration"
- Python and Pandas will be used to compile this archive
- You do not need experience in Python or Pandas

### **Procedure**

1. Getting Started
2. Clean and Filter Data
3. Export Data

### **Getting Started**

1. Download Files
  - Create a folder in Google Drive where all the files will be stored
  > Mine is named - "Astronauts Data Archive"
  - Download this file and move it to the folder
  > Mine is named - "processdocumentation.ipynb"
  - Donload the data through this link - [Data](https://drive.google.com/file/d/1ilk_9CeUsVTrIkWlVzfBGGZTG44wjS6i/view?usp=share_link) - and move it to the folder
  > Mine is named - "astronauts.csv"

2. Import Pandas
  - Pandas is a module from Python that will read and filter data
  - Import Pandas from Python through this code
  > Import Pandas as `pd` to make it easy to call functions


In [1]:
import pandas as pd

3. Read Data
  - Read the data as a Comma Seperated Values file using Pandas
  > Use the function - `read_csv()` - from Pandas to read the path of the file
  - Save the data as a Data Frame in Pandas
  > Mine is saved as - "data"

In [2]:
data = pd.read_csv("/content/drive/MyDrive/Classes/ENGL 105/Unit 3/Astronaut Mission Data/raw_data.csv")

### **Clean and Filter Data**

1. Select Columns of Interest
  - Save - "Name", "Gender", "Birth Year", "Nationality", "Mission Count", "Mission Role", "Mission Year", "Mission Name", and "Mission Duration" - as a list
  > Mine is saved as - "selected_columns"

In [3]:
selected_columns = ["Profile.Name", "Profile.Gender", "Profile.Birth Year", "Profile.Nationality", "Profile.Lifetime Statistics.Mission count", "Mission.Role", "Mission.Year", "Mission.Name", "Mission.Durations.Mission duration"]

2. Filter the data for these columns
  - Save the columns for the data - "Name", "Gender", "Birth Year", "Nationality", "Mission Count", "Mission Role", "Mission Year", "Mission Name", and "Mission Duration" - as a Data Frame in Pandas
  > Mine is saved as - "filtered_data"

In [4]:
filtered_data = data[selected_columns]

3. Change Names and Order
  - Change the name of each column to this - "Name", "Gender", "Birth Year", "Nationality", "Mission Count", "Mission Role", "Mission Year", "Mission Name", and "Mission Duration"
  > Use the function - `rename()` - from Pandas to change the names
  - Change the order of the column to this - "Name", "Gender", "Birth Year", "Nationality", "Mission Count", "Mission Name", "Mission Role", "Mission Year", and "Mission Duration"
  > Change the order inside of the function - `rename()` - to do this
  - Save the data as a Data Frame in Pandas
  > Mine is saved as - "cleaned_data"

In [5]:
cleaned_data = filtered_data.rename(columns={"Profile.Name": "Name", "Profile.Gender": "Gender", "Profile.Birth Year": "Birth Year", "Profile.Nationality": "Nationality", "Profile.Lifetime Statistics.Mission count": "Mission Count", "Mission.Name": "Mission Name", "Mission.Role": "Mission Role", "Mission.Year": "Mission Year", "Mission.Durations.Mission duration": "Mission Duration"})

### **Export Data**

1. Save the data as a Data Frame in Pandas
> Use the function - `to_csv()` - from Pandas to save the data as a Comma Seperated Values file

In [6]:
new_data = cleaned_data.to_csv('final_data.csv', index=False)

2. View some of the data
> Use the function - `read_csv()` - from Pandas to read the path of the file and the function - `head()` - from Pandas to view some of the data


In [None]:
new_data = pd.read_csv("/content/final_data.csv")
new_data.head()

Unnamed: 0,Name,Gender,Birth Year,Nationality,Mission Count,Mission Role,Mission Year,Mission Name,Mission Duration
0,"Gagarin, Yuri",male,1934,U.S.S.R/Russia,1,pilot,1961,Vostok 1,1.77
1,"Titov, Gherman",male,1935,U.S.S.R/Russia,1,pilot,1961,Vostok 2,25.0
2,"Glenn, John H., Jr.",male,1921,U.S.,2,pilot,1962,MA-6,5.0
3,"Glenn, John H., Jr.",male,1921,U.S.,2,PSP,1998,STS-95,213.0
4,"Carpenter, M. Scott",male,1925,U.S.,1,Pilot,1962,Mercury-Atlas 7,5.0
