<a href="https://colab.research.google.com/github/dnevius1/Aerospace-Engineering/blob/master/Lesson_2_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 2: Pandas Library

## Pandas Explained 🐼
Pandas is a Python library used for data manipulation and analysis. It provides data structures and functions that simplify working with structured data, such as tables or spreadsheets.

In [None]:
import pandas as pd

## Pandas Series 📊
Series is a one-dimensional labeled array capable of holding any data type (e.g., integers, floats, strings, etc.). It's similar to a column in a spreadsheet or a single list in Python.

In [None]:
import numpy as np
data = np.array([10, 20, 30, 40, 50])
series = pd.Series(data)
print("\nSeries from NumPy array:")
print(series)


Series from NumPy array:
0    10
1    20
2    30
3    40
4    50
dtype: int64


You can access pertinent attributes and methods as well for series:

In [None]:
# Attributes
print("Index:", series.index)
print("Values:", series.values)

# Methods
print("Sum of values:", series.sum())
print("Maximum value:", series.max())
print("Minimum value:", series.min())

Index: RangeIndex(start=0, stop=5, step=1)
Values: [10 20 30 40 50]
Sum of values: 150
Maximum value: 50
Minimum value: 10


## Pandas Dataframes 🗄️
A Pandas DataFrame is a two-dimensional labeled data structure with rows and columns, similar to a spreadsheet.

In [None]:
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Emma', 'Peter'],
        'Age': [30, 25, 35],
        'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print("DataFrame created from dictionary:")
print(df)

DataFrame created from dictionary:
    Name  Age      City
0   John   30  New York
1   Emma   25    London
2  Peter   35     Paris


You can access rows, columns, or individual elements using indexing, slicing, or label-based methods.

In [None]:
# Accessing columns
print("\nAccessing columns:")
print(df['Name'])  # Accessing a single column
print(df[['Name', 'Age']])  # Accessing multiple columns

# Accessing rows
print("\nAccessing rows:")
print(df.loc[1])  # Accessing a single row by label
print(df.iloc[0])  # Accessing a single row by index

# Accessing elements
print("\nAccessing elements:")
print(df.at[0, 'Age'])  # Accessing a single element by label
print(df.iat[1, 2])  # Accessing a single element by index


Accessing columns:
0     John
1     Emma
2    Peter
Name: Name, dtype: object
    Name  Age
0   John   30
1   Emma   25
2  Peter   35

Accessing rows:
Name      Emma
Age         25
City    London
Name: 1, dtype: object
Name        John
Age           30
City    New York
Name: 0, dtype: object

Accessing elements:
30
London


You can perform various manipulations on DataFrames, such as adding or removing columns, filtering rows, and applying functions.

In [None]:
# Adding a new column
df['Gender'] = ['Male', 'Female', 'Male']
print("\nDataFrame after adding a new column:")
print(df)

# Filtering rows based on a condition
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame:")
print(filtered_df)

# Applying a function to a column
df['Age'] = df['Age'].apply(lambda x: x + 1)
print("\nDataFrame after applying function to 'Age' column:")
print(df)


DataFrame after adding a new column:
    Name  Age      City  Gender
0   John   30  New York    Male
1   Emma   25    London  Female
2  Peter   35     Paris    Male

Filtered DataFrame:
    Name  Age      City Gender
0   John   30  New York   Male
2  Peter   35     Paris   Male

DataFrame after applying function to 'Age' column:
    Name  Age      City  Gender
0   John   31  New York    Male
1   Emma   26    London  Female
2  Peter   36     Paris    Male


### Exercise 1: Planetary Analysis with Pandas 🌏
You are part of a team working on analyzing planetary Earth data. Your task is to process and analyze the data using Pandas to extract insights.

#### Objectives:
- Create a Pandas Series representing distances of planets from the Sun (in million kilometers)
- Create a Pandas DataFrame representing characteristics of moons of the outer planets
- Analyze the data to find key information about planetary distances and moon characteristics

In [None]:
planet_distances = {
    'Mercury': 57.9,
    'Venus': 108.2,
    'Earth': 149.6,
    'Mars': 227.9,
    'Jupiter': 778.6,
    'Saturn': 1433.5,
    'Uranus': 2872.5,
    'Neptune': 4495.1,
    'Pluto': 5906.4
}

moon_data = {
    'Planet': ['Jupiter', 'Jupiter', 'Saturn', 'Saturn', 'Uranus', 'Neptune'],
    'Moon': ['Io', 'Ganymede', 'Titan', 'Rhea', 'Titania', 'Triton'],
    'Diameter (km)': [3642, 5262, 5150, 1528, 1578, 2707],
    'Orbital Period (days)': [1.77, 7.15, 15.95, 4.52, 8.71, 5.88]
}

In [None]:
import pandas as pd

planet_distances = pd.Series(planet_distances)
moon_characteristics = pd.DataFrame(moon_data)
print(moon_characteristics)

    Planet      Moon  Diameter (km)  Orbital Period (days)
0  Jupiter        Io           3642                   1.77
1  Jupiter  Ganymede           5262                   7.15
2   Saturn     Titan           5150                  15.95
3   Saturn      Rhea           1528                   4.52
4   Uranus   Titania           1578                   8.71
5  Neptune    Triton           2707                   5.88


In [None]:
print("Average distance of planets from the sun: ")
print(planet_distances.mean())

print("\nNumber of moons for each outer planet: ")
outer_planets = ['Jupiter', 'Saturn', 'Uranus', 'Neptune']
for planet in outer_planets:
  num_moons = moon_characteristics[moon_characteristics['Planet'] == planet].shape[0]
  print(f"{planet}: {num_moons} ")

print("\n")

for planet in outer_planets:
  largest_moon = moon_characteristics[moon_characteristics['Planet'] == planet].sort_values(by='Diameter (km)', ascending=False).iloc[0]
  print(f"{planet}:{largest_moon['Moon']} ({largest_moon['Diameter (km)']} km)")


Average distance of planets from the sun: 
1781.0777777777778

Number of moons for each outer planet: 
Jupiter: 2 
Saturn: 2 
Uranus: 1 
Neptune: 1 


Jupiter:Ganymede (5262 km)
Saturn:Titan (5150 km)
Uranus:Titania (1578 km)
Neptune:Triton (2707 km)


## Working with Real Life Data ☄️
Pandas allows us to work with a wide range of file formats from external sources. The code below allows us to import data via CSV and display information about it.

The below downloads a CSV file of the most recent SpaceX Missions using a URL:

In [1]:
space_x_missions_csv = "https://raw.githubusercontent.com/BriantOliveira/SpaceX-Dataset/master/dataset/SpaceX-Missions.csv"
df = pd.read_csv(space_x_missions_csv)

NameError: name 'pd' is not defined

You can gauge more information about a particular dataset using the following methods:

In [None]:
# Display the first few rows of the DataFrame
print(df.head())

# Check basic information about the DataFrame
print(df.info())

# Summary statistics of numerical columns
print(df.describe())

  Flight Number    Launch Date Launch Time       Launch Site Vehicle Type  \
0          F1-1  24 March 2006       22:30  Marshall Islands     Falcon 1   
1          F1-2  21 March 2007       01:10  Marshall Islands     Falcon 1   
2          F1-3  3 August 2008       03:34  Marshall Islands     Falcon 1   
3          F1-3  3 August 2008       03:34  Marshall Islands     Falcon 1   
4          F1-3  3 August 2008       03:34  Marshall Islands     Falcon 1   

         Payload Name             Payload Type  Payload Mass (kg)  \
0         FalconSAT-2       Research Satellite               19.5   
1             DemoSat                      NaN                NaN   
2         Trailblazer  Communication Satellite                NaN   
3  PRESat, NanoSail-D      Research Satellites                8.0   
4           Explorers            Human Remains                NaN   

  Payload Orbit Customer Name Customer Type Customer Country Mission Outcome  \
0           NaN         DARPA    Governmen

## Core Methods with Pandas ⚙️

### Mathematical & Statistical Methods
You can apply basic statistical and mathematical  functions such as `mean`, `median`, `sum`, `std` (standard deviation), `var` (variance), and more!

In [None]:
mean_payload_mass = df['Payload Mass (kg)'].mean()
print(mean_payload_mass)

median_payload_mass = df['Payload Mass (kg)'].median()
print(median_payload_mass)

sum_payload_mass = df['Payload Mass (kg)'].sum()
print(sum_payload_mass)

std_payload_mass = df['Payload Mass (kg)'].std()
print(std_payload_mass)

var_payload_mass = df['Payload Mass (kg)'].var()
print(var_payload_mass)

2739.7727272727275
2490.0
90412.5
2131.502972856349
4543304.923295454


### Explorative Methods
Explorative methods in Pandas allows you to learn more about the dataset at hand. Some of these methods include `describe` and `unique`.

In [None]:
# Print all unique launch sites in the dataset
pd.unique(df["Launch Site"])

# Describe the statistical values of all the numerical columns within the dataset
df.describe(include=[np.number])

Unnamed: 0,Payload Mass (kg)
count,33.0
mean,2739.772727
std,2131.502973
min,8.0
25%,570.0
50%,2490.0
75%,4159.0
max,9600.0


### Data Selection and Filtering
Pandas offers a versatile tool kit for faceting your dataset.

In [None]:
# Select specific columns
selected_columns = df[['Payload Name', 'Payload Orbit']]
print(selected_columns)

# Filter rows based on condition (if payload mass exceeds 3000)
filtered_payloads = df[df['Payload Mass (kg)'] > 3000]
print(filtered_payloads)

                            Payload Name                 Payload Orbit
0                            FalconSAT-2                           NaN
1                                DemoSat                           NaN
2                            Trailblazer                           NaN
3                     PRESat, NanoSail-D                           NaN
4                              Explorers                           NaN
5                       RatSat (DemoSat)               Low Earth Orbit
6                               RazakSAT               Low Earth Orbit
7   Dragon Spacecraft Qualification Unit               Low Earth Orbit
8                 SpaceX CRS (Dragon C1)               Low Earth Orbit
9                SpaceX CRS (Dragon C2+)               Low Earth Orbit
10                          SpaceX CRS-1               Low Earth Orbit
11                           Orbcomm-OG2               Low Earth Orbit
12                          SpaceX CRS-2               Low Earth Orbit
13    

### Grouping Data
Another cool feature in Panadas is `groupby` which allows you to categorize data and perform analysis on it.

In [None]:
# Grouping payload by launch site
launch_groups = df.groupby("Launch Site")

# Obtaining the mean value for each group
launch_groups['Payload Mass (kg)'].mean()

Launch Site
Cape Canaveral AFS LC-40       3075.880
Kennedy Space Center LC-39A    2490.000
Marshall Islands                 93.125
Vandenberg AFB SLC-4E          3551.000
Name: Payload Mass (kg), dtype: float64

### Exercise 2: Deeper Dive into SpaceX Launch Dataset 🚀
Let's conduct more research analysis on the SpaceX launch dataset!

#### Objectives:
- Reinitialize the dataframe and name it `launches_dataset`
- View the first values of the dataset using the `head()` method
- Print out the "Customer Country" column
- Print out all the unique customer countries
- Make a variable named `launches_by_country` and make it grouped by the "Customer Country" column
- Print out all the launches with a payload mass ("Payload Mass (kg)") of less than 4000
- Print the median payload mass for `United States`

In [6]:
import pandas as pd

space_x_missions_csv = "https://raw.githubusercontent.com/BriantOliveira/SpaceX-Dataset/master/dataset/SpaceX-Missions.csv"
df = pd.read_csv(space_x_missions_csv)

print(df["Customer Country"])

print(pd.unique(df["Customer Country"]))

launches_by_country = df.groupby("Customer Country")

print(df[df["Payload Mass (kg)"]< 4000])

print(df["Payload Mass (kg)"].mean())

0       United States
1       United States
2       United States
3       United States
4       United States
5                 NaN
6            Malaysia
7                 NaN
8       United States
9       United States
10      United States
11      United States
12      United States
13             Canada
14         Luxembourg
15           Thailand
16      United States
17      United States
18              China
19              China
20      United States
21      United States
22      United States
23            Bermuda
24    France (Mexico)
25      United States
26       Turkmenistan
27      United States
28      United States
29      United States
30         Luxembourg
31      United States
32              Japan
33           Thailand
34            Bermuda
35    France (Mexico)
36      United States
37              Japan
38             Israel
39      United States
40      United States
Name: Customer Country, dtype: object
['United States' nan 'Malaysia' 'Canada' 'Luxembourg' 'Thail

Good job! You made it to the end of this lesson!
Next, we will be covering a visualization and plotting library called Matplotlib.