# List of active United States Military Aircraft

This is a project to visualize a list of **active** military aircraft that are used by the United States military. This will not include aircraft that is no longer in-service.

### Contents:
- Business Request & User Stories
- Data Cleansing & Transformation (Python)
- Data Model
- Active US Military Aircraft Dashboard

## Business Request & User Stories 

The business request for this data analyst project was an executive military aircraft report for enthusiasts. Based on the request that was made from the business (aka me), the following user stories were defined to fulfill delivery and ensure that acceptance criteria was maintained throughout the project.


| #	As a (role) | I want (request / demand) | So that I (user value) | Acceptance Criteria
| :- | :- | :- | :- 
| 1. Military dude | To get a dashboard overview of active US military aircraft | Can follow better what aircraft is in use | A Power BI dashboard which updates data upon refresh
| 2. Veteran | A detailed overview of Aircraft per Branch | Can follow what aircraft is in use by each branch | A Power BI dashboard which allows me to filter data for each branch
| 3. Sales Representative | A detailed overview of Aircraft per Manufacturer | Can follow up which Manufacturer and aircraft is most in use | A Power BI dashboard which allows me to filter data for each Manufacturer
| 4. Time Traveler | A dashboard overview of Aircraft introduction dates | Follow aircraft in service over date introduced | A Power Bi dashboard with graphs to filter by dates ascending

## Data Cleansing & Transformation (Python)

To create the necessary data model for doing analysis and fulfilling the business needs defined in the user stories, we will be using pandas, requests, and beautifulsoup to webscrape a wikipedia page.

Source: https://en.wikipedia.org/wiki/List_of_active_United_States_military_aircraft

In [1]:
import pandas as pd
import requests
import numpy as np
import re
from bs4 import BeautifulSoup

In [2]:
page = requests.get('https://en.wikipedia.org/wiki/List_of_active_United_States_military_aircraft').text
soup = BeautifulSoup(page, 'html.parser')
table = soup.find_all('table', class_="wikitable sortable")

df = pd.read_html(str(table))
df = pd.concat(df)
print(df.shape)

(162, 14)


### Reading data in pandas

In [3]:
# Read headers
df.columns

Index(['Aircraft', 'Manufacturer', 'Origin', 'Propulsion', 'Role', 'Control',
       'Introduced', 'In service[1][2][3]', 'Total', 'Notes', 'Type',
       'In service[15]', 'In service[22]', 'In service[23][24]'],
      dtype='object')

In [4]:
# !pip install -U klib

import klib

# Clean columns using klib library
df = klib.clean_column_names(df) # cleans and standardizes column names
df = klib.convert_datatypes(df) # converts existing to more efficient dtypes

df.columns

Index(['aircraft', 'manufacturer', 'origin', 'propulsion', 'role', 'control',
       'introduced', 'in_service[1][2][3]', 'total', 'notes', 'type',
       'in_service[15]', 'in_service[22]', 'in_service[23][24]'],
      dtype='object')

### Make changes to the data

In [5]:
# Assign values from "type" column if "aircraft" is blank
df.aircraft = df.aircraft.replace('', pd.NA).fillna(df.type)
df.tail()

Unnamed: 0,aircraft,manufacturer,origin,propulsion,role,control,introduced,in_service[1][2][3],total,notes,type,in_service[15],in_service[22],in_service[23][24]
25,MH-60 Seahawk,Sikorsky,USA,Helicopter,Anti-submarine warfare helicopterMulti-mission,Manned,1984,,,,MH-60 Seahawk,508,,
26,TH-57B/C Sea Ranger,Bell,USA,Helicopter,Trainer,Manned,1984,,,To be replaced by the AgustaWestland TH-57A Th...,TH-57B/C Sea Ranger,114,,
27,ScanEagle,Boeing,USA,Propeller,,Unmanned,2005,,,,ScanEagle,,,
28,RQ-21A Blackjack,Boeing Insitu,USA,Propeller,,Unmanned,2014,,,,RQ-21A Blackjack,,,
29,MQ-8B Fire Scout,Northrop Grumman,USA,Helicopter,Patrol,Unmanned,2009,,,Helicopter. 96 planned.[33],MQ-8B Fire Scout,27[27],,


In [6]:
# Review dtypes before working with re
df.dtypes

aircraft                 string
manufacturer             string
origin                   string
propulsion             category
role                     string
control                category
introduced               object
in_service[1][2][3]      string
total                    object
notes                    string
type                     string
in_service[15]           string
in_service[22]          Float32
in_service[23][24]       string
dtype: object

In [7]:
# We see that one column is still an object so we need to convert 'in_service[22]' column into a string
df['in_service[22]'] = df['in_service[22]'].astype('string')

df.dtypes

aircraft                 string
manufacturer             string
origin                   string
propulsion             category
role                     string
control                category
introduced               object
in_service[1][2][3]      string
total                    object
notes                    string
type                     string
in_service[15]           string
in_service[22]           string
in_service[23][24]       string
dtype: object

In [None]:
# Replace all NaN values with blank
df = df.replace(np.nan, '', regex=True)

In [9]:
# Combine in_service[x] columns into one
cols = ['in_service[1][2][3]', 'in_service[15]', 'in_service[22]', 'in_service[23][24]']
df['in_service'] = df[cols].apply(lambda row: ''.join(row.values.astype(str)), axis=1)

# Drop columns
df = df.drop(columns=cols)
df.drop('type', axis=1, inplace=True) # This was merged with aircraft column

df.head()

Unnamed: 0,aircraft,manufacturer,origin,propulsion,role,control,introduced,total,notes,in_service
0,A-10C Thunderbolt II,Fairchild Republic,USA,Jet,CAS / Attack,Manned,1977,,,281[4]
1,AC-130J Ghostrider,Lockheed,USA,Propeller,CAS / Attack,Manned,2017,,Replacement for the AC-130U.,6
2,AC-130W Stinger II,Lockheed,USA,Propeller,CAS / Attack,Manned,1966,,Currently being replaced by the AC-130J.,20
3,B-1B Lancer,Rockwell International,USA,Jet,Bomber,Manned,1986,,Employs variable-sweep wing design. To be repl...,45[5]
4,B-2A Spirit,Northrop Grumman,USA,Jet,Bomber,Manned,1997,,Stealth capable aircraft. To be replaced by th...,19
...,...,...,...,...,...,...,...,...,...,...
25,MH-60 Seahawk,Sikorsky,USA,Helicopter,Anti-submarine warfare helicopterMulti-mission,Manned,1984,,,508
26,TH-57B/C Sea Ranger,Bell,USA,Helicopter,Trainer,Manned,1984,,To be replaced by the AgustaWestland TH-57A Th...,114
27,ScanEagle,Boeing,USA,Propeller,,Unmanned,2005,,,
28,RQ-21A Blackjack,Boeing Insitu,USA,Propeller,,Unmanned,2014,,,


### Filtering data

In [None]:
# Example: Find Aircraft made by Lockheed
lockheed = df.loc[df['manufacturer'].str.contains("Lockheed")]
lockheed

### Sorting data by Date

In [None]:
# df.sort_values(['introduced'], ascending=[1], inplace=True)

### Write the dataframe to parquet format.

After preprocessing data using Python, we can convert the dataframe to a format that can be loaded into a visualization tool (e.g., Microsoft PowerBI).

- Q. Why parquet format and not csv/excel?
- A. Parquet retains dtypes; csv/excel do not



In [None]:
# df.to_parquet('aircraft.parquet', engine='fastparquet') # https://fastparquet.readthedocs.io/en/latest/
