# **ADD HERE THE NOTEBOOK NAME**

## Objectives

* Write here your notebook objective, for example, "Fetch data from Kaggle and save as raw data", or "engineer features for modelling"

## Inputs

* Write here which data or information you need to run the notebook

## Outputs

* Write here which files, code or artefacts you generate by the end of the notebook

## Additional Comments


* In case you have any additional comments that don't fit in the previous bullets, please state them here.


---

# Install python packages in the notebooks

<span style="color:red;">IMPORTANT!!! Change "ml-template-forked" to the name that you have given your GitHub/GitPod Workspace.</span>

In [1]:
%pip install -r /workspace/manned-unmanned-airplane-classifer/requirements.txt

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory.  

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [2]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/manned-unmanned-airplane-classifer/jupyter_notebooks'

We want to make the parent of the current directory the new current directory.
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [3]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [4]:
current_dir = os.getcwd()
current_dir

'/workspace/manned-unmanned-airplane-classifer'

---

---

## Table of Content

- [Section 1](#section-1)
- [Section 2](#section-2)
- [Save files to workspace](#save-files-to-workspace)


---

# Biplanes vs Monoplanes from 1930s to early 1940s

In [None]:
# 01_data_preprocessing.ipynb

# --- Step 1: Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# --- Step 2: Load Data
data = {
    'Name': [
        'Bristol F.2', 'Sopwith Camel', 'SPAD S.XIII', 'Curtiss JN-4', 'Albatros D.V',
        'Nieuport 17', 'Fokker Dr.I', 'Airco DH.2', 'Rumpler C.I', 'Morane-Saulnier L',
        'P-51 Mustang', 'F-86 Sabre', 'MiG-15', 'Spitfire Mk IX', 'Messerschmitt Bf 109',
        'Hawker Hurricane', 'Yak-3', 'F4U Corsair', 'P-38 Lightning', 'Focke-Wulf Fw 190'
    ],
    'Manufacturer': [
        'Bristol', 'Sopwith', 'SPAD', 'Curtiss', 'Albatros',
        'Nieuport', 'Fokker', 'Airco', 'Rumpler', 'Morane-Saulnier',
        'North American', 'North American', 'Mikoyan-Gurevich', 'Supermarine', 'Messerschmitt',
        'Hawker', 'Yakovlev', 'Vought', 'Lockheed', 'Focke-Wulf'
    ],
    'FirstFlight': [
        1916, 1917, 1917, 1915, 1917,
        1916, 1917, 1915, 1915, 1915,
        1940, 1947, 1947, 1942, 1937,
        1935, 1944, 1940, 1939, 1939
    ],
    'Wingspan': [
        9.8, 10.0, 9.7, 11.4, 9.4, 7.9, 8.9, 9.5, 13.9, 9.8,
        11.3, 9.0, 12.2, 9.8, 10.6, 11.0, 10.7, 11.0, 10.5, 10.0
    ],
    'MaxSpeed': [
        414, 430, 441, 295, 195, 183, 175, 370, 222, 210,
        470, 525, 547, 470, 500, 460, 390, 510, 520, 592
    ],
    'Weight': [
        2300, 2200, 2300, 2800, 1700, 670, 830, 1650, 3500, 1220,
        2200, 2000, 2950, 2400, 2450, 2350, 2000, 2600, 2800, 2700
    ],
    'Label': [
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1
    ],
    'Notes': [
        '', 'Used in WW1', None, 'Trainer', 'Scout',
        '!', '  ', 'Experimental', 'Recon', ' ',
        'WW2 fighter', None, 'Korean War', 'iconic', 'fighter',
        None, 'lightweight', '', 'Long range', None
    ]
}
df = pd.DataFrame(data)

# --- Step 3: Explore Data
print(df.head())
print(df.info())
print(df.describe())

# --- Step 4: Clean Data
# Drop Notes column (irrelevant)
df.drop(columns=['Notes'], inplace=True)

# Optional: Encode Manufacturer or keep for reference only
# df = pd.get_dummies(df, columns=['Manufacturer'], drop_first=True)

# --- Step 5: Feature Matrix and Target Vector
X = df[['Wingspan', 'MaxSpeed', 'Weight', 'FirstFlight']]
y = df['Label']

# --- Step 6: Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# --- Step 7: Standardize Features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --- Save for next notebook
import joblib
joblib.dump((X_train_scaled, X_test_scaled, y_train, y_test), 'aircraft_data_scaled.pkl')


---

# Section 2


Section 2 content

---

# Save files to workspace

We will generate the following files
* Train set
* Test set
* Data cleaning and Feature Engineering pipeline
* Modeling pipeline
* etc.

In [None]:
topic = 'topic'  # datasets
notebook = 'notebook'  # collections
version = 'v1'
file_path = f'outputs/{notebook}/{notebook}/{version}'

try:
    os.makedirs(name=file_path)
except Exception as e:
    print(e)

In [None]:
import os
try:
  os.makedirs(name='outputs/datasets/collection') # create outputs/datasets/collection folder
except Exception as e:
  print(e)

df.to_csv(f"outputs/datasets/collection/TelcoCustomerChurn.csv",index=False)

## Train Set

Note that ...

In [None]:
print(X_train.shape)
X_train.head()

X_train.to_csv(f"{file_path}/X_train.csv", index=False)

In [None]:
y_train

In [None]:
y_train.to_csv(f"{file_path}/y_train.csv", index=False)