<a target="_blank" href="https://colab.research.google.com/github/ZHAW-ZAV/TSO-FS25-students/blob/main/07_unsupervised_ml/LSZH_concepts.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Zurich airport has three main operating concepts:

**North Approach Concept**
- Approaches from north on runway 14 or 16 
- Departures heading west on runway 28 or heading south on runway 16
- Departures partially heading east on runway 10

General application:
- Monday to Friday from 07:00 - 21:00
- Saturday to Sunday from 09:00 - 20:00 and on public holiday in Baden-Württemberg

**East Approach Concept**
- Approaches form east on runway 28
- Departures  heading north on runway 32 or 34

General application:
- Monday to Friday from 21:00 - 23:30
- Saturday to Sunday from 20:00 - 23:30 and on public holidays in Baden-Württemberg

**South Approach Concept**
- Approaches from south on runway 34
- Departures heading north on runway 32 or 34
- Departures partially heading west on runway 28

General application:
- Monday to Friday from 06:00 - 07:00
- Saturday to Sunday from 06:00 - 09:00 and on public holidays in Baden-Württemberg


The goal is to cluster the operated concepts using data obtained ADS-B data in an unsupervised way.

### Import of required libraries


In [25]:
import pandas as pd
import plotly.express as px
from sklearn.decomposition import PCA

### Import datasets

In [None]:
lnd_df_url = 'https://raw.githubusercontent.com/ZHAW-ZAV/TSO-FS25-students/refs/heads/main/07_unsupervised_ml/lszh_lnd_df.csv'
lnd_df = pd.read_csv(lnd_df_url)
lnd_df['time'] = pd.to_datetime(lnd_df['time'])
lnd_df

In [None]:
toff_df_url = 'https://raw.githubusercontent.com/ZHAW-ZAV/TSO-FS25-students/refs/heads/main/07_unsupervised_ml/lszh_toff_df.csv'
toff_df = pd.read_csv(toff_df_url)
toff_df['time'] = pd.to_datetime(toff_df['time'])
toff_df

---


### Preparation of the dataset

We want to create a dataset that for each hour contains the number of takeoffs and landings per runway.


Identify takeoffs and generate overview dataframe


In [None]:
grouped = toff_df.groupby([pd.Grouper(key="time", freq="h"), "rwy"]).size()
grouped

In [None]:
# Pivot to get one column per runway
toff_df = grouped.unstack(fill_value=0)
# Rename columns
toff_df = toff_df.rename(
    columns={
        14: "toff_14",
        32: "toff_32",
        16: "toff_16",
        34: "toff_34",
        10: "toff_10",
        28: "toff_28",
    }
)
toff_df

In [None]:
# same for landings
grouped = lnd_df.groupby([pd.Grouper(key="time", freq="h"), "rwy"]).size()
lnd_df = grouped.unstack(fill_value=0)
lnd_df = lnd_df.rename(
    columns={
        14: "lnd_14",
        32: "lnd_32",
        16: "lnd_16",
        34: "lnd_34",
        10: "lnd_10",
        28: "lnd_28",
    }
)
lnd_df


In [None]:
# let's merge the two dataframes
df = pd.concat([toff_df, lnd_df], axis=1)
# Let's filter a bit by only keep rows where at least 5 takeoffs and landings occurred
df['total_takeoffs'] = df.filter(like='toff').sum(axis=1)
df['total_landings'] = df.filter(like='lnd').sum(axis=1)
df = df.query('total_takeoffs >= 5 and total_landings >= 5').dropna()

df['main_lnd_rwy'] = df.filter(like='lnd').idxmax(axis=1)
df['main_toff_rwy'] = df.filter(like='toff').idxmax(axis=1)

df



In [None]:
# instead of standardizing, we can also compute the percentage of takeoffs and landings per runway and add it as new columns
df['pct_toff_14'] = df['toff_14'] / df['total_takeoffs']
df['pct_toff_16'] = df['toff_16'] / df['total_takeoffs']
df['pct_toff_28'] = df['toff_28'] / df['total_takeoffs']
df['pct_toff_32'] = df['toff_32'] / df['total_takeoffs']
df['pct_toff_34'] = df['toff_34'] / df['total_takeoffs']
df['pct_lnd_14'] = df['lnd_14'] / df['total_landings']
df['pct_lnd_16'] = df['lnd_16'] / df['total_landings']
df['pct_lnd_28'] = df['lnd_28'] / df['total_landings']
df['pct_lnd_32'] = df['lnd_32'] / df['total_landings']
df['pct_lnd_34'] = df['lnd_34'] / df['total_landings']

df

In [None]:
## PCA for visualization
# we will use a PCA on the pct columns
df_pca = df.copy().filter(like='pct')
pca = PCA(n_components=2)
df_pca = pd.DataFrame(pca.fit_transform(df_pca), columns=['PC1', 'PC2'])
px.scatter(df_pca, x='PC1', y='PC2')

# # in 3d
df_pca = df.copy().filter(like='pct')
pca = PCA(n_components=3)
df_pca = pd.DataFrame(pca.fit_transform(df_pca), columns=['PC1', 'PC2', 'PC3'])
px.scatter_3d(df_pca, x='PC1', y='PC2', z='PC3')


**Task**: Play with the parameters of the DBSCAN algorithm to find the best clustering and identify one anomaly (e.g. lots of landings on 16)

In [None]:
# apply DBSCAN
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=X, min_samples=X)
cluster_labels = dbscan.fit_predict(df.filter(like='pct')).astype(str)
df_pca['cluster'] = cluster_labels
# px.scatter(df_pca, x='PC1', y='PC2', color='cluster')
px.scatter_3d(df_pca, x='PC1', y='PC2', z='PC3', color='cluster')



In [None]:
df['cluster'] = df_pca['cluster'].values

px.scatter(
    df,
    y=['main_lnd_rwy', 'main_toff_rwy'],
    color='cluster',
    hover_data=df.filter(like='pct').columns.tolist() + ['total_landings', 'total_takeoffs'],
)



### K-means


**Task**: Play with the number of clusters and visualize the results in 2D or 3D with plotly


In [None]:
# k-means with 3 clusters
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=6, random_state=42)
kmeans.fit(df.filter(like='pct'))
df_pca['cluster'] = kmeans.labels_.astype(str)
px.scatter_3d(df_pca, x='PC1', y='PC2', z='PC3', color='cluster')


In [None]:
df['cluster'] = df_pca['cluster'].values.astype(str)

px.scatter(
    df,
    y=['main_lnd_rwy', 'main_toff_rwy'],
    color='cluster',
    hover_data=df.filter(like='pct').columns.tolist() + ['total_landings', 'total_takeoffs'],
)
