# Using Deep Learning to Classify Airliner Flight Profiles for Post-Flight Analysis

> Capstone Project for Master of Science in Data Science (University of Wisconsin)
- toc: true
- branch: master
- badges: false
- comments: true
- hide: true
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2

This project suggests an automatic way for post-flight analysts to undertake classification of flight profiles into useful versus non-useful classes. Instead of using the traditional algorithms for time-series classification, this work makes use of a relatively new approach: Before classifying, first transform a time-series into an image. This allows for the application of a well-developed set of algorithms from the area of computer vision. In this project, we perform a comparison of a number of these transformation techniques in terms of their associated image classification performance. We apply each transformation technique to the time-series dataset in turn, train a Convolutional Neural Network to do classification, and record the performance. Then we select the most performant transformation technique (a simple line plot that got a 100% F1-score) and use it in the rest of the analysis pipeline.

The pipeline consists of three models. The first model classifies flight profiles into developed (useful) and non-developed (non-useful) profiles. The second model performs multi-label classification on the developed profiles from the first model. The labels reflect whether a profile has canonical climb/cruise/descent segments. The last model classifies flight profiles with canonical cruise segments into classes that have extended cruises (useful) and shorter cruises (non-useful).

Next, we prepare a significant unlabeled test dataset, consisting of data points that have never been seen by any of the models. We construct an end-to-end analytic inference process to simulate a production system, apply it to the test dataset, and obtain impressive results. Finally, we make recommendations to post-flight and other interested analysts.

*Keywords*: Deep learning, Time series, Image Classification, CNN, RNN, Flight path

# 1 BACKGROUND

One of the important operational characteristics of a commercial airliner is its flight path. This term needs qualification though. The *lateral* flight path is the track projected onto a flat earth from above. To visualize the lateral flight path one can plot longitude versus latitude as flight time progresses. The *vertical* flight path, on the other hand, is the altitude profile (viewed from the side). This may be visualized by plotting altitude versus flight time. This project will focus on the vertical flight path. When we use the term flight path or flight profile in the rest of the document, we will always refer to the *vertical* flight path.

During normal operation, an airliner has a predictable flight path. After *takeoff*, it climbs as quickly as possible (during the phase known as *climb*) until it reaches a point called the *top of climb*. The pilot then levels off and usually maintains this altitude for most of the flight (straight-and-level flight). This phase of the flight is known as *cruise*. When nearing its destination, a point is reached that is known as *top of descent*. At this point the pilot enters the *descent* phase. Finally, the flight ends during the *landing* of the aircraft.

Post-flight data analysts are often interested in separating useful from non-useful (or less useful) flight paths prior to a specific analysis. In this document, and its associated analysis code, useful profiles will be labeled as *typical* (abbreviated as “typ”) while less useful, non-useful, or anomalous flight paths (for a specific analysis) will be labeled as *non-typical* (abbreviated as “non”). A *typical* flight profile, in the context of this document, has a relatively extended cruise section without changes in altitude (see Figure 1.1). This characteristic will make it useful for certain types of analyses, for example, to estimate hard-to-measure variables like drag and exact angle-of-attack, as well as estimations of the positions of vertical flight control surfaces (even though they are measured). Flight paths could also be considered *non-typical* due to insignificant (i.e. too short) cruise segments and missing data (see Figure 1.2). Note that our definition of usefulness is by no means universal in post-flight analysis.

![Figure 1.1 Typical and Non-typical vertical flight path examples (the non-typical path on the right has steps during cruise)](../images/fig1-1.png "Figure 1.1 Typical and Non-typical vertical flight path examples (the non-typical path on the right has steps during cruise)")

A large airline can operate thousands of flights a day and it is not feasible for the analyst to do this separation/classification in a manual way. What comes to mind next is to construct an algorithm to take on the task. However, it is not straightforward to come up with a traditional algorithm that would discriminate between typical and non-typical flight paths. A promising approach, of course, is to use supervised machine learning and show the model a large enough number of training examples.

![Figure 1.2 Non-typical vertical flight path examples due to insignificant cruise section (left) and missing data (right)](../images/fig1-2.png "Figure 1.2 Non-typical vertical flight path examples due to insignificant cruise section (left) and missing data (right)")

The *predictor points* for this problem are not structured vectors as is common in the case of structured data analysis. Here we have to use a time-series or sequence of scalar-valued predictor points and have the model learn the associated *target point* which is a single categorical “scalar” in each case. The values of the target points will be either *typical* (“typ”) or *non-typical* (“non”). We therefore have a classification problem: Given a flight path (as a scalar-valued time-series), the classifier should choose between typical and non-typical (scalar-valued and categorical).

In the deep learning subfield, it is common to use a *Recurrent Neural Network* (RNN) for this kind of problem. See, for example, Hüsken and Stagge (2003), and also Sun, Di, and Fang (2019). However, the training of an RNN can be challenging due to high demands on computing resources including processing power, processing time, and memory. There is also the vanishing/exploding gradients problem, addressed in various ways, but is often still lurking in the background.

Consider how easy it is for the human visual system to handle this problem, and in a fraction of a second. In fact, this is exactly how analysts often do their classifications manually. This suggests that we might benefit from tapping into the biological mechanisms for analyzing visual data (i.e. images). Recently, some researchers started adopting this insight. See, for example, Wang and Oates (2015a). The essence of this line of thought is the following: Instead of analyzing a sequence of 1-D or scalar-valued *temporal* data points, we transform them into a single 2-D or matrix-valued *spatial* data point. The spatial data point is simply an image which means the time-series signal has been transformed into an image (see Figure 1.3 for an example of a transformation technique). This allows for the application of a large body of relatively well-developed computer vision techniques to the above-stated problem. Most of these techniques center around the *Convolutional Neural Network* (CNN). In summary, the *time-series classification* problem has been converted to an *image classification* problem.

![Figure 1.3 Example of Gramian Angular Summation Field (GASF) transformation of a time-series](../images/fig1-3.png "Figure 1.3 Example of Gramian Angular Summation Field (GASF) transformation of a time-series")

This comparative case study will not attempt to compare the difference between using RNNs and CNNs to solve the flight path classification problem. Instead, it will compare the impact of a number of transformation techniques (to transform the time-series into an image) on the image classification performance. After application of each transformation technique to the training dataset of flight path time-series, a CNN will be trained which will serve as a classifier. The performance of the classifiers will be compared. *Transfer learning* will be used to speed up the training of the CNNs.

## 1.1	VALUE PROPOSITION

This project seeks to provide value in a number of ways:
* Demonstrates how flight profile time-series can be turned into images for more effective classification
* Identifies the best transformation technique for flight path time-series
* Reduces the need for (or does away with) hand classification of flight profiles. This will save significant amounts of time for post-flight analysts.
* Provides an analytic process that can be adopted as a tool by analysts. They can then implement the analytical process in their own preferred technology environment.
* Demonstrates how transfer learning greatly speedup the time to train a CNN neural network for the classification of profiles. This should encourage analysts that might still be skeptical about the use of deep learning for everyday tasks, and save them even more time.
* Demonstrates how post-flight analysis can be undertaken by ordinary analysts. This data is usually considered sensitive by airlines and are not published. A publicly available de-identified source of flight data is used and the project demonstrates how this provides a valuable opportunity for analysts.
* Encourages data scientists to undertake post-flight analyses. This is especially needed in the area of airline safety. In addition, and when allowed by airline policies and pilot unions, post-flight analysis can be a valuable tool in the performance evaluation of pilots. This can have a positive impact on the profitability of an airline.
* Satisfies my personal interest in the analysis of flight data as well as the application of cutting edge analysis techniques in the form of deep learning.

## 1.2 OBJECTIVES

To setup an analytics pipeline for analysts, the foremost objective is to find the best transformation technique to convert flight path time-series into images. The rest of the objectives provide the components for the construction of the pipeline.

### 1.2.1 Transformation Techniques

We will perform a comparison of a number of transformation techniques in terms of their associated image classification performance. To do this we will apply each transformation technique to the cleaned time-series dataset in turn, train a CNN to do classification (using supervised learning), and record the performance. Then we will select the most performant transformation technique and use this technique in the rest of the analysis pipeline. The following transformation techniques will be considered:
* Altitude line plots transformed into an image
* Altitude area plots transformed into an image
* Gramian Angular Summation Field (GASF)
* Gramian Angular Difference Field (GADF)
* Markov Transition Field (MTF)
* Recurrence Plot (RP)

### 1.2.2 Developed/Non-developed Model

The first model in the analytics pipeline will classify flight profiles into developed (useful) and non-developed (non-useful) profiles. We will also consider the use of *anomaly detection* by means of an *auto-encoder* (instead of a classification algorithm) due to a significant class imbalance.

### 1.2.3 Canonical Segments Model

The next model in the pipeline will perform multi-label classification of the developed profiles. The labels used here will reflect whether a profile has *canonical* climb, cruise, and descent segments. In this context, canonical means relatively smooth.

### 1.2.4 Extended/Short Cruises Model

The final model in the pipeline will classify flight profiles with canonical *cruise* segments (regardless of the properties of climb or descent segments) into profiles that have extended cruises (useful) and shorter cruises (non-useful).

### 1.2.3 End-to-end Inference

The final objective will be to prepare a significant *test* dataset, consisting of data points that have never been seen by any of the models. We will construct an end-to-end inference process to simulate a production system and apply it to the test dataset. Then we will make recommendations to post-flight analysts.

# 2 DATA SOURCE

At any moment, there is an average of about 10,000 airplanes in the sky carrying more than a million passengers. Hundreds of variables are usually monitored during a flight which often has a duration of a number of hours. Many of these variables are sampled at a rate of once per second or more frequently. A huge volume of data is generated during a typical flight. This suggests that the analysis of flight data should be of some importance. Moreover, it seems reasonable that flight data should be easily accessible. This is not always the case, however.

Flight data directly reveals how an airline operates its core business and how efficiently pilots perform their duties. This data is considered sensitive. Some of the collected flight data, however, is so basic that it is, in fact, publicly available. Examples are datapoints that contain altitude, latitude, longitude, and heading. This information is considered to be public in the interest of the safe operation of all aircraft.

## 2.1 SOURCES OF FLIGHT DATA

The gradual adoption of *Automatic Dependent Surveillance – Broadcast* (ADS–B) by airlines is leading to the wide availability of flight data in the public domain. Wikipedia gives a good overview of this technology ("Automatic dependent surveillance – broadcast," n.d.). The ADS-B technology allows an aircraft to use satellite navigation to find its position. It then broadcasts this information periodically which enables ground stations to track it. This method is used as a replacement for secondary surveillance radar (SSR) and does not depend on an interrogation signal from the ground. The data that is broadcast can also update the situational awareness of other aircraft in the area.

### 2.1.1	Publicly available flight data

The increasing use of ADS-B has led to many flight tracking sites that publish basic flight data for consumption by the public. See "This Is How Flight Tracking Sites Work" (Rabinowitz, 2017). This data is relatively superficial and usually consists of a dozen or so measured quantities. Some of the more prominent players are:
* ADS-B Exchange at https://www.adsbexchange.com/  
* OPENSKY at https://opensky-network.org/ 
* FlightAware at https://flightaware.com/ 
* ADSBHub at http://www.adsbhub.org/ 
* planefinder at https://planefinder.net/ 
* Aireon at https://aireon.com/ 
* flightradar24 at https://www.flightradar24.com/
* RadarBox at https://www.radarbox24.com/ 

Even though these sources provide the altitude data needed for the analyses described in this document, we chose to not use any of them. Follow-up analyses often require more in-depth flight data that is not provided by any of the ADS-B sources. We also want to provide an example of how to use a substantial flight data source consisting of in-depth data.

### 2.1.2 In-depth flight data

Detailed, in-depth flight data is generally unavailable to the public. There are a few sources that make de-indentified data available but usefulness varies. A few sources are:
* DASHlink at https://c3.nasa.gov/dashlink/ 
* IATA at https://www.iata.org/services/statistics/gadm/Pages/fdx.aspx 
* Data.gov at https://www.data.gov/ with a search term of “ads-b”

### 2.1.3 Selected data source

We selected the DASHlink source. The data is accessible from https://c3.nasa.gov/dashlink/projects/85/.  After clicking on “35 Datasets,” we used the data for “Tail 687.” This is a large amount of data (2,395.4 MB in zipped format) from which we sub-selected the first three datasets: Tail_687_1.zip, Tail_687_2.zip, and Tail_687_3.zip. The data can be downloaded from [Sample Flight Data](https://c3.nasa.gov/dashlink/resources/664/). There are 186 measured quantities (features) in the data.

# 3 DATA PREPARATION

The preparation of data involves conversion, cleaning, resampling, and the transformation of time-series data to images.

## 3.1 EXPLORATION OF DATA STRUCTURE

The structure of the raw data files is somewhat complicated. It is in MATLAB format and different variables were sampled at different sample rates. For familiarization, a thorough exploration of the structure of the raw data is provided in the notebook:

[10_mat2csv.ipynb](https://nbviewer.jupyter.org/github/kobus78/dashlink/blob/master/10_mat2csv.ipynb).

The actual preparation of the data was divided into two procedures. This first takes care of general conversion and cleaning tasks. The second undertakes the transformation of the time-series in each file to its associated spatial signal or image. Each procedure occurs in its own notebook.

## 3.2 CLEANING PREPARATION PROCEDURE

The source data is in MATLAB format (.mat) after downloading and unzipping. The raw data acquired for this project were as follows:
* For training, including validation (data will be labeled)
    * Tail_687_1.zip (651 flights)
    * Tail_687_2.zip (602 flights)
* For testing, i.e. simulation of production (data will not be labeled)
    * Tail_687_3.zip (582 flights)

After downloading and unzipping, the files in each folder were converted separately (from .mat to .csv) by means of the notebook:

[10_mat2csv-2.ipynb](https://nbviewer.jupyter.org/github/kobus78/dashlink/blob/master/10_mat2csv-2.ipynb)

The output .csv files were eventually moved to either the Train (1,253 files) or Test (582 files) folders.

The cleaning preparation procedure can be summarized as follows:
* Conversion:
    * Using the scipy.io Python package, the data is converted from MATLAB format to a simple .csv format.
* Make a dataframe for each sample rate:
    * All datapoints for a specific sample rate are collected in a dataframe. The rates available are referred to as 0.25, 1, 2, 4, 8, and 16.
    * All dataframes are combined into a single dataframe.
* Remove invalid time values:
    * Files with invalid values for year, month, day, hour, minute, or second are removed in this step.
* Output a csv file from the dataframe
* Build date-time index
    * Being a time-series, it is important to index the data in the form of a date-time index. This is done by reading the exported file back into a dataframe.
* Down-sample to 1 minute rate:
    * The data in each file’s dataframe is down-sampled from a variable’s specific sample rate to a 1 minute rate. This reduces the intensity of the data as well as provides for a more realistic sample rate for the purposes of this study. 
    * Another csv file is exported using the same name but having a “-1min” appended to the name.

## 3.3 TRANFORMATION PREPARATION PROCEDURE

The transformation procedure occurs in the notebook:

[10_csv2png-3.ipynb](https://nbviewer.jupyter.org/github/kobus78/dashlink/blob/master/10_csv2png-3.ipynb)

The transformation preparation procedure can be summarized as follows:
* Read the csv data into a dataframe
* Select the transformation technique
    * Done by commenting in the appropriate section of code. Please see section 4.1.3 for a description of each transformation technique.
* Plot the time-series signal
    * To convert the time-series signal to a spatial signal it is plotted as a graphic.
    * The graphic is stripped of all annotations, e.g. the frame, tick marks, tick labels, axis labels, and heading.
* Save the image, using the same name but having an extension of .png.

# 4 MODELING

In this section, we will look at the important concept of time-series classification and how it relates to two of the most important deep learning architectures: Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). Then we will discuss the classification models for our pipeline in detail, making use of image-transformed flight profile time-series and the CNN architecture.

## 4.1 TIME-SERIES CLASSIFICATION

Time-series classification (TSC) is an area of active research. Some consider it one of the most challenging problems in data mining (Esling & Agon, 2012). This opinion is supported by Yang and Wu (2006). A large number of techniques have been invented. Many of these approaches are covered by Bagnall, Lines, Bostrom, Large, and Keogh (2017). The most promising approach, they point out, is known as COTE (Collective Of Transformation-based Ensembles) as described by Bagnall, Lines, Hills, and Bostrom (2016). HIVE-COTE is an extension of COTE (Lines, Taylor, & Bagnall, 2016). See also Lines, Taylor, and Bagnall (2018). The extension is in the form of a Hierarchical Vote system. This is considered the state-of-the-art currently. To use HIVE-COTE a large number of classifiers (37) need to be trained. The decisions made by them are not easy to interpret and classification time is excessive.

Given the impressive successes of deep learning in many disciplines lately, the use of it has started to make inroads into the area of time-series classification (Wang, Yan, & Oates, 2017). In their recent paper, “Deep Learning for Time Series Classification: A Review,” Fawaz, Forestier, Weber, Idoumghar, and Muller (2019) point out that they achieved results that are not significantly different from results obtained from HIVE-COTE by making use of deep learning and a residual network. They also provide a handy taxonomy (p. 11) for the use of deep learning algorithms to classify time-series (somewhat abbreviated here):
* Deep Learning for Time Series Classification
    * Generative Models
        * Auto Encoders
            * RNNs
        * Echo State Networks (simplified RNNs)
    * Discriminative Models
        * Feature Engineering
            * Image Transformation
            * Domain Specific
        * End-to-End
            * Multi-Layer Perceptrons (aka fully-connected or FC networks)
            * CNNs
            * Hybrid

The main division is between generative and discriminative models. Generative models generally include an unsupervised training step before the learner fits its classifier. A discriminative model, on the other hand, directly fits the mapping from the raw input of a time-series to the probability distribution over the classification classes.

The literature informally agree that discriminative models are more accurate than generative models. In this report, we will focus on the Image Transformation leaf of this tree, which falls under discriminative models.

There are significant advantages in the use of deep learning to classify time-series. One specific advantage is the ability to detect time invariant characteristics. This is similar to how spatially invariant filters detect patterns in images.

### 4.1.1 Recurrent Neural Networks (RNNs)

In a fairly old paper, Hüsken and Stagge (2003) promote the use of RNNs for time-series classification. Recurrent layers are described by the equations:

$$ \Large
\begin{align}
\mathbf{a}^{<t>} &= g(\mathbf{W}_{aa} \mathbf{a}^{<t-1>} + \mathbf{W}_{ax} \mathbf{x}^{<t>} + \mathbf{b}_a) \\
\hat{\mathbf{y}}^{<t>}  &= g(\mathbf W_{ya}\mathbf{a}^{<t>} + \mathbf{b_y})
\end{align}
$$

The parameters or weights that undergo training are captured in a number of *filters* or *kernels*. The *feedback* filter is $\mathbf{W}_{aa}$, the *input* filter $\mathbf{W}_{ax}$, and the *output* filter $\mathbf{W}_{ya}$. The *signal* is the data that are used as examples during training. The symbols $\mathbf{x}^{<t>}$ and $\mathbf{\hat{y}}^{<t>}$ represent the input and output signals respectively. The hidden state, or internal signal, is given by $\mathbf{a}^{<t>}$. The filters are matrices while the signals are vector-valued. There is often a single layer in an RNN. Note, however, that this architecture is recursive. This means that each time-step could be considered a separate layer in time.

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder

### x.x.x Holder