# Exercise 3.4 - Relative positive acceleration

## Introduction 
In this exercise we are going to investigate the acceleration in terms of the relative positive acceleration (RPA). The RPA is a measure for the frequency and intensity of positive accelerations. This exercise consists of two parts. In the first part, we are going to calculate the RPA of an exemplary day of our vehicle. The second part focuses on the analysis of the RPA per day. 

## Preparation
First of all we need to import all necessary packages and modules. In this case we need:
* pandas (pandas dataframes)
* numpy (numpy arrays as well as various mathematical methods)
* matplotlib.pyplot (plotting)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Further, we want to set the font size of all plots globally. 

In [None]:
# set font size of all plots globally
plt.rcParams.update({'font.size': 16})

## Data import - Part 1
Here the necessary data for the first part of this exercise is imported and saved as a pandas dataframe using pandas read_pickle function. Further, we sort the dataframe chronologically. 

In [None]:
data_df = pd.read_pickle('data/e34_part1_data_df.pkl')

## Available data - Part 1
The data that is available for the following exercises is a pandas dataframe called data_df with the following rows:
* time: timestamp as datetime64
* distance: distance in m to the next time step
* speed: speed values in m/s
* time_diff: difference in s between two subsequent timestamps
* acceleration: acceleration in m/s^2

In [None]:
display(data_df)

## Exercise 3.4.1 Calculate realative positive acceleration

### Task
The RPA is a measure for the frequency and intensity of positive accelerations. This exercise consists of two parts. It is calculated according to:
$$RPA = \frac{1}{d_{tot}} \sum_{t=0}^{t_{final}} v \cdot a^{+} \cdot \delta t$$

where $d_{tot}$ is the total daily distance, $a^{+}$ indicates a positive acceleration, $v$ is the speed and $\delta t$ is the duration until the next timestamp, i.e., time_diff.

* Implement a function that calculates the relative positive acceleration for one day of our vehicle according to:
* Test your function using the given data.

##### Signature of the function
rpa = calculate_relative_positive_acceleration(acceleration, speed, time_diff, distance) 

* INPUT: acceleration, speed, time_diff, distance as numpy arrays
* OUPUT: rpa as scalar value 

###### Reminder
-

##### Hint
-

##### Solution
rpa = 0.34727876393372664

### Your code here:

In [None]:
#<<solution>>
def calculate_relative_positive_acceleration(acceleration, speed, time_diff, distance):

    if len(distance) != len(speed) != len(acceleration) != len(time_diff):
        return np.nan

    else:

        # initialize rpa
        rpa_sum = 0

        # calculate total distance
        total_distance = distance.sum()

        # loop distance
        for i in range(0, len(acceleration) - 1):

            if acceleration[i] > 0:

                rpa_sum += (time_diff[i] * speed[i] * acceleration[i])

        # scale to total distance
        rpa = rpa_sum / total_distance

        return rpa
    
    
# extract values from data_df and convert to numpy array
acceleration = data_df['acceleration'].values
speed = data_df['speed'].values
time_diff = data_df['time_diff'].values
distance = data_df['distance'].values

# calculate discrete velocity distribution
rpa = calculate_relative_positive_acceleration(acceleration, speed, time_diff, distance)

print('rpa =', rpa)
#<</solution>>

## Data import - Part 2
Here the necessary data for the second part of this exercise is imported and saved as a pandas dataframe using pandas read_pickle function. Further, we sort the dataframe chronologically. 

In [None]:
data_df = pd.read_pickle('data/e34_part2_data_df.pkl')

## Available data - Part 2
The data that is available for the following exercises is a pandas dataframe called data_df with the following rows:
* day: day as datetime64
* rpa: rpa in m/s^2

In [None]:
display(data_df)

## Exercise 3.4.2 Plot rpa using different box plot settings

### Task
In this task we are going to visualize the rpa per day using a box plot. 
1. Create a plot as depicted below. It consists of three subplots. The first one is a basic plot, i.e., no parameters are passed to the box plot function. In the second one outliers are hide. The last box plot uses whiskers, that includes all outliers. 

##### Signature of the script
The signature of the script definies the interfaces (INPUT, OUTPUT) of the current task within this notebook. It is up to you, how you get from INPUT to OUTPUT.
* INPUT: Pandas dataframe data_df
* OUPUT: Plot as depicted below

###### Reminder
* The goal of a box plot is to compare graphically various data sets within one diagram.

##### Hint
* Use the official matplotlib documentation for further information (https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html)

##### Solution
![title](data/img/solution_e432.png)

### Your code here:

In [None]:
#<<solution>>
rpa = data_df['rpa']

fig, ax = plt.subplots(1, 3 , sharey=True, figsize=(15, 10))

ax[0].set(title='Basic plot')
ax[0].boxplot(rpa)

ax[1].set(title='Hide outliers')
ax[1].boxplot(rpa, showfliers=False)

ax[2].set(title='Include outliers')
ax[2].boxplot(rpa, whis='range')

y_min = np.min(rpa) - 0.1
y_max = np.max(rpa) + 0.1

for axs in ax.flat:
    axs.set(ylabel='Relative positive acceleration in m/s^2')
    axs.set_ylim([y_min, y_max])
    axs.grid(axis='y')

    # Hide x labels and tick labels for top plots 
    axs.label_outer()

plt.show()
#<</solution>>

## Exercise 3.4.3 Calculate box plot characteristics

### Task
Now we are going to calculate all box plot characteristics manually. We will use them in later tasks.
1. Calculate all statistical measures that are included in a boxplot manually.

##### Signature of the script
* INPUT: Pandas dataframe data_df
* OUPUT: All statistical measures that are usually included in a box plot.

###### Reminder
Statistical measures that are usually included:<br/> 
* Median, Maximum and minimum  
* Quartiles (Q1 and Q3) and Interquartile range (IQR = Q3 – Q1)
* Whiskers (indicating  variability outside the Q1 and Q3): 
* Lower whisker: Q1 – 1.5*IQR
* Upper whisker: Q3 + 1.5*IQR
* Outliers

##### Hint
The whiskers, that are depicted in a box plot, represents real values. The upper whisker will extend to the last value less than (Q3 + whis * IQR). Similarly, the lower whisker will extend to the first value greater than (Q1 - whis * IQR). 
see https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.boxplot.html

##### Solution
minimum = 0.00088893<br/>
maximum = 0.609692<br/>
median = 0.318714<br/>
lower_quartile = 0.267899<br/>
upper_quartile = 0.377053<br/>
lower_whisker = 0.111165<br/>
upper_whisker = 0.539693<br/>

### Your code here:

In [None]:
#<<solution>>
rpa = data_df['rpa']

minimum = rpa.min()
maximum = rpa.max()

median = np.median(rpa)

lower_quartile = np.quantile(rpa, 0.25)
upper_quartile = np.quantile(rpa, 0.75)

whis = 1.5
iqr = upper_quartile - lower_quartile
lower_whisker_theoretical = lower_quartile - (1.5 * iqr)
upper_whisker_theoretical = upper_quartile + (1.5 * iqr)

lower_whisker = rpa[rpa >= lower_whisker_theoretical].min()
upper_whisker = rpa[rpa <= upper_whisker_theoretical].max()

print('minimum =', minimum) 
print('maximum =', maximum)
print('median =', median)
print('lower_quartile =', lower_quartile)
print('upper_quartile =', upper_quartile)
print('lower_whisker =', lower_whisker)
print('upper_whisker =', upper_whisker)
#<</solution>>

## Exercise 3.4.4 Compare box plot and histogram

### Task
Finally, we are going to visualize all box plot characeteristics inside of a histogram und compare the resulting plot with the box plot. 
1. Create a plot as depicted below. Set the number of bins of the histogram to the number of rpa values. 

##### Signature of the script
* INPUT: Pandas dataframe data_df
* OUPUT: Plot as depicted below

###### Reminder
-

##### Hint
* By passing gridspec_kw={'hspace': 0} to the subplot function you can hide the vertical space between two superposed subplots.
* In order to get the same plot styling as depicted below, you need to use following parameters: alpha=0.2, edgecolor='blue'

##### Solution
![title](data/img/solution_e433.png)

### Your code here:

In [None]:
#<<solution>>
rpa = data_df['rpa']

fig, ax = plt.subplots(2, sharex=True, figsize=(15, 10), gridspec_kw={'hspace': 0})

fig.suptitle('Boxplot and histogram')

# Boxplot
ax[0].boxplot(rpa, vert=False, whis=1.5)

# Histogram
num_bins = int(len(rpa)/5)
n, bins, patches = ax[1].hist(rpa, bins=num_bins, alpha=0.2, edgecolor='blue')

# set axis limits
x_range = rpa.max() - rpa.min()
x_min = rpa.min() - 0.05*x_range
x_max = rpa.max() + 0.05*x_range

y_range = n.max() - n.min()
y1_min = 0
y1_max = n.max() + 0.05*y_range

ax[1].vlines(minimum, ymin=y1_min, ymax=y1_max, colors='black')
ax[1].vlines(maximum, ymin=y1_min, ymax=y1_max, colors='black')
ax[1].vlines(median, ymin=y1_min, ymax=y1_max, colors='orange')
ax[1].vlines(lower_whisker, ymin=y1_min, ymax=y1_max, colors='black')
ax[1].vlines(upper_whisker, ymin=y1_min, ymax=y1_max, colors='black')
ax[1].vlines(lower_quartile, ymin=y1_min, ymax=y1_max, colors='black')
ax[1].vlines(upper_quartile, ymin=y1_min, ymax=y1_max, colors='black')


ax[1].set_ylim([y1_min, y1_max])
ax[1].set(ylabel='Relative frequency')

for axs in ax.flat:
    axs.set(xlabel='Relative positive acceleration in m/s^2')
    axs.set_xlim([x_min, x_max])
    axs.grid()
    axs.label_outer()

plt.show()
#<</solution>>