# Assignment 4 - Digital signal processing and analysis

Course "Data processing and Visualization", IE500417, NTNU.

https://www.ntnu.edu/studies/courses/IE500417

**Note (as usual): plagiarism is strictly forbidden! You should never copy any source code from other students. If you use any code written by others (except the standards libraries: NumPy, SciPy, Pandas, etc), provide a reference.**

**If the teachers see that your work is mostly copy+paste from online code snippets, the grade can be reduced.**

**If a case of plagiarism is detected, it will be reported to the administration.**

## Task description

In this assignment, you will practice digital signal processing (in a rather basic form, there will be no advanced DSP methods). You will work with two signals
simultaneously. As it sometimes happens, the two signals are not synchronized: they are sampled at
different time moments and with different sampling rate. You will have to resample and synchronize
them so that both signals have the same sample timestamps. You will do some analysis of the signals
and visualize them using line charts.

## Submission details (as usual)

The assignment must be handed in on Blackboard. The following must be handed in:
1. Report in PDF or HTML format describing the results of this assignment. Preferably, it is generated from the Jupyter notebook you used (Hint: In Jupyter: File > Download as > HTML). Alternatively (if you use plain Python or other tools), prepare a readable report that contains figures and source code snippets necessary to understand your work.
2. Source code that you used to generate the results. This could be the the Jupyter notebook file, python source files, Matlab files, etc.

Deadlines and grading information on Blackboard.

## Part 1: Understanding the signals (25%)

**Step 1.1: Load the two signals from CSV files: s1.csv and s2.csv.**

In [1]:
# Your code here

**Step 1.2: Do a quick analysis of what data you got: column names of each signal and number of rows.**

In [2]:
# Your code here

**Step 1.3: One of the signals is sampled at even frequency, another is not. Find out which is the nonuniformly
sampled signal. Store it in variable `signal_x`. Store the the uniformly sampled signal in variable `signal_u`.**

Note: "find out" here means **"write code that finds out"**. If you will manually assign the `signal_u` and `signal_x` variables, you won't get points for this step. The reason - manual assignment is not flexible. If the dataset changes, your remaining notebook calculations will be wrong suddenly. Flexible code that finds the necessary signals would work even if we would swap the s1.csv and s2.csv files.

In [3]:
# Your code here
signal_x = ...
signal_u = ...

**Step 1.4. Plot the two signals in a line chart:**

* Both lines in a single chart
* Add a legend with label for each signal
* Signal U should be Green dashed line with line width=2
* Signal X should be Blue solid line with line width=1.
* Chart should have a title, font size = 20

A reference showing approximately how it could look:

<img src="./docs/ref1.png">

In [4]:
# Your code here

**Step 1.5: Find out the sampling frequency of Signal U, save it in variable `f_u`.**

In [5]:
# Your code here
f_u = ...

**Step 1.6: Find out which are the highest frequencies used in Signal U. Save the highest frequency in variable `b_u`, having Hz as units.**

Hint: use Fourier transform, and find max frequency having a significant amplitude. There may be many frequencies with tiny amplitudes. Use some threshold to filter these out.

In [6]:
# Your code here
b_u = ...

**Step 1.7: Find out the minimum frequency at which Signal U should have been sampled to still contain all the information in the signal. Save it in variable `fs_u`.**

Hint: Nyquist-Shannon theorem

In [7]:
# Your code here
fs_u = ...

**Step 1.8: Calculate, how many % of space is wasted by storing too many samples for Signal U. I.e., if we would resample in the Signal U at a sampling rate `fs_u`, how many samples would we store, and how much that is in relation to the number of samples in the CSV file?**

**If it is 0, why?**

P.S. Don't worry about Signal X – the sampling system for it was designed by careless engineers who did not know about Nyquist-Shannon's sampling theorem. Therefore, the sampling of Signal X is not proper. But we work with what we have.

In [8]:
# Your code here

## Part 2: Synchronizing the signals – resampling (25%)

Note: whenever you modify something for the signals, it is suggested to store the modifierd signal in another variable. Keep the original one intact. You may later want to compare the two. 

**Step 2.1: Decimate (down-sample) the Signal U to 10Hz, store it in variable `su_resampled`:**

In [9]:
# Your code here
su_resampled = ...

**Step 2.2: Explain - why is the resampled signal not containing the same information as the original signal (i.e., what information is lost?)**

**--- YOUR ANSWER HERE ---**

From now on, in all the places whare you need to do something with Signal U, **use the resampled version: `su_resampled`.**

**Step 2.3: Synchronize Signal X with Signal U, store it in variable `sx_resampled`. I.e., if `su_resampled` is sampled at time moments t0, t1, …, tN, then resample Signal X at the same time moments: t0, t1, …, tN. This may involve several steps, 
depending on what functions/libraries you use.**

Hint: see `07-2-Resampling.ipynb` example notebook on Blackboard ().

Hint: your resulting `sx_resampled` should be 10Hz signal, not 100Hz.

In [10]:
# Your code here
sx_resampled = ...

**Step 2.4: Check if the two signals really are synchronized – compare the timestamps, these should be
equal.**

In [11]:
# Your code here

**Step 2.5: Take both signals and insert them into a single DataFrame object (name it `composed_data`) which has:**
* Timestamps as the index column
* Two columns named `signalX` and `signalU` containing the corresponding values (`sx_resampled` and `su_resampled`).

In [12]:
# Your code here
composed_data = ...

## Part 3: Find extreme values (20%)

In this part you will find extreme values in the signals. Typically, these could mean outliers, sampling errors or extreme modes of operation in the system (such as overheating of a motor).

**Step 3.1: Find Signal U values (`su_resampled`) above 170.0. Store those in variable `extreme_u_vals`.**

In [13]:
# Your code here
extreme_u_vals = ...

**Step 3.2: Find Signal X values (`sx_resampled`) outside the range mean ± 2* StdDev. Store those in variable `ex_static_x_vals`.**

In [14]:
# Your code here
ex_static_x_vals = ...

**Step 3.3: Find Signal X values (`sx_resampled`) outside an adaptive range: 3-second moving average ± 2 * StdDev. Both the average and StdDev are calculated in a 3-second rolling window. Store the values in variable `ex_dynamic_x_vals`.**

In [15]:
# Your code here
ex_dynamic_x_vals = ...

## Part 4: Extra challenges (30%)

**Step 4.1: Plot a line chart with values, rolling mean, *normal boundaries* (mean +/- 2StdDev) and the extreme values `ex_dynamic_x_vals` that you calculated in Part 3.** Example:

<img src="files/ref3.png">

In [16]:
# Your code here

**Step 4.2: Find segments in Signal X (`sx_resampled`) where the 10-second moving average value is increasing for a continuous period of at least two seconds.**

In [17]:
# Your code here

**Step 4.3: Plot a line chart and mark these regions (from previous step) in a different color.**

For example: show the normal line in blue color and the continuously increasing moving average segments in green. Example:

<img src="files/ref2.png">

In [18]:
# Your code here

## Reflection 

Please reflect on the following questions:
1. How did the assignment go? Was it easy or hard?
2. How many hours did you spend on it?
3. What was the most time-consuming part?
4. If you need to do similar things later in your professional life, how can you improve? How can you do it more efficiently?
5. Was tehre something you would expect to learn that this exercise did not include?
6. Was there something that does not make sense?

**--- YOUR ANSWERS HERE ---**