**Qingbo Liu**

Spring 2020

CS 251: Data Analysis and Visualization

Project 2: Matrix Transformations

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import data
import transformation

plt.style.use(['seaborn-colorblind', 'seaborn-darkgrid'])
plt.rcParams.update({'font.size': 20})

np.set_printoptions(suppress=True, precision=5)

# Automatically reload external modules
%load_ext autoreload
%autoreload 2

# Project 2: Matrix Transformations

The goal of this project is to give you practice using matrix multiplication to efficiently transform data (translation, scaling, and rotation). To that end, you'll develop the Transformation class as a child class of Analysis, from Project 1. 

We expect you to create the transformation matrices and and apply them from data using matrix multiplication yourself â€” **you may not call high-level functions to do all the work for you**. Functions similar to the following are fine to use:

- creating an identity matrix with `np.eye()`
- creating matrices of zeros or ones with `np.zeros()` or `np.ones()`
- concatenating matrices with `np.hstack()` or `np.vstack()`

Here is a **suggested order of implementation** for completing the Transformation class's methods in transformation.py:
1. `__init__()`: The constructor.
1. `get_data_homogeneous()`: Adds a column of normal homogeneous coordinates to the data matrix.
2. `project()`: Projects the M-dimensional Data object in self.data onto a subset of its axes.
3. Construct homogeneous transformation matrices (in any order):
> * `translation_matrix()`: Constructs an M+1-by-M+1 translation matrix for shifting the M-dimensional Data object in self.data
> * `scale_matrix()`: Constructs an M+1-by-M+1 scale matrix for resizing the M-dimensional Data object in self.data
> * `rotation_matrix_3d()`: Constructs a 4x4 rotation matrix for rotating the 3-dimensional Data object in self.data
4. Apply transformation matrices to the Data object in self.data (with homogeneous coordinates):
> * `translate()`: Uses a translation matrix to transform self.data
> * `scale()`: Uses a scale matrix to transform self.data
> * `rotate_3d()`: Uses a 3D rotation matrix to transform self.data (which must, in this case, contain exactly 3 features, plus the normal homogeneous coordinate)
> * `transform()`: Uses a homogeneous transformation matrix (passed as a parameter) to transform self.data.
5. Normalization:
> * `normalize_together()`: Uses homogeneous transformation matrices to normalize all the features of self.data together, using the global min and max.
> * `normalize_separately()`: Uses homogeneous transformation matrices to normalize each feaure separately, using its own local min and max.
6. Visualization:
> * `scatter_color()`: Similar to Analysis.scatter(), but using a third feature to control the color of the plotted data points.
> * `heatmap()`: This function is provided for you, already completed. Take a look to see what it's doing.

Use this notebook to demo your completed Transformation class (transformation.py).

## Task 0) Preprocess Iris data

- Copy over `data.py`, `analysis.py`, and `iris.csv` from Project 1.
- In whatever way you wish, replace the `species` strings with ints â€” i.e. setosa -> 0, versicolor -> 1, virginica -> 2. *Remember to change the type to numeric!*

## Task 1) Implement transformation matrices

Implement the following methods in `transformation.py`, running the following test code to guide you as you work. 
- Constructor
- `project(headers)`: "project" the data on the list of data variables specified by `headers` â€” i.e. select a subset of the variables from the original dataset.
- `get_data_homogeneous`: Helper method to get a version of the projected data array with an added homogeneous coordinate.
- `translation_matrix(headers, magnitudes)`: Make an M-dimensional homogeneous transformation matrix for translation
- `scale_matrix(headers, magnitudes)`: Make an M-dimensional homogeneous scaling matrix for scaling.
- `rotation_matrix_3d(header, degrees)`: Make an 3-D homogeneous rotation matrix for rotating the projected data about the ONE axis/variable `header`.
- `transform(C)`: Transforms the PROJECTED dataset by applying the homogeneous transformation matrix `C`.

### Test (i): Translation

- Write a test that does the following. Note the below expected output

* Create Data and Transformation objects for the Iris dataset. 
* Project the Transformation object's data onto the first 3 axes (`sepal_length`, `sepal_width`, and `petal_length`).
* Create a translation matrix that would shift the projected data by -0.5 along `sepal_length` and +1.5 along `petal_length`, then print the translation matrix. 

In [2]:
iris_data_fp = 'data/iris.csv'
iris_data = data.Data(iris_data_fp)
iris_transformation = transformation.Transformation(iris_data)

iris_transformation.project(['sepal_length', 'sepal_width', 'petal_length'])
t_m = iris_transformation.translation_matrix(['sepal_length', 'petal_length'], [-0.5, 1.5])

print(t_m)

[[ 1.   0.   0.  -0.5]
 [ 0.   1.   0.   0. ]
 [ 0.   0.   1.   1.5]
 [ 0.   0.   0.   1. ]]


**Your output should look like:**

    Translation matrix:
    [[ 1.   0.   0.  -0.5]
     [ 0.   1.   0.   0. ]
     [ 0.   0.   1.   1.5]
     [ 0.   0.   0.   1. ]]

### Test (ii): Scaling

* Create a scaling matrix that would scale the projected data by 2 along `sepal_width` and 1/3 along `petal_length`, then print the scaling matrix.

In [3]:
s_m = iris_transformation.scale_matrix(['sepal_width', 'petal_length'], [2, 1/3])

print(s_m)

[[1.      0.      0.      0.     ]
 [0.      2.      0.      0.     ]
 [0.      0.      0.33333 0.     ]
 [0.      0.      0.      1.     ]]


**Your output should look like:**

    Scale matrix:
    [[1.      0.      0.      0.     ]
     [0.      2.      0.      0.     ]
     [0.      0.      0.33333 0.     ]
     [0.      0.      0.      1.     ]]

### Test (iii): Rotation

* Create a rotatation matrix that would rotate the Transformation object's projected data by 45 degrees about `petal_length`, and print the rotation matrix.

In [4]:
r_m = iris_transformation.rotation_matrix_3d('petal_length', 45)

print(r_m)

[[ 0.70711 -0.70711  0.       0.     ]
 [ 0.70711  0.70711  0.       0.     ]
 [ 0.       0.       1.       0.     ]
 [ 0.       0.       0.       1.     ]]


**Your output should look like:**

    Rotation matrix:
    [[ 0.70711 -0.70711  0.       0.     ]
     [ 0.70711  0.70711  0.       0.     ]
     [ 0.       0.       1.       0.     ]
     [ 0.       0.       0.       1.     ]]

### Test (iv): Perform the compound rotation-translation-scaling transformation to the projected data

- Create a compound transformation matrix in the cell below that applies the above rotation, translation, and scaling (in that order). Remember the data matrix will ultimately go on the right-hand side.
- Use the `transform` method to apply it to the projected data. Print the 1st 5 samples.

In [19]:
# Write your compound RTS transformation test here
compound_m = s_m @ t_m @ r_m
iris_data_transformed = iris_transformation.transform(compound_m)

print(compound_m)
print(f'\n{iris_data_transformed[0:5, :]}')

[[ 0.70711 -0.70711  0.      -0.5    ]
 [ 1.41421  1.41421  0.       0.     ]
 [ 0.       0.       0.33333  0.5    ]
 [ 0.       0.       0.       1.     ]]

[[ 0.63137 12.16224  0.96667  1.     ]
 [ 0.8435  11.17229  0.96667  1.     ]
 [ 0.56066 11.17229  0.93333  1.     ]
 [ 0.56066 10.88944  1.       1.     ]
 [ 0.48995 12.16224  0.96667  1.     ]]


    Compound transformation matrix:
    [[ 0.70711 -0.70711  0.      -0.5    ]
     [ 1.41421  1.41421  0.       0.     ]
     [ 0.       0.       0.33333  0.5    ]
     [ 0.       0.       0.       1.     ]]
     
    Transformed data:
    [[ 0.63137 12.16224  0.96667  1.     ]
     [ 0.8435  11.17229  0.96667  1.     ]
     [ 0.56066 11.17229  0.93333  1.     ]
     [ 0.56066 10.88944  1.       1.     ]
     [ 0.48995 12.16224  0.96667  1.     ]]

## Task 2) Transformation detective

The objective of this task is to determine the set of matrix transformations to apply to the specified data and what variables to project onto in order to reproduce the below plots.

### 2a) Hello, Iris

- Create Data and Transformation objects for the Iris dataset.
- Project the dataset onto all the headers.
- Use the Transfromation object to generate a pair plot of the entire Iris dataset. (*Your results should look just like the example, below.*)

In [6]:
# Write your 2a code here

#### Your results should look like the image below.
![sanity_1a.png](attachment:sanity_1a.png)

**Question 1:** How many dimensions (features) does the Iris dataset contain?

**Answer 1:** *Enter your answer here*

### 2b) Solve transformation mystery 1

- Make a Transformation object.
- Determine the set of variables to project onto to recreate the image below.
- Create a pair plot identical to the one below based on the projected data.

In [7]:
# Write your 2b code here

#### Your results should look like the image below.
Your results should look like those below.
![sanity_1b.png](attachment:sanity_1b.png)

**Question 2:** How could you tell what type(s) of transformation to perform in order to recreate this figure?

**Answer 2:** *Your answer here*

### 2c) Implement methods that apply a single transformation

Although you already have a method implemented to apply a compound transformation, it can be convenient to have dedicated methods to apply a single transformation to projected data (without having to pass around matrices). Implement the following methods for this purpose:
- `translate`: Translates the variables `headers` in projected dataset in corresponding amounts specified by `magnitudes`.
- `scale`: Scales the variables `headers` in projected dataset in corresponding amounts specified by `magnitudes`.
- `rotate_3d`: Rotates the projected data about the variable `header` by the angle (in degrees) `degrees`

#### Test `translate`

- Make a Transformation object with the Iris data
- Project onto the first 3 variables (`sepal_length`, `sepal_width`, `petal_length`).
- Translate x, y, and z by +1 unit each.
- Print out the 1st 5 samples of the result.

In [8]:
# Write your translate test code here

**Your translate output should look like:**

    [[6.1 4.5 2.4]
     [5.9 4.  2.4]
     [5.7 4.2 2.3]
     [5.6 4.1 2.5]
     [6.  4.6 2.4]]

#### Test `scale`

- Make a Transformation object with the Iris data
- Project onto the first 3 variables (`sepal_length`, `sepal_width`, `petal_length`).
- Scale x, y, and z to 50% each.
- Print out the 1st 5 samples of the result.

In [9]:
# Write your scale test code here

**Your scale output should look like:**

    [[2.55 1.75 0.7 ]
     [2.45 1.5  0.7 ]
     [2.35 1.6  0.65]
     [2.3  1.55 0.75]
     [2.5  1.8  0.7 ]]

#### Test `rotate_3d`

- Make a Transformation object with the Iris data
- Project onto the first 3 variables (`sepal_length`, `sepal_width`, `petal_length`).
- Rotate about `sepal_length` 10 deg.
- Print out the 1st 5 samples of the result.

In [10]:
# Write your rotate 3d test code here

**Your rotation output should look like:**

    [[5.1     3.20372 1.9865 ]
     [4.9     2.71132 1.89968]
     [4.7     2.92564 1.83592]
     [4.6     2.79243 2.01552]
     [5.      3.3022  2.00386]]

### 2d) Solve transformation mystery 2

- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto then create/apply transformation matrix (or matrices) to recreate the image below.
- Create a scatter plot identical to the one below based on the projected/transformed data.

**NOTE:** Remember that `Transformation` inherits from `Analysis` so you have access to all those methods.

In [11]:
# Write your 2d code here

#### Your results should look like the image below.
![sanity_1c.png](attachment:sanity_1c.png)

**Question 3:** How could you tell what type(s) of transformation to perform in order to recreate this figure?

**Answer 3:** *Your answer here*

### 2e) Solve transformation mystery 3

- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto then create/apply transformation matrix (or matrices) to recreate the image below.
- Create a scatter plot identical to the one below based on the projected/transformed data. 

In [12]:
# Write your 2e code here

#### Your results should look like the plot below.
![sanity_1d.png](attachment:sanity_1d.png)

**Question 4:** How could you tell what type(s) of transformation to perform in order to recreate this figure?

**Answer 4:** *Your answer here*

### 2f) Solve transformation mystery 4

- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto then create/apply transformation matrix (or matrices) to recreate the image below.
- Create a pair plot identical to the one below based on the projected/transformed data. 

In [13]:
# Write your 2f code here

#### Your results should look like the image below.
![sanity_1e.png](attachment:sanity_1e.png)

**Question 5:** How could you tell what type(s) of transformation to perform in order to recreate this figure?

**Answer 5:** *Your answer here*

## Task 3) Normalization

In this task, you will take advantage of your data transformation pipeline to normalize data in two ways:
1. All the variables together (entire matrix).
2. All the variables separately/independently.

Implement the following methods to perform each of these operations:
- `normalize_together`: 
- `normalize_separately`: 

### 3a ) Normalize together

- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto to recreate the image below.
- Use a Transformation object to normalize each feature of the Iris dataset __together__.
- Create a pair plot identical to the one below based on the projected/transformed data. 

In [14]:
# Write your 3a code here

#### Your results should look like the image below.
![sanity_2a.png](attachment:sanity_2a.png)

### 3b) Normalize Separately
- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto to recreate the image below.
- Use a Transformation object to normalize each feature of the Iris dataset __separately__.
- Create a pair plot identical to the one below based on the projected/transformed data. 

In [15]:
# Write your 3b code here

Your results should look like the image below.
![sanity_2b.png](attachment:sanity_2b.png)

**Question 6:** What type(s) of transformation does normalization require?

**Answer 6:** *Your answer here*

### 3c) Analysis challenge

- Make a Transformation object with the Iris data.
- Determine the set of variables to project onto to recreate the image below.
- Determine the set of transformations to apply to them.
- Create a pair plot identical to the one below based on the projected/transformed data. 

In [16]:
# Write your 3c code here

Your results should look like the image below.
![sanity_2c.png](attachment:sanity_2c.png)

## Task 4) Visualizing multi-dimensional data (>3D)

The Iris dataset has too many dimensions to visualize in 2D space with a standard scatterplot! Let's see what we can do about that.

### 4a) Color scales

In this subtask, you will use color to visualize a third dimension of the Iris dataset. Your color scale should be colorblind friendly.

- Implement the `scatter_color()` method to your `Transformation` class that uses color to represent a third axis on a 2D scatterplot.
    - **Section B (Linear Algebra):** Use a ColorBrewer color palette to implement the color scale (e.g. from the `palettable` library).
- Use your `scatter_color()` method to recreate the images below.
    * One with headers [`sepal_length`, `petal_length`, `sepal_width`].
    * Another with headers [`sepal_length`, `petal_length`, `species`].

**Reminder:** Re-project your data onto the appropriate variables before plotting.

In [17]:
# Write your 4a code here

#### Your results should look like the following
![sanity_3a1.png](attachment:sanity_3a1.png)
![sanity_3a2.png](attachment:sanity_3a2.png)

**Question 7:** In a scatterplot, is color a more useful representation of __continuous__ features (like sepal width) or __discrete__ features (like species)? Why do you think that is?

**Answer 7:** *Your answer here*

### 4b) Heatmap

Use the a `heatmap()` method (written for you) to recreate the image below.

In [18]:
# Write your 4b code here

#### Your results should look like the image below.
![image.png](attachment:image.png)

**Question 8:** Does color help you see any patterns in this heatmap that were difficult to see in the scatterplots?

**Question 9:** Are there any characteristics of iris.csv support the readability of this heatmap? Explain your answer.

**Answer 8:** *Your answer here*

**Answer 9:** *Your answer here*

## Extensions

To receive credit for any extension, you must:
- Not modify / prevent any code from the core project from working (e.g. make a copy before changing). In other words, **the notebook test code should still work!**
- **You must describe what you did and what you found in detail**. This includes a summary of parameter values used in your simulations.
- Include (*labeled!*) plots and/or numbers to present your results.
- Write up your extensions below or in a separate notebook.

**Rule of thumb: one deep, thorough extension is worth more than several quick, shallow extensions!**

**Reminder:** Give credit to all sources, including anyone that you consulted.

### 1. Explore additional visualizations

- Implement a scatter plot version that uses the marker size aesthetic to visualize another dimension of data (up to 4D).
- Implement a scatter plot version that uses both color and marker size aesthetics (up to 5D).

### 2. Perform different matrix transformations on data

- Normalize by Z-score rather than min/max.
- "Whiten" a dataset.
- Implement normalize together and separately using numpy vectorization/broadcasting. Compare the approaches in efficiency (time and compare the two implementations).

### 3. Implement and use 2D rotation

### 4. Apply matrix transformations and visualization a dataset of your choice