Project Title: Analysis for The Sure Tomorrow Insurance Company 

# Introduction

The Sure Tomorrow insurance company wants to solve several tasks with the help of Machine Learning and you are asked to evaluate that possibility.

- Task 1: Find customers who are similar to a given customer. This will help the company's agents with marketing.
- Task 2: Predict whether a new customer is likely to receive an insurance benefit. Can a prediction model do better than a dummy model?
- Task 3: Predict the number of insurance benefits a new customer is likely to receive using a linear regression model.
- Task 4: Protect clients' personal data without breaking the model from the previous task. It's necessary to develop a data transformation algorithm that would make it hard to recover personal information if the data fell into the wrong hands. This is called data masking, or data obfuscation. But the data should be protected in such a way that the quality of machine learning models doesn't suffer. You don't need to pick the best model, just prove that the algorithm works correctly.

# Data Preprocessing & Exploration

## Initialization

In [None]:
import numpy as np
import pandas as pd
import math

import seaborn as sns

import sklearn.linear_model
import sklearn.metrics
import sklearn.neighbors
import sklearn.preprocessing

from sklearn.model_selection import train_test_split

from IPython.display import display

from sklearn.neighbors import NearestNeighbors 
from sklearn.preprocessing import MaxAbsScaler 

## Load Data

Load data and conduct a basic check that it's free from obvious issues.

In [None]:
df = pd.read_csv('/datasets/insurance_us.csv')

We rename the colums to make the code look more consistent with its style.

In [None]:
df = df.rename(columns={'Gender': 'gender', 'Age': 'age', 'Salary': 'income', 'Family members': 'family_members', 'Insurance benefits': 'insurance_benefits'})

In [None]:
df.sample(10)

In [None]:
df.info()

In [None]:
# we may want to fix the age type (from float to int) though this is not critical

# write your conversion here if you choose:
df['age'] = df['age'].astype(int)

In [None]:
# check to see that the conversion was successful
df.info

In [None]:
# now have a look at the data's descriptive statistics. 
# Does everything look okay?
df.describe()

In [None]:
df.duplicated().sum()

In [None]:
df[df.duplicated()].head(10)

In [None]:
df.isnull().sum()

Description: There are no missing values. However, it is difficult to determine if there are true duplicates. Therefore, the data will be left in it's present condition. 

## EDA

Let's quickly check whether there are certain groups of customers by looking at the pair plot.

In [None]:
g = sns.pairplot(df, kind='hist')
g.fig.set_size_inches(12, 12)

Ok, it is a bit difficult to spot obvious groups (clusters) as it is difficult to combine several variables simultaneously (to analyze multivariate distributions). That's where LA and ML can be quite handy.

# Task 1. Similar Customers

In the language of ML, it is necessary to develop a procedure that returns k nearest neighbors (objects) for a given object based on the distance between the objects.

You may want to review the following lessons (chapter -> lesson)
- Distance Between Vectors -> Euclidean Distance
- Distance Between Vectors -> Manhattan Distance

To solve the task, we can try different distance metrics.

Write a function that returns k nearest neighbors for an $n^{th}$ object based on a specified distance metric. The number of received insurance benefits should not be taken into account for this task. 

You can use a ready implementation of the kNN algorithm from scikit-learn (check [the link](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html#sklearn.neighbors.NearestNeighbors)) or use your own.

Test it for four combination of two cases
- Scaling
  - the data is not scaled
  - the data is scaled with the [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html) scaler
- Distance Metrics
  - Euclidean
  - Manhattan

Answer these questions:
- Does the data being not scaled affect the kNN algorithm? If so, how does that appear?
- How similar are the results using the Manhattan distance metric (regardless of the scaling)?

In [None]:
feature_names = ['gender', 'age', 'income', 'family_members']

In [None]:
def get_knn(df, n, k, metric):
    
    """
    Returns k nearest neighbors

    :param df: pandas DataFrame used to find similar objects within
    :param n: object no for which the nearest neighbours are looked for
    :param k: the number of the nearest neighbours to return
    :param metric: name of distance metric
    """
    nbrs = NearestNeighbors(n_neighbors=k, metric=metric) 

    nbrs.fit(df[feature_names]) 
    
    nbrs_distances, nbrs_indices = nbrs.kneighbors([df.iloc[n][feature_names]], k, return_distance=True)
    
    df_res = pd.concat([
        df.iloc[nbrs_indices[0]], 
        pd.DataFrame(nbrs_distances.T, index=nbrs_indices[0], columns=['distance'])
        ], axis=1)
    
    return df_res

Scaling the data.

In [None]:
feature_names = ['gender', 'age', 'income', 'family_members']

transformer_mas = sklearn.preprocessing.MaxAbsScaler().fit(df[feature_names].to_numpy())

df_scaled = df.copy()
df_scaled.loc[:, feature_names] = transformer_mas.transform(df[feature_names].to_numpy())

In [None]:
df_scaled.sample(5)

In [None]:
#Unscaled Euclidean distance 
nbrs_unscaled_euclidean = get_knn(df, n=0, k=5, metric='euclidean')
print("Unscaled data with Euclidean distance:")
print(nbrs_unscaled_euclidean)

In [None]:
#Unscaled Manhattan distance 
nbrs_unscaled_euclidean = get_knn(df, n=0, k=5, metric='euclidean')
print("Unscaled data with Euclidean distance:")
print(nbrs_unscaled_euclidean)

In [None]:
#Scaling Euclidean distance 
nbrs_scaled_manhattan = get_knn(df_scaled, n=0, k=5, metric='euclidean')
print("\nScaled data with Manhattan distance:")
print(nbrs_scaled_manhattan)

In [None]:
#Scaling Manhattan distance 
nbrs_scaled_manhattan = get_knn(df_scaled, n=0, k=5, metric='manhattan')
print("\nScaled data with Manhattan distance:")
print(nbrs_scaled_manhattan)

Now, let's get similar records for a given one for every combination

Answers to the questions:

**Does the data being not scaled affect the kNN algorithm? If so, how does that appear?** 

Yes, the unscaled data will impact the kNN algorithm distances. The features with greater numerical ranges  can effect the distance calculation. This results in different nearest neighbors being selected. The scaled data ensures that the features remain in comparable ranges. 

**How similar are the results using the Manhattan distance metric (regardless of the scaling)?** 

The results are very consistent when using the Manhattan distance metric because the absolute value is used. Absolute value ensures consistency because each feature is weighted equally. This also prevents outliers from having as big an impact. 

# Task 2. Is Customer Likely to Receive Insurance Benefit?

In terms of machine learning we can look at this like a binary classification task.

With `insurance_benefits` being more than zero as the target, evaluate whether the kNN classification approach can do better than a dummy model.

Instructions:
- Build a KNN-based classifier and measure its quality with the F1 metric for k=1..10 for both the original data and the scaled one. That'd be interesting to see how k may influece the evaluation metric, and whether scaling the data makes any difference. You can use a ready implemention of the kNN classification algorithm from scikit-learn (check [the link](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)) or use your own.
- Build the dummy model which is just random for this case. It should return "1" with some probability. Let's test the model with four probability values: 0, the probability of paying any insurance benefit, 0.5, 1.

The probability of paying any insurance benefit can be defined as

$$
P\{\text{insurance benefit received}\}=\frac{\text{number of clients received any insurance benefit}}{\text{total number of clients}}.
$$

Split the whole data in the 70:30 proportion for the training/testing parts.

In [None]:
# calculate the target

df['insurance_benefits_received'] = (df['insurance_benefits']>0)

In [None]:
# check for the class imbalance with value_counts()
df['insurance_benefits_received'].value_counts()


In [None]:
def eval_classifier(y_true, y_pred):
    
    f1_score = sklearn.metrics.f1_score(y_true, y_pred)
    print(f'F1: {f1_score:.2f}')
    
# if you have an issue with the following line, restart the kernel and run the notebook again
    cm = sklearn.metrics.confusion_matrix(y_true, y_pred, normalize='all')
    print('Confusion Matrix')
    print(cm)

In [None]:
# generating output of a random model

def rnd_model_predict(P, size, seed=42):

    rng = np.random.default_rng(seed=seed)
    return rng.binomial(n=1, p=P, size=size)

In [None]:
for P in [0, df['insurance_benefits_received'].sum() / len(df), 0.5, 1]:

    print(f'The probability: {P:.2f}')
    y_pred_rnd = rnd_model_predict(P, size=len(df))
        
    eval_classifier(df['insurance_benefits_received'], y_pred_rnd)
    
    print()

In [None]:
from sklearn.model_selection import train_test_split 

#split data into training and validation sets 
drop_columns = ['insurance_benefits_received','insurance_benefits']
X = df.drop(drop_columns, axis=1) 
y = df['insurance_benefits_received'] 

X_train, X_val, y_train, y_val = train_test_split(X,y, test_size=0.3, random_state=42)

#print shapes of training and validation sets to verify 
print("Training set:") 
print(X_train.shape, y_train.shape)
print("Validation set:")
print(X_val.shape, y_val.shape) 

In [None]:
#Initialize MaxAbsScaler and fit transform on training data 

scaler = MaxAbsScaler()
X_train_scaled = scaler.fit_transform(X_train) 
X_val_scaled = scaler.transform(X_val) 

In [None]:
from sklearn.neighbors import KNeighborsClassifier 

#Build kNN classifier 

def evaluate_knn(X_train, y_train, X_val, y_val): 
    for k in range(1,11): 
        knn = KNeighborsClassifier(n_neighbors=k) 
        knn.fit(X_train, y_train) 
        y_pred = knn.predict(X_val) 
        print(f' k={k}') 
        eval_classifier(y_val, y_pred) 
        print() 

#Examine original data 
print("Original data:") 
evaluate_knn(X_train, y_train, X_val, y_val) 

print()

#Examine scaled data 
print("Scaled data:") 
evaluate_knn(X_train_scaled, y_train, X_val_scaled, y_val) 

# Task 3. Regression (with Linear Regression)

With `insurance_benefits` as the target, evaluate what RMSE would be for a Linear Regression model.

Build your own implementation of LR. For that, recall how the linear regression task's solution is formulated in terms of LA. Check RMSE for both the original data and the scaled one. Can you see any difference in RMSE between these two cases?

Let's denote
- $X$ — feature matrix, each row is a case, each column is a feature, the first column consists of unities
- $y$ — target (a vector)
- $\hat{y}$ — estimated tagret (a vector)
- $w$ — weight vector

The task of linear regression in the language of matrices can be formulated as

$$
y = Xw
$$

The training objective then is to find such $w$ that it would minimize the L2-distance (MSE) between $Xw$ and $y$:

$$
\min_w d_2(Xw, y) \quad \text{or} \quad \min_w \text{MSE}(Xw, y)
$$

It appears that there is analytical solution for the above:

$$
w = (X^T X)^{-1} X^T y
$$

The formula above can be used to find the weights $w$ and the latter can be used to calculate predicted values

$$
\hat{y} = X_{val}w
$$

Split the whole data in the 70:30 proportion for the training/validation parts. Use the RMSE metric for the model evaluation.

In [None]:
class MyLinearRegression:
    
    def __init__(self):
        
        self.weights = None
    
    def fit(self, X, y):
        
        # adding the unities
        X2 = np.append(np.ones([len(X), 1]), X, axis=1)
        self.weights = np.linalg.inv(X2.T @ X2) @ X2.T @ y

    def predict(self, X):
        
        # adding the unities
        X2 = np.append(np.ones([len(X),1]),X, axis=1)
        y_pred = X2 @ self.weights
        
        return y_pred

In [None]:
def eval_regressor(y_true, y_pred):
    
    rmse = math.sqrt(sklearn.metrics.mean_squared_error(y_true, y_pred))
    print(f'RMSE: {rmse:.2f}')
    
    r2_score = math.sqrt(sklearn.metrics.r2_score(y_true, y_pred))
    print(f'R2: {r2_score:.2f}')    

In [None]:
X = df[['age', 'gender', 'income', 'family_members']].to_numpy()
y = df['insurance_benefits'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12345)

lr = MyLinearRegression()

lr.fit(X_train, y_train)
print(lr.weights)

y_test_pred = lr.predict(X_test)
eval_regressor(y_test, y_test_pred)

In [None]:
#scaled data 
X = df_scaled[['age','gender','income','family_members']].to_numpy()
y = df_scaled['insurance_benefits'].to_numpy() 

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=12345)

lr=MyLinearRegression() 

lr.fit(X_train, y_train) 
print(f"Scaled Data:")
print(f"Weights:")
print(lr.weights)

print() 

print(f"Scaled Data:") 

y_test_pred = lr.predict(X_test) 

eval_regressor(y_test, y_test_pred) 



Description: 
The weights differ between the original and scaled data. This is due to the magnitude of the features and coefficients of the regression model. The RMSE did not change indicating that the model performs equally well with both scaled and unscaled data. In addition the R2 value did not change ethier suggesting that the proportion of varaince is consistent irregardless of scaling. 

# Task 4. Obfuscating Data

It best to obfuscate data by multiplying the numerical features (remember, they can be seen as the matrix $X$) by an invertible matrix $P$. 

$$
X' = X \times P
$$

Try to do that and check how the features' values will look like after the transformation. By the way, the intertible property is important here so make sure that $P$ is indeed invertible.

You may want to review the 'Matrices and Matrix Operations -> Matrix Multiplication' lesson to recall the rule of matrix multiplication and its implementation with NumPy.

In [None]:
personal_info_column_list = ['gender', 'age', 'income', 'family_members']
df_pn = df[personal_info_column_list]

In [None]:
X = df_pn.to_numpy()

Generating a random matrix $P$.

In [None]:
rng = np.random.default_rng(seed=42)
P = rng.random(size=(X.shape[1], X.shape[1]))

Checking the matrix $P$ is invertible

In [None]:
while np.linalg.det(P) == 0:
    P = rng.random(size=(X.shape[1],X.shape[1])) 
X_transformed = X @ P 

df_transformed = pd.DataFrame(X_transformed, columns=personal_info_column_list) 

print("Original Data:") 
print(df_pn.head()) 
print("Transformed Data:") 
print(df_transformed.head()) 

Can you guess the customers' ages or income after the transformation?

It is no possible to guess the customers' age or income after the transformation. 

Can you recover the original data from $X'$ if you know $P$? Try to check that with calculations by moving $P$ from the right side of the formula above to the left one. The rules of matrix multiplcation are really helpful here.

In [None]:
P_inv = np.linalg.inv(P) 
X_recovered = X_transformed @ P_inv 

df_transformed = pd.DataFrame(X_transformed, columns=personal_info_column_list)
df_recovered =  pd.DataFrame(X_recovered,columns=personal_info_column_list)

Print all three cases for a few customers
- The original data
- The transformed one
- The reversed (recovered) one

In [None]:
print("Intial data:")
print(df_pn.head()) 

You can probably see that some values are not exactly the same as they are in the original data. What might be the reason for that?

In [None]:
print("Transformed data:")
print(df_transformed.head()) 

In [None]:
print("Recovered data:")
print(df_recovered.head()) 

In [None]:
#Showing initial data and recovered data are close
print("Is the initial data and recovered data close?") 
print(np.allclose(X,X_recovered))

## Proof That Data Obfuscation Can Work with LR

The regression task has been solved with linear regression in this project. Your next task is to prove _analytically_ that the given obfuscation method won't affect linear regression in terms of predicted values i.e. their values will remain the same. Can you believe that? Well, you don't have to, you should prove it!

So, the data is obfuscated and there is $X \times P$ instead of just $X$ now. Consequently, there are other weights $w_P$ as
$$
w = (X^T X)^{-1} X^T y \quad \Rightarrow \quad w_P = [(XP)^T XP]^{-1} (XP)^T y
$$

How would $w$ and $w_P$ be linked if you simplify the formula for $w_P$ above? 

What would be predicted values with $w_P$? 

What does that mean for the quality of linear regression if you measure it with RMSE?

Check Appendix B Properties of Matrices in the end of the notebook. There are useful formulas in there!

No code is necessary in this section, only analytical explanation!

**Answer**


How would w and wP be linked if you simplify the formula for wP? 

If we simplify the formula the weight for the obfuscated data is: 
wp =[(XP)^T XP]^-1 (XP)^T y

Then further simplify to: 
wp = [(P^T X^T)(XP)]^-1(P^T X^T)y

Then: 
wp = [P^T( X^T X)P]^-1 P^T X^Ty

In examining the matrix inversions and properties: [P^T( X^T X)P]^-1 can be simplified using the property:
(ABC)^-1 = C^-1B^-1A^-1(P^T(X^TX)P)^-1 = P^-1(X^TX)^-1(P^T)^-1

Then further simplified to: 
wp = [P^-1(X^T X)^-1 (P^T)^-1]P^T X^Ty

wp = P^-1(X^T X)^-1(P^T)^-1 P^T X^T y 

(P^T)^-1 P^T = I which simplifies to wp = P^-1 (X^T X)^-1 X^T y 

The link between w and wp since w = (w=(X^T X)^-1 X^T y). w can be substituted in the equation for wp as follows:
wp = P^-1 w 

Basically, the weights for the obfuscated data (wp) are the original weights (w) multiplied by the inverse of the obfuscation matrix (P^-1). 

The weights calculated for the obfuscated data(wp) are just the original weights(w) transformed by P's inverse. 


What would be predicted values with wP? 

The original predicted values:
ŷ = Xw 

The predicted values with obfuscated data: 
ŷ' = (XP)wp 

Substitute wp: 

ŷ' = (XP)(P^-1 w)

ŷ' = X(PP^-1) w)

ŷ' = XIw 

ŷ' = Xw 

The above shows that the predicted values ŷ using the original data X and the weights w are the same predicted values ŷ using the obfuscated data X' and the weights wp. 

What does that mean for the quality of linear regression if you measure it with RMSE? 
RMSE is the same for both cases since predicted values are the same. 


**Analytical proof**

Prove that ŷ = Xw and ŷ = X'wp give the same predicted values ŷ = ŷ: 

A. Linear Regression Weight Calc:
   for the origianl data w = (X^T X)^-1 X^Ty
   for the obfuscated data wp = (X'^T X')^-1 X'^T y

   Sub in X' = X x P

   wp = [(XP)^T (XP)^-1] (XP)^T y
   
B. Simplify the expression for wp: 

wp = [P^T X^T XP]^-1 P^T X^T y 

Using the property matrix inversion: 
(ABC)^-1 = C^-1 B^-1 A^-1(P^T(X^TX)P)^-1 = P^-1(X^TX)^-1(P^T)^-1

Then substitute back: 
wp = P^-1 (X^T X)^-1 (P^T)^-1 P^T X^T y 

C. Further simplify: 
Using the identity matrix: I = (P^T)^-1 P^T 
wp can then be simplified as wp = P^-1 (X^T X)^-1 X^T y 
w is just (X^T X)^-1 X^T y 
Therefore, it can be written as 
wp = P^-1 w 

D. Predicted values 
The original predicted value is ŷ = Xw 
The obfuscated data predicted values ŷ = (XP)wp 

Sub in wp = P^-1 w  

ŷ = (XP)(P^-1 w)

Using the associative property of matrix multiplication:
ŷ = X(PP^-1)w

Since PP^-1 = I 
ŷ = XIw 

Consequently, 
ŷ = Xw 

The predicted values are the same irregardless, of weather the obfuscated data or the original data is used. This would suggest that the RMSE values will not be impacted. 



## Test Linear Regression With Data Obfuscation

Now, let's prove Linear Regression can work computationally with the chosen obfuscation transformation.

Build a procedure or a class that runs Linear Regression optionally with the obfuscation. You can use either a ready implementation of Linear Regression from sciki-learn or your own.

Run Linear Regression for the original data and the obfuscated one, compare the predicted values and the RMSE, $R^2$ metric values. Is there any difference?

**Procedure**

- Create a square matrix $P$ of random numbers.
- Check that it is invertible. If not, repeat the first point until we get an invertible matrix.
- <! your comment here !>
- Use $XP$ as the new feature matrix

In [None]:
def generate_invertible_matrix(size, seed=42):
    rng = np.random.default_rng(seed) 

    while True:
        P = rng.random(size=(size,size)) 
        if np.linalg.det(P) !=0:
            return P 

In [None]:
#Show if invertible 
P = generate_invertible_matrix(size=4, seed=42) 
print("Random Invertible Matrix p:\n",P) 

In [None]:
import matplotlib.pyplot as plt 

# Distribution of elements in P 
plt.hist(P.ravel(), bins=30, edgecolor='k') 
plt.title('Distribution of Elements in P') 
plt.xlabel('Value') 
plt.ylabel('Frequency') 
plt.show() 

In [None]:
#LR on initial data 
X = df[['age','gender','income','family_members']].to_numpy()
y = df ['insurance_benefits'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12345)
lr =MyLinearRegression() 
lr.fit(X_train, y_train) 
print("Initial data:") 
y_test_pred = lr.predict(X_test) 

eval_regressor(y_test,y_test_pred) 


In [None]:
#Obfuscate data 
personal_info_column_list = ['age','gender','income','family_members']
df_pn = df[personal_info_column_list]
X= df_pn.to_numpy() 
X_transformed = X @ P 
df_transformed = pd.DataFrame(X_transformed, columns = personal_info_column_list) 


In [None]:
# LR on Obfuscate data 
X_obfuscated = df_transformed[['age','gender','income','family_members']].to_numpy()
y_obfuscated = df['insurance_benefits'].to_numpy()

X_train_obfuscated, X_test_obfuscated, y_train_obfuscated, y_test_obfuscated = train_test_split(X_obfuscated, y_obfuscated, test_size=0.3, random_state=12345)
lr =MyLinearRegression() 
lr.fit(X_train_obfuscated, y_train_obfuscated)  
print("Obfuscated data:") 
y_test_pred = lr.predict(X_test_obfuscated) 

eval_regressor(y_test_obfuscated, y_test_pred)


The obfuscation process does not impact the performance of the Linear Regression model as the data from the initial and obfuscated data remain the same. 

# Conclusions

The performance of linear regression under obfuscation was examined. When the obfuscation method was applied where the features matrix is multiplied by a random invertible matrix that the quality of the predictions was not reduced. It was discovered that the obfuscation method does not impact the preditions. This idea is supported by the fact that the predicted values from the linear regression model for both the intial and obfuscated data were very similar. Therefore, it can be concluded that the obfuscation process does not interfere with the predictive abilities of the model. Another finding was that there was no change in performance metrics. This would suggest that the model's able to take into account variability even when obfuscation is used. In addition, since the matrix is invertible the inital data can be recovered if need be. The matrix adds an additional security measure, but does not compromise the quality of the model. The results of this project demonstrate that the predicted values and evaluate metrics such as RMSE and R^2 are operational after obfuscation. This indicates that data protection techniques can effectively operate without hindering the machine learning model. 






# Checklist

Type 'x' to check. Then press Shift+Enter.

# Appendices 

## Appendix A: Writing Formulas in Jupyter Notebooks

You can write formulas in your Jupyter Notebook in a markup language provided by a high-quality publishing system called $\LaTeX$ (pronounced "Lah-tech"), and they will look like formulas in textbooks.

To put a formula in a text, put the dollar sign (\\$) before and after the formula's text e.g. $\frac{1}{2} \times \frac{3}{2} = \frac{3}{4}$ or $y = x^2, x \ge 1$.

If a formula should be in its own paragraph, put the double dollar sign (\\$\\$) before and after the formula text e.g.

$$
\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i.
$$

The markup language of [LaTeX](https://en.wikipedia.org/wiki/LaTeX) is very popular among people who use formulas in their articles, books and texts. It can be complex but its basics are easy. Check this two page [cheatsheet](http://tug.ctan.org/info/undergradmath/undergradmath.pdf) for learning how to compose the most common formulas.

## Appendix B: Properties of Matrices

Matrices have many properties in Linear Algebra. A few of them are listed here which can help with the analytical proof in this project.

<table>
<tr>
<td>Distributivity</td><td>$A(B+C)=AB+AC$</td>
</tr>
<tr>
<td>Non-commutativity</td><td>$AB \neq BA$</td>
</tr>
<tr>
<td>Associative property of multiplication</td><td>$(AB)C = A(BC)$</td>
</tr>
<tr>
<td>Multiplicative identity property</td><td>$IA = AI = A$</td>
</tr>
<tr>
<td></td><td>$A^{-1}A = AA^{-1} = I$
</td>
</tr>    
<tr>
<td></td><td>$(AB)^{-1} = B^{-1}A^{-1}$</td>
</tr>    
<tr>
<td>Reversivity of the transpose of a product of matrices,</td><td>$(AB)^T = B^TA^T$</td>
</tr>    
</table>