## Unsupervised Machine Learning
- Unsupervised learning subsumes all kinds of machine learning where there is no
known output, no teacher to instruct the learning algorithm. In unsupervised learning, 
the learning algorithm is just shown the input data, and asked to extract knowledge from this data.


### Types of Unsupervised Learning
- Dimensionality Reduction 
- Clustering

#### Dimensionality Reduction
- This involves creating a new representation from a high-dimensional data by looking for subset of the original features
that summarizes all the essential characteristics in the original high-dimensional data. The most common dimensionlaity
reduction techniques is the **Principal Component Analysis (PCA)**.

### Principal Component Analysis (PCA)
- Principal component analysis (PCA) is a method that rotates the dataset in a way
such that the rotated features are statistically uncorrelated. This rotation is often followed 
by selecting only a subset of the new features, according to how important they
are for explaining the data.



- The algorithm proceeds by first finding the direction of maximum variance, labeled as “Component 1”. This is the direction in the data that contains most of the information, or
in other words, the direction along which the features are most correlated with each
other.
Then, the algorithm finds the direction that contains the most information while
being orthogonal (is at a right angle) to the first direction.



- The second plot shows the same data, but now rotated so that the first principal component aligns with the x axis, and the second principal component aligns with the y
axis. Before the rotation, the mean was subtracted from the data, so that the transformed 
data is centered around zero. In the rotated representation found by PCA, the
two axes are uncorrelated, meaning that the correlation matrix of the data in this representation is zero except for the diagonal.
We can use PCA for dimensionality reduction by retaining only some of the principal
components. 

## Implementing PCA in sklearn

In [7]:
import pandas as pd 
import numpy as np 
from sklearn.preprocessing import  StandardScaler 
from sklearn.model_selection import train_test_split 


In [8]:
# Before Applying PCA, we must scale the data so that each feature has a unit variance

# get the cancer data

df = pd.read_csv ("mhs.csv")
df.head()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel
0,25,130,80,15.0,98.0,86,high risk
1,35,140,90,13.0,98.0,70,high risk
2,29,90,70,8.0,100.0,80,high risk
3,30,140,85,7.0,98.0,70,high risk
4,35,120,60,6.1,98.0,76,low risk


### Explore the data

In [9]:
df.shape

(1014, 7)

In [10]:
df.isnull().sum()

Age            0
SystolicBP     0
DiastolicBP    0
BS             0
BodyTemp       0
HeartRate      0
RiskLevel      0
dtype: int64

In [11]:
df.describe()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate
count,1014.0,1014.0,1014.0,1014.0,1014.0,1014.0
mean,29.871795,113.198225,76.460552,8.725986,98.665089,74.301775
std,13.474386,18.403913,13.885796,3.293532,1.371384,8.088702
min,10.0,70.0,49.0,6.0,98.0,7.0
25%,19.0,100.0,65.0,6.9,98.0,70.0
50%,26.0,120.0,80.0,7.5,98.0,76.0
75%,39.0,120.0,90.0,8.0,98.0,80.0
max,70.0,160.0,100.0,19.0,103.0,90.0


In [12]:
df.corr()

Unnamed: 0,Age,SystolicBP,DiastolicBP,BS,BodyTemp,HeartRate
Age,1.0,0.416045,0.398026,0.473284,-0.255323,0.079798
SystolicBP,0.416045,1.0,0.787006,0.425172,-0.286616,-0.023108
DiastolicBP,0.398026,0.787006,1.0,0.423824,-0.257538,-0.046151
BS,0.473284,0.425172,0.423824,1.0,-0.103493,0.142867
BodyTemp,-0.255323,-0.286616,-0.257538,-0.103493,1.0,0.098771
HeartRate,0.079798,-0.023108,-0.046151,0.142867,0.098771,1.0


In [13]:
# select features and Target
X = df.drop("SystolicBP", axis = 1)
y = df["SystolicBP"]

In [14]:
# Perform one-hot encoding as we have got one feature being categprical or binary  
X = pd.get_dummies(X, drop_first=True)

In [15]:
# Split data 
x_train, x_test, y_train, y_test = train_test_split (X, y, random_state = 0, test_size = 0.2 )

In [16]:
x_train.shape

(811, 7)

### Standardize the data

In [17]:
# scale the data
scaler = StandardScaler()
x_train_sc = scaler.fit_transform(x_train)
x_test_sc = scaler.transform(x_test)

### Implementing PCA on the cancer data 

In [18]:
from sklearn.decomposition import PCA


pca = PCA (n_components =2, random_state = 0 ) # specify how many component u want; less, it will not reduce it but use entire feature and find the principal component out of them 

# transform the data
x_train_pca = pca.fit_transform(x_train_sc)
x_test_pca = pca.transform(x_test_sc)


# check the shape of both the original data and the transformed one

print(f"Original: {x_train_sc.shape}")
print(f"PCA: {x_train_pca.shape}") 

Original: (811, 7)
PCA: (811, 2)


In [19]:
x_train_sc

array([[-0.52394719,  0.24829747, -0.51976254, ..., -0.5322086 ,
        -0.83041154, -0.68498387],
       [-0.52394719,  0.61151761, -0.55351395, ..., -1.01977841,
        -0.83041154,  1.4598884 ],
       [-0.96649658, -0.11492267,  1.01132388, ...,  0.19914612,
        -0.83041154, -0.68498387],
       ...,
       [-0.45018896,  0.24829747, -0.36941538, ..., -1.01977841,
         1.20422218, -0.68498387],
       [ 0.36115159, -0.47814281, -0.36941538, ..., -1.01977841,
         1.20422218, -0.68498387],
       [ 0.13987689,  1.70117803, -0.55351395, ...,  0.44293102,
        -0.83041154, -0.68498387]])

In [20]:
x_train_pca # the two important features from already scaled features is seen below  using the principal component 

array([[-0.08292973, -0.17846865],
       [-0.06243439,  1.03879285],
       [-0.01710349,  0.87986315],
       ...,
       [-0.69808459, -1.54611042],
       [-0.64858213, -1.54211686],
       [ 1.09263981, -0.266307  ]])

As can see here, the 7features has been compressed to 2 based on feature importance which is gotten 
when each features are being compared with the 2 principal component and then the one with a high variance or low similarity stand to rep the features for analysis

The question, where are the principal components.

In [21]:
pca.components_ # This is the actual principal components or factors used to bench  the feature column selected amongst the scaled or normalized feature arrays or in summary this is the factors used to transform the x_train_sc summarization    

array([[ 0.51547591,  0.50757655,  0.55209063, -0.21219252,  0.11749082,
        -0.33366976, -0.04100378],
       [-0.11319525, -0.131922  ,  0.00481857,  0.38351501,  0.18369978,
        -0.62441247,  0.63169605]])

 - - Advantage of the PCA is that it helps summarize the high dime=nsional data into a smaller one to reduce high computational cost having known what the summarizxatiuon level would have been 
    

 -- Imagining trying to do computation on about 30 features; this is not cost effective on computation; having benched with the principal component to know summarization level, if 10features against the initla (30features) can summarize to about 70% and then the main information still conserved; we can say  10nos features will get the model done and with short compoutational time  

#### Computing correlation between principal components and original data  

In [22]:
# Get to find a better way to write the code  
pd.DataFrame(data = [[np.corrcoef(x_train[c], x_train_pca[:,n])[1,0]
                      for n in range(pca.n_components_)] for c in x_train ], 
             
             index = x_train.columns, 
             columns = ["First_comp", "Second_comp"] ).T 
# This is producing a correlation between the scaled and summarized version of the data and recall the scaled will be btewn 0 nand 1
# this is then correlated with the initail value corresponing to each featuren  to produce a correlation value

Unnamed: 0,Age,DiastolicBP,BS,BodyTemp,HeartRate,RiskLevel_low risk,RiskLevel_mid risk
First_comp,0.751544,0.740027,0.804926,-0.309368,0.171297,-0.486477,-0.059782
Second_comp,-0.145723,-0.169831,0.006203,0.49372,0.236487,-0.803841,0.813217


Deduction 

As can see from the Dataframe, we can see that the dataframe will produce a frame with correlation comparison between  x_train(c) - which is the original individual feature values and the corresponding x_train_pca which is the transformed values from the PCA. It pops the first component first which is a show of how the frist component is much important based on much value it can capture; the second PC value is also but could only explain the unxplainable part of the first component.  

'As can see above, we can see how each original or initial features (x_train or x_train_sc ) has an importance value to the each Principal component first and second 


IMPORTANCE OF THIS ANALYSIS

- It brings an alternative to the corrlation approaoch we do use say df.corr which we then do the feature selection based on correlation level; same thing being done here too but in a diferent approach or way 


So based on the values shown above, Age, DiastolicBp, BS, Body temp and Risk-leve_low risk seems to make contribution on the systolic level whioch is our target. WIth positive corrlation is and increased rate to systolic but negavtive is an inverse rate or rather the increased Age, disstolic and body temp have a 50% risk of post partum in women  

## HInt


- Get the scaled trained values 

- Compress the data by gettng us 2 most important features or Data column  in an array format as u know it will return just as the normalized or its scaled version  

- Then, get the principal component that had resulted to that 2-numpy array x_train_pca  

- You can now get the principal component itself resulting to the set of important two data coloumn in the array format  

- Recall, withe the usual correlation df.corr u can used that as a feature importance approach, the principal components 

Now use the numpy array to determine or get the feature importance among many feature based on the correlation with the target and also yank off feature based on high correlation btw themselves... As an alternative is using the x_train_pca which is the two features summarized based on the principal components and then using the two to find a correlation with individual features in the original data; getting the sets of correlative values can give better insights on feature importance levele of the feature which then a selection of improtance features to be used for the modelling  

ANSWER 

The Pca showing the corrleation between features is just buttreesing the correlation earlier done using df.corr which can be used with that for feature selection  or elimination for model building  

### EXplained Variance Ratio 

This refers to the percenatge of the variation in the data explained by the principal components 

In [23]:
pca.explained_variance_ratio_

array([0.30366419, 0.23675491])

In [24]:
pca.explained_variance_ratio_.sum()

0.5404190928349023

This infers that using 2components will be able to explain 54% of information in the original data

but from above, 1component will explain only 30% and only the second will explain 24%  

### if the two we used can oly explain 54%, now lets capture which of the n_component can explain much and better

In [25]:
exp_var = []
for i in range(1,5): # Recall we have just 7 componnets so lets just summarize to just 4 features as it has to be lesser that the features nos  
    pca = PCA(n_components = i, random_state = 0)
    pca.fit_transform(x_train_sc)
    exp =pca.explained_variance_ratio_.sum()
    exp_var.append(exp)

In [26]:
exp_var # using 1 feature will expalin 30% of the data; 2 will explain 54%; 3 will explain 69.6% etc  

[0.3036641851494061,
 0.5404190928349023,
 0.6966619311641322,
 0.8169585751478899]

In [27]:
# Now having considered the no of sum components that will explain or summarize the data well, we will then transfrom via the PCA which then 
# to produce the x_train_pca and x_test_pca and then be passed into the model; say n_components that can summarize is 5, so n_component will be 5 

# pca = PCA (n_components =2, random_state = 0 ) # specify how many component u want; less, it will not reduce it but use entire feature and find the principal component out of them 

# transform the data
# x_train_pca = pca.fit_transform(x_train_sc)
# x_test_pca = pca.transform(x_test_sc)
