## Apply PCA and Clustering to Wholesale Customer Data

In this homework, we'll examine the [**_Wholesale Customers Dataset_**](https://archive.ics.uci.edu/ml/datasets/Wholesale+customers), which we'll get from the UCI Machine Learning Datasets repository.  This dataset contains the purchase records from clients of a wholesale distributor.  It details the total annual purchases across categories seen in the data dictionary below:

**Category** | **Description** 
:-----:|:-----:
CHANNEL| 1= Hotel/Restaurant/Cafe, 2=Retailer (Nominal)
REGION| Geographic region of Portugal for each order (Nominal)
FRESH| Annual spending (m.u.) on fresh products (Continuous);
MILK| Annual spending (m.u.) on milk products (Continuous); 
GROCERY| Annual spending (m.u.)on grocery products (Continuous); 
FROZEN| Annual spending (m.u.)on frozen products (Continuous) 
DETERGENTS\_PAPER| Annual spending (m.u.) on detergents and paper products (Continuous) 
DELICATESSEN| Annual spending (m.u.)on and delicatessen products (Continuous); 

**_TASK:_** Read in `wholesale_customers_data.csv` from the `datasets` folder and store in a dataframe.  Store the `Channel` column in a separate variable, and then drop the `Channel` and `Region` columns from the dataframe. Scale the data and use PCA to engineer new features (Principal Components).  Print out the explained variance for each principal component. 

### Imports

In [16]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

## 1. Reading in Data

In [17]:
df = pd.read_csv('../Notebooks/Datasets/wholesale_customers_data.csv')
df.head()

Unnamed: 0,Channel,Region,Fresh,Milk,Grocery,Frozen,Detergents_Paper,Delicassen
0,2,3,12669,9656,7561,214,2674,1338
1,2,3,7057,9810,9568,1762,3293,1776
2,2,3,6353,8808,7684,2405,3516,7844
3,1,3,13265,1196,4221,6404,507,1788
4,2,3,22615,5410,7198,3915,1777,5185


## 2. Store Channel in a Separate Variable

In [18]:
target = df['Channel']

## 3. Droppping Columns

In [19]:
df_removed = df.drop(columns=['Channel', 'Region'])

## 4. Scaling the Data

In [20]:
# standard scaling is utilized
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df_removed)
X_scaled.T

array([[ 0.05293319, -0.39130197, -0.44702926, ...,  0.20032554,
        -0.13538389, -0.72930698],
       [ 0.52356777,  0.54445767,  0.40853771, ...,  1.31467078,
        -0.51753572, -0.5559243 ],
       [-0.04111489,  0.17031835, -0.0281571 , ...,  2.34838631,
        -0.60251388, -0.57322717],
       [-0.58936716, -0.27013618, -0.13753572, ..., -0.54337975,
        -0.41944059, -0.62009417],
       [-0.04356873,  0.08640684,  0.13323164, ...,  2.51121768,
        -0.56977032, -0.50488752],
       [-0.06633906,  0.08915105,  2.24329255, ...,  0.12145607,
         0.21304614, -0.52286938]])

## 5. Performing PCA

In [24]:
# Principal Component Analysis by way of scikit-learn
pca = PCA(n_components=2)
X_r = pca.fit_transform(X_scaled)
# Explained Variance for Each Component - how much information did we preserve?
print(pca.explained_variance_)
print(pca.explained_variance_ratio_)
print(pca.explained_variance_ratio_.cumsum())

[2.65099857 1.70646229]
[0.44082893 0.283764  ]
[0.44082893 0.72459292]


## K-Means, but Without All the Supervision
 

**_Challenge:_** Use K-Means clustering on the `wholesale_customers` dataset, and then again on a version of this dataset transformed by PCA.  

1. Read in the data from the `wholesale_customers_data.csv` file contained within the datasets folder.  

2. Store the `Channel` column in a separate variable, and then drop the `Region` and `Channel` columns from the dataframe.  `Channel` will act as our labels to tell us what class of customer each datapoint actually is, in case we want to check the accuracy of our clustering.  

3.  Scale the data, fit a k-means object to it, and then visualize the data and the clustering.  

4.  Use PCA to transform the data, and then use k-means clustering on it to see if our results are any better.  

**Challenge:_** Use the confusion matrix function to create a confusion matrix and see how accurate our clustering algorithms were.  Which did better--scaled data, or data transformed by PCA?

