<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

<br><h2>Script 01 | Fundamentals of Computer Vision</h2>
<br>
Written by Chase Kusterer<br>
<a href="https://github.com/chase-kusterer">GitHub</a> | <a href="https://www.linkedin.com/in/kusterer/">LinkedIn</a>
<br><br><br>

<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />

<h3>Part I: Preparation</h3><br>
Run the following code to import necessary packages, load data, and set display options. 

In [None]:
########################################
# importing packages
########################################
import numpy             as np  # mathematical essentials
import pandas            as pd  # data science essentials
import matplotlib.pyplot as plt # fundamental data visualization
import seaborn           as sns # enhanced visualization
import sys                      # system-specific parameters and functions


# new libraries
from sklearn.decomposition import PCA            # pca
from sklearn.datasets      import load_digits    # digits dataset


########################################
# loading data and setting display options
########################################
# loading data
digits = load_digits()


# setting print options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 100)
np.set_printoptions(threshold=sys.maxsize)


########################################
# chacking the type of the dataset
########################################
type(digits)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>User-Defined Functions</strong><br>
Run the following code to load the user-defined functions used throughout this Notebook.

In [None]:
########################################
# pca_plotter
########################################

# pca_plotter
def pca_plotter(bunch, colors = None):
    """
    PARAMETERS
    ----------
    bunch        : Bunch object to be used in PCA
    colors       : color coding for target labels, default None
    """

    # INSTANTIATING a PCA object
    pca = PCA(n_components = 2,
              random_state = 802)


    # FITTING and TRANSFORMING the data
    dataset_pca = pca.fit_transform(bunch.data)

    
    # setting figure options
    plt.figure(figsize=(10, 10))
    plt.xlim(dataset_pca[:, 0].min(), dataset_pca[:, 0].max())
    plt.ylim(dataset_pca[:, 1].min(), dataset_pca[:, 1].max())


    # data vizualization
    for i in range(len(bunch.data)):

        plt.text(dataset_pca[i, 0],
                 dataset_pca[i, 1],
                 str(bunch.target[i]),
                 color = colors[bunch.target[i]],
                 fontdict={'weight': 'bold', 'size': 9})

    plt.xlabel("First principal component")
    plt.ylabel("Second principal component")
    plt.show()

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

Unlike the previous datasets analyzed in this course, the digits data is a <a href="https://pypi.org/project/bunch/">Bunch object</a>. A Bunch is very similar to a dictionary. In fact, it is a subclass of a dictionary that is highly compatible with Javascript (JSON) and YAML, making it an incredibly useful structure. <strong>It is very likely that you will encounter this structure as a business analyst</strong>. It is commonly used in data that is collected digitally.<br><br>
To get started, let's take a look at the keys of this object. Keys can be thought of in the same way as primary keys in a relational database (they uniquely identify each table of information). Unlike relational databases, Bunch objects generally do not have relationships between tables.

In [None]:
digits.keys()

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
As with anything new in data science, it is important to <strong>read the documentation</strong>. For the digits dataset, this information can be found under the key <strong>'DESCR'</strong>. Note that this is different from accessing a help( ) file, which provides documentation on Python objects as opposed to data.<br><br><br>
<strong>Run the following code below to access the description of the digits dataset.</strong><br>

In [None]:
print(digits.get('DESCR'))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<strong>Run the following code to access information stored under the 'data' key.</strong>

In [None]:
# printing description
print(digits.get('data'))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>Run the following code to access the information stored under the 'target' key.</strong>

In [None]:
# printing target
print(digits.get('target'))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>Run the following code to access the information stored under the 'target_names' key.</strong><br>


In [None]:
# printing target_names
print(digits.get('target_names'))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>Run the following code to access the information stored under the 'images' key.</strong><br>

In [None]:
# printing target_names
print(digits.get('images'))

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<h3>Part II: Understanding Image Data</h3><br>
Note that each of the original hand-written digits was written on white paper with black ink, creating <a href="https://whatis.techtarget.com/definition/grayscale">grayscale images</a>. As can be observed from above, each image is stored as a matrix of numbers, and each number represents the amount of shading in a given cell. A value of 0 implies that a cell was completely white and a value of 16 implies that a cell was completely black.<br><br>
Notice also that since each image is its own matrix, we have three dimensions of indexing, which can be interpreted as follows:<br><br>

~~~
[IMAGE NUMBER, ROWS, COLUMNS]
~~~

<br>
<strong>Run the following code to output the shape of 'images'.</strong>

In [None]:
digits.get('images').shape

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>Run the following code to store the first image as an object and view its numerical matrix.</strong>

In [None]:
digit_images = digits.get('images')
first_digit = digit_images[0, :, :]

print(first_digit)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<br>
As with many aspects of data analysis, it may be easier to visualize an image in order to improve efficiency of interpretation.
<br><br>
<strong>Run the following code to generate a heatmap for the first digit in the dataset.</strong>

In [None]:
# setting display size
fig, ax = plt.subplots(figsize = (10,10))


# creating a heatmap
sns.heatmap(data = first_digit,
            cmap = 'inferno')


# displaying the plot
plt.show()

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
<strong>Run the following code to view images for each digit.</strong>

In [None]:
# printing digit images
fig, ax = plt.subplots(2, 5,
                       figsize=(12, 5),
                       subplot_kw={'xticks':(),
                                   'yticks': ()})


for axes, img in zip(ax.ravel(), digits.images):
    axes.imshow(img)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h3>Part III: Principal Component Analysis (PCA)</h3><br>
Note that the data should be scaled before running PCA. This has already been done for the digits data. Other things to keep in mind:

* There is no need for train/test split because unsupervised algorithms do not require a response variable.
* The process for most unsupervised algorithms is: (1) instantiate, (2) fit, (3) transform.
* There is no score step because there is no response variable.


In [None]:
# INSTANTIATING a PCA model
pca = PCA(n_components = None, # we will discuss this detail later
          random_state = 702 )


# FITTING and TRANSFORMING the digits data
digits_pca = pca.fit_transform(digits.data)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>
It is important to understand how much variance has been accounted for by each principal component. We can observe this by analyzing the <strong>explained_variance_ratio_</strong> attribute.
<br><br>
<strong>Run the following code and analyze the explained variance ratio.</strong>

In [None]:
1/64

In [None]:
# principal component counter
p_component = 0


# extracting explained variance ratio per principal component
for variance in pca.explained_variance_ratio_:
    
    # increasing the counter
    p_component += 1
    
    # printing each principal component's explained variance ratio
    print(f"PC {p_component}: {variance.round(decimals = 2)}")

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

The following is a visualization of the first two principal components. Our goal is to interpret what each principal component is measuring.

In [None]:
# calling pca_plotter
pca_plotter(bunch  = digits,
            colors = colors_lst)

<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

~~~
  _____   ___    ___  ____  ____    ____                                
 / ___/  /  _]  /  _]|    ||    \  /    |                               
(   \_  /  [_  /  [_  |  | |  _  ||   __|                               
 \__  ||    _]|    _] |  | |  |  ||  |  |                               
 /  \ ||   [_ |   [_  |  | |  |  ||  |_ |                               
 \    ||     ||     | |  | |  |  ||     |                               
  \___||_____||_____||____||__|__||___,_|                               
                                                                        
 ____   ____  ______  ______    ___  ____   ____   _____                
|    \ /    ||      ||      |  /  _]|    \ |    \ / ___/                
|  o  )  o  ||      ||      | /  [_ |  D  )|  _  (   \_                 
|   _/|     ||_|  |_||_|  |_||    _]|    / |  |  |\__  |                
|  |  |  _  |  |  |    |  |  |   [_ |    \ |  |  |/  \ |                
|  |  |  |  |  |  |    |  |  |     ||  .  \|  |  |\    |                
|__|  |__|__|  |__|    |__|  |_____||__|\_||__|__| \___|                
                                                                        
 ____  ____                                                             
|    ||    \                                                            
 |  | |  _  |                                                           
 |  | |  |  |                                                           
 |  | |  |  |                                                           
 |  | |  |  |                                                           
|____||__|__|                                                           
                                                                        
    __   ___   ___ ___  ____  _        ___  __ __                       
   /  ] /   \ |   |   ||    \| |      /  _]|  |  |                      
  /  / |     || _   _ ||  o  ) |     /  [_ |  |  |                      
 /  /  |  O  ||  \_/  ||   _/| |___ |    _]|_   _|                      
/   \_ |     ||   |   ||  |  |     ||   [_ |     |                      
\     ||     ||   |   ||  |  |     ||     ||  |  |                      
 \____| \___/ |___|___||__|  |_____||_____||__|__|                      
                                                                        
  _____ ____  ______  __ __   ____  ______  ____  ___   ____   _____ __ 
 / ___/|    ||      ||  |  | /    ||      ||    |/   \ |    \ / ___/|  |
(   \_  |  | |      ||  |  ||  o  ||      | |  ||     ||  _  (   \_ |  |
 \__  | |  | |_|  |_||  |  ||     ||_|  |_| |  ||  O  ||  |  |\__  ||__|
 /  \ | |  |   |  |  |  :  ||  _  |  |  |   |  ||     ||  |  |/  \ | __ 
 \    | |  |   |  |  |     ||  |  |  |  |   |  ||     ||  |  |\    ||  |
  \___||____|  |__|   \__,_||__|__|  |__|  |____|\___/ |__|__| \___||__|
                                                                        
~~~


<br>
<hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<br>