# Feature Vectors:
- A feature vector is an abstraction of the image itself and at the most basic level, is simply a list of numbers used to represent the image

#### Steps:
- first step of building any image search engine is to define your image descriptor. 
- Once we have defined our image descriptor we can apply our descriptor to an image. 
- The output of the image descriptor is our feature vector.

In [3]:
import cv2
image = cv2.imread("/Users/apple/Downloads/charizard.png")
image.shape
#http://www.pyimagesearch.com/2014/03/03/charizard-explains-describe-quantify-image-using-feature-vectors/

(198, 254, 3)

# Raw Pixel Feature Vectors
basic color feature vector you can use is the raw pixel intensities 


- What image descriptor am I using? I am using a raw pixel descriptor.
- What is the excepted output of my descriptor? 
A list of numbers corresponding to the raw RGB pixel intensities of my image.

In [7]:
raw = image.flatten()
raw.shape

(150876,)

In [8]:
raw

array([255, 255, 255, ..., 255, 255, 255], dtype=uint8)

In [11]:
#Our flattened array has a shape of 150,876 because there exists 198 x 254 = 50,292 pixels in the image with 3 values
#per pixel, thus 50,292 x 3 = 150,876

# Color Mean
- A simple method to quantify the color of an image is to compute the mean of each of the color channels


- What image descriptor am I using? A color mean descriptor.
- What is the expected output of my image descriptor? The mean value of each channel of the image.

In [12]:
means = cv2.mean(image)
means

(181.12238527002307, 199.18315040165433, 206.514296508391, 0.0)

In [14]:
# using the cv2.mean method

#This method returns a tuple with four values, our color features.
# The first value is the mean of the blue channel, the second value the mean of the green channel
#and the third value is the mean of red channel. <<Remember, OpenCV stores RGB images as a NumPy array>>
#but in reverse order. We actually read them backwards in BGR order, 
#hence the blue value comes first, then the green, and finally the red.

In [15]:
means = means[:3]
means

(181.12238527002307, 199.18315040165433, 206.514296508391)

# Color Mean and Standard Deviation
- compute both the mean and standard deviation of each channel

- What image descriptor am I using? A color mean and standard deviation descriptor.
- What is the expected output of my image descriptor? The mean and standard deviation of each channel of the image.

In [16]:
(means, stds) = cv2.meanStdDev(image)
means, stds

(array([[ 181.12238527],
        [ 199.1831504 ],
        [ 206.51429651]]), array([[ 80.67819854],
        [ 65.41130384],
        [ 77.77899992]]))

In [19]:
# grab both the mean and standard deviation of each channel, we use the cv2.meanStdDev
# returns a tuple - one for the means and one for the standard deviations, respectively
#  this list of numbers serves as our color features
# combine the means and standard deviations into a single color feature vector:

In [22]:
import numpy as np
stats = np.concatenate([means, stds]).flatten()
stats

array([ 181.12238527,  199.1831504 ,  206.51429651,   80.67819854,
         65.41130384,   77.77899992])

In [23]:
# our feature vector stats has six entries rather than three. 
# We are now representing the mean of each channel as well as the standard deviation of each channel in the image.

# 3D color histogram to describe our image.

- What image descriptor am I using? A 3D color histogram.
- What is the expected output of my image descriptor? A list of numbers used to characterize the color distribution of the image.

In [24]:
hist = cv2.calcHist([image], [0, 1, 2], None, [8, 8, 8], [0, 256, 0, 256, 0, 256])

In [25]:
hist.shape

(8, 8, 8)

In [28]:
# How can we use this as a feature vector if it’s multi-dimensional?
# Ans:  We should flatten it

In [29]:
hist = hist.flatten()
hist.shape

(512,)

In [30]:
# By defining our image descriptor as a 3D color histogram we can extract a list of numbers (i.e. our feature vector)
# to represent the distribution of colors in the image.