In this assignment, you will use a pre-trained convnet to produce features for a classifier that can detect a single object type. This notebook has some code to help you get started. 

In [87]:
import pandas as pd
import os
from os import listdir
from os.path import isfile, join
import os.path as osp
from tqdm import tqdm_notebook as tqdm

In [88]:
from sklearn.model_selection import train_test_split

### Gather positive examples

Pick a word. For example, "red" or "santa" or "horse". 

Now you will need to find "positive" image examples of that word. For example, if you chose "red" as your word, you will need to find images of red things. You are free to use Google Image search or something similar. File types shouldn't matter, but try to stick with .png and .jpg files.

You'll need at least 100 positive example images. Put them in the folder called `pos`. 

### Gather negative examples

Now you need to think about negative examples; i.e., things that are *not* examples of your word. You can either just find random images, or look for specific negative examples. For example, if you chose the word "red" then it might work best if you find negative examples that are other colors, especially colors close to red. 

You'll need at least 200 negative example images. Put them in the folder called `neg`. 

## 1.) Run the following cell

* This imports needed Keras libraries
* Then, it gets the trained VGG19 imagenet model
* Then, it prints out the names of all the layers in that model

In [89]:
import numpy as np
from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications.vgg19 import preprocess_input
from tensorflow.keras.preprocessing import image
from tensorflow.keras.models import Model

base_model = VGG19(weights='imagenet',include_top=True)
xs,ys=224,224

for layer in base_model.layers:
    print(layer.name)

input_4
block1_conv1
block1_conv2
block1_pool
block2_conv1
block2_conv2
block2_pool
block3_conv1
block3_conv2
block3_conv3
block3_conv4
block3_pool
block4_conv1
block4_conv2
block4_conv3
block4_conv4
block4_pool
block5_conv1
block5_conv2
block5_conv3
block5_conv4
block5_pool
flatten
fc1
fc2
predictions


In [77]:
base_model

<keras.engine.functional.Functional at 0x7fcf22ea1430>

### 2.) Determine your output layer

- try `predictions` first
- note the layers printed out above; you can use any of those laters
- pay attention to output shape of each layer! predictions is a vector of size 1000, for example

In [90]:
layer = 'predictions'

model = Model(inputs=base_model.input, outputs=base_model.get_layer(layer).output)

### Run the following cell

- These functions are to help you perform transfer learning

In [91]:
def get_image(img_path, xs,ys):
    x = image.load_img(img_path, target_size=(xs, ys))
    x = image.img_to_array(x)
    x = np.expand_dims(x, axis=0)
    return x

def get_img_features(model, img):
    img = preprocess_input(img)
    yhat = model.predict(img)
    return yhat

def get_image_features(word):
    files = [f for f in listdir(word)] # grab all of the images in the folder
    image_vectors = []
    for f in tqdm(files):
        img = get_image(osp.join(word, f), xs, ys) 
        x_feats = get_img_features(model, img).flatten() # get features for each image
        image_vectors.append(x_feats) 
    return np.array(image_vectors)

## 3.) Evaluate a classifier for your `word`

* Using the positive and negative output from `model`, train a classifier (it can be a linear classifier from scikit-learn, if you'd like, but I would recommend the Keras Dense network we built for the previous assignment). 
* You'll need to split your data into Train and Test (I would recommend using half of the data for training, half for testing; you may opt for downloading more positive and negative examples)
* your classifier can be any scikit classifier, but you can also use a neural network of some kind

In [8]:
pos_images = get_image_features('pos') # get positive image vectors


Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for f in tqdm(files):


  0%|          | 0/100 [00:00<?, ?it/s]

In [92]:
neg_images = get_image_features('neg')

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for f in tqdm(files):


  0%|          | 0/198 [00:00<?, ?it/s]

### Prepare the data. Split to train/test sets

In [93]:
import sklearn 

In [94]:
X = np.concatenate((pos_images, neg_images))

y_pos = np.ones(pos_images.shape[0])
y_neg = np.zeros(neg_images.shape[0])
y = np.concatenate((y_pos, y_neg))

data = pd.DataFrame(X)
data['y'] = y

data.head()


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,991,992,993,994,995,996,997,998,999,y
0,3.507573e-07,2.879765e-06,1.580464e-07,2.802052e-07,2.93575e-07,4.441483e-05,2.204816e-07,2.279076e-05,3.83225e-05,6.307426e-07,...,5.019204e-05,0.0002662022,0.0005218061,0.001818533,5.209767e-05,0.000341,0.0002576908,0.002303,1.040072e-05,1.0
1,1.175289e-07,4.842361e-05,2.520233e-06,3.377019e-07,2.134688e-06,1.307097e-06,4.497871e-07,5.252026e-06,4.435612e-06,2.67982e-08,...,9.435138e-05,1.044118e-05,2.925503e-05,1.64106e-05,8.903983e-07,0.001081,0.0001784793,0.000147,1.295121e-06,1.0
2,5.57303e-09,3.964879e-07,1.310729e-09,2.594938e-10,5.760505e-10,8.320185e-09,4.93467e-10,7.858988e-10,3.724695e-09,8.338656e-10,...,1.55275e-08,1.016937e-06,2.049984e-09,1.034703e-08,9.547201e-09,2e-06,1.804308e-07,1e-06,4.665858e-08,1.0
3,6.226661e-09,2.628523e-07,6.35529e-08,1.300558e-07,7.130999e-08,3.917478e-07,2.130303e-08,1.038874e-08,1.41161e-08,5.910153e-10,...,1.463722e-07,2.973103e-07,1.120569e-08,2.879135e-08,2.30813e-08,6e-06,2.330296e-06,4e-06,7.411234e-08,1.0
4,1.758125e-07,2.110604e-05,2.189291e-07,1.920241e-08,3.774171e-08,9.923679e-07,8.7336e-08,2.401701e-06,3.10762e-06,1.342046e-07,...,5.257853e-06,0.001477315,7.170216e-06,1.740215e-05,1.772866e-05,6e-05,0.005179733,8.4e-05,6.189363e-07,1.0


# Splitting 

In [95]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=0)

In [96]:
X_train.shape, X_test.shape

((149, 1000), (149, 1000))

### Define model, train

In [97]:
from sklearn.linear_model import LogisticRegression

In [98]:
classifier = LogisticRegression(random_state=0, C=1, max_iter=1000, verbose=1)
classifier.fit(X_train, y_train)


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s finished


LogisticRegression(C=1, max_iter=1000, random_state=0, verbose=1)

### Evaluate

In [99]:
# Evaluate using the logistic regression classifier
predictions = classifier.predict(X_test)
accuracy = np.mean((y_test == predictions).astype(np.float)) * 100.
print(f"Accuracy = {accuracy:.3f}")

Accuracy = 85.235


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  accuracy = np.mean((y_test == predictions).astype(np.float)) * 100.


### 4.) Try CLIP

* Repeat steps 3 and 4 above, only this time using the [CLIP](https://github.com/openai/CLIP) model
  
  To get image features, use the following example: `image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)`

(see also the last code section of the README for the CLIP github repo on training a classifier using CLIP features)
  
  
* (Answer in a markdown cell): Which model+layer works the best for this data? Why do you think that is?
* What makes for good positive examples? What makes for good negative examples? Why does the choice of negative examples matter?

In [25]:
import os
import clip
import torch
from PIL import Image
import transformers 
from transformers import CLIPProcessor, CLIPModel
import numpy as np
from sklearn.linear_model import LogisticRegression
from torch.utils.data import DataLoader


In [100]:
clip.available_models()

['RN50',
 'RN101',
 'RN50x4',
 'RN50x16',
 'RN50x64',
 'ViT-B/32',
 'ViT-B/16',
 'ViT-L/14']

In [158]:
# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"


In [101]:
import os
import clip
import torch
from torchvision.datasets import CIFAR100

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load('ViT-B/32', device)

In [105]:
pwd

'/Users/maishamaliha/Documents/NLP/A6'

In [124]:
direc = '/Users/maishamaliha/Documents/NLP/A6' + '/sample'

In [129]:
import glob
import PIL

In [128]:
images = glob.glob(direc+'/*/*')

In [134]:
import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open(images[0])).unsqueeze(0).to(device)
text = clip.tokenize(["pizza", "not pizza"]).to(device)


with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  


Label probs: [[0.3859106 0.6140894]]


In [130]:
images[0]

'/Users/maishamaliha/Documents/NLP/A6/sample/neg copy/photo-1565138146061-e29b079736c0.jpg'