# 第十五课 图片特征和图像搜索

- [x] 载入clip模型并且抽取图片特征
- [x] 实现一个拍照程序，方便给图片入库
- [x] 使用特征进行以图搜图

在这节课中我们将尝试为图像抽取特征。

在之前中我们尝试过分析特定的数据，比如泰坦尼克号乘客生存概率的数据

或者在上一节课的作业中，有一个手势分类的例子

这些特征是相对容易理解的。

在这节课中，我们要接触另一类种类型的特征

这些特征是深度学习发展起来之后，通过一个深度学习模型，在大量的数据集上训练得到的特征

In [3]:
import os
os.environ['HTTP_PROXY'] = 'http://localhost:8234'
os.environ['HTTPS_PROXY'] = 'http://localhost:8234'

如果是在班级上课的话，我们这里建议老师们把模型拷贝给同学

如果在家里上课的话，同学们直接设置cache_dir = None即可

这样transformer库会自动下载模型

In [6]:
from src.CLIPExtractor import CLIPExtractor
cache_dir = "D:\\aistudio\\LubaoGithub\\models"
clip_extractor = CLIPExtractor(model_name = "openai/clip-vit-large-patch14", cache_dir = cache_dir)

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


openai/clip-vit-large-patch14 这个模型顾名思义是一个比较大的clip模型

整体下载下来有1.5个G左右。如果同学老师们觉得这个模型太大，可以改为

openai/clip-vit-base-patch32

另外为了国内的同学老师下载方便，在ClipExtractor代码中，我们设置的https://hf-api.gitee.com这个镜像地址去下载。

接下来我们尝试用这个extractor的extract_image_from_file方法，来抽取一个图片的特征看看

In [9]:
img_path = "images/car.jpg"

feature = clip_extractor.extract_image_from_file(img_path)

print(feature.shape)
print(feature[:5])

(768,)
[ 0.42208827 -0.33759007 -0.3351909  -0.6240947   0.44211075]


可以看到CLIPExtractor输出了768个浮点数

这些数字就是图像的“特征”

当然这些特征很难用自然语言来描述。

我们需要用“余弦相似度”来评估图片和图片之间的相似度。我们让ChatGPT来帮我们计算一下图片两两的相似度。

---

```python
img_path = "images/car.jpg"

feature = clip_extractor.extract_image_from_file(img_path)

print(feature.shape)
print(feature[:5])
```

这段代码可以顺利运行，输出是

```
(768,)
[ 0.42208827 -0.33759007 -0.3351909  -0.6240947   0.44211075]
```

在images文件夹下有 car.jpg, car2.jpg, 狗.jpg,  狐狸.jpg, 老虎.jpg, 骆驼.jpg

参考上面这段代码，实现一段python程序，抽取每一张图片的特征，计算两两图片之间的相似度

再在notebook中用一个表格对相似度进行可视化

---

In [10]:
import numpy as np
import pandas as pd

# Sample feature extraction function - replace with actual feature extraction logic
def extract_image_feature(image_path):
    # # Mockup feature extraction
    # np.random.seed(hash(image_path) % 100000)  # To ensure reproducibility
    # return np.random.randn(768)
    return clip_extractor.extract_image_from_file(image_path)

# List of image paths
image_paths = ["images/car.jpg", "images/car2.jpg", "images/狗.jpg", "images/狐狸.jpg", "images/老虎.jpg", "images/骆驼.jpg"]

# Extract features for each image
features = {image_path: extract_image_feature(image_path) for image_path in image_paths}

# Function to compute cosine similarity
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

# Calculate pairwise cosine similarities
similarity_matrix = pd.DataFrame(index=image_paths, columns=image_paths, dtype=float)

for img1 in image_paths:
    for img2 in image_paths:
        similarity_matrix.loc[img1, img2] = cosine_similarity(features[img1], features[img2])

# Display the similarity matrix
print(similarity_matrix)

# import ace_tools as tools; tools.display_dataframe_to_user(name="Image Similarity Matrix", dataframe=similarity_matrix)


                 images/car.jpg  images/car2.jpg  images/狗.jpg  images/狐狸.jpg  \
images/car.jpg         1.000000         0.651916      0.527754       0.369409   
images/car2.jpg        0.651916         1.000000      0.384417       0.252949   
images/狗.jpg           0.527754         0.384417      1.000000       0.471594   
images/狐狸.jpg          0.369409         0.252949      0.471594       1.000000   
images/老虎.jpg          0.618884         0.430250      0.717786       0.538361   
images/骆驼.jpg          0.570757         0.425792      0.624439       0.499466   

                 images/老虎.jpg  images/骆驼.jpg  
images/car.jpg        0.618884       0.570757  
images/car2.jpg       0.430250       0.425792  
images/狗.jpg          0.717786       0.624439  
images/狐狸.jpg         0.538361       0.499466  
images/老虎.jpg         1.000000       0.728275  
images/骆驼.jpg         0.728275       1.000000  


可以看到对于car.jpg来说，car2.jpg和他的相似度最高，达到了0.65

当然狗和狐狸之间的相似度也很高，但是如果有另一张狗的图片，很有可能就是狗之间的相速度会更高了。

所以我们可以用特征之间的余弦相似度，来制作最简单的以图搜图。

当然，在做这个以图搜图之前，让我们先完成一个“拍照”的程序

---

我希望实现一个python的拍照程序

- 按f键可以使得呈现的画面左右镜像

- 按空格可以保存当前的图像，保存到record_images文件夹下，文件名为时间戳

- 按q键可以退出程序

如果程序开始的时候record_images文件夹不存在则新建一下。

In [13]:
import cv2
import os
import time

# 检查record_images文件夹是否存在，不存在则创建
if not os.path.exists('record_images'):
    os.makedirs('record_images')

# 初始化摄像头
cap = cv2.VideoCapture(0)

# 设置镜像标志
mirror = False

while True:
    # 从摄像头读取一帧
    ret, frame = cap.read()

    if not ret:
        print("无法获取摄像头数据")
        break

    # 如果mirror为True，则进行左右镜像操作
    if mirror:
        frame = cv2.flip(frame, 1)

    # 显示画面
    cv2.imshow('Camera', frame)

    # 检查按键
    key = cv2.waitKey(1) & 0xFF
    if key == ord('f'):
        # 切换镜像状态
        mirror = not mirror
    elif key == ord(' '):
        # 保存图片
        timestamp = int(time.time())
        filename = f'record_images/{timestamp}.jpg'
        cv2.imwrite(filename, frame)
        print(f'图片已保存：{filename}')
    elif key == ord('q'):
        # 退出程序
        break

# 释放摄像头
cap.release()
# 关闭所有OpenCV窗口
cv2.destroyAllWindows()


图片已保存：record_images/1724742614.jpg
图片已保存：record_images/1724742632.jpg
图片已保存：record_images/1724742637.jpg
图片已保存：record_images/1724742649.jpg
图片已保存：record_images/1724742660.jpg
图片已保存：record_images/1724742672.jpg
图片已保存：record_images/1724742681.jpg


我们在record_images里面已经存储了一些照片

接下来我们要用clip_extractor提取这些照片的特征，并且最好存储起来

以便于下次运行程序的时候，可以避免掉抽取的过程

---

```python
from src.CLIPExtractor import CLIPExtractor

clip_extractor = CLIPExtractor()

img_path = "images/car.jpg"

feature = clip_extractor.extract_image_from_file(img_path)

print(feature.shape)
print(feature[:5])
```

这段代码可以顺利抽取图片的特征，输出如下

```
(768,)
[ 0.42208827 -0.33759007 -0.3351909  -0.6240947   0.44211075]
```

我希望建立一个类ImageDatabase，能够抽取record_images文件夹下所有jpg图片的特征，这个类包括下面的成员

__init__( folder_name = "record_images" ) 进行初始化，

这个类会先尝试读取folder_name下的feature.pkl文件，获取已经被抽取的文件和特征的关系

接着扫描folder_name对应文件夹下的所有jpg图片，对于没有抽取过的图片，重新抽取特征

然后将所有的文件和特征，重新存储到feature.pkl文件中

In [4]:
import os
import pickle
from src.CLIPExtractor import CLIPExtractor

class ImageDatabase:
    def __init__(self, folder_name="record_images"):
        self.folder_name = folder_name
        self.feature_dict = self._load_features()

        # 如果你有cache_dir的话，在这里指定，没有的话直接用 self.clip_extractor = CLIPExtractor() 初始化就可以。
        cache_dir = "D:\\aistudio\\LubaoGithub\\models"
        self.clip_extractor = CLIPExtractor(model_name = "openai/clip-vit-large-patch14", cache_dir = cache_dir)
        self._update_features()

    def _load_features(self):
        # 尝试加载已保存的特征字典
        feature_file = os.path.join(self.folder_name, "feature.pkl")
        if os.path.exists(feature_file):
            with open(feature_file, 'rb') as f:
                return pickle.load(f)
        else:
            return {}

    def _update_features(self):
        # 遍历文件夹中的所有jpg图片
        for filename in os.listdir(self.folder_name):
            if filename.endswith(".jpg"):
                img_path = os.path.join(self.folder_name, filename)
                # 如果图片尚未被抽取特征，则进行特征抽取
                if filename not in self.feature_dict:
                    feature = self.clip_extractor.extract_image_from_file(img_path)
                    self.feature_dict[filename] = feature
                    print("为图片抽取特征并保存：", filename)

        # 保存更新后的特征字典
        self._save_features()

    def _save_features(self):
        # 保存特征字典到文件
        feature_file = os.path.join(self.folder_name, "feature.pkl")
        with open(feature_file, 'wb') as f:
            pickle.dump(self.feature_dict, f)

# 使用示例
database = ImageDatabase()
print("Features updated and saved.")


`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


为图片抽取特征并保存： 1724742614.jpg
为图片抽取特征并保存： 1724742632.jpg
为图片抽取特征并保存： 1724742637.jpg
为图片抽取特征并保存： 1724742649.jpg
为图片抽取特征并保存： 1724742660.jpg
为图片抽取特征并保存： 1724742672.jpg
为图片抽取特征并保存： 1724742681.jpg
Features updated and saved.


这个时候对于一张新的照片比如images/search_example.jpg，我们肯定希望去抽取这个图片的特征，

然后和ImageDatabase中的特征进行逐一比对，找到最相似的图片，并进行展示

我们先用之前的代码,尝试抽取images/search_example.jpg的特征

In [6]:
img_path = "images/search_example.jpg"

# 因为之前database中已经初始化过了CLIPExtractor，为了节省内存，这里直接用database.clip_extractor
clip_extractor = database.clip_extractor

feature = clip_extractor.extract_image_from_file(img_path)

print(feature.shape)
print(feature[:5])

(768,)
[ 0.7393415   0.52909625 -0.34023103 -0.08355319  0.38873   ]


我们接下来编辑一个prompt，为ImageDatabase增加一个search_with_feature( feature )方法，

这个方法会返回输入特征最接近的图片以及对应的相似度

---


{复制ImageDatabase的代码，或者继续之前的对话}

```python
# Function to compute cosine similarity
def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
```

这个函数可以计算两个feature之间的相似度

我希望为ImageDatabase增加一个成员函数 search_with_feature( feature )方法，

输入query的特征，返回和输入特征最接近的图片的文件名，以及对应的相似度

In [1]:
import numpy as np
import os
import pickle

# 假设CLIPExtractor类和cosine_similarity函数已经定义
from src.CLIPExtractor import CLIPExtractor

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

class ImageDatabase:
    def __init__(self, folder_name="record_images"):
        self.folder_name = folder_name
        self.feature_dict = self._load_features()
        # 如果你有cache_dir的话，在这里指定，没有的话直接用 self.clip_extractor = CLIPExtractor() 初始化就可以。
        cache_dir = "D:\\aistudio\\LubaoGithub\\models"
        self.clip_extractor = CLIPExtractor(model_name = "openai/clip-vit-large-patch14", cache_dir = cache_dir)
        self._update_features()

    def _load_features(self):
        feature_file = os.path.join(self.folder_name, "feature.pkl")
        if os.path.exists(feature_file):
            with open(feature_file, 'rb') as f:
                return pickle.load(f)
        else:
            return {}

    def _update_features(self):
        for filename in os.listdir(self.folder_name):
            if filename.endswith(".jpg"):
                img_path = os.path.join(self.folder_name, filename)
                if filename not in self.feature_dict:
                    feature = self.clip_extractor.extract_image_from_file(img_path)
                    self.feature_dict[filename] = feature
        self._save_features()

    def _save_features(self):
        feature_file = os.path.join(self.folder_name, "feature.pkl")
        with open(feature_file, 'wb') as f:
            pickle.dump(self.feature_dict, f)

    def search_with_feature(self, query_feature):
        max_similarity = -1
        most_similar_image = None

        for filename, feature in self.feature_dict.items():
            similarity = cosine_similarity(query_feature, feature)
            if similarity > max_similarity:
                max_similarity = similarity
                most_similar_image = filename

        return most_similar_image, max_similarity

下面我们来实现测试代码

In [5]:
database = ImageDatabase()

img_path = "images/search_example.jpg"

# 因为之前database中已经初始化过了CLIPExtractor，为了节省内存，这里直接用database.clip_extractor
clip_extractor = database.clip_extractor

query_feature = clip_extractor.extract_image_from_file(img_path)

most_similar_image, max_similarity = database.search_with_feature(query_feature)

print(most_similar_image)
print(max_similarity)

`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


1724742672.jpg
1.0000001


可以看到这段程序是可以正常输出的

当然这样做太抽象了，我们需要实现一个实时的程序来进行可视化

```python
database = ImageDatabase()

img_path = "images/search_example.jpg"

# 因为之前database中已经初始化过了CLIPExtractor，为了节省内存，这里直接用database.clip_extractor
clip_extractor = database.clip_extractor

query_feature = clip_extractor.extract_image_from_file(img_path)

most_similar_image, max_similarity = database.search_with_feature(query_feature)

print(most_similar_image)
print(max_similarity)
```

这段代码可以顺利运行

clip_extractor还有一个方法extract_image( frame )可以直接抽取frame中的特征

我希望实现一个摄像头程序

一开始的时候初始化一张全黑的图片，放在摄像头原图的右边

然后摄像头不断读取图片，并抽取特征，然后和数据库中的特征进行比对，找到最相似的图片，然后显示在摄像头原图的右边

整个程序支持按f翻转镜头，或者按空格记录图片 以及按q退出

In [6]:
import cv2
import numpy as np
from src.CLIPExtractor import CLIPExtractor

# from image_database import ImageDatabase  # 假设之前的ImageDatabase类被保存在image_database.py文件中
from src.ImageDatabase import ImageDatabase

# 初始化数据库
database = ImageDatabase()

# 初始化CLIPExtractor
clip_extractor = database.clip_extractor

# 初始化摄像头
cap = cv2.VideoCapture(0)

# 初始化全黑图片
black_image = np.zeros((480, 640, 3), dtype=np.uint8)

# 检查摄像头是否打开
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

while True:
    # 读取摄像头帧
    ret, frame = cap.read()
    if not ret:
        print("Error: Can't receive frame (stream end?). Exiting ...")
        break

    # 抽取摄像头帧的特征
    frame_feature = clip_extractor.extract_image(frame)

    # 使用抽取的特征在数据库中搜索最相似的图片
    most_similar_image, max_similarity = database.search_with_feature(frame_feature)

    # 加载最相似的图片
    if most_similar_image:
        similar_image = cv2.imread(os.path.join(database.folder_name, most_similar_image))
        if similar_image is not None:
            # 将摄像头帧和最相似的图片并排显示
            combined_image = cv2.hconcat([frame, similar_image])
        else:
            combined_image = cv2.hconcat([frame, black_image])
    else:
        combined_image = cv2.hconcat([frame, black_image])

    # 显示组合后的图片
    cv2.imshow('Camera + Most Similar Image', combined_image)

    # 处理按键事件
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break
    elif key == ord('f'):
        # 翻转摄像头帧
        frame = cv2.flip(frame, 1)
    elif key == ord(' '):
        # 保存当前帧到数据库
        filename = f"recorded_{int(cv2.getTickCount())}.jpg"
        cv2.imwrite(os.path.join(database.folder_name, filename), frame)
        feature = clip_extractor.extract_image(frame)
        database.feature_dict[filename] = feature
        database._save_features()

# 释放摄像头和销毁所有窗口
cap.release()
cv2.destroyAllWindows()


`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


可以看到这段程序可以正常运行

但是非常卡顿，这是由于frame_feature = clip_extractor.extract_image(frame)这句语句非常消耗时间

这个时候我们可以要求chatgpt把程序修改为多线程的程序

---

{复制之前的程序，或者在之前的连续会话中}

这段程序可以正常运行，但是frame_feature = clip_extractor.extract_image(frame)这句话非常消耗时间

我希望修改这段程序，使得画面的左边始终流畅显示frame，

但是同时在另一个线程，如果没有当前抽取的frame，则把frame送去抽取，如果抽取结束，则存储抽取完的feature到last_feature 然后把待抽取的frame更新成None然

并且对最近抽取的特征last_feature，搜索最相近的照片在右侧显示



In [8]:
import cv2
import numpy as np
import os
import threading
from src.CLIPExtractor import CLIPExtractor
from src.ImageDatabase import ImageDatabase

# Initialize the database
database = ImageDatabase()

# Initialize CLIPExtractor
clip_extractor = database.clip_extractor

# Initialize the camera
cap = cv2.VideoCapture(0)

# Initialize a full black image
black_image = np.zeros((480, 640, 3), dtype=np.uint8)

# Check if the camera opened correctly
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

# Variables for threading
frame_to_extract = None
last_feature = None
lock = threading.Lock()

def feature_extraction_thread():
    global frame_to_extract, last_feature
    while True:
        if frame_to_extract is not None:
            # Extract feature for the current frame
            feature = clip_extractor.extract_image(frame_to_extract)

            # Store the extracted feature
            with lock:
                last_feature = feature
                frame_to_extract = None

# Start the feature extraction thread
threading.Thread(target=feature_extraction_thread, daemon=True).start()

while True:
    # Read camera frame
    ret, frame = cap.read()
    if not ret:
        print("Error: Can't receive frame (stream end?). Exiting ...")
        break

    # Display the camera frame smoothly
    combined_image = cv2.hconcat([frame, black_image])

    # Search for the most similar image using the last extracted feature
    if last_feature is not None:
        most_similar_image, max_similarity = database.search_with_feature(last_feature)
        if most_similar_image:
            similar_image = cv2.imread(os.path.join(database.folder_name, most_similar_image))
            if similar_image is not None:
                # Display the frame and the most similar image side by side
                combined_image = cv2.hconcat([frame, similar_image])
        else:
            combined_image = cv2.hconcat([frame, black_image])

    # Show the combined image
    cv2.imshow('Camera + Most Similar Image', combined_image)

    # Handle key events
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break
    elif key == ord('f'):
        # Flip the camera frame
        frame = cv2.flip(frame, 1)
    elif key == ord(' '):
        # Save the current frame to the database
        filename = f"recorded_{int(cv2.getTickCount())}.jpg"
        cv2.imwrite(os.path.join(database.folder_name, filename), frame)
        feature = clip_extractor.extract_image(frame)
        database.feature_dict[filename] = feature
        database._save_features()

    # Assign frame for feature extraction in a separate thread
    if frame_to_extract is None:
        with lock:
            frame_to_extract = frame.copy()

# Release the camera and destroy all windows
cap.release()
cv2.destroyAllWindows()


`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.


暂时调通了这个例子，但是还是比较难的

这节课后面就不上分类了，先把这个图像搜索上完就可以，下节课上分类