# Sistema de Recomendação por Similaridade de Imagens
## Primeira Parte: Extração e Armazenamento de Features de Imagens
<p>Esta primeira parte do projeto toma imagens de uma base de treinamento e, utilizando uma rede neural pretreinada, extrai features de cada imagem e armazena-as em um DataFrame, para uso posterior na busca.
<p>Foram utilizadas 44.441 imagens para criação da base, retiradas do dataset <a href="https://www.kaggle.com/datasets/bhaskar2443053/fashion-small">Fashion Small, do Kaggle</a>.
<p>Para extração de features das imagens, foi utilizada a rede VGG16 preteinada com os pesos da imagenet e sem as camadas de classificação.
<p>Como resultado, esta parte do projeto gera um arquivo csv com 513 colunas features para cada imagem, sendo uma coluna para armazenar o nome da imagem e 512 para armazenar as features.
<p>Como o objetivo é criar o csv que será utilizado na consulta de imagens, este notebook só precisa ser executado quando houver alterações na base de imagens.
<p>Esta parte foi separada por ser um processo demorado, mas que não precisa ser executado com frequência.
<p>A consulta de imagem será tratada pelo notebook image_query.ipynb, considerando que o arquivo csv já foi criado por este notebook.

### Etapa 1: Importação de bibliotecas

In [1]:
import os # para tratar caminhos de arquivos
import pandas as pd # para tratar bancos de dados
import tqdm # opcional; para exibir barra de progresso durante a extração de features.
import utils # contém o modelo e a função de extração de features

img_folder = utils.img_folder # pasta onde estão as imagens da base de dados

print('Tudo certo com as importações')

Tudo certo com as importações


### Etapa 2: Extração de features da base de imagens

In [None]:

features_matrix = {} # as features serão armazenadas em um dicionário
print('Construindo dataset com as características das imagens.')
print('Isso pode demorar um bocado de tempo.')
print('Recomendo que vá viver sua vida, cheirar as flores, ler um livro ou coisa assim.')
files = os.listdir(img_folder)[:5] #lista os arquivos de imagem (tirar o [:5] em produção, para englobar todas as imagens.)
with tqdm.tqdm(total=len(files)) as progress: #cria uma barra de progresso (opcional, mas útil)
    for img_path in files: #para cada imagem da base:
        features = utils.img_features_extract(os.path.join(img_folder, img_path), verbose=0) #extrai as features da imagem
        features_matrix[img_path] = features #armazena o vetor de features no dicionário
        progress.update() #atualiza a barra de progresso
    


Construindo dataset com as características das imagens.
Isso pode demorar um bocado de tempo.
Recomendo que vá viver sua vida, cheirar as flores, ler um livro ou coisa assim.


  0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 5/5 [00:18<00:00,  3.78s/it]


### Etapa 3: Exportação das features para um arquivo externo

In [None]:
df = pd.DataFrame.from_dict(features_matrix, orient='index').reset_index() #converte o dicionário de features em DataFrame
df.rename(columns={'index': 'file_name'}, inplace=True) #renomeia a coluna com o nome da imagem
df.to_csv("image_features.csv", index=False) #salva o DataFrame em um arquivo.
print('O dataset foi construído e salvo no arquivo "image_features.csv".')
print('Abaixo é possível visualizar um pedacinho do dataset.')
df.head() #exibe parte do DataFrame

O dataset foi construído e salvo no arquivo "image_features.csv".
Abaixo é possível visualizar um pedacinho do dataset.


Unnamed: 0,file_name,0,1,2,3,4,5,6,7,8,...,502,503,504,505,506,507,508,509,510,511
0,10000.jpg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,10001.jpg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,10002.jpg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,10003.jpg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,10004.jpg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
