# Run clip-retrieval back with fondant-ai/datacomp-small-clip index

### Create virtual environment

In [1]:
!python3 -m venv .env
!source .env/bin/activate

## Download index and metadata

### Install requirements

In [2]:
!pip install dask[dataframe] huggingface_hub

Defaulting to user installation because normal site-packages is not writeable


### Create the index folder

In [3]:
!mkdir datacomp_small

### Download the index

In [4]:
!wget -O datacomp_small/image.index "https://huggingface.co/datasets/fondant-ai/datacomp-small-clip/resolve/main/faiss?download=true" -q --show-progress



### Download the metadata

In [5]:
import dask.dataframe as dd
from dask.diagnostics import ProgressBar

ddf = dd.read_parquet("hf://datasets/fondant-ai/datacomp-small-clip/id_mapping")
ddf = ddf.rename(columns={"image_path": "url"})
ddf = ddf.repartition(npartitions=1)

with ProgressBar():
    ddf.to_parquet("datacomp_small/metadata")

  from .autonotebook import tqdm as notebook_tqdm


[                                        ] | 0% Completed | 3.04 s ms

'(ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: fc4d9ed6-3e35-4d93-8486-7da2b9f2320d)')' thrown while requesting GET https://huggingface.co/datasets/fondant-ai/datacomp-small-clip/resolve/main/id_mapping/part-00000002-d50665c4-da02-11ee-9c19-42010a0a0a09.parquet
Retrying in 1s [Retry 1/5].


[########################################] | 100% Completed | 9.71 s


## Run clip-retrieval backend

### Install requirements

In [6]:
!pip install clip-retrieval

Defaulting to user installation because normal site-packages is not writeable


In [7]:
%%writefile indices.json
{
    "fondant_datacomp_small": {
        "indice_folder": "datacomp_small",
        "columns_to_return": ["url"],
        "clip_model": "open_clip:ViT-B-32/laion2b_s34b_b79k",
        "enable_mclip_option": false,
        "provide_aesthetic_embeddings": false
    }
}

Overwriting indices.json


In [8]:
!clip-retrieval back --port 1234 --indices-paths indices.json --clip_model open_clip:ViT-B-32/laion2b_s34b_b79k

starting boot of clip back
warming up with batch size 1 on cuda
done warming up in 5.622560501098633s
indices loaded
 * Serving Flask app 'clip_retrieval.clip_back'
 * Debug mode: off
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:1234
 * Running on http://192.168.1.181:1234
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:44:14] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:45:03] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:48:55] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:49:25] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:50:15] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:50:29] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:50:46] "POST /knn-service HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [29/Mar/2024 11:51:17] "POST