Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No embeddings found in folder #198

Closed
loretoparisi opened this issue Mar 20, 2024 · 2 comments
Closed

No embeddings found in folder #198

loretoparisi opened this issue Mar 20, 2024 · 2 comments

Comments

@loretoparisi
Copy link

loretoparisi commented Mar 20, 2024

I'm using autofaiss==2.15.8. I get the error No embeddings found in folder when calling build_index:

from autofaiss import build_index

 max_index_memory_usage='4G'
        build_index(embeddings=embeddings,
                    index_path='%s/knn.index'%path_to_store_index,
                    index_infos_path="%s/knn_info.json"%path_to_store_index,
                    max_index_memory_usage=max_index_memory_usage)

Here I use the embeddings parameters that according to the docs may contain a path or the object (numpy array), and in the code above I'm using the latter approach, that according to the source of build_index here

if isinstance(embeddings, np.ndarray):
        tmp_dir_embeddings = tempfile.TemporaryDirectory()  # pylint: disable=consider-using-with
        np.save(os.path.join(tmp_dir_embeddings.name, "emb.npy"), embeddings)
        embeddings_path = tmp_dir_embeddings.name
    else:
        embeddings_path = embeddings  # type: ignore

should creare a file within the tmp_dir_embeddings , that afterwards causes the error

No embeddings found in folder /tmp/tmpnt82l_ms

there I assume the tmp_dir_embeddings is the one specified in the error, /tmp/tmpnt82l_ms, while the instance type is of np.ndarray.

@loretoparisi
Copy link
Author

[UPDATE]

Digging into dependant libraries source code, I have found that the error is raied by the embedding-reader module here in the NumpyReader class:

class `NumpyReader`:
    """Numpy reader class, implements init to read the files headers and call to procuce embeddings batches"""

    def __init__(self, embeddings_folder):
        self.embeddings_folder = embeddings_folder
        self.fs, embeddings_file_paths = get_file_list(embeddings_folder, "npy")

        headers = get_numpy_headers(embeddings_file_paths, self.fs)
        self.headers = pd.DataFrame(
            headers,
            columns=["filename", "count", "count_before", "dimension", "dtype", "header_offset", "byte_per_item"],
        )

        self.count = self.headers["count"].sum()
        if self.count == 0:
            raise ValueError(f"No embeddings found in folder {embeddings_folder}") # <--- this error

@loretoparisi
Copy link
Author

loretoparisi commented Mar 22, 2024

Fixed downgrading embedding_reader to back support Python 3.7.x see - rom1504/embedding-reader#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant