Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Integration of VLM embedding model #446

Merged
merged 57 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d300035
add e5 embedding
Wendong-Fan Nov 23, 2023
524bfd4
fix typo in toml file
Wendong-Fan Nov 25, 2023
0f13021
allow user to switch embeeding model from SentenceTransformer
Wendong-Fan Nov 30, 2023
9ddc871
Move the import to __init__
Wendong-Fan Nov 30, 2023
e9c3135
polish docstring
Wendong-Fan Nov 30, 2023
aeae92d
remove # type: ignore
Wendong-Fan Nov 30, 2023
b431e67
change embed_list return type and polish docstring
Wendong-Fan Nov 30, 2023
884f190
use Union[List[List[float]], ndarray] instead of List[List[float]] | …
Wendong-Fan Nov 30, 2023
9cce263
change return of embed_list from ndarray to list
Wendong-Fan Dec 3, 2023
4c7b67c
change name from SentenceTransformerEmbedding into SentenceTransforme…
Wendong-Fan Dec 3, 2023
939808e
update poetry
lightaime Dec 3, 2023
e8ce692
update poetry
lightaime Dec 3, 2023
1bf7320
update poetry
lightaime Dec 3, 2023
653b381
update poetry
lightaime Dec 3, 2023
93e795e
remove ndarry and union in embedding base file
Wendong-Fan Dec 8, 2023
a50b478
Merge branch 'master' into feature/open_source_embedding_model
Wendong-Fan Dec 8, 2023
4d5ba2d
sentence-transformer
FUYICC Jan 30, 2024
692a670
integration of clip embedding and update of license
FUYICC Feb 1, 2024
20654fd
Limit embed_list input type
FUYICC Feb 3, 2024
9e0de62
revert changes of sentence embedding
FUYICC Feb 5, 2024
b3ea26c
poetry change of pillow
FUYICC Feb 5, 2024
f1adf18
change of docstring of functions
FUYICC Feb 5, 2024
8ca7195
change of get_output_dim function
FUYICC Feb 24, 2024
c0f2b85
fix of bugs of embedding dim
FUYICC Feb 24, 2024
955bf11
allow the clip embedding accept both texts and images
FUYICC Feb 25, 2024
afb46bf
fix the bug for pytest
FUYICC Feb 28, 2024
9f98e8a
fix the bug for poetry.lock
FUYICC Feb 29, 2024
79e6d8d
refactor: refactor CLIPEmbedding class to improve readability and doc…
Appointat Mar 8, 2024
2e16ed6
chore: remove empty line in pyproject.toml
Appointat Mar 8, 2024
d5e10fb
chore: add specific test cases for image and text embeddings
Appointat Mar 8, 2024
f41f3c2
fix: fix error handling in CLIPEmbedding class
Appointat Mar 8, 2024
ddf78af
typo: fix default value capitalization in CLIPEmbedding class
Appointat Mar 8, 2024
8fe17cb
Use generics to support the type system
FUYICC Mar 11, 2024
1afd27b
store dimension into a variable
FUYICC Mar 11, 2024
e8d073d
Update update_license.py for windows compatibility
FUYICC Mar 12, 2024
f0a1573
Change to general visual language model class and use lazy initializa…
FUYICC Apr 9, 2024
0fc220d
Merge branch 'master' into CLIP_model
FUYICC Apr 12, 2024
1fa0c0f
test for inconsistancy of inputs with different types
FUYICC Apr 12, 2024
71d48a2
update of poetry
FUYICC Apr 12, 2024
4de4fad
usage of **kwargs
FUYICC Apr 12, 2024
1517d52
debug for pytest
FUYICC May 2, 2024
2105510
Merge branch 'master' into CLIP_model
FUYICC May 3, 2024
ed54edf
poetry dependency
FUYICC May 3, 2024
a667614
ruff
FUYICC May 3, 2024
8aab43d
poetry
FUYICC May 3, 2024
8c1f086
return list of float
FUYICC May 5, 2024
b8bd94e
change of tests
FUYICC May 5, 2024
de718ce
Update camel/embeddings/vlm_embedding.py
FUYICC May 27, 2024
6ebf5cd
Update camel/embeddings/vlm_embedding.py
FUYICC May 27, 2024
c969597
Update camel/embeddings/vlm_embedding.py
FUYICC May 27, 2024
6b2c48e
Update camel/embeddings/vlm_embedding.py
FUYICC May 27, 2024
b0cadb0
Update camel/embeddings/vlm_embedding.py
FUYICC May 27, 2024
e2c7824
one method for **kwargs
FUYICC May 27, 2024
1c23c64
split of kwargs
FUYICC Jun 2, 2024
487dfca
add pillow into tool.poetry.extras
FUYICC Jun 2, 2024
908bb91
Merge branch 'master' into CLIP_model
FUYICC Jun 2, 2024
6b5db36
poetry lock
FUYICC Jun 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions camel/embeddings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@
from .base import BaseEmbedding
from .openai_embedding import OpenAIEmbedding
from .sentence_transformers_embeddings import SentenceTransformerEncoder
from .vlm_embedding import VisionLanguageEmbedding

__all__ = [
"BaseEmbedding",
"OpenAIEmbedding",
"SentenceTransformerEncoder",
"VisionLanguageEmbedding",
]
105 changes: 105 additions & 0 deletions camel/embeddings/vlm_embedding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
# Licensed under the Apache License, Version 2.0 (the “License”);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an “AS IS” BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
from typing import Any, List, Optional, Union

from PIL import Image

from camel.embeddings import BaseEmbedding


class VisionLanguageEmbedding(BaseEmbedding[Union[str, Image.Image]]):
r"""Provides image embedding functionalities using multimodal model.

Args:
model_name : The model type to be used for generating embeddings.
And the default value is: obj:`openai/clip-vit-base-patch32`.

Raises:
RuntimeError: If an unsupported model type is specified.
"""

def __init__(
self, model_name: str = "openai/clip-vit-base-patch32"
) -> None:
r"""Initializes the: obj: `VisionLanguageEmbedding` class
with a specified model
and return the dimension of embeddings.

Args:
model_name (str, optional): The version name of the model to use.
(default: :obj:`openai/clip-vit-base-patch32`)
"""
FUYICC marked this conversation as resolved.
Show resolved Hide resolved
from transformers import AutoModel, AutoProcessor

self.model = AutoModel.from_pretrained(model_name)
self.processor = AutoProcessor.from_pretrained(model_name)
FUYICC marked this conversation as resolved.
Show resolved Hide resolved
self.dim: Optional[int] = None

def embed_list(
self,
objs: List[Union[Image.Image, str]],
**kwargs: Any,
) -> List[List[float]]:
r"""Generates embeddings for the given images or texts.

Args:
objs (List[Image.Image|str]): The list of images or texts for
which to generate the embeddings.
**kwargs (Any): Extra kwargs passed to the embedding API.

Returns:
List[List[float]]: A list that represents the generated embedding
as a list of floating-point numbers.
"""
FUYICC marked this conversation as resolved.
Show resolved Hide resolved
if not objs:
raise ValueError("Input objs list is empty.")
result_list = []
for obj in objs:
if isinstance(obj, Image.Image):
input = self.processor(
Wendong-Fan marked this conversation as resolved.
Show resolved Hide resolved
images=obj, return_tensors="pt", padding=True, **kwargs
)
image_feature = (
self.model.get_image_features(**input, **kwargs)
Wendong-Fan marked this conversation as resolved.
Show resolved Hide resolved
.squeeze(dim=0)
.tolist()
)
result_list.append(image_feature)
FUYICC marked this conversation as resolved.
Show resolved Hide resolved

elif isinstance(obj, str):
input = self.processor(
text=obj, return_tensors="pt", padding=True, **kwargs
)
text_feature = (
self.model.get_text_features(**input, **kwargs)
Wendong-Fan marked this conversation as resolved.
Show resolved Hide resolved
.squeeze(dim=0)
.tolist()
)
result_list.append(text_feature)
else:
raise ValueError("Input type is not image nor text.")
self.dim = len(result_list[0])
return result_list
FUYICC marked this conversation as resolved.
Show resolved Hide resolved

def get_output_dim(self):
FUYICC marked this conversation as resolved.
Show resolved Hide resolved
r"""Returns the output dimension of the embeddings.

Returns:
int: The dimensionality of the embedding for the current model.
"""
if self.dim is None:
text = 'dimension'
inputs = self.processor(text=[text], return_tensors="pt")
self.dim = self.model.get_text_features(**inputs).shape[1]
return self.dim
FUYICC marked this conversation as resolved.
Show resolved Hide resolved
6 changes: 4 additions & 2 deletions licenses/update_license.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,10 +39,12 @@ def update_license_in_file(
start_line_start_with: str,
end_line_start_with: str,
) -> bool:
with open(file_path, 'r') as f:
with open(
file_path, 'r', encoding='utf-8'
) as f: # for windows compatibility
content = f.read()

with open(license_template_path, 'r') as f:
with open(license_template_path, 'r', encoding='utf-8') as f:
new_license = f.read().strip()

maybe_existing_licenses = re.findall(
Expand Down
2 changes: 1 addition & 1 deletion poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ pyowm = { version = "^3.3.0", optional = true }
googlemaps = { version = "^4.10.0", optional = true }
requests_oauthlib = { version = "^1.3.1", optional = true }
unstructured = { extras = ["all-docs"], version = "^0.10.30", optional = true }

pillow = { version = "^10.2.0", optional = true }
Wendong-Fan marked this conversation as resolved.
Show resolved Hide resolved
# encoders
sentence-transformers = { version = "^2.2.2", optional = true }

Expand Down
78 changes: 78 additions & 0 deletions test/embeddings/test_vlm_embeddings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
# Licensed under the Apache License, Version 2.0 (the “License”);
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an “AS IS” BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =========== Copyright 2023 @ CAMEL-AI.org. All Rights Reserved. ===========
import pytest
import requests
from PIL import Image
from transformers import CLIPModel, CLIPProcessor

from camel.embeddings import VisionLanguageEmbedding


@pytest.fixture
def VLM_instance() -> VisionLanguageEmbedding:
return VisionLanguageEmbedding()


def test_CLIPEmbedding_initialization(VLM_instance):
assert VLM_instance is not None
assert isinstance(VLM_instance.model, CLIPModel)
assert isinstance(VLM_instance.processor, CLIPProcessor)


def test_image_embed_list_with_valid_input(VLM_instance):
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
test_images = [image, image]
embeddings = VLM_instance.embed_list(test_images)
assert isinstance(embeddings, list)
assert len(embeddings) == 2
for e in embeddings:
assert len(e) == VLM_instance.get_output_dim()


def test_image_embed_list_with_empty_input(VLM_instance):
with pytest.raises(ValueError):
VLM_instance.embed_list([])


def test_text_embed_list_with_valid_input(VLM_instance):
test_texts = ['Hello world', 'Testing sentence embeddings']
embeddings = VLM_instance.embed_list(test_texts)
assert isinstance(embeddings, list)
assert len(embeddings) == 2
for e in embeddings:
assert len(e) == VLM_instance.get_output_dim()


def test_text_embed_list_with_empty_input(VLM_instance):
with pytest.raises(ValueError):
VLM_instance.embed_list([])


def test_mixed_embed_list_with_valid_input(VLM_instance):
test_list = ['Hello world', 'Testing sentence embeddings']
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
test_list.append(image)
embeddings = VLM_instance.embed_list(test_list)
assert isinstance(embeddings, list)
assert len(embeddings) == 3
for e in embeddings:
assert len(e) == VLM_instance.get_output_dim()


def test_get_output_dim(VLM_instance):
output_dim = VLM_instance.get_output_dim()
assert isinstance(output_dim, int)
assert output_dim > 0