-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal deep learning example throws error: Can't get attribute 'DocVec[TextDoc]' on <module 'docarray.array.any_array' #1614
Comments
Hey @Robbie-Palmer , we are going to look into it ASAP |
Does it work if you add this before the classes are defined? DocVec[TextDoc] If so, I will change the Documentation and mention that it should not be needed when #1330 is done |
It's running now I've added these four: DocVec[Tokens]
DocVec[TextDoc]
DocVec[ImageDoc]
DocVec[PairTextImage] So the script is looking like: from typing import Optional
from docarray import BaseDoc, DocVec
from docarray.documents import TextDoc as BaseText
from docarray.typing import TorchTensor, ImageUrl
DEVICE = "cuda:0"
class Tokens(BaseDoc):
input_ids: TorchTensor[48]
attention_mask: TorchTensor
class TextDoc(BaseText):
tokens: Optional[Tokens]
class ImageDoc(BaseDoc):
url: Optional[ImageUrl]
tensor: Optional[TorchTensor]
embedding: Optional[TorchTensor]
class PairTextImage(BaseDoc):
text: TextDoc
image: ImageDoc
DocVec[Tokens]
DocVec[TextDoc]
DocVec[ImageDoc]
DocVec[PairTextImage]
import itertools
from pathlib import Path
import pandas as pd
import torch
import torchvision
from docarray import DocList
from docarray.data import MultiModalDataset
from docarray.typing import TorchTensor
from torch import nn
from torch.utils.data import DataLoader
from tqdm import tqdm
from transformers import AutoTokenizer, DistilBertModel
... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The documentation contains a "How-to" guide on training a multimodal CLIP-esque model
There are a number of small bugs such as incorrectly named classes or parameters, which are easily worked through, resulting in a script roughly like this:
But PyTorch throws an error that seems linked to deeper in DocArray on trying to load the data
This is a similar error to #1480 but it can't be worked around by initializing
DocVec[PairTextImage]
as the issue is within a PyTorch process, trying to unpickle from a queueThis may or may not be handled by #1330 when it is resolved
The text was updated successfully, but these errors were encountered: