Training sample from custom dataset #6

axel588 · 2023-02-15T23:18:02Z

I could'nt manage to train on a custom dataset, many parts in the code in the sample training call external dataset.
Is it possible to have a sample training code on custom datasets, using utils_load_dataset didn't work for the training case.
The embedding is for what I've understood a clip encoded list of strings using their tokeniser. But much of this is hard to setup.
The idea would be to have a simple, possible to train on custom dataset, training sample, it's something truely missing in many repositories.

Thanks for the work you've done !

apapiu · 2023-02-16T00:52:40Z

Hi - what dataset are you trying to use? For the text embedding you can use the get_text_encodings function and the images can just be resized to the appropriate size and saved as an numpy file.

axel588 · 2023-02-18T19:22:47Z

thanks @apapiu for your answer
the main issue I have is with 16_16_latent_embeddings.npy I have no idea how to reproduce this kind of file, Not sure how to transform images to 'latent embedding'. I have a folder of images.png and images.txt ... , I don't know how to convert this to a latent embedding, my attempt until then was to create a dataset that return a numpy array of the imahe a,d called get _ text encodings for the text in getitem without success.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training sample from custom dataset #6

Training sample from custom dataset #6

axel588 commented Feb 15, 2023

apapiu commented Feb 16, 2023

axel588 commented Feb 18, 2023

Training sample from custom dataset #6

Training sample from custom dataset #6

Comments

axel588 commented Feb 15, 2023

apapiu commented Feb 16, 2023

axel588 commented Feb 18, 2023