Hello, if we want to train on our homemade dataset, how should the text corresponding to the image be preprocessed?