Right now, in our classification pipeline, the train and test data are split from the uploaded dataset, converted into an OpenAI-compatible format, and then uploaded to OpenAI for further use. While this works, it limits the pipeline because we don’t retain the actual training and testing datasets in any proper storage system other than openai.