- T5 model suffers from a huge amount of repetition as compared to the DistilGPT model
- Even though we trained the models for a decent amount of epochs, they still tend to generate new characters that are not provided in the input
- Although PEFT methods speed up the process of Finetuning by ~15%, they also do affect the performance of the model on the downstream task
- Stable Diffusion models and several other Text conditioned Image Synthesis models are incapable of performing Scene Transition
!pip install transformers
!pip install openai
!pip install sklearn
https://pytorch.org/get-started/locally/
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install booknlp
- Story Generation
- CMU Movie Summary: http://www.cs.cmu.edu/~ark/personas/
- CMU Books Summary: https://www.cs.cmu.edu/~dbamman/booksummaries.html
- Python Files:
- BookNLP.ipynb.ipynb
- Extraction of the Characters, their interactions, and relevant sentiments for each pair of characters
- ChatGPT_API.ipynb
- ChatGPT for using the Summaries and generating 2-3 lines plot from them
- Data_Exploration.ipynb
- Unzipping and Loading the dataset
- Preprocessing into a proper DataFrame
- Extraction of features like Genre, Title, etc.
- Data_Merging.ipynb
- Merging of the Processed sub-parts of the dataset
- Dataset_Preparation_Story_Gen.ipynb
- Dropping of non-useful features
- Concatenating Books and Movie Summary datasets
- Processing dataset for conditional text generation
- Diffusion.ipynb
- Image Generation using two techniques:
- Text to Image generation using the first sentence and later performing text conditioned image to image generation
- Taking each sentence and performing a text-based image generation (No knowledge about previously occurred story)
- Image Generation using two techniques:
- Story_Generation_DistilGPT.ipynb
- Dataset Processing (Tokenization and Data Split) for DistilGPT model
- Training the model on the processed dataset
- Testing model on Perplexity and BLEU score
- Plotting the Loss Curve
- Story_Generation_T5.ipynb
- Dataset Processing (Tokenization and Data Split) for T5 model
- Training the model on the processed dataset using several PEFT techniques like LoRA and Adapters
- Creating a custom training loop utilizing a loss given by ChatGPT
- Testing model on Perplexity and BLEU score
- Plotting the Loss Curve
- BookNLP.ipynb.ipynb
- Docs
- Story Generation - Contains several papers researched for the task of Story Generation
- Visual Conversion - Research papers for Image Synthesis
- Download the datasets from the links provided and all the Python files from GitHub
- To extract the datasets into a proper DataFrame run Data_Exploration.ipynb
- Run BookNLP.ipynb on both datasets for the extraction of several features such as Characters, Inter-Character relations, etc.
- Execute ChatGPT_API.ipynb for generating plots for the summaries - Run them in batches as you'll receive errors due to the saturation of requests at the OpenAI server
- Once you have obtained plots for all the summaries, run Data_Merging.ipynb for combining all the batches
- Execute Data_Preparation_Story_Gen.ipynb for the extraction of Genre, Title, etc. from the processed dataset, now you have the Plot-Summary dataset
- For training T5 and DistilGPT models on this dataset run Story_Generation_T5.ipynb and Story_Generation_DistilGPT.ipynb files respectively
- Now you can test both the fine-tuned model for the task of Story Generation
- Finally run Diffusion.ipynb for converting the generated story into a visual representation
- Story Generation
- Create a custom sentiment analyzer
- Plot Generation gave the following components: Characters, Genre, Title, and Inter-Character Relations
- Dataset expansion for better training
- Apply on long-form story generation
- Train models on variations of the dataset such as - only Plot and Summaries (do not include Title, Characters, etc.)
- Integrate more PEFT methodologies and compare their effects on the performance
- Text-to-Image
- Do a literature survey on the current image synthesis technologies 🟡
- Propose an architecture/methodology that is capable of scene transformation conditioned on text