Skip to content

Xingang1990/me396p-course_project

Repository files navigation

ME396P Course Project

This repo is created for codes and files of the course project of ME396p.

Setup

Training/finetuning GPT2

  1. Training data preparation: See all the data files in the folder training_data_preparation. We manually obtained the scripts of the python lectures as TXT files on Youtube from the links as listed below. Some scripts have periods for separating sentences (period.txt) while some do not (no-period.txt). For these with periods, we treat each sentence (segmented by period) as one data point. For these with NO periods, we decided to concatenate n (we used n=5 in our project) lines of texts as one data point. It should be noted that one full sentence might be segmented in this way. We provide a Python program to parse the two TXT files to two CSV files (data.csv and data2.csv) for the training purpose. The two CSV files are combined to trainingdata.csv for finetuning the GPT2 model.
  2. We finetuned the GPT2 model on Kaggle and here is the link for the codes. You can edit the codes as you want and interact with codes like a Jupyter notebook. Note: Remember to download the finetuned models (e.g., gpt2_medium_pythonlecturer_4.pt) to your local computer otherwise they will be erased after a period of no interaction.
  3. The finetuned GPT2 model is then used as the AI backend for the chatbot. We provide a finetuned GPT2 model here. Download and put it in the folder trained_models under the folder chatbot.

AI chatbot

  1. Conda environment (example): conda create -n chatbot python=3.10.6
  2. Install requirements:
    • pytorch tested with 1.13.0, gpu not required
    • transformers tested with 4.23.1
    • SpeechRecognition tested with 3.8.1
    • gTTs tested with 2.2.4
    • PyAudio tested with 0.2.12
    • playsound tested with 1.2.2
  3. Start the AI chatbot: cd to chabot and run the chatbot python chatbot_gpt2.py. It will download the original GPT2 and load its neural network architecture, which might take a few minutes on your first use. Downloading GPT2 will not be needed in the future. Note: The AI chatbot can take either typing texts or voice as input by commenting one of the two lines of codes: user_input = ai.take_user_input() and user_input = ai.speech_to_text().
  4. Enjoy your conversation with the AI chatbot talking about Python by inputing questions, like "How to debug in python?", "What are the common built-in data types in python?".

Resourses for the project

AI chatbot

GPT2 model finetuing

GPT2 and Transformers detailed explanation

Youtube videos used for obtaining the scripts of python lectures

Slides

Project

Lightning talk

About

Codes and files for the course project of me396p

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •