GitHub - adkwn1/question-answer-app: Question and Answer web applicaiton using fine-tuned and pre-trained T5 models. Application runs on Streamlit.

Question-Answer Streamlit App Using T5 Models

Author: Andrew Kwon

Introduction

This project was created as part of TripleTen's externship program in collaboration with DataSpeak. Project is still a work in progress, but currently functional. The application showcases three concepts for a question-answer model using pre-trained and fine-tuned T5 models. The dataset used for context and model training was dervied from approximately 100,000 Python questions from Stack Overflow. The tasks are preformed as follows:

Model 1: Retrieval Augmented Generation (RAG) Question-Answering

Takes a user query and retrieves K-similar semantic matches
The retrieved responses are then passed to:
1. A pre-trained T5 model (t5-base) to summarize the retrieved responses based on K-similar returns.
2. A fine-tuned T5 conditional generation model (available on HuggingFace: 'c-kilo-1/t5-sm-py-stackoverflow') to generate an answer.

Model 2: Text-Generation

Takes a user query and provides an answer generated using a fine-tuned T5 conditional generation model ('c-kilo-1/t5-flan-py-stackoverflow').

The task for all three cases are designed to provide closed-book answers in order to provide appropriate responses to domain specific questions. For proof of concept purposes, the domain in this case are questions related to Python based on community provided topics and responses on Stack Overflow. As such, validity of answers may vary.

Dataset

Added a small sample of the final dataset for demonstrative purposes (15,000 entries, ~20 MB).

The main dataset can be downloaded from Kaggle (https://www.kaggle.com/datasets/stackoverflow/pythonquestions). The provided notebook (nb_parser.ipynb) was used to clean and merge the dataset, and export the final .csv file(s) for usage.

Usage - Local Browser

To run the project in a local web browser, clone the repository ensuring the files are in the correct directory structure:

app_demo.py (top level)
.streamlit/config.toml
datasets/final_sample.csv (created via notebook from original dataset)
- Can use 'datasets/final_sample_small.csv' for limited demo

The config.toml specifies the server address and port number to run the web app locally via streamlit. In command prompt or terminal, navigate to the cloned directory and run the following command:

streamlit run app_demo.py

Open a web browser and navigate to http://localhost:10000 to interact with the project application.

Requirements

pandas
numpy
pickle
streamlit
transformers
sentence_transformers

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Notebooks		Notebooks
.gitignore		.gitignore
README.md		README.md
app_demo.py		app_demo.py
final_sample_small.csv		final_sample_small.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Question-Answer Streamlit App Using T5 Models

Introduction

Dataset

Usage - Local Browser

Requirements

Screenshots

About

Releases

Packages

Languages

adkwn1/question-answer-app

Folders and files

Latest commit

History

Repository files navigation

Question-Answer Streamlit App Using T5 Models

Introduction

Dataset

Usage - Local Browser

Requirements

Screenshots

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages