Transcript Formatter

Overview

This project is intended to processes long unformatted text files (such as a raw audio transcripts) converting them into formatted content with complete sentences.

Key Features

Converts raw unformatted transcripts into readable, well-structured text.
Handles text of any length using chunked processing.
Produces output with punctuation of sentences
Un-formatted fragments remain unchanged.

Installation & Setup

Prerequisites

Python 3.8+
The text-generation-webui sever and a loaded model, currently developing with mythomax gguf variants.

Server Setup

Start the server (provides the loaded model):
```
python server.py
```
The OpenAI-compatible API will be available at:
```
http://0.0.0.0:5000
```
Use the webui to load a model.

Running the Formatter

Run the formatter:

python process.py
* config.py defines file paths

How It Works

The system:

Processes text in chunks that fit within the LLM's context window
Applies proper formatting to each chunk
Combines chunks results correctly
Preserves the original content's meaning and intent

Output Format:

Proper sentence structure with correct punctuation

Technical Approach:

Uses chunked processing to handle unlimited length input
Maintains overlap between chunks for seamless transitions

Status:

Preliminary results shows that a LoRA will need to be trained.
Proper chunking and combining of chunks is essential and is the current priority.
Proper chunking facilitates the production of training data.

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
.vscode		.vscode
books		books
charts		charts
chunk_document		chunk_document
docs		docs
files		files
training/datasets		training/datasets
.gitignore		.gitignore
README.md		README.md
build_venv.py		build_venv.py
config.py		config.py
deformat.py		deformat.py
llm_integration.py		llm_integration.py
logger.py		logger.py
model_loader.py		model_loader.py
process.py		process.py
report.py		report.py
report.txt		report.txt
requirements.txt		requirements.txt
startlora.py		startlora.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transcript Formatter

Overview

Key Features

Installation & Setup

Prerequisites

Server Setup

Running the Formatter

How It Works

The system:

Output Format:

Technical Approach:

Status:

About

Uh oh!

Releases

Packages

Languages

KeithHayes/Process-Transcript

Folders and files

Latest commit

History

Repository files navigation

Transcript Formatter

Overview

Key Features

Installation & Setup

Prerequisites

Server Setup

Running the Formatter

How It Works

The system:

Output Format:

Technical Approach:

Status:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages