GPT Summarizer

A Python script that utilizes OpenAI's GPT-3.5-turbo model to process and summarize long-form text files.

Features

Paraphrases input text into bullet point statements
Sorts notes by specified topics or auto-generates topics
Summarizes text files into key takeaways and action items
Replaces specified terms using a custom jargon.txt file; useful for replacing repetitive transcription errors
Supports text files, including VTT (Web Video Text Tracks) files
Optimizes input text to remove blank lines, whitespace, VTT tags, and timestamps before feeding into OpenAI to reduce costs
Supports long-form text, such as meeting transcripts; automatically separates input text into sections to be processed by OpenAI in batches that will not exceed the model's token limit

Requirements

Python 3.6 or higher
openai library
tiktoken library

Installation

Clone this repository (or download and extract the ZIP file).

$ git clone https://github.com/blakewenzel/gpt-summarizer.git

Install the required libraries

$ pip install -r requirements.txt

Set up your OpenAI API credentials as environmental variables on your operating system. You will need OPENAI_ORG_ID and OPENAI_API_KEY.

$ export OPENAI_ORG_ID=YOUR_ORG_ID_HERE
$ export OPENAI_API_KEY=YOUR_API_KEY_HERE

Usage

$ python summarize.py input_file

input_file is the path to the text file you want to summarize.

The summary will be output to a text file in the same directory as the input file with the same name appended with _output.txt unless the output file option is used.

Options

$ python summarize.py [-h] [-o [OUTPUT_FILE]] [-j [JARGON_FILE]] [-t [TOPICS]] [-s] input_file

  -h, --help            show this help message and exit
  -o [OUTPUT_FILE], --output_file [OUTPUT_FILE]
                        The output file where the results will be saved. If omitted, the output file will be named the same as the input file, but appended with '_output' and always have a '.txt' extension.
  -j [JARGON_FILE], --jargon_file [JARGON_FILE]
                        Replace jargon terms before processing text. Will check for jargon.txt in the current working directory unless another file location is specified.
  -t [TOPICS], --topics [TOPICS]
                        Sort notes by topic. Provide a comma-separated list of topics or use 'auto' to automatically generate topics. Default is 'prompt' which will ask for the list at runtime.
  -s, --summary         Generate a summary of the notes.

Jargon Replacement

You can have jargon terms replaced by creating a file called jargon.txt in the same directory as where the script is being run and using the -j option. Each line should contain two strings separated by a comma. The first string is the jargon term to replace, and the second string is the replacement.

Example jargon.txt:

AI,Artificial Intelligence
ML,Machine Learning
Sean,Shawn
Sarah,Sara

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
jargon.txt		jargon.txt
requirements.txt		requirements.txt
summarize.py		summarize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

jargon.txt

jargon.txt

requirements.txt

requirements.txt

summarize.py

summarize.py

Repository files navigation

GPT Summarizer

Features

Requirements

Installation

Usage

Options

Jargon Replacement

About

Languages

License

blakewenzel/gpt-summarizer

Folders and files

Latest commit

History

Repository files navigation

GPT Summarizer

Features

Requirements

Installation

Usage

Options

Jargon Replacement

About

Topics

Resources

License

Stars

Watchers

Forks

Languages