Streamlit Chunker

This project is a Streamlit application that allows users to upload PDF files, process them into text chunks, and perform similarity searches on the text. Users can adjust chunking parameters and view the most similar chunks retrieved.

Features

Upload multiple PDF files
Extract text from PDFs
Chunk text with adjustable parameters
Search for similar text chunks
Display top similar chunks with similarity scores
User-configurable settings via sidebar

Requirements

Python 3.7+
Streamlit
langchain_community
langchain_openai
PyPDF2

Installation

Clone the repository:

git clone https://github.com/hamadandrabi/streamlit-chunker.git
cd streamlit-chunker

Create a virtual environment and activate it:

python -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Run the Streamlit app:
```
streamlit run main.py
```
Open a web browser and navigate to http://localhost:8501.
In the sidebar, paste your OpenAI API key.
Adjust the chunking parameters (chunk size and overlap) as needed.
Upload your PDF files and click "Process".
Enter a search query to find similar text chunks.

Configuration

API Key: Enter your OpenAI API key in the sidebar.
Chunk Size: Use the slider to select the size of each text chunk.
Chunk Overlap: Use the slider to select the overlap between chunks.
Number of Chunks: Specify how many top similar chunks to display.

Running on Ports 80 or 443

To run the Streamlit app on ports 80 or 443, you need administrative privileges.

On Windows

Open Command Prompt as an administrator.
Run the Streamlit app with the desired port:
```
streamlit run main.py --server.port 80
```

On Linux/macOS

Open a terminal.

Run the Streamlit app with sudo:

sudo streamlit run main.py --server.port 80

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Streamlit for the awesome web app framework
Langchain for the vector store and embedding support
PyPDF2 for PDF text extraction

Link to the App: https://chunker.streamlit.app/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Streamlit Chunker

Features

Requirements

Installation

Usage

Configuration

Running on Ports 80 or 443

On Windows

On Linux/macOS

License

Acknowledgments

About

Releases

Packages

Languages

HamadAndrabi/chunker

Folders and files

Latest commit

History

Repository files navigation

Streamlit Chunker

Features

Requirements

Installation

Usage

Configuration

Running on Ports 80 or 443

On Windows

On Linux/macOS

License

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages