Data Ingestion and Parsing Project

A Python project for data ingestion and document parsing using modern AI/ML libraries.

Features

Data ingestion and processing
Document parsing capabilities
Integration with LangChain and vector databases
Support for various document formats (PDF, DOCX)
ChromaDB and FAISS for vector storage and retrieval

Dependencies

This project uses the following key libraries:

LangChain: For AI/ML pipeline management
ChromaDB & FAISS: Vector databases for semantic search
Sentence Transformers: For text embeddings
Document Processing: Support for PDF (PyPDF2) and DOCX files
pandas: Data manipulation and analysis

Setup

This project uses uv for dependency management.

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create and activate virtual environment:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
uv sync
```

Usage

Run the main application:

python main.py

Project Structure

main.py - Main application entry point
0-DataIngestParsing/ - Data ingestion and parsing notebooks
- 1-dataingestion.ipynb - Data ingestion workflows
- 3-dataparsingdoc.ipynb - Document parsing examples
requirements.txt - Legacy requirements file
pyproject.toml - Modern Python project configuration

Development

For Jupyter notebook development, ipykernel is included in the dependencies.

License

[Add your license here]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
0-DataIngestParsing		0-DataIngestParsing
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
_Notes.txt		_Notes.txt
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Ingestion and Parsing Project

Features

Dependencies

Setup

Usage

Project Structure

Development

License

About

Uh oh!

Releases

Packages

Languages

DeveloperAlex/2025-10-python-ai-learnings

Folders and files

Latest commit

History

Repository files navigation

Data Ingestion and Parsing Project

Features

Dependencies

Setup

Usage

Project Structure

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages