Skip to content

DeveloperAlex/2025-10-python-ai-learnings

Repository files navigation

Data Ingestion and Parsing Project

A Python project for data ingestion and document parsing using modern AI/ML libraries.

Features

  • Data ingestion and processing
  • Document parsing capabilities
  • Integration with LangChain and vector databases
  • Support for various document formats (PDF, DOCX)
  • ChromaDB and FAISS for vector storage and retrieval

Dependencies

This project uses the following key libraries:

  • LangChain: For AI/ML pipeline management
  • ChromaDB & FAISS: Vector databases for semantic search
  • Sentence Transformers: For text embeddings
  • Document Processing: Support for PDF (PyPDF2) and DOCX files
  • pandas: Data manipulation and analysis

Setup

This project uses uv for dependency management.

  1. Install uv (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
  2. Create and activate virtual environment:

    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    uv sync

Usage

Run the main application:

python main.py

Project Structure

  • main.py - Main application entry point
  • 0-DataIngestParsing/ - Data ingestion and parsing notebooks
    • 1-dataingestion.ipynb - Data ingestion workflows
    • 3-dataparsingdoc.ipynb - Document parsing examples
  • requirements.txt - Legacy requirements file
  • pyproject.toml - Modern Python project configuration

Development

For Jupyter notebook development, ipykernel is included in the dependencies.

License

[Add your license here]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published