# Packaging Guide for D2D Project

This document provides a detailed guide on the structure of the `setup.py` file for the "d2d" package, how to generate a `.whl` file, and how to install the package in a clean environment.

## Understanding the `setup.py` Structure

The `setup.py` file is the configuration script for packaging a Python project using `setuptools`. It defines metadata, dependencies, and package structure for the "d2d" (Dialogue2Data) project, which transforms interview transcripts into structured data.

### Key Components of `setup.py`

- **Imports**:
  - `from setuptools import setup, find_packages`: Imports the `setup` function to configure the package and `find_packages` to automatically discover all packages in the specified directory.
  - The `open` function reads the `README.md` file for the long description.

- **Reading the README**:
  ```python
  with open("README.md", "r", encoding="utf-8") as fh:
      long_description = fh.read()
  ```
  - Opens the `README.md` file in read mode with UTF-8 encoding to include its content as the package's long description.

- **Setup Function**:
  The `setup()` function from `setuptools` defines the package metadata and configuration:
  - `name="d2d"`: The package name, used for distribution and installation (e.g., via `pip install d2d`).
  - `version="0.2.8"`: The current version of the package, following semantic versioning (major.minor.patch).
  - `author="Sienko Ikhabi, Dominic Lam, Yun Zhou, Wangkai Zhu"`: Lists the authors of the project.
  - `author_email="your.email@example.com"`: A contact email for the maintainers (should be updated to a valid email).
  - `description="Dialogue2Data: Transform interview transcripts into structured data"`: A short summary of the package's purpose.
  - `long_description=long_description`: Uses the content of `README.md` for a detailed description.
  - `long_description_content_type="text/markdown"`: Specifies the format of the long description as Markdown.
  - `url="https://github.com/avalanche-strategy/D2D"`: The URL to the project’s repository (update to the actual repo URL).
  - `package_dir={"": "src"}`: Maps the root package directory to the `src` folder, meaning all packages are searched for under `src/`.
  - `packages=find_packages(where="src")`: Automatically finds all packages (e.g., subdirectories with `__init__.py`) in the `src` directory.
  - `include_package_data=True`: Ensures non-Python files (e.g., data files, if specified in `MANIFEST.in`) are included in the package.
  - `install_requires=[...]`: Lists dependencies required for the package to function:
    - `torch>=1.10.0`: For tensor operations with sentence transformers.
    - `sentence-transformers>=2.2.0`: For embedding models.
    - `litellm>=1.0.0`: For LLM interactions.
    - `python-dotenv>=0.21.0`: For environment variable management.
    - `pandas>=1.5.0`: For data manipulation.
    - `numpy>=1.23.0`: For numerical operations.
    - `tqdm>=4.65.0`: For progress bars.
    - `openai>=1.0.0`: For OpenAI API client.
    - `ragas>=0.1.0`: For evaluation (e.g., `run_ragas_evaluation`).
    - `rapidfuzz>=3.9.0`: For fuzzy string matching.
  - `classifiers=[...]`: Metadata tags for PyPI:
    - Indicates compatibility with Python 3.
    - Specifies the license as Apache Software License.
    - Notes the package is OS-independent.
  - `python_requires=">=3.12"`: Ensures the package requires Python 3.12 or higher.

### Directory Structure Assumption
The `setup.py` assumes the following structure:
```
project_root/
├── src/
│   └── d2d/
│       ├── __init__.py
│       └── (other modules, e.g., processor.py)
├── README.md
├── setup.py
└── (optional) MANIFEST.in
```
- The `src` directory contains the package code.
- `README.md` provides documentation.
- `MANIFEST.in` (if present) lists additional files to include (e.g., data files).

## Generating a `.whl` File

A `.whl` (Wheel) file is a binary distribution format for Python packages, offering faster installation than source distributions. Follow these steps to generate it:

### Prerequisites
- Ensure Python 3.12 or higher is installed.
- Install the required build tools:
  ```bash
  pip install setuptools wheel
  ```

### Steps to Generate `.whl`
1. **Navigate to Project Directory**:
   - Open a terminal and change to the directory containing `setup.py`:
     ```bash
     cd /path/to/project_root
     ```
2. **Run the Build Command**:
   - Execute the following command to create both a source distribution (`.tar.gz`) and a wheel distribution (`.whl`):
     ```bash
     python setup.py sdist bdist_wheel
     ```
3. **Output**:
   - The command generates files in the `dist/` directory, e.g.:
     - `d2d-0.2.8.tar.gz`: Source distribution.
     - `d2d-0.2.8-py3-none-any.whl`: Wheel file, where `py3-none-any` indicates Python 3 compatibility and platform independence.
4. **Verify**:
   - Check the `dist/` directory for the generated files:
     ```bash
     ls dist/
     ```

### Notes
- Ensure all dependencies listed in `install_requires` are correct and available on PyPI.
- If additional files (e.g., data, configs) need to be included, create a `MANIFEST.in` file (e.g., `include src/d2d/*.txt` for text files).

## Installing the Package

To test or use the package, install it in a clean environment to avoid dependency conflicts. We’ll use `conda` to create a virtual environment and `pip` to install the wheel file.

### Steps to Install

1. **Create a Clean Environment**:
   - Use `conda` to create a new environment with Python 3.12:
     ```bash
     conda create -n d2d-test python=3.12
     ```
   - This creates an isolated environment named `d2d-test` with Python 3.12.

2. **Activate the Environment**:
   - Activate the environment to work within it:
     ```bash
     conda activate d2d-test
     ```
   - Your terminal prompt should now show `(d2d-test)`.

3. **Install the Package**:
   - Ensure the `.whl` file (e.g., `d2d-0.2.8-py3-none-any.whl`) is in the `dist/` directory or copy it to your current directory.
   - Install the wheel file using `pip`:
     ```bash
     pip install dist/d2d-0.2.8-py3-none-any.whl
     ```
   - This installs the `d2d` package and its dependencies listed in `setup.py`.

4. **Verify Installation**:
   - Test the package by importing a key component, such as `D2DProcessor`:
     ```python
     from d2d import D2DProcessor
     ```
   - If no errors occur, the package is installed correctly.
   - You can now use `D2DProcessor` or other modules for transforming interview transcripts into structured data.

### Notes
- Ensure the wheel file name matches the version (e.g., `d2d-0.2.8-py3-none-any.whl`).
- If issues arise, verify that Python 3.12 is active (`python --version`) and that dependencies are compatible.
- To distribute the package, upload the `.whl` and `.tar.gz` files to PyPI using a tool like `twine` (requires additional setup).

## Conclusion
This guide covers the structure of the `setup.py` file for the "d2d" package, how to generate a `.whl` file using `python setup.py sdist bdist_wheel`, and how to install it in a clean `conda` environment. Follow these steps to package, distribute, and use the Dialogue2Data tool effectively.