Contributing

Contributing Guide

Welcome to the Whales Identification project! This guide will help you get started with contributing.

Getting Started

Prerequisites

Python 3.11.6
Node.js ≥20.19
Docker & Docker Compose
Git
Poetry
Text editor (VS Code, PyCharm, etc.)

Initial Setup

1. Fork and Clone

# Fork on GitHub first, then clone
git clone https://github.com/YOUR_USERNAME/whales-identification.git
cd whales-identification

# Add upstream remote
git remote add upstream https://github.com/0x0000dead/whales-identification.git

2. Backend Setup

cd whales_be_service

# Install Poetry (if not installed)
pip install poetry

# Install dependencies
poetry install

# Install pre-commit hooks
poetry run pre-commit install

# Verify installation
poetry run pytest --version

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Verify
npm run dev

4. Download Models

# From project root
pip install huggingface_hub==0.20.3
./scripts/download_models.sh

5. Verify Setup

# Run tests
cd whales_be_service
poetry run pytest

# Start services
cd ..
docker compose up --build

Development Workflow

Git Workflow

We use Feature Branch Workflow with the following conventions:

Branch Naming

<type>/<short-description>

Types:
- feature/  - New features
- fix/      - Bug fixes
- docs/     - Documentation changes
- refactor/ - Code refactoring
- test/     - Test additions/fixes
- chore/    - Maintenance tasks

Examples:
- feature/add-orca-detection
- fix/login-bug
- docs/update-api-reference
- refactor/optimize-inference

Commit Messages

Follow Conventional Commits:

<type>(<scope>): <subject>

<body>

<footer>

Types:
- feat: New feature
- fix: Bug fix
- docs: Documentation
- style: Code style (formatting, no logic change)
- refactor: Code refactoring
- test: Tests
- chore: Maintenance

Examples:
feat(api): add batch prediction endpoint

Implement ZIP upload and batch processing for
multiple whale images. Supports up to 100 images
per request.

Closes #42

---

fix(inference): resolve memory leak in model loading

Model was not being properly released after inference,
causing memory usage to grow over time. Now properly
using context managers.

Fixes #67

---

docs(wiki): add API reference documentation

Complete documentation for all API endpoints with
curl examples and response schemas.

Daily Workflow

# 1. Start your day - sync with upstream
git checkout main
git pull upstream main

# 2. Create feature branch
git checkout -b feature/my-new-feature

# 3. Make changes and commit frequently
git add .
git commit -m "feat(scope): description"

# 4. Keep branch updated
git fetch upstream
git rebase upstream/main

# 5. Push to your fork
git push origin feature/my-new-feature

# 6. Open Pull Request on GitHub

# 7. After PR is merged, cleanup
git checkout main
git pull upstream main
git branch -d feature/my-new-feature

Code Style

Python

PEP 8 with Black Formatting

# Good - follows black (line length 88)
def predict_whale_species(
    image_path: str,
    model: CetaceanIdentificationModel,
    config: dict,
) -> Detection:
    """
    Predict whale species from image.

    Args:
        image_path: Path to whale image
        model: Trained model instance
        config: Configuration dictionary

    Returns:
        Detection object with prediction results
    """
    image = load_image(image_path)
    tensor = preprocess(image)

    with torch.no_grad():
        embeddings = model(tensor)
        prediction = model.classify(embeddings)

    return Detection(**prediction)


# Bad - inconsistent formatting
def predict_whale_species(image_path,model,config):
    image=load_image(image_path);tensor=preprocess(image)
    with torch.no_grad():embeddings=model(tensor);prediction=model.classify(embeddings)
    return Detection(**prediction)

Type Hints

# Good - explicit types
from typing import Optional, List, Dict

def process_batch(
    images: List[np.ndarray],
    batch_size: int = 32
) -> List[Detection]:
    results: List[Detection] = []
    for batch in create_batches(images, batch_size):
        predictions = model.predict(batch)
        results.extend(predictions)
    return results


# Bad - no type hints
def process_batch(images, batch_size=32):
    results = []
    for batch in create_batches(images, batch_size):
        predictions = model.predict(batch)
        results.extend(predictions)
    return results

Docstrings

# Good - Google style docstrings
def calculate_metrics(
    predictions: torch.Tensor,
    targets: torch.Tensor
) -> Dict[str, float]:
    """
    Calculate evaluation metrics.

    Args:
        predictions: Model predictions (N, C)
        targets: Ground truth labels (N,)

    Returns:
        Dictionary with metrics:
            - precision: Precision@1
            - recall: Recall
            - f1: F1-score

    Raises:
        ValueError: If predictions and targets have different lengths

    Examples:
        >>> predictions = torch.randn(10, 5)
        >>> targets = torch.randint(0, 5, (10,))
        >>> metrics = calculate_metrics(predictions, targets)
        >>> print(metrics['precision'])
        0.85
    """
    if len(predictions) != len(targets):
        raise ValueError("Predictions and targets must have same length")

    precision = compute_precision(predictions, targets)
    recall = compute_recall(predictions, targets)
    f1 = 2 * (precision * recall) / (precision + recall)

    return {"precision": precision, "recall": recall, "f1": f1}

TypeScript/React

ESLint + Prettier

// Good - proper formatting and types
interface WhaleDetection {
  imageInd: string;
  classAnimal: string;
  idAnimal: string;
  probability: number;
  bbox: [number, number, number, number];
  mask: string | null;
}

const ResultDisplay: React.FC<{ detection: WhaleDetection }> = ({
  detection,
}) => {
  const confidencePercent = (detection.probability * 100).toFixed(1);

  return (
    <div className="result-container">
      <h2>{detection.idAnimal}</h2>
      <p>Confidence: {confidencePercent}%</p>
      {detection.mask && (
        <img
          src={`data:image/png;base64,${detection.mask}`}
          alt="Whale mask"
        />
      )}
    </div>
  );
};

// Bad - no types, inconsistent formatting
const ResultDisplay = ({ detection }) => {
  const confidencePercent = detection.probability * 100;
  return (
    <div>
      <h2>{detection.idAnimal}</h2>
      <p>Confidence: {confidencePercent}%</p>
      {detection.mask && (
        <img src={`data:image/png;base64,${detection.mask}`} />
      )}
    </div>
  );
};

Pre-commit Hooks

Installation

cd whales_be_service
poetry run pre-commit install

Hooks Configuration

We use 20 pre-commit hooks:

Category	Hooks	Auto-fix
Formatting	black, isort, prettier	✅ Yes
Linting	flake8	❌ No
Type Checking	mypy	❌ No
Security	bandit	❌ No
Jupyter	nbstripout, nbqa-*	✅ Partial
Basic	trailing-whitespace, end-of-file-fixer, etc.	✅ Most

See PRE_COMMIT_GUIDE.md for full documentation.

Running Manually

# Run on staged files (automatic on commit)
poetry run pre-commit run

# Run on all files
poetry run pre-commit run --all-files

# Run specific hook
poetry run pre-commit run black --all-files

# Skip hooks (NOT RECOMMENDED)
git commit --no-verify -m "Emergency fix"

Common Hook Failures

Black formatting

# Failure: Code not formatted
# Fix: Run black
poetry run black .
git add .
git commit -m "style: format code with black"

Flake8 linting

# Failure: F401: Module imported but unused
# Fix: Remove unused import
-import pandas as pd  # Not used
+# pandas not needed

# Failure: E501: Line too long (>88 characters)
# Fix: Break line
-very_long_string = "This is an extremely long string that exceeds the maximum line length"
+very_long_string = (
+    "This is an extremely long string "
+    "that exceeds the maximum line length"
+)

Mypy type checking

# Failure: Missing type hints
# Fix: Add type hints
-def process(data):
+def process(data: List[str]) -> Dict[str, int]:
    return {"count": len(data)}

Testing

Running Tests

# All tests
poetry run pytest

# With coverage
poetry run pytest --cov=src --cov-report=term

# Fast tests only (skip slow integration tests)
poetry run pytest -m "not slow"

# Specific test file
poetry run pytest tests/api/test_post_endpoints.py -v

Writing Tests

# tests/unit/test_new_feature.py

import pytest
from whales_be_service.new_feature import new_function

def test_new_function_success():
    """Test new_function with valid input"""
    result = new_function(input_data="test")
    assert result == expected_output

def test_new_function_invalid_input():
    """Test new_function handles invalid input"""
    with pytest.raises(ValueError):
        new_function(input_data=None)

@pytest.mark.slow
def test_new_function_integration():
    """Test new_function with real model (slow)"""
    model = load_full_model()
    result = new_function(model=model, data=test_data)
    assert result is not None

See Testing Guide for comprehensive testing documentation.

Pull Request Process

Before Opening PR

Checklist:

Code follows style guide (black, flake8, mypy pass)
All tests pass (poetry run pytest)
New tests added for new features
Documentation updated (docstrings, wiki)
Pre-commit hooks pass
Branch is up-to-date with main
Commit messages follow conventions

Opening PR

Push to your fork:
```
git push origin feature/my-feature
```
Open PR on GitHub:
- Navigate to https://github.com/0x0000dead/whales-identification
- Click "Pull requests" → "New pull request"
- Select your branch

Fill PR template:

## Description

Brief description of changes

## Type of Change

- [ ] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation update

## Testing

- [ ] Unit tests added
- [ ] Integration tests added
- [ ] Manual testing completed

## Checklist

- [ ] Code follows style guide
- [ ] Tests pass
- [ ] Documentation updated

## Related Issues

Closes #42

PR Review Process

Automated checks run:
- Linting (black, flake8, isort, mypy)
- Security (bandit, safety)
- Tests (pytest with coverage)
- Docker build
Code review:
- At least 1 approval required
- Reviewers check:
  - Code quality
  - Test coverage
  - Documentation
  - Security

Address feedback:

# Make changes
git add .
git commit -m "fix: address review feedback"
git push origin feature/my-feature

Merge:
- After approval, maintainer merges PR
- Delete feature branch

Code Review Guidelines

As a Reviewer

What to check:

✅ Code correctness and logic
✅ Test coverage (>80% for new code)
✅ Documentation and comments
✅ Security issues (SQL injection, XSS, etc.)
✅ Performance implications
✅ Consistency with existing code

How to provide feedback:

# Good - constructive, specific

Consider using a list comprehension here for better readability:
``python

# Instead of

results = []
for item in items:
results.append(process(item))

# Use

results = [process(item) for item in items]
``

# Bad - vague, unhelpful

This code is bad.

As an Author

Responding to feedback:

# Good - acknowledge, explain, implement

Thanks for the suggestion! You're right that a list comprehension
is cleaner here. I've updated the code in commit abc123.

# Bad - defensive

My code is fine. This is just your opinion.

Common Tasks

Adding a New API Endpoint

Define endpoint in routers.py:

@router.post("/new-endpoint", response_model=NewResponse)
async def new_endpoint(request: NewRequest):
    # Implementation
    return NewResponse(...)

Add Pydantic models:

# response_models.py
class NewRequest(BaseModel):
    field: str

class NewResponse(BaseModel):
    result: str

Write tests:

# tests/api/test_new_endpoint.py
def test_new_endpoint_success(client):
    response = client.post("/new-endpoint", json={"field": "value"})
    assert response.status_code == 200

Update documentation:
- API Reference wiki page
- OpenAPI schema (auto-generated by FastAPI)

Adding a New Model

Create model class:

# whales_identify/models/new_model.py
class NewModel(nn.Module):
    def __init__(self, ...):
        ...

Add training script:

# whales_identify/train_new_model.py
def train_new_model():
    ...

Create model card:
- Add to Model-Cards wiki page
- Include metrics, intended use, limitations
Add integration:

# whales_be_service/whale_infer.py
if model_type == "new_model":
    self.model = NewModel.load(model_path)

Updating Dependencies

# Backend
cd whales_be_service
poetry add package_name
poetry lock
git add pyproject.toml poetry.lock

# Frontend
cd frontend
npm install package_name
git add package.json package-lock.json

# Commit
git commit -m "chore: add package_name dependency"

Community

Communication Channels

GitHub Issues: Bug reports, feature requests
GitHub Discussions: Questions, ideas

Getting Help

Check documentation first:
- Wiki pages
- README
- Code comments
Search existing issues:
- Someone may have had the same problem
Open a new issue:
- Provide context
- Include error messages
- Share minimal reproducible example

License

By contributing, you agree that your contributions will be licensed under the same license as the project:

Code: MIT License
Models: CC-BY-NC-4.0
Data: CC-BY-NC-4.0

See LICENSE, LICENSE_MODELS.md, and LICENSE_DATA.md.

Thank You!

Thank you for contributing to Whales Identification! Your contributions help protect marine mammals. 🐋

Related Pages:

Installation - Setup development environment
Testing - Testing guidelines
Architecture - System design
API Reference - API documentation

Contributing

Contributing Guide

Table of Contents

Getting Started

Prerequisites

Initial Setup

1. Fork and Clone

2. Backend Setup

3. Frontend Setup

4. Download Models

5. Verify Setup

Development Workflow

Git Workflow

Branch Naming

Commit Messages

Daily Workflow

Code Style

Python

PEP 8 with Black Formatting

Type Hints

Docstrings

TypeScript/React

ESLint + Prettier

Pre-commit Hooks

Installation

Hooks Configuration

Running Manually

Common Hook Failures

Black formatting

Flake8 linting

Mypy type checking

Testing

Running Tests

Writing Tests

Pull Request Process

Before Opening PR

Opening PR

PR Review Process

Code Review Guidelines

As a Reviewer

As an Author

Common Tasks

Adding a New API Endpoint

Adding a New Model

Updating Dependencies

Community

Communication Channels

Getting Help

License

Thank You!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally