Skip to content

Contributing

github-actions[bot] edited this page Jun 11, 2026 · 2 revisions

Contributing Guide

Welcome to the Whales Identification project! This guide will help you get started with contributing.


Table of Contents


Getting Started

Prerequisites

  • Python 3.11.6
  • Node.js ≥20.19
  • Docker & Docker Compose
  • Git
  • Poetry
  • Text editor (VS Code, PyCharm, etc.)

Initial Setup

1. Fork and Clone

# Fork on GitHub first, then clone
git clone https://github.com/YOUR_USERNAME/whales-identification.git
cd whales-identification

# Add upstream remote
git remote add upstream https://github.com/0x0000dead/whales-identification.git

2. Backend Setup

cd whales_be_service

# Install Poetry (if not installed)
pip install poetry

# Install dependencies
poetry install

# Install pre-commit hooks
poetry run pre-commit install

# Verify installation
poetry run pytest --version

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Verify
npm run dev

4. Download Models

# From project root
pip install huggingface_hub==0.20.3
./scripts/download_models.sh

5. Verify Setup

# Run tests
cd whales_be_service
poetry run pytest

# Start services
cd ..
docker compose up --build

Development Workflow

Git Workflow

We use Feature Branch Workflow with the following conventions:

Branch Naming

<type>/<short-description>

Types:
- feature/  - New features
- fix/      - Bug fixes
- docs/     - Documentation changes
- refactor/ - Code refactoring
- test/     - Test additions/fixes
- chore/    - Maintenance tasks

Examples:
- feature/add-orca-detection
- fix/login-bug
- docs/update-api-reference
- refactor/optimize-inference

Commit Messages

Follow Conventional Commits:

<type>(<scope>): <subject>

<body>

<footer>

Types:
- feat: New feature
- fix: Bug fix
- docs: Documentation
- style: Code style (formatting, no logic change)
- refactor: Code refactoring
- test: Tests
- chore: Maintenance

Examples:
feat(api): add batch prediction endpoint

Implement ZIP upload and batch processing for
multiple whale images. Supports up to 100 images
per request.

Closes #42

---

fix(inference): resolve memory leak in model loading

Model was not being properly released after inference,
causing memory usage to grow over time. Now properly
using context managers.

Fixes #67

---

docs(wiki): add API reference documentation

Complete documentation for all API endpoints with
curl examples and response schemas.

Daily Workflow

# 1. Start your day - sync with upstream
git checkout main
git pull upstream main

# 2. Create feature branch
git checkout -b feature/my-new-feature

# 3. Make changes and commit frequently
git add .
git commit -m "feat(scope): description"

# 4. Keep branch updated
git fetch upstream
git rebase upstream/main

# 5. Push to your fork
git push origin feature/my-new-feature

# 6. Open Pull Request on GitHub

# 7. After PR is merged, cleanup
git checkout main
git pull upstream main
git branch -d feature/my-new-feature

Code Style

Python

PEP 8 with Black Formatting

# Good - follows black (line length 88)
def predict_whale_species(
    image_path: str,
    model: CetaceanIdentificationModel,
    config: dict,
) -> Detection:
    """
    Predict whale species from image.

    Args:
        image_path: Path to whale image
        model: Trained model instance
        config: Configuration dictionary

    Returns:
        Detection object with prediction results
    """
    image = load_image(image_path)
    tensor = preprocess(image)

    with torch.no_grad():
        embeddings = model(tensor)
        prediction = model.classify(embeddings)

    return Detection(**prediction)


# Bad - inconsistent formatting
def predict_whale_species(image_path,model,config):
    image=load_image(image_path);tensor=preprocess(image)
    with torch.no_grad():embeddings=model(tensor);prediction=model.classify(embeddings)
    return Detection(**prediction)

Type Hints

# Good - explicit types
from typing import Optional, List, Dict

def process_batch(
    images: List[np.ndarray],
    batch_size: int = 32
) -> List[Detection]:
    results: List[Detection] = []
    for batch in create_batches(images, batch_size):
        predictions = model.predict(batch)
        results.extend(predictions)
    return results


# Bad - no type hints
def process_batch(images, batch_size=32):
    results = []
    for batch in create_batches(images, batch_size):
        predictions = model.predict(batch)
        results.extend(predictions)
    return results

Docstrings

# Good - Google style docstrings
def calculate_metrics(
    predictions: torch.Tensor,
    targets: torch.Tensor
) -> Dict[str, float]:
    """
    Calculate evaluation metrics.

    Args:
        predictions: Model predictions (N, C)
        targets: Ground truth labels (N,)

    Returns:
        Dictionary with metrics:
            - precision: Precision@1
            - recall: Recall
            - f1: F1-score

    Raises:
        ValueError: If predictions and targets have different lengths

    Examples:
        >>> predictions = torch.randn(10, 5)
        >>> targets = torch.randint(0, 5, (10,))
        >>> metrics = calculate_metrics(predictions, targets)
        >>> print(metrics['precision'])
        0.85
    """
    if len(predictions) != len(targets):
        raise ValueError("Predictions and targets must have same length")

    precision = compute_precision(predictions, targets)
    recall = compute_recall(predictions, targets)
    f1 = 2 * (precision * recall) / (precision + recall)

    return {"precision": precision, "recall": recall, "f1": f1}

TypeScript/React

ESLint + Prettier

// Good - proper formatting and types
interface WhaleDetection {
  imageInd: string;
  classAnimal: string;
  idAnimal: string;
  probability: number;
  bbox: [number, number, number, number];
  mask: string | null;
}

const ResultDisplay: React.FC<{ detection: WhaleDetection }> = ({
  detection,
}) => {
  const confidencePercent = (detection.probability * 100).toFixed(1);

  return (
    <div className="result-container">
      <h2>{detection.idAnimal}</h2>
      <p>Confidence: {confidencePercent}%</p>
      {detection.mask && (
        <img
          src={`data:image/png;base64,${detection.mask}`}
          alt="Whale mask"
        />
      )}
    </div>
  );
};

// Bad - no types, inconsistent formatting
const ResultDisplay = ({ detection }) => {
  const confidencePercent = detection.probability * 100;
  return (
    <div>
      <h2>{detection.idAnimal}</h2>
      <p>Confidence: {confidencePercent}%</p>
      {detection.mask && (
        <img src={`data:image/png;base64,${detection.mask}`} />
      )}
    </div>
  );
};

Pre-commit Hooks

Installation

cd whales_be_service
poetry run pre-commit install

Hooks Configuration

We use 20 pre-commit hooks:

Category Hooks Auto-fix
Formatting black, isort, prettier ✅ Yes
Linting flake8 ❌ No
Type Checking mypy ❌ No
Security bandit ❌ No
Jupyter nbstripout, nbqa-* ✅ Partial
Basic trailing-whitespace, end-of-file-fixer, etc. ✅ Most

See PRE_COMMIT_GUIDE.md for full documentation.

Running Manually

# Run on staged files (automatic on commit)
poetry run pre-commit run

# Run on all files
poetry run pre-commit run --all-files

# Run specific hook
poetry run pre-commit run black --all-files

# Skip hooks (NOT RECOMMENDED)
git commit --no-verify -m "Emergency fix"

Common Hook Failures

Black formatting

# Failure: Code not formatted
# Fix: Run black
poetry run black .
git add .
git commit -m "style: format code with black"

Flake8 linting

# Failure: F401: Module imported but unused
# Fix: Remove unused import
-import pandas as pd  # Not used
+# pandas not needed

# Failure: E501: Line too long (>88 characters)
# Fix: Break line
-very_long_string = "This is an extremely long string that exceeds the maximum line length"
+very_long_string = (
+    "This is an extremely long string "
+    "that exceeds the maximum line length"
+)

Mypy type checking

# Failure: Missing type hints
# Fix: Add type hints
-def process(data):
+def process(data: List[str]) -> Dict[str, int]:
    return {"count": len(data)}

Testing

Running Tests

# All tests
poetry run pytest

# With coverage
poetry run pytest --cov=src --cov-report=term

# Fast tests only (skip slow integration tests)
poetry run pytest -m "not slow"

# Specific test file
poetry run pytest tests/api/test_post_endpoints.py -v

Writing Tests

# tests/unit/test_new_feature.py

import pytest
from whales_be_service.new_feature import new_function

def test_new_function_success():
    """Test new_function with valid input"""
    result = new_function(input_data="test")
    assert result == expected_output

def test_new_function_invalid_input():
    """Test new_function handles invalid input"""
    with pytest.raises(ValueError):
        new_function(input_data=None)

@pytest.mark.slow
def test_new_function_integration():
    """Test new_function with real model (slow)"""
    model = load_full_model()
    result = new_function(model=model, data=test_data)
    assert result is not None

See Testing Guide for comprehensive testing documentation.


Pull Request Process

Before Opening PR

Checklist:

  • Code follows style guide (black, flake8, mypy pass)
  • All tests pass (poetry run pytest)
  • New tests added for new features
  • Documentation updated (docstrings, wiki)
  • Pre-commit hooks pass
  • Branch is up-to-date with main
  • Commit messages follow conventions

Opening PR

  1. Push to your fork:

    git push origin feature/my-feature
  2. Open PR on GitHub:

  3. Fill PR template:

    ## Description
    
    Brief description of changes
    
    ## Type of Change
    
    - [ ] Bug fix
    - [ ] New feature
    - [ ] Breaking change
    - [ ] Documentation update
    
    ## Testing
    
    - [ ] Unit tests added
    - [ ] Integration tests added
    - [ ] Manual testing completed
    
    ## Checklist
    
    - [ ] Code follows style guide
    - [ ] Tests pass
    - [ ] Documentation updated
    
    ## Related Issues
    
    Closes #42

PR Review Process

  1. Automated checks run:

    • Linting (black, flake8, isort, mypy)
    • Security (bandit, safety)
    • Tests (pytest with coverage)
    • Docker build
  2. Code review:

    • At least 1 approval required
    • Reviewers check:
      • Code quality
      • Test coverage
      • Documentation
      • Security
  3. Address feedback:

    # Make changes
    git add .
    git commit -m "fix: address review feedback"
    git push origin feature/my-feature
  4. Merge:

    • After approval, maintainer merges PR
    • Delete feature branch

Code Review Guidelines

As a Reviewer

What to check:

  • ✅ Code correctness and logic
  • ✅ Test coverage (>80% for new code)
  • ✅ Documentation and comments
  • ✅ Security issues (SQL injection, XSS, etc.)
  • ✅ Performance implications
  • ✅ Consistency with existing code

How to provide feedback:

# Good - constructive, specific

Consider using a list comprehension here for better readability:
``python

# Instead of

results = []
for item in items:
results.append(process(item))

# Use

results = [process(item) for item in items]
``

# Bad - vague, unhelpful

This code is bad.

As an Author

Responding to feedback:

# Good - acknowledge, explain, implement

Thanks for the suggestion! You're right that a list comprehension
is cleaner here. I've updated the code in commit abc123.

# Bad - defensive

My code is fine. This is just your opinion.

Common Tasks

Adding a New API Endpoint

  1. Define endpoint in routers.py:
@router.post("/new-endpoint", response_model=NewResponse)
async def new_endpoint(request: NewRequest):
    # Implementation
    return NewResponse(...)
  1. Add Pydantic models:
# response_models.py
class NewRequest(BaseModel):
    field: str

class NewResponse(BaseModel):
    result: str
  1. Write tests:
# tests/api/test_new_endpoint.py
def test_new_endpoint_success(client):
    response = client.post("/new-endpoint", json={"field": "value"})
    assert response.status_code == 200
  1. Update documentation:
    • API Reference wiki page
    • OpenAPI schema (auto-generated by FastAPI)

Adding a New Model

  1. Create model class:
# whales_identify/models/new_model.py
class NewModel(nn.Module):
    def __init__(self, ...):
        ...
  1. Add training script:
# whales_identify/train_new_model.py
def train_new_model():
    ...
  1. Create model card:

    • Add to Model-Cards wiki page
    • Include metrics, intended use, limitations
  2. Add integration:

# whales_be_service/whale_infer.py
if model_type == "new_model":
    self.model = NewModel.load(model_path)

Updating Dependencies

# Backend
cd whales_be_service
poetry add package_name
poetry lock
git add pyproject.toml poetry.lock

# Frontend
cd frontend
npm install package_name
git add package.json package-lock.json

# Commit
git commit -m "chore: add package_name dependency"

Community

Communication Channels

  • GitHub Issues: Bug reports, feature requests
  • GitHub Discussions: Questions, ideas

Getting Help

  1. Check documentation first:

    • Wiki pages
    • README
    • Code comments
  2. Search existing issues:

    • Someone may have had the same problem
  3. Open a new issue:

    • Provide context
    • Include error messages
    • Share minimal reproducible example

License

By contributing, you agree that your contributions will be licensed under the same license as the project:

  • Code: MIT License
  • Models: CC-BY-NC-4.0
  • Data: CC-BY-NC-4.0

See LICENSE, LICENSE_MODELS.md, and LICENSE_DATA.md.


Thank You!

Thank you for contributing to Whales Identification! Your contributions help protect marine mammals. 🐋


Related Pages:

Clone this wiki locally