-
Notifications
You must be signed in to change notification settings - Fork 1
Contributing
Welcome to the Whales Identification project! This guide will help you get started with contributing.
- Getting Started
- Development Workflow
- Code Style
- Pre-commit Hooks
- Testing
- Pull Request Process
- Code Review Guidelines
- Common Tasks
- Python 3.11.6
- Node.js ≥20.19
- Docker & Docker Compose
- Git
- Poetry
- Text editor (VS Code, PyCharm, etc.)
# Fork on GitHub first, then clone
git clone https://github.com/YOUR_USERNAME/whales-identification.git
cd whales-identification
# Add upstream remote
git remote add upstream https://github.com/0x0000dead/whales-identification.gitcd whales_be_service
# Install Poetry (if not installed)
pip install poetry
# Install dependencies
poetry install
# Install pre-commit hooks
poetry run pre-commit install
# Verify installation
poetry run pytest --versioncd frontend
# Install dependencies
npm install
# Verify
npm run dev# From project root
pip install huggingface_hub==0.20.3
./scripts/download_models.sh# Run tests
cd whales_be_service
poetry run pytest
# Start services
cd ..
docker compose up --buildWe use Feature Branch Workflow with the following conventions:
<type>/<short-description>
Types:
- feature/ - New features
- fix/ - Bug fixes
- docs/ - Documentation changes
- refactor/ - Code refactoring
- test/ - Test additions/fixes
- chore/ - Maintenance tasks
Examples:
- feature/add-orca-detection
- fix/login-bug
- docs/update-api-reference
- refactor/optimize-inference
Follow Conventional Commits:
<type>(<scope>): <subject>
<body>
<footer>
Types:
- feat: New feature
- fix: Bug fix
- docs: Documentation
- style: Code style (formatting, no logic change)
- refactor: Code refactoring
- test: Tests
- chore: Maintenance
Examples:
feat(api): add batch prediction endpoint
Implement ZIP upload and batch processing for
multiple whale images. Supports up to 100 images
per request.
Closes #42
---
fix(inference): resolve memory leak in model loading
Model was not being properly released after inference,
causing memory usage to grow over time. Now properly
using context managers.
Fixes #67
---
docs(wiki): add API reference documentation
Complete documentation for all API endpoints with
curl examples and response schemas.
# 1. Start your day - sync with upstream
git checkout main
git pull upstream main
# 2. Create feature branch
git checkout -b feature/my-new-feature
# 3. Make changes and commit frequently
git add .
git commit -m "feat(scope): description"
# 4. Keep branch updated
git fetch upstream
git rebase upstream/main
# 5. Push to your fork
git push origin feature/my-new-feature
# 6. Open Pull Request on GitHub
# 7. After PR is merged, cleanup
git checkout main
git pull upstream main
git branch -d feature/my-new-feature# Good - follows black (line length 88)
def predict_whale_species(
image_path: str,
model: CetaceanIdentificationModel,
config: dict,
) -> Detection:
"""
Predict whale species from image.
Args:
image_path: Path to whale image
model: Trained model instance
config: Configuration dictionary
Returns:
Detection object with prediction results
"""
image = load_image(image_path)
tensor = preprocess(image)
with torch.no_grad():
embeddings = model(tensor)
prediction = model.classify(embeddings)
return Detection(**prediction)
# Bad - inconsistent formatting
def predict_whale_species(image_path,model,config):
image=load_image(image_path);tensor=preprocess(image)
with torch.no_grad():embeddings=model(tensor);prediction=model.classify(embeddings)
return Detection(**prediction)# Good - explicit types
from typing import Optional, List, Dict
def process_batch(
images: List[np.ndarray],
batch_size: int = 32
) -> List[Detection]:
results: List[Detection] = []
for batch in create_batches(images, batch_size):
predictions = model.predict(batch)
results.extend(predictions)
return results
# Bad - no type hints
def process_batch(images, batch_size=32):
results = []
for batch in create_batches(images, batch_size):
predictions = model.predict(batch)
results.extend(predictions)
return results# Good - Google style docstrings
def calculate_metrics(
predictions: torch.Tensor,
targets: torch.Tensor
) -> Dict[str, float]:
"""
Calculate evaluation metrics.
Args:
predictions: Model predictions (N, C)
targets: Ground truth labels (N,)
Returns:
Dictionary with metrics:
- precision: Precision@1
- recall: Recall
- f1: F1-score
Raises:
ValueError: If predictions and targets have different lengths
Examples:
>>> predictions = torch.randn(10, 5)
>>> targets = torch.randint(0, 5, (10,))
>>> metrics = calculate_metrics(predictions, targets)
>>> print(metrics['precision'])
0.85
"""
if len(predictions) != len(targets):
raise ValueError("Predictions and targets must have same length")
precision = compute_precision(predictions, targets)
recall = compute_recall(predictions, targets)
f1 = 2 * (precision * recall) / (precision + recall)
return {"precision": precision, "recall": recall, "f1": f1}// Good - proper formatting and types
interface WhaleDetection {
imageInd: string;
classAnimal: string;
idAnimal: string;
probability: number;
bbox: [number, number, number, number];
mask: string | null;
}
const ResultDisplay: React.FC<{ detection: WhaleDetection }> = ({
detection,
}) => {
const confidencePercent = (detection.probability * 100).toFixed(1);
return (
<div className="result-container">
<h2>{detection.idAnimal}</h2>
<p>Confidence: {confidencePercent}%</p>
{detection.mask && (
<img
src={`data:image/png;base64,${detection.mask}`}
alt="Whale mask"
/>
)}
</div>
);
};
// Bad - no types, inconsistent formatting
const ResultDisplay = ({ detection }) => {
const confidencePercent = detection.probability * 100;
return (
<div>
<h2>{detection.idAnimal}</h2>
<p>Confidence: {confidencePercent}%</p>
{detection.mask && (
<img src={`data:image/png;base64,${detection.mask}`} />
)}
</div>
);
};cd whales_be_service
poetry run pre-commit installWe use 20 pre-commit hooks:
| Category | Hooks | Auto-fix |
|---|---|---|
| Formatting | black, isort, prettier | ✅ Yes |
| Linting | flake8 | ❌ No |
| Type Checking | mypy | ❌ No |
| Security | bandit | ❌ No |
| Jupyter | nbstripout, nbqa-* | ✅ Partial |
| Basic | trailing-whitespace, end-of-file-fixer, etc. | ✅ Most |
See PRE_COMMIT_GUIDE.md for full documentation.
# Run on staged files (automatic on commit)
poetry run pre-commit run
# Run on all files
poetry run pre-commit run --all-files
# Run specific hook
poetry run pre-commit run black --all-files
# Skip hooks (NOT RECOMMENDED)
git commit --no-verify -m "Emergency fix"# Failure: Code not formatted
# Fix: Run black
poetry run black .
git add .
git commit -m "style: format code with black"# Failure: F401: Module imported but unused
# Fix: Remove unused import
-import pandas as pd # Not used
+# pandas not needed
# Failure: E501: Line too long (>88 characters)
# Fix: Break line
-very_long_string = "This is an extremely long string that exceeds the maximum line length"
+very_long_string = (
+ "This is an extremely long string "
+ "that exceeds the maximum line length"
+)# Failure: Missing type hints
# Fix: Add type hints
-def process(data):
+def process(data: List[str]) -> Dict[str, int]:
return {"count": len(data)}# All tests
poetry run pytest
# With coverage
poetry run pytest --cov=src --cov-report=term
# Fast tests only (skip slow integration tests)
poetry run pytest -m "not slow"
# Specific test file
poetry run pytest tests/api/test_post_endpoints.py -v# tests/unit/test_new_feature.py
import pytest
from whales_be_service.new_feature import new_function
def test_new_function_success():
"""Test new_function with valid input"""
result = new_function(input_data="test")
assert result == expected_output
def test_new_function_invalid_input():
"""Test new_function handles invalid input"""
with pytest.raises(ValueError):
new_function(input_data=None)
@pytest.mark.slow
def test_new_function_integration():
"""Test new_function with real model (slow)"""
model = load_full_model()
result = new_function(model=model, data=test_data)
assert result is not NoneSee Testing Guide for comprehensive testing documentation.
Checklist:
- Code follows style guide (black, flake8, mypy pass)
- All tests pass (
poetry run pytest) - New tests added for new features
- Documentation updated (docstrings, wiki)
- Pre-commit hooks pass
- Branch is up-to-date with main
- Commit messages follow conventions
-
Push to your fork:
git push origin feature/my-feature
-
Open PR on GitHub:
- Navigate to https://github.com/0x0000dead/whales-identification
- Click "Pull requests" → "New pull request"
- Select your branch
-
Fill PR template:
## Description Brief description of changes ## Type of Change - [ ] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation update ## Testing - [ ] Unit tests added - [ ] Integration tests added - [ ] Manual testing completed ## Checklist - [ ] Code follows style guide - [ ] Tests pass - [ ] Documentation updated ## Related Issues Closes #42
-
Automated checks run:
- Linting (black, flake8, isort, mypy)
- Security (bandit, safety)
- Tests (pytest with coverage)
- Docker build
-
Code review:
- At least 1 approval required
- Reviewers check:
- Code quality
- Test coverage
- Documentation
- Security
-
Address feedback:
# Make changes git add . git commit -m "fix: address review feedback" git push origin feature/my-feature
-
Merge:
- After approval, maintainer merges PR
- Delete feature branch
What to check:
- ✅ Code correctness and logic
- ✅ Test coverage (>80% for new code)
- ✅ Documentation and comments
- ✅ Security issues (SQL injection, XSS, etc.)
- ✅ Performance implications
- ✅ Consistency with existing code
How to provide feedback:
# Good - constructive, specific
Consider using a list comprehension here for better readability:
``python
# Instead of
results = []
for item in items:
results.append(process(item))
# Use
results = [process(item) for item in items]
``
# Bad - vague, unhelpful
This code is bad.Responding to feedback:
# Good - acknowledge, explain, implement
Thanks for the suggestion! You're right that a list comprehension
is cleaner here. I've updated the code in commit abc123.
# Bad - defensive
My code is fine. This is just your opinion.- Define endpoint in routers.py:
@router.post("/new-endpoint", response_model=NewResponse)
async def new_endpoint(request: NewRequest):
# Implementation
return NewResponse(...)- Add Pydantic models:
# response_models.py
class NewRequest(BaseModel):
field: str
class NewResponse(BaseModel):
result: str- Write tests:
# tests/api/test_new_endpoint.py
def test_new_endpoint_success(client):
response = client.post("/new-endpoint", json={"field": "value"})
assert response.status_code == 200-
Update documentation:
- API Reference wiki page
- OpenAPI schema (auto-generated by FastAPI)
- Create model class:
# whales_identify/models/new_model.py
class NewModel(nn.Module):
def __init__(self, ...):
...- Add training script:
# whales_identify/train_new_model.py
def train_new_model():
...-
Create model card:
- Add to Model-Cards wiki page
- Include metrics, intended use, limitations
-
Add integration:
# whales_be_service/whale_infer.py
if model_type == "new_model":
self.model = NewModel.load(model_path)# Backend
cd whales_be_service
poetry add package_name
poetry lock
git add pyproject.toml poetry.lock
# Frontend
cd frontend
npm install package_name
git add package.json package-lock.json
# Commit
git commit -m "chore: add package_name dependency"- GitHub Issues: Bug reports, feature requests
- GitHub Discussions: Questions, ideas
-
Check documentation first:
- Wiki pages
- README
- Code comments
-
Search existing issues:
- Someone may have had the same problem
-
Open a new issue:
- Provide context
- Include error messages
- Share minimal reproducible example
By contributing, you agree that your contributions will be licensed under the same license as the project:
- Code: MIT License
- Models: CC-BY-NC-4.0
- Data: CC-BY-NC-4.0
See LICENSE, LICENSE_MODELS.md, and LICENSE_DATA.md.
Thank you for contributing to Whales Identification! Your contributions help protect marine mammals. 🐋
Related Pages:
- Installation - Setup development environment
- Testing - Testing guidelines
- Architecture - System design
- API Reference - API documentation