# Chapter 10: Collaboration (Part 1)
## Items 82-88: PyPI, Virtual Environments, Documentation, and Module Organization

---

This notebook covers essential best practices for Python collaboration including package management, environment isolation, documentation standards, and API design.

## Item 82: Know Where to Find Community-Built Modules

### The Python Package Index (PyPI)

PyPI (https://pypi.org) is the central repository for open-source Python modules maintained by the community. It contains packages for virtually every domain.

### Installing Packages with pip

The `pip` command-line tool is used to install packages from PyPI:

In [None]:
# Install a package (run in terminal, not notebook)
# python3 -m pip install pytz

# Show installed package information
# python3 -m pip show pytz

### Package Licensing

Most PyPI packages have free or open-source licenses. Key points:

- Check the license before using in production
- Most popular packages allow commercial use
- When in doubt, consult with legal counsel
- Visit https://opensource.org for license details

---

## Item 83: Use Virtual Environments for Isolated and Reproducible Dependencies

### The Dependency Problem

Installing packages globally can cause **dependency hell**:

- Different projects may require different versions of the same package
- Transitive dependencies can conflict
- System-wide installations affect all Python programs

### Transitive Dependency Example

```
Sphinx requires: Jinja2==2.10
Flask requires:  Jinja2==2.10
```

**Current state**: Both work fine

```
# Six months later...
Jinja2 releases version 3.0 with breaking changes

You upgrade: pip install --upgrade Jinja2
Result: Sphinx breaks, Flask continues working
```

Python can only have **one global version** of each package!

### Virtual Environments with venv

The `venv` module (available since Python 3.4) solves this by creating isolated Python environments.

#### Creating a Virtual Environment

```bash
# Create virtual environment
python3 -m venv myproject

# Activate it (Linux/macOS)
source myproject/bin/activate

# Activate it (Windows)
myproject\Scripts\activate.bat

# Activate it (Windows PowerShell)
myproject\Scripts\activate.ps1
```

#### Virtual Environment Structure

```
myproject/
├── bin/          # Scripts and Python executable
├── include/      # C headers
├── lib/          # Installed packages
└── pyvenv.cfg    # Configuration
```

### Working with Virtual Environments

```bash
# After activation, python points to venv's Python
(myproject)$ which python3
/tmp/myproject/bin/python3

# Install packages in venv only
(myproject)$ python3 -m pip install pytz

# Deactivate when done
(myproject)$ deactivate
$
```

### Reproducing Environments

#### Exporting Dependencies

```bash
# Save all installed packages
(myproject)$ python3 -m pip freeze > requirements.txt

# View requirements
(myproject)$ cat requirements.txt
certifi==2019.3.9
chardet==3.0.4
idna==2.8
numpy==1.16.2
pytz==2018.9
requests==2.21.0
urllib3==1.24.1
```

#### Recreating an Environment

```bash
# Create new venv
$ python3 -m venv otherproject
$ cd otherproject
$ source bin/activate

# Install all dependencies from requirements.txt
(otherproject)$ python3 -m pip install -r /tmp/myproject/requirements.txt

# Verify installation
(otherproject)$ python3 -m pip list
```

### Virtual Environment Best Practices

**Moving Virtual Environments**
- Don't move venv directories (paths are hardcoded)
- Instead: Use `pip freeze`, create new venv, reinstall from `requirements.txt`

**Version Control**
- Commit `requirements.txt` to version control
- Don't commit the venv directory itself
- Note: Python version is NOT included in requirements.txt

**Team Collaboration**
- Update requirements.txt when adding/removing packages
- Keep dependencies in sync across team members

In [None]:
# Example: Checking if running in virtual environment
import sys
import os

def is_venv():
    """Check if running in a virtual environment."""
    return (hasattr(sys, 'real_prefix') or 
            (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))

print(f"Running in virtual environment: {is_venv()}")
print(f"Python executable: {sys.executable}")
print(f"Python prefix: {sys.prefix}")

---

## Item 84: Write Docstrings for Every Function, Class, and Module

### Why Documentation Matters

Python's dynamic nature makes documentation crucial:
- Documentation is accessible at runtime via `__doc__`
- Interactive development tools (IPython, help()) rely on it
- Community expectation: good code is well-documented

### Basic Docstring Example

In [None]:
def palindrome(word):
    """Return True if the given word is a palindrome."""
    return word == word[::-1]

# Docstring is accessible programmatically
print(f"Docstring: {palindrome.__doc__}")

# Test the function
assert palindrome('tacocat')
assert not palindrome('banana')
print("✓ Tests passed")

In [None]:
# Use help() to see formatted documentation
help(palindrome)

### Module Docstrings

Every module should have a top-level docstring (first statement in file):

```python
# words.py
#!/usr/bin/env python3
"""Library for finding linguistic patterns in words.

Testing how words relate to each other can be tricky sometimes!
This module provides easy ways to determine when words you've
found have special properties.

Available functions:
- palindrome: Determine if a word is a palindrome.
- check_anagram: Determine if two words are anagrams.
...
"""
```

### Class Docstrings

In [None]:
class Player:
    """Represents a player of the game.
    
    Subclasses may override the 'tick' method to provide
    custom animations for the player's movement depending
    on their power level, etc.
    
    Public attributes:
    - power: Unused power-ups (float between 0 and 1).
    - coins: Coins found during the level (integer).
    """
    
    def __init__(self, power=0.0, coins=0):
        self.power = power
        self.coins = coins
    
    def tick(self, dt):
        """Update player state for one time step.
        
        Args:
            dt: Time delta in seconds.
        """
        pass

# View the class documentation
help(Player)

### Function Docstrings

In [None]:
def find_anagrams(word, dictionary):
    """Find all anagrams for a word.
    
    This function only runs as fast as the test for
    membership in the 'dictionary' container.
    
    Args:
        word: String of the target word.
        dictionary: collections.abc.Container with all
            strings that are known to be actual words.
    
    Returns:
        List of anagrams that were found. Empty if
        none were found.
    """
    # Sort letters to find anagrams
    sorted_word = ''.join(sorted(word))
    anagrams = []
    
    for candidate in dictionary:
        if ''.join(sorted(candidate)) == sorted_word and candidate != word:
            anagrams.append(candidate)
    
    return anagrams

# Test it
words = ['listen', 'silent', 'enlist', 'hello', 'world']
result = find_anagrams('listen', words)
print(f"Anagrams of 'listen': {result}")

### Special Docstring Cases

**Functions with No Arguments**
- Single-sentence description is sufficient

**Functions Returning None**
- Omit mention of return value (don't say "returns None")

**Functions with Variable Arguments**
- Document `*args` and `**kwargs` purpose

**Functions with Default Values**
- Mention the defaults in docstring

**Generator Functions**
- Describe what the generator yields

**Coroutines**
- Explain when execution will stop

### Type Annotations and Docstrings

In [None]:
from typing import Container, List

def find_anagrams_typed(word: str, 
                       dictionary: Container[str]) -> List[str]:
    """Find all anagrams for a word.
    
    This function only runs as fast as the test for
    membership in the 'dictionary' container.
    
    Args:
        word: Target word.
        dictionary: All known actual words.
    
    Returns:
        Anagrams that were found.
    """
    sorted_word = ''.join(sorted(word))
    return [candidate for candidate in dictionary
            if ''.join(sorted(candidate)) == sorted_word
            and candidate != word]

# Note: With type annotations, docstring is more concise
# Type information is not duplicated

### Documentation Tools

**Built-in pydoc**
```bash
# Start local documentation server
python3 -m pydoc -p 1234
# Server ready at http://localhost:1234/
```

**Third-Party Tools**
- **Sphinx**: Most popular documentation generator (https://www.sphinx-doc.org)
- **Read the Docs**: Free hosting for open source projects (https://readthedocs.org)
- **MkDocs**: Alternative documentation tool with Markdown support

---

## Item 85: Use Packages to Organize Modules and Provide Stable APIs

### Package Basics

Packages organize related modules into hierarchical structures.

### Creating a Simple Package

```
Project Structure:
main.py
mypackage/
├── __init__.py
├── models.py
└── utils.py
```

```python
# main.py
from mypackage import utils

# Or import specific items
from mypackage.utils import log_base2_bucket
```

### Namespaces: Avoiding Name Conflicts

In [None]:
# Example: Two modules with same function name

# ❌ WRONG - Second import overwrites first
# from analysis.utils import inspect
# from frontend.utils import inspect  # Overwrites!

# ✓ SOLUTION 1: Use 'as' to rename
# from analysis.utils import inspect as analysis_inspect
# from frontend.utils import inspect as frontend_inspect

# ✓ SOLUTION 2: Use full module names
# import analysis.utils
# import frontend.utils
# 
# analysis.utils.inspect(value)
# frontend.utils.inspect(value)

print("See code comments for namespace conflict solutions")

### Stable APIs with __all__

The `__all__` attribute controls what gets exported when using `from module import *`:

In [None]:
# Simulating a module's __all__ definition

# models.py
__all__ = ['Projectile']  # Only export Projectile

class Projectile:
    def __init__(self, mass, velocity):
        self.mass = mass
        self.velocity = velocity

class _InternalHelper:  # Not exported (leading underscore)
    pass

# When someone does: from models import *
# They only get: Projectile
# They don't get: _InternalHelper

print(f"Public API: {__all__}")

### Package __init__.py Pattern

```python
# mypackage/__init__.py
__all__ = []

# Import from submodules
from .models import *
__all__ += models.__all__

from .utils import *
__all__ += utils.__all__
```

This allows users to import directly from the package:

```python
# Instead of:
from mypackage.models import Projectile
from mypackage.utils import simulate_collision

# Users can do:
from mypackage import Projectile, simulate_collision
```

### Beware of import *

**Problems with Wildcard Imports:**
- Hides the source of names (unclear where things come from)
- Can cause name conflicts and overwrite existing names
- Makes code harder to understand for new readers

**Recommendation:**
- Use explicit imports: `from x import y`
- Reserve `import *` for interactive sessions only

In [None]:
# Example: Explicit imports are clearer

# ✓ GOOD - Clear what's being used
from collections import defaultdict, Counter
from typing import List, Dict

# ❌ AVOID - Unclear what came from where
# from collections import *
# from typing import *

print("Explicit imports recommended over wildcard imports")

---

## Item 86: Consider Module-Scoped Code to Configure Deployment Environments

### The Deployment Environment Challenge

Programs need to run in different environments:
- **Production**: Real database, authentication, full infrastructure
- **Development**: Local machine, mock services, test data
- **Testing**: Isolated, deterministic, fast

### Solution: Module-Scoped Configuration

In [None]:
# Example: Environment-specific configuration

# Simulating __main__ module with TESTING flag
class FakeMain:
    TESTING = True  # Set based on environment

__main__ = FakeMain()

# db_connection.py (simulated)
class TestingDatabase:
    def query(self, sql):
        return ["mock_data"]

class RealDatabase:
    def query(self, sql):
        return ["real_data"]

# Module-scope conditional
if __main__.TESTING:
    Database = TestingDatabase
else:
    Database = RealDatabase

# Usage is the same regardless of environment
db = Database()
result = db.query("SELECT * FROM users")
print(f"Database type: {type(db).__name__}")
print(f"Query result: {result}")

### Platform-Specific Configuration

In [None]:
import sys

# Example: Platform-specific database selection
class Win32Database:
    platform = "Windows"

class PosixDatabase:
    platform = "POSIX (Linux/macOS)"

# Module-scope platform check
if sys.platform.startswith('win32'):
    Database = Win32Database
else:
    Database = PosixDatabase

db = Database()
print(f"Platform: {sys.platform}")
print(f"Selected database: {db.platform}")

### Environment Variable Configuration

In [None]:
import os

# Example: Using environment variables
class ProductionConfig:
    DEBUG = False
    DATABASE_URL = "postgresql://prod-server/db"

class DevelopmentConfig:
    DEBUG = True
    DATABASE_URL = "sqlite:///local.db"

# Check environment variable
environment = os.environ.get('APP_ENV', 'development')

if environment == 'production':
    Config = ProductionConfig
else:
    Config = DevelopmentConfig

print(f"Environment: {environment}")
print(f"Debug mode: {Config.DEBUG}")
print(f"Database: {Config.DATABASE_URL}")

### Best Practices for Deployment Configuration

**Simple Projects:**
- Use module-scope conditionals
- Check `__main__.TESTING` or similar flags

**Complex Projects:**
- Use dedicated configuration files (JSON, YAML, INI)
- Leverage `configparser` built-in module
- Keep configuration separate from code

**Environment Variables:**
- Good for sensitive data (passwords, API keys)
- Platform-agnostic
- Easy to change without code modification

---

## Item 87: Define a Root Exception to Insulate Callers from APIs

### The Problem with Built-in Exceptions

Using built-in exceptions (like `ValueError`) makes it hard for API consumers to distinguish between different error sources.

In [None]:
# ❌ AVOID - Using built-in exceptions directly
def determine_weight_bad(volume, density):
    if density <= 0:
        raise ValueError('Density must be positive')
    return volume * density

# Problem: ValueError could come from anywhere
try:
    result = determine_weight_bad(1, -1)
except ValueError as e:
    print(f"Error (could be from any code): {e}")

### Solution: Root Exception Hierarchy

In [None]:
# ✓ BETTER - Custom exception hierarchy

class Error(Exception):
    """Base-class for all exceptions raised by this module."""

class InvalidDensityError(Error):
    """There was a problem with a provided density value."""

class InvalidVolumeError(Error):
    """There was a problem with the provided volume value."""

def determine_weight(volume, density):
    if density <= 0:
        raise InvalidDensityError('Density must be positive')
    if volume <= 0:
        raise InvalidVolumeError('Volume must be positive')
    return volume * density

# Now API consumers can catch all module errors
try:
    weight = determine_weight(1, -1)
except Error as e:  # Catches all errors from this module
    print(f"API Error: {type(e).__name__}: {e}")

### Three-Layer Exception Handling

In [None]:
import logging

# Layer 1: Handle specific expected errors
# Layer 2: Catch general API errors (missed specific cases)
# Layer 3: Catch implementation bugs

try:
    weight = determine_weight(-1, 1)
    
except InvalidDensityError:
    # Layer 1: Handle specific case
    weight = 0
    print("Handled density error specifically")
    
except Error:
    # Layer 2: Catch other API errors (helps find missing handlers)
    logging.exception('Bug in the calling code')
    weight = 0
    
except Exception:
    # Layer 3: Catch implementation bugs in the API itself
    logging.exception('Bug in the API code!')
    raise  # Re-raise to caller

print(f"Final weight: {weight}")

### Future-Proofing with Exception Hierarchies

In [None]:
# Example: Adding more specific exceptions later

class NegativeDensityError(InvalidDensityError):
    """A provided density value was negative."""

def determine_weight_v2(volume, density):
    if density < 0:
        raise NegativeDensityError('Density must be non-negative')
    if density == 0:
        raise InvalidDensityError('Density cannot be zero')
    if volume <= 0:
        raise InvalidVolumeError('Volume must be positive')
    return volume * density

# Old code still works (catches InvalidDensityError)
try:
    weight = determine_weight_v2(1, -1)
except InvalidDensityError as e:
    print(f"Caught by parent class: {type(e).__name__}")

# New code can be more specific
try:
    weight = determine_weight_v2(1, -1)
except NegativeDensityError:
    print("Handling negative density specifically")
except InvalidDensityError:
    print("Handling other density errors")

### Broad Exception Categories

In [None]:
# Example: Organizing exceptions by functionality

class Error(Exception):
    """Base-class for all exceptions raised by this module."""

class WeightError(Error):
    """Base-class for weight calculation errors."""

class VolumeError(Error):
    """Base-class for volume calculation errors."""

class DensityError(Error):
    """Base-class for density calculation errors."""

# Specific exceptions inherit from categories
class InvalidDensityError(DensityError):
    pass

class NegativeDensityError(InvalidDensityError):
    pass

# Callers can catch at any level
try:
    pass  # Some operation
except DensityError:
    print("Any density-related error")
except Error:
    print("Any module error")

### Benefits of Root Exceptions

**1. Insulation**
- API consumers can catch all deliberate exceptions with one except block
- Prevents exceptions from propagating unexpectedly

**2. Bug Detection**
- Helps identify missing exception handlers in calling code
- Distinguishes between API bugs and usage bugs

**3. Future-Proofing**
- Add specific exceptions without breaking existing code
- Maintain backward compatibility through inheritance

---

## Item 88: Know How to Break Circular Dependencies

### The Circular Dependency Problem

Circular dependencies occur when modules import each other, creating an import cycle.

### Example: Circular Dependency

```python
# dialog.py
import app  # Imports app module

class Dialog:
    def __init__(self, save_dir):
        self.save_dir = save_dir

# Module-scope code that runs on import
save_dialog = Dialog(app.prefs.get('save_dir'))

def show():
    # Show the dialog
    pass
```

```python
# app.py
import dialog  # Imports dialog module

class Prefs:
    def get(self, key):
        return '/default/path'

prefs = Prefs()
```

**Problem:** When `app` tries to import `dialog`, `dialog` tries to import `app`, creating a cycle!

### Solution 1: Reordering Imports

In [None]:
# Example: Break cycle by moving imports

# Simulating the circular dependency solution

# Instead of importing at module level:
# import expensive_module  # ❌

# Import inside the function where it's needed:
def process_data(data):
    import expensive_module  # ✓ Only imports when function is called
    return expensive_module.process(data)

print("✓ Module imported successfully without circular dependency")

### Solution 2: Dynamic Imports

```python
# dialog.py
class Dialog:
    def __init__(self, save_dir):
        self.save_dir = save_dir

# Don't create instance at module level
save_dialog = None

def show():
    # Import and initialize when needed
    import app
    global save_dialog
    if save_dialog is None:
        save_dialog = Dialog(app.prefs.get('save_dir'))
    # Show the dialog
```

### Solution 3: Dependency Injection

In [None]:
# Example: Pass dependencies as parameters

class Dialog:
    def __init__(self, save_dir):
        self.save_dir = save_dir

class Prefs:
    def get(self, key):
        return '/default/path'

# Instead of Dialog accessing app.prefs directly,
# pass it as an argument
def create_dialog(prefs):
    save_dir = prefs.get('save_dir')
    return Dialog(save_dir)

# Usage
prefs = Prefs()
dialog = create_dialog(prefs)
print(f"Dialog save directory: {dialog.save_dir}")

### Solution 4: Refactoring to Common Module

```
Before (circular):
app.py ←→ dialog.py

After (no cycle):
app.py → common.py ← dialog.py
```

Extract shared dependencies into a separate module that both can import.

### Circular Dependency Best Practices

**Prevention:**
- Design clear module hierarchies upfront
- Keep modules focused and single-purpose
- Avoid module-scope code that depends on other modules

**Detection:**
- Watch for `ImportError` or `AttributeError` on import
- Use import analysis tools (e.g., `pydeps`)
- Draw module dependency graphs

**Resolution:**
- Move imports to function scope (lazy loading)
- Use dependency injection
- Refactor shared code to a common module
- Consider if the circular dependency indicates a design flaw

---

## Summary: Chapter 10 (Items 82-88)

### Key Takeaways

**Package Management (Item 82)**
- PyPI provides thousands of community packages
- Use `pip` to install packages
- Check licenses before production use

**Virtual Environments (Item 83)**
- Isolate project dependencies with `venv`
- Avoid dependency hell
- Use `requirements.txt` for reproducibility

**Documentation (Item 84)**
- Write docstrings for every module, class, and function
- Follow PEP 257 conventions
- Avoid redundancy with type annotations

**Package Organization (Item 85)**
- Use packages to organize modules
- Control public APIs with `__all__`
- Avoid wildcard imports in production code

**Deployment Configuration (Item 86)**
- Use module-scope code for environment-specific behavior
- Check platform, environment variables, or flags
- Keep complex config in separate files

**Exception Hierarchies (Item 87)**
- Define root exceptions for your modules
- Insulate callers from implementation details
- Enable future-proof exception handling

**Circular Dependencies (Item 88)**
- Prevent cycles with good design
- Break cycles with dynamic imports or dependency injection
- Refactor shared code to common modules

---

## Practice Exercises

Test your understanding with these exercises:

### Exercise 1: Virtual Environment Workflow

Create a simple workflow demonstration:

In [None]:
# Exercise: Simulate checking for virtual environment
import sys
import os

def check_environment():
    """Display current Python environment information."""
    print("Python Environment Information:")
    print(f"Python version: {sys.version}")
    print(f"Executable: {sys.executable}")
    print(f"Prefix: {sys.prefix}")
    
    is_venv = (hasattr(sys, 'real_prefix') or 
               (hasattr(sys, 'base_prefix') and sys.base_prefix != sys.prefix))
    print(f"In virtual environment: {is_venv}")

check_environment()

# TODO: Try running this in both a virtual environment and outside one

### Exercise 2: Write Complete Docstrings

In [None]:
# Exercise: Add proper docstrings to this code

def calculate_average(numbers, exclude_outliers=False):
    # TODO: Add comprehensive docstring
    if not numbers:
        return 0
    
    if exclude_outliers and len(numbers) > 2:
        numbers = sorted(numbers)[1:-1]
    
    return sum(numbers) / len(numbers)

# Test it
print(calculate_average([1, 2, 3, 4, 5]))
print(calculate_average([1, 2, 3, 4, 100], exclude_outliers=True))

# TODO: Check the docstring with help(calculate_average)

### Exercise 3: Exception Hierarchy Design

In [None]:
# Exercise: Design an exception hierarchy for a file processing module

# TODO: Create a root exception class
# TODO: Create specific exceptions for:
#   - File not found
#   - File too large
#   - Invalid file format
#   - Corrupted file data

# TODO: Implement a function that uses these exceptions
# TODO: Write comprehensive exception handling code

# Example structure:
class FileProcessingError(Exception):
    """Base exception for file processing operations."""
    pass

# Add more exception classes here...

print("TODO: Complete the exception hierarchy")

### Exercise 4: Configuration Management

In [None]:
# Exercise: Implement environment-based configuration

import os

class Config:
    """Base configuration."""
    DEBUG = False
    TESTING = False
    DATABASE_URL = "sqlite:///default.db"

class DevelopmentConfig(Config):
    # TODO: Set development-specific settings
    pass

class ProductionConfig(Config):
    # TODO: Set production-specific settings
    pass

class TestingConfig(Config):
    # TODO: Set testing-specific settings
    pass

# TODO: Implement logic to select config based on environment
def get_config():
    env = os.environ.get('APP_ENV', 'development')
    # Return appropriate config
    return Config

config = get_config()
print(f"Debug mode: {config.DEBUG}")

---

## Additional Resources

**Official Documentation:**
- Python Packaging User Guide: https://packaging.python.org
- PEP 257 (Docstring Conventions): https://www.python.org/dev/peps/pep-0257/
- Virtual Environments: https://docs.python.org/3/library/venv.html

**Tools:**
- PyPI: https://pypi.org
- Sphinx Documentation: https://www.sphinx-doc.org
- Read the Docs: https://readthedocs.org
- IPython: https://ipython.org

**Further Reading:**
- "The Hitchhiker's Guide to Python" - Packaging section
- "Python Testing with pytest" - Virtual environments chapter
- "Fluent Python" - Module and package organization