# 🚀 Jupyter Lab Data Science Project Setup

Welcome to your professional Jupyter Lab data science environment using Docker Compose!

This notebook demonstrates setting up and using our containerized data science environment with the `jupyter/datascience-notebook` Docker image and Docker Compose v2.39.1.

## 📋 Table of Contents
1. [Project Structure Setup](#project-structure)
2. [Docker Compose Configuration](#docker-compose-config)
3. [Environment Variables Configuration](#env-config)
4. [Volume Mounts and Networking](#volumes-networking)
5. [Build and Launch Environment](#build-launch)
6. [Verify Installation and Access](#verify-installation)

---

## 1. Project Structure Setup {#project-structure}

Our project follows a professional data science directory structure that promotes organization and reproducibility:

```
JupyterLab/
├── docker-compose.yml          # Main Docker Compose configuration
├── .env.example               # Environment variables template
├── .env                      # Your environment variables (create from .env.example)
├── .gitignore               # Git ignore rules
├── README.md                # Project documentation
├── requirements.txt         # Additional Python packages
├── notebooks/               # Jupyter notebooks (organized by purpose)
│   ├── exploratory/        # Data exploration notebooks
│   ├── analysis/           # Analysis notebooks  
│   ├── modeling/           # Machine learning models
│   └── reports/            # Report notebooks
├── data/                   # Dataset storage
│   ├── raw/               # Raw, unprocessed data
│   ├── processed/         # Cleaned and processed data
│   └── external/          # External datasets
├── scripts/               # Python modules and utilities
│   ├── __init__.py       # Make it a Python package
│   ├── utils.py          # Utility functions
│   ├── data_processing.py # Data processing functions
│   └── visualization.py  # Visualization helpers
├── outputs/               # Generated outputs
│   ├── figures/          # Charts and plots
│   ├── models/           # Trained models
│   └── reports/          # Generated reports
└── database/             # Database initialization scripts
    └── init/             # PostgreSQL init scripts
```

### Key Benefits:
- **Separation of concerns**: Clear distinction between raw data, processed data, code, and outputs
- **Reproducibility**: Consistent structure makes projects easier to understand and reproduce
- **Collaboration**: Team members can quickly navigate and contribute to the project
- **Version control**: Organized structure works well with Git workflows

## 2. Docker Compose Configuration {#docker-compose-config}

Our `docker-compose.yml` file uses **Compose version 2.39.1** syntax and the `jupyter/datascience-notebook:latest` image.

In [None]:
# Let's examine our Docker Compose configuration
with open('../docker-compose.yml', 'r') as f:
    content = f.read()

print("=== Docker Compose Configuration ===")
print(content[:1500] + "..." if len(content) > 1500 else content)

### Key Docker Compose Features:

- **jupyter/datascience-notebook:latest**: Pre-installed with pandas, numpy, matplotlib, seaborn, scikit-learn, and more
- **Volume Mounts**: Persistent storage for notebooks, data, scripts, and outputs
- **Optional Services**: PostgreSQL database and Redis cache (use profiles to enable)
- **Custom Network**: Isolated network for service communication
- **Environment Variables**: Flexible configuration through `.env` file

## 3. Environment Variables Configuration {#env-config}

Environment variables provide secure and flexible configuration management. Let's examine the `.env.example` template:

In [None]:
# Examine environment variables template
with open('../.env.example', 'r') as f:
    env_content = f.read()

print("=== Environment Variables Template ===")
print(env_content)

print("\n=== Setup Instructions ===")
print("1. Copy .env.example to .env: cp .env.example .env")
print("2. Edit .env file with your preferred settings")
print("3. Never commit .env file to version control (it's in .gitignore)")

## 4. Volume Mounts and Networking {#volumes-networking}

Our Docker Compose setup includes sophisticated volume mounting and networking:

In [None]:
import os
import subprocess

print("=== Volume Mounts Configuration ===")
print("Host Directory -> Container Directory")
print("./notebooks -> /home/jovyan/work/notebooks")
print("./data -> /home/jovyan/work/data") 
print("./scripts -> /home/jovyan/work/scripts")
print("./outputs -> /home/jovyan/work/outputs")

print("\n=== Current Directory Structure ===")
for root, dirs, files in os.walk('../'):
    level = root.replace('../', '').count(os.sep)
    indent = ' ' * 2 * level
    print(f'{indent}{os.path.basename(root)}/')
    subindent = ' ' * 2 * (level + 1)
    for file in files[:3]:  # Show first 3 files only
        print(f'{subindent}{file}')
    if len(files) > 3:
        print(f'{subindent}... and {len(files) - 3} more files')

print("\n=== Network Configuration ===")
print("- Custom bridge network: jupyter-network")
print("- Isolated container communication")
print("- Host access via port 8888")
print("- Optional database services on custom network")

## 5. Build and Launch Environment {#build-launch}

Here are the essential Docker Compose commands for managing your environment:

In [None]:
print("=== Essential Docker Compose Commands ===")
print()

commands = {
    "Start Jupyter Lab only": "docker compose up -d",
    "Start with PostgreSQL": "docker compose --profile database up -d", 
    "Start with all services": "docker compose --profile database --profile cache up -d",
    "View logs": "docker compose logs jupyter",
    "Stop services": "docker compose down",
    "Restart Jupyter": "docker compose restart jupyter",
    "Access container shell": "docker compose exec jupyter bash",
    "Install packages": "docker compose exec jupyter pip install package-name",
    "Update services": "docker compose pull && docker compose up -d"
}

for description, command in commands.items():
    print(f"📌 {description}:")
    print(f"   {command}")
    print()

print("=== Troubleshooting Tips ===")
print("• Port conflicts: Change JUPYTER_PORT in .env file")
print("• Permission issues: Run 'docker compose exec jupyter chown -R jovyan:users /home/jovyan/work'") 
print("• Package issues: Use conda for complex packages: 'docker compose exec jupyter conda install package-name'")
print("• Reset environment: 'docker compose down && docker compose up -d'")

## 6. Verify Installation and Access {#verify-installation}

Let's verify that our Jupyter environment is working correctly by testing key components:

In [None]:
# 1. Test System Information
import sys
import platform
print("=== System Information ===")
print(f"Python Version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Architecture: {platform.architecture()}")
print()

In [None]:
# 2. Test Essential Data Science Libraries
print("=== Testing Essential Libraries ===")

# Test imports
try:
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import plotly.express as px
    import sklearn
    print("✅ All essential libraries imported successfully!")
    print(f"   • pandas: {pd.__version__}")
    print(f"   • numpy: {np.__version__}")
    print(f"   • matplotlib: {plt.matplotlib.__version__}")
    print(f"   • seaborn: {sns.__version__}")
    print(f"   • scikit-learn: {sklearn.__version__}")
    print("   • plotly: Available")
except ImportError as e:
    print(f"❌ Import error: {e}")

print()

In [None]:
# 3. Test Custom Utilities
print("=== Testing Custom Utilities ===")
try:
    # Add the scripts directory to the Python path
    sys.path.append('/home/jovyan/work/scripts')
    
    from scripts.utils import get_project_root, setup_logging
    from scripts.data_processing import clean_column_names
    from scripts.visualization import setup_matplotlib_style
    
    print("✅ Custom utilities imported successfully!")
    print(f"   • Project root: {get_project_root()}")
    print("   • Data processing utilities: Available")
    print("   • Visualization utilities: Available")
    
except ImportError as e:
    print(f"❌ Custom utilities error: {e}")
    print("   This is normal if running outside the Docker container")

print()

In [None]:
# 4. Create and Test Sample Data
print("=== Creating Sample Dataset ===")

# Create sample data
sample_data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'age': [25, 30, 35, 28, 32],
    'city': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney'],
    'salary': [70000, 80000, 90000, 75000, 85000],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales', 'Marketing']
}

df = pd.DataFrame(sample_data)
print("✅ Sample DataFrame created:")
print(df)
print(f"\nDataFrame shape: {df.shape}")
print(f"Data types:\n{df.dtypes}")

print()

In [None]:
# 5. Test Matplotlib Visualization
print("=== Testing Matplotlib Visualization ===")

plt.figure(figsize=(10, 6))

# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Bar plot
ax1.bar(df['name'], df['age'])
ax1.set_title('Age by Person')
ax1.set_xlabel('Name')
ax1.set_ylabel('Age')
ax1.tick_params(axis='x', rotation=45)

# Scatter plot
departments = df['department'].unique()
colors = plt.cm.Set3(np.linspace(0, 1, len(departments)))
for i, dept in enumerate(departments):
    dept_data = df[df['department'] == dept]
    ax2.scatter(dept_data['age'], dept_data['salary'], 
               label=dept, color=colors[i], s=100, alpha=0.7)

ax2.set_title('Salary vs Age by Department')
ax2.set_xlabel('Age')
ax2.set_ylabel('Salary')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ Matplotlib visualization test completed!")

In [None]:
# 6. Test Interactive Plotly Visualization
print("=== Testing Plotly Interactive Visualization ===")

# Create interactive scatter plot
fig = px.scatter(df, x='age', y='salary', color='department', size='age',
                 hover_data=['name'], title='Interactive Salary vs Age by Department')

fig.update_layout(
    width=800,
    height=500,
    showlegend=True,
    template="plotly_white"
)

fig.show()

print("✅ Plotly interactive visualization test completed!")
print("\n=== Next Steps ===")
print("🎉 Your Jupyter Lab environment is ready!")
print("• Access Jupyter Lab at: http://localhost:8888")
print("• Default token: datascience-token (change in .env)")
print("• Explore the notebooks/ directory for more examples")
print("• Check out the scripts/ directory for utility functions")
print("• Use the data/ directory for your datasets")
print("• Save outputs to the outputs/ directory")