# Jupyter Notebooks and Git Version Control

## Learning Objectives
- Master Jupyter notebook features and shortcuts
- Understand Git version control basics
- Learn best practices for managing ML projects
- Practice collaborative development workflows

## Part 1: Jupyter Notebook Mastery

### 1.1 Jupyter Basics

In [None]:
# This is a code cell
# Press Shift+Enter to run it and move to the next cell
# Press Ctrl+Enter to run it and stay in the same cell

print("Welcome to Jupyter!")
print("Current Python version:")
import sys
print(sys.version)

### 1.2 Cell Types

Jupyter has two main cell types:
- **Code cells**: Execute Python code
- **Markdown cells**: Documentation with formatting

Press `Esc` to enter command mode, then:
- `M` to convert to Markdown
- `Y` to convert to Code
- `A` to insert cell above
- `B` to insert cell below
- `DD` to delete cell

### 1.3 Markdown Formatting

# Header 1
## Header 2
### Header 3

**Bold text** and *italic text*

Lists:
- Item 1
- Item 2
  - Subitem 2.1
  - Subitem 2.2

Numbered lists:
1. First
2. Second
3. Third

Code formatting: `inline code` or

```python
def hello():
    print("Hello, World!")
```

Mathematical equations: $y = mx + b$ or

$$E = mc^2$$

[Links](https://jupyter.org) and images:

Tables:

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| Data 1   | Data 2   | Data 3   |
| Data 4   | Data 5   | Data 6   |

### 1.4 Magic Commands

Jupyter provides special commands prefixed with `%` (line magic) or `%%` (cell magic).

In [None]:
# Time execution
%time sum(range(1000000))

In [None]:
# Time multiple runs for better accuracy
%timeit sum(range(1000))

In [None]:
# List all variables
a = 10
b = "hello"
c = [1, 2, 3]

%whos

In [None]:
# Write cell content to file
%%writefile example_script.py
def greet(name):
    return f"Hello, {name}!"

if __name__ == "__main__":
    print(greet("Student"))

In [None]:
# Run external Python script
%run example_script.py

In [None]:
# Display matplotlib plots inline
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 2*np.pi, 100)
plt.plot(x, np.sin(x))
plt.title("Sine Wave")
plt.show()

### 1.5 Shell Commands

You can run shell commands directly in Jupyter using `!`

In [None]:
# List files in current directory
!ls -la

In [None]:
# Check Python packages
!pip list | grep numpy

In [None]:
# Current working directory
!pwd

### 1.6 Interactive Widgets

Jupyter supports interactive widgets for dynamic visualizations.

In [None]:
# Install ipywidgets if not already installed
# !pip install ipywidgets

from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

def plot_sine(frequency=1.0, amplitude=1.0, phase=0.0):
    x = np.linspace(0, 2*np.pi, 1000)
    y = amplitude * np.sin(frequency * x + phase)
    
    plt.figure(figsize=(10, 4))
    plt.plot(x, y)
    plt.title(f"y = {amplitude:.1f} * sin({frequency:.1f}x + {phase:.1f})")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.grid(True, alpha=0.3)
    plt.ylim(-3, 3)
    plt.show()

# Create interactive plot
interact(plot_sine, 
         frequency=(0.5, 3.0, 0.1),
         amplitude=(0.5, 2.0, 0.1),
         phase=(0.0, 2*np.pi, 0.1));

### 1.7 Useful Keyboard Shortcuts

#### Command Mode (press Esc)
- `Enter`: Enter edit mode
- `A`: Insert cell above
- `B`: Insert cell below
- `M`: Change to Markdown
- `Y`: Change to Code
- `DD`: Delete cell
- `Z`: Undo cell deletion
- `Shift+J/K`: Select multiple cells
- `Shift+M`: Merge selected cells

#### Edit Mode (press Enter)
- `Shift+Enter`: Run cell, move to next
- `Ctrl+Enter`: Run cell, stay in cell
- `Alt+Enter`: Run cell, insert new below
- `Tab`: Code completion
- `Shift+Tab`: Show documentation
- `Ctrl+]`: Indent
- `Ctrl+[`: Dedent
- `Ctrl+A`: Select all
- `Ctrl+Z`: Undo

### Exercise 1: Jupyter Practice

Complete the following tasks:

In [None]:
# Task 1: Use tab completion to explore numpy functions
# Type: np. and press Tab
import numpy as np
# np.<press Tab here>

In [None]:
# Task 2: Get help on a function using ?
# Uncomment and run:
# np.random.randn?

In [None]:
# Task 3: Time the difference between list and numpy operations
import numpy as np

# Create data
list_data = list(range(1000000))
array_data = np.arange(1000000)

# Time list operation
print("List operation:")
%time list_sum = sum([x**2 for x in list_data])

# Time numpy operation
print("\nNumpy operation:")
%time array_sum = np.sum(array_data**2)

print(f"\nNumpy is faster!")

## Part 2: Git Version Control

### 2.1 Git Basics

Git is a distributed version control system essential for:
- Tracking changes
- Collaboration
- Experimentation
- Backup

### 2.2 Git Configuration

In [None]:
# Check if Git is installed
!git --version

In [None]:
# Configure Git (replace with your information)
# !git config --global user.name "Your Name"
# !git config --global user.email "your.email@example.com"

# Check configuration
!git config --list | grep user

### 2.3 Basic Git Workflow

Let's create a sample project to practice Git commands.

In [None]:
# Create a sample project directory
!mkdir -p ml_project
!cd ml_project && git init
print("Git repository initialized!")

In [None]:
%%writefile ml_project/model.py
"""Simple ML model for demonstration"""

import numpy as np

class SimpleModel:
    def __init__(self):
        self.weights = None
    
    def fit(self, X, y):
        """Fit the model to data"""
        # Simple linear regression using normal equation
        X_with_bias = np.c_[np.ones(X.shape[0]), X]
        self.weights = np.linalg.inv(X_with_bias.T @ X_with_bias) @ X_with_bias.T @ y
        return self
    
    def predict(self, X):
        """Make predictions"""
        X_with_bias = np.c_[np.ones(X.shape[0]), X]
        return X_with_bias @ self.weights

In [None]:
%%writefile ml_project/README.md
# ML Project

A simple machine learning project for demonstration.

## Features
- Simple linear regression model
- Data preprocessing utilities
- Visualization tools

## Installation
```bash
pip install numpy matplotlib scikit-learn
```

## Usage
```python
from model import SimpleModel

model = SimpleModel()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
```

In [None]:
%%writefile ml_project/.gitignore
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/

# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints

# Data
data/
*.csv
*.json
*.h5

# Models
models/
*.pkl
*.joblib
*.h5
*.pt

# IDE
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

### 2.4 Git Commands in Action

In [None]:
# Check status
!cd ml_project && git status

In [None]:
# Add files to staging area
!cd ml_project && git add .
!cd ml_project && git status

In [None]:
# Commit changes
!cd ml_project && git commit -m "Initial commit: Add model, README, and .gitignore"

In [None]:
# View commit history
!cd ml_project && git log --oneline

In [None]:
# Make changes to the model
%%writefile ml_project/utils.py
"""Utility functions for data processing"""

import numpy as np
from sklearn.preprocessing import StandardScaler

def preprocess_data(X, y=None):
    """Preprocess input data"""
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    if y is not None:
        return X_scaled, y
    return X_scaled

def split_data(X, y, test_size=0.2, random_state=42):
    """Split data into train and test sets"""
    np.random.seed(random_state)
    n_samples = len(X)
    n_test = int(n_samples * test_size)
    
    indices = np.random.permutation(n_samples)
    test_idx, train_idx = indices[:n_test], indices[n_test:]
    
    return X[train_idx], X[test_idx], y[train_idx], y[test_idx]

In [None]:
# Check what changed
!cd ml_project && git status
!cd ml_project && git diff

In [None]:
# Stage and commit new changes
!cd ml_project && git add utils.py
!cd ml_project && git commit -m "Add utility functions for data processing"

### 2.5 Branching and Merging

In [None]:
# Create and switch to new branch
!cd ml_project && git checkout -b feature/add-visualization

In [None]:
%%writefile ml_project/visualize.py
"""Visualization utilities for ML projects"""

import matplotlib.pyplot as plt
import numpy as np

def plot_predictions(y_true, y_pred, title="Predictions vs Actual"):
    """Plot predictions against actual values"""
    plt.figure(figsize=(10, 6))
    
    plt.subplot(1, 2, 1)
    plt.scatter(y_true, y_pred, alpha=0.5)
    plt.plot([y_true.min(), y_true.max()], 
             [y_true.min(), y_true.max()], 
             'r--', lw=2)
    plt.xlabel('Actual')
    plt.ylabel('Predicted')
    plt.title(title)
    
    plt.subplot(1, 2, 2)
    residuals = y_true - y_pred
    plt.hist(residuals, bins=30, edgecolor='black')
    plt.xlabel('Residuals')
    plt.ylabel('Frequency')
    plt.title('Residual Distribution')
    
    plt.tight_layout()
    plt.show()

def plot_learning_curve(train_scores, val_scores):
    """Plot learning curves"""
    plt.figure(figsize=(10, 6))
    plt.plot(train_scores, label='Training Score')
    plt.plot(val_scores, label='Validation Score')
    plt.xlabel('Epoch')
    plt.ylabel('Score')
    plt.title('Learning Curves')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

In [None]:
# Commit changes on feature branch
!cd ml_project && git add visualize.py
!cd ml_project && git commit -m "Add visualization utilities"

In [None]:
# Switch back to main branch
!cd ml_project && git checkout main

# Merge feature branch
!cd ml_project && git merge feature/add-visualization

In [None]:
# View branch history
!cd ml_project && git log --oneline --graph --all

### 2.6 Common Git Commands Reference

```bash
# Repository Setup
git init                    # Initialize new repository
git clone <url>            # Clone remote repository

# Basic Workflow
git status                 # Check status
git add <file>            # Stage file
git add .                 # Stage all files
git commit -m "message"   # Commit changes
git push                  # Push to remote
git pull                  # Pull from remote

# Branching
git branch                # List branches
git branch <name>         # Create branch
git checkout <name>       # Switch branch
git checkout -b <name>    # Create and switch
git merge <branch>        # Merge branch
git branch -d <name>      # Delete branch

# History
git log                   # View history
git log --oneline        # Compact view
git diff                  # Show changes
git diff <file>          # Show file changes

# Undoing Changes
git restore <file>        # Discard changes
git reset HEAD~1         # Undo last commit
git revert <commit>      # Revert commit

# Remote Repositories
git remote add origin <url>  # Add remote
git remote -v                # List remotes
git push -u origin main      # Push and set upstream
git fetch                    # Fetch updates
```

### 2.7 Best Practices for ML Projects

#### Directory Structure
```
ml_project/
├── data/
│   ├── raw/
│   ├── processed/
│   └── external/
├── notebooks/
│   ├── exploration/
│   └── experiments/
├── src/
│   ├── models/
│   ├── features/
│   └── utils/
├── tests/
├── models/
│   └── saved_models/
├── reports/
│   └── figures/
├── requirements.txt
├── setup.py
├── README.md
└── .gitignore
```

#### Version Control Tips
1. **Don't commit large files**: Use .gitignore for data, models
2. **Use meaningful commit messages**: Describe what and why
3. **Commit often**: Small, logical changes
4. **Branch for features**: Keep main branch stable
5. **Document everything**: README, docstrings, comments

### Exercise 2: Git Practice

Complete the following Git workflow:

In [None]:
# Task 1: Create a new branch for a feature
# !cd ml_project && git checkout -b feature/your-feature-name

In [None]:
# Task 2: Create a new file
%%writefile ml_project/config.py
"""Configuration settings for ML project"""

# Model parameters
LEARNING_RATE = 0.01
BATCH_SIZE = 32
EPOCHS = 100

# Data settings
TEST_SIZE = 0.2
RANDOM_STATE = 42

# Paths
DATA_PATH = "data/"
MODEL_PATH = "models/"

In [None]:
# Task 3: Stage, commit, and merge
# !cd ml_project && git add config.py
# !cd ml_project && git commit -m "Add configuration file"
# !cd ml_project && git checkout main
# !cd ml_project && git merge feature/your-feature-name

## Part 3: Jupyter + Git Integration

### 3.1 Challenges with Notebooks in Git

Jupyter notebooks can be problematic in Git because:
- They contain execution counts
- Output cells change frequently
- Metadata changes
- Binary data in outputs

### 3.2 Solutions

1. **Clear outputs before committing**
2. **Use nbstripout** to automatically clean notebooks
3. **Use ReviewNB** for better diffs
4. **Convert to Python scripts** for version control

In [None]:
# Install nbstripout
!pip install nbstripout

In [None]:
# Configure nbstripout for repository
!cd ml_project && nbstripout --install

In [None]:
# Convert notebook to Python script
!jupyter nbconvert --to script "02_Jupyter_Git_Basics.ipynb"

### 3.3 Jupyter Lab Extensions

Useful extensions for ML development:
- **jupyterlab-git**: Git integration
- **nbdime**: Notebook diffing and merging
- **variable-inspector**: Variable browser
- **jupyterlab-toc**: Table of contents

Install with:
```bash
pip install jupyterlab-git nbdime
jupyter lab build
```

## Summary

### Jupyter Mastery
- Use keyboard shortcuts for efficiency
- Leverage magic commands
- Create interactive visualizations
- Document with Markdown
- Use widgets for interactivity

### Git Essentials
- Initialize and configure repositories
- Stage, commit, push changes
- Branch for features
- Merge and resolve conflicts
- Use .gitignore appropriately

### Best Practices
- Clear notebook outputs before committing
- Use meaningful commit messages
- Organize projects consistently
- Document everything
- Version control early and often

### Next Steps
1. Set up your own ML project repository
2. Practice the Git workflow
3. Explore Jupyter extensions
4. Collaborate on a shared repository