# Jupyter Notebook
## What are Jupyter Notebooks?
Jupyter Notebooks are interactive documents that combine:
- **Code cells**: Executable Python code
- **Markdown cells**: Rich text documentation
- **Output cells**: Results of code execution

They are ideal for:
- Data exploration and analysis
- Teaching and learning
- Research reproducibility
- Creating reports with live code

## Jupyter Notebook Structure

### Cell Types

#### 1. Code Cells

Code cells contain executable Python code. You run them and the output appears below.

In [None]:
# Code cell - executes Python
import numpy as np
import matplotlib.pyplot as plt

# Create data
x = np.linspace(0, 10, 100)
y = np.sin(x)

print(f"Created {len(x)} data points")

#### 2. Markdown Cells

Markdown cells contain formatted text, equations, and documentation.

In [None]:
# This is a code cell showing an example
message = "This is from a code cell"
print(message)

## Markdown Basics


---

## Jupyter Features

### Cell Execution Shortcuts

| Shortcut | Action |
|----------|--------|
| `Shift + Enter` | Run cell and move to next |
| `Ctrl + Enter` | Run cell and stay |
| `Alt + Enter` | Run cell and insert new cell below |
| `D, D` (press twice) | Delete cell |
| `A` | Insert cell above |
| `B` | Insert cell below |
| `M` | Change cell to Markdown |
| `Y` | Change cell to Code |

### Magic Commands

Jupyter has special commands that start with `%` or `%%`:

In [None]:
# Line magic - affects single line
%timeit np.random.rand(1000)

# Cell magic - affects entire cell
%%time
data = np.random.rand(10000)
result = np.mean(data)

Common magic commands:
- `%timeit`: Measure execution time
- `%%time`: Time entire cell
- `%pwd`: Print working directory
- `%ls`: List files
- `%load`: Load code from file
- `%matplotlib inline`: Display plots inline (default)

---

## Debugging in Jupyter

### Print Debugging

Use print statements to trace execution flow:

In [None]:
def calculate_energy(atomic_positions):
    print(f"Input shape: {atomic_positions.shape}")
    print(f"First position: {atomic_positions[0]}")
    
    distances = np.linalg.norm(atomic_positions, axis=1)
    print(f"Distance statistics:")
    print(f"  Mean: {np.mean(distances):.3f}")
    print(f"  Std: {np.std(distances):.3f}")
    
    return distances

# Test function
positions = np.array([[0, 0, 0], [1, 0, 0], [0, 1, 0]])
calculate_energy(positions)

### Checking Variable State

List all variables in memory:

In [None]:
# List all variables
%who

# List all variables with details
%whos

# Display value with type
value = 42
print(f"Value: {value}, Type: {type(value).__name__}")

---

## Best Practices

### 1. Cell Organization

**DO:** Organize code logically across cells

In [None]:
# Cell 1: Imports
import numpy as np
import matplotlib.pyplot as plt

# Cell 2: Setup and constants
A = 5.43  # Lattice constant (Å)
T = 298   # Temperature (K)

# Cell 3: Data generation
positions = np.array([[0, 0, 0], [0.25, 0.25, 0.25]])

# Cell 4: Analysis
print(f"Number of atoms: {len(positions)}")
print(f"Lattice constant: {A} Å")
print(f"Temperature: {T} K")

**DON'T:** Put everything in one cell

- Hard to read and debug
- Can't run sections independently
- Poor organization

### 2. Clear State Before Starting

Restart kernel before major analysis changes:

In [None]:
# Clear all variables (careful!)
# %reset -f  # Uncomment to use

# Or restart kernel from menu:
# Menu: Kernel -> Restart & Run All

### 3. Use Descriptive Variable Names

**GOOD:** Descriptive names

In [None]:
# GOOD - descriptive names
bulk_modulus_si = 185  # GPa
energy_per_atom = -4.63  # eV
lattice_constant_fcc = 3.61  # Å

print(f"Bulk modulus: {bulk_modulus_si} GPa")
print(f"Energy per atom: {energy_per_atom} eV")

**BAD:** Generic names

In [None]:
# BAD - generic names
x = 185
y = -4.63
z = 3.61

### 4. Document with Markdown

Use markdown cells to explain your analysis:

## Example: Silicon Bulk Modulus Analysis

In this section, we calculate the bulk modulus of silicon.

### Method
- DFT calculation with PBE functional
- Convergence criteria: 10^-6 eV
- k-point mesh: 8x8x8

### 5. Save Outputs for Reports

Create publication-ready plots:

In [None]:
# Create publication-ready plot
plt.figure(figsize=(8, 5), dpi=300)

composition = [0, 0.25, 0.5, 0.75, 1.0]
energy = [-4.0, -4.2, -4.3, -4.2, -4.0]

plt.plot(composition, energy, 'o-', linewidth=2, markersize=8)
plt.xlabel('Composition (at%)')
plt.ylabel('Energy (eV/atom)')
plt.title('Formation Energy of Alloy')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('formation_energy.png', dpi=300, bbox_inches='tight')
plt.show()

---

## Integration with Git

### Before Committing

1. **Clear all outputs** (optional but recommended)
2. **Restart and run all** to verify notebook works
3. **Check for large files** before committing

### .gitignore for Jupyter Projects

Create a `.gitignore` file in your project root:
```
# Jupyter Notebook
.ipynb_checkpoints/
*.ipynb_checkpoints

# Python
__pycache__/
*.py[cod]
*$py.class

# Virtual environment
.venv/
venv/
```

### Version Control Strategy

- Keep notebooks small and focused
- Use Git LFS for large data files
- Commit after major changes
- Write meaningful commit messages

---

## Markdown for Documentation

### Basic Syntax

- `# Heading 1`
- `## Heading 2`
- `### Heading 3`
- `**Bold**` and `*italic*`
- Unordered lists with `-`
- Ordered lists with `1.`
- `[Link](url)`
- `` `Inline code` ``
- Code blocks with triple backticks

In [None]:
# Example of formatting in a markdown cell (this is a comment in code cell)
print("Run this cell to test Jupyter setup")

### Math in Markdown

Inline math: `$E = mc^2$`

Display math: `$$F = ma$$`

Array notation:
$$
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{bmatrix}
$$

---

## Performance Tips

### 1. Use Vectorized Operations

Vectorized operations are much faster than loops:

In [None]:
import time

# BAD: Using loop
start = time.time()
result_loop = []
for i in range(10000):
    result_loop.append(i**2)
time_loop = time.time() - start

# GOOD: NumPy vectorized
start = time.time()
result_vectorized = np.arange(10000) ** 2
time_vectorized = time.time() - start

print(f"Loop time: {time_loop:.4f} seconds")
print(f"Vectorized time: {time_vectorized:.4f} seconds")
print(f"Speedup: {time_loop/time_vectorized:.1f}x")

### 2. Cache Expensive Computations

Store results of expensive calculations:

In [None]:
# Check if result exists in memory
if 'energies' not in locals():
    print("Computing energies...")
    energies = np.random.randn(1000)
else:
    print("Using cached energies")

print(f"Number of energies: {len(energies)}")

---

## Common Pitfalls

1. **Cell execution order matters**: Always use "Restart & Run All" before sharing
2. **Global state pollution**: Variables from previous cells may interfere
3. **Hidden state**: Restart kernel when analysis changes significantly
4. **Large outputs**: Clear output cells before committing to Git
5. **Forgotten imports**: Put all imports at the top of the notebook

---

## Summary

By the end of this chapter, you should be able to:
- Create and organize Jupyter notebooks effectively
- Use Markdown for clear documentation
- Debug code using print statements and magic commands
- Follow best practices for notebook organization
- Integrate notebooks with Git workflow
- Avoid common pitfalls in Jupyter development