# Amorsize: Quick Start Notebook

This notebook provides a simple template for using **Amorsize** to optimize your parallel processing.

## What is Amorsize?

Amorsize automatically analyzes your function and data to determine:
- **Should you use parallel processing?** (or stick with serial execution)
- **How many workers** (`n_jobs`) to use
- **What chunk size** to use for optimal performance

It prevents "negative scaling" where parallelization actually makes your code slower!

## Installation

```bash
pip install -e /path/to/amorsize
```

Or if you have the package installed:
```bash
pip install amorsize
```

---
## Step 1: Import Amorsize

Just one import is all you need!

In [None]:
from amorsize import optimize

---
## Step 2: Define Your Function

**üëâ PLACE YOUR FUNCTION HERE**

Replace the example function below with your own function. Your function should:
- Accept a single argument (one data item)
- Return a result
- Be defined at module level (not inside another function)

In [None]:
# ============================================================================
# üëâ PLACE YOUR FUNCTION HERE
# ============================================================================

def my_function(data_item):
    """
    Replace this with your own function.
    
    This example calculates a simple mathematical result.
    Your function can do anything: data processing, API calls,
    computations, transformations, etc.
    """
    # Example: Some computation
    result = 0
    for i in range(1000):
        result += data_item ** 2 + i
    return result

# ============================================================================

---
## Step 3: Prepare Your Dataset

**üëâ PLACE YOUR DATASET HERE**

Replace the example data below with your own dataset. Your data can be:
- A list
- A range
- A generator
- Any iterable

In [None]:
# ============================================================================
# üëâ PLACE YOUR DATASET HERE
# ============================================================================

# Example: A list of numbers
my_data = list(range(1000))

# Other examples:
# my_data = [1, 2, 3, 4, 5, ...]
# my_data = pd.read_csv('file.csv')['column'].tolist()
# my_data = range(10000)
# my_data = (x for x in range(1000))  # generator

# ============================================================================

print(f"Dataset size: {len(my_data) if hasattr(my_data, '__len__') else 'Unknown (generator)'}")

---
## Step 4: Run Amorsize Analysis

This is where the magic happens! Amorsize will:
1. Test your function on a small sample
2. Measure execution time and memory usage
3. Calculate optimal parallelization parameters
4. Tell you whether to use parallel or serial execution

In [None]:
# Analyze and get recommendations
result = optimize(my_function, my_data, verbose=True)

print("\n" + "="*70)
print("AMORSIZE RECOMMENDATIONS")
print("="*70)
print(result)

---
## Step 5: View Detailed Results

Access the optimization results:

In [None]:
print(f"Recommended workers (n_jobs): {result.n_jobs}")
print(f"Recommended chunksize: {result.chunksize}")
print(f"Estimated speedup: {result.estimated_speedup:.2f}x")
print(f"\nReason: {result.reason}")

if result.warnings:
    print(f"\nWarnings:")
    for warning in result.warnings:
        print(f"  - {warning}")

---
## Step 6: Apply the Recommendations

Now use the recommendations with Python's multiprocessing!

In [None]:
from multiprocessing import Pool
import time

# Apply Amorsize recommendations
if result.n_jobs > 1:
    print(f"Using PARALLEL execution with {result.n_jobs} workers...")
    
    start_time = time.time()
    with Pool(processes=result.n_jobs) as pool:
        results = pool.map(my_function, my_data, chunksize=result.chunksize)
    elapsed_time = time.time() - start_time
    
    print(f"‚úì Completed in {elapsed_time:.2f} seconds")
    print(f"  Processed {len(results)} items")
    print(f"  Average time per item: {elapsed_time/len(results)*1000:.2f}ms")
    
else:
    print(f"Using SERIAL execution (parallel not beneficial)...")
    
    start_time = time.time()
    results = [my_function(item) for item in my_data]
    elapsed_time = time.time() - start_time
    
    print(f"‚úì Completed in {elapsed_time:.2f} seconds")
    print(f"  Processed {len(results)} items")
    print(f"  Average time per item: {elapsed_time/len(results)*1000:.2f}ms")

---
## üìä Quick Summary

View a summary of your results:

In [None]:
print("\n" + "="*70)
print("EXECUTION SUMMARY")
print("="*70)
print(f"Function: {my_function.__name__}")
print(f"Dataset size: {len(results)} items")
print(f"Execution mode: {'PARALLEL' if result.n_jobs > 1 else 'SERIAL'}")
if result.n_jobs > 1:
    print(f"Workers used: {result.n_jobs}")
    print(f"Chunksize: {result.chunksize}")
print(f"Total time: {elapsed_time:.2f} seconds")
print(f"Estimated speedup: {result.estimated_speedup:.2f}x")
print("="*70)

---
---

# üí° Example Use Cases

Below are some example scenarios where Amorsize is helpful:

## Example 1: Image Processing

In [None]:
# Example function for image processing
def process_image(image_path):
    """
    Simulate image processing.
    In real use, you might resize, apply filters, etc.
    """
    # Simulate expensive operation
    import time
    time.sleep(0.01)  # Simulate processing time
    return f"processed_{image_path}"

# Sample data
image_paths = [f"image_{i}.jpg" for i in range(100)]

# Optimize
result = optimize(process_image, image_paths)
print(f"Recommendation: {result.n_jobs} workers, chunksize={result.chunksize}")

## Example 2: Data Transformation

In [None]:
# Example function for data transformation
def transform_data(record):
    """
    Transform a data record.
    """
    # Simulate complex transformation
    result = {
        'id': record['id'],
        'value': record['value'] ** 2,
        'category': 'processed'
    }
    return result

# Sample data
records = [{'id': i, 'value': i * 10} for i in range(1000)]

# Optimize
result = optimize(transform_data, records)
print(f"Recommendation: {result.n_jobs} workers, chunksize={result.chunksize}")

## Example 3: Mathematical Computation

In [None]:
# Example function for mathematical computation
def compute_statistics(numbers):
    """
    Compute statistics on a list of numbers.
    """
    import math
    mean = sum(numbers) / len(numbers)
    variance = sum((x - mean) ** 2 for x in numbers) / len(numbers)
    std_dev = math.sqrt(variance)
    return {'mean': mean, 'std': std_dev}

# Sample data - list of lists
datasets = [[i + j for j in range(100)] for i in range(500)]

# Optimize
result = optimize(compute_statistics, datasets)
print(f"Recommendation: {result.n_jobs} workers, chunksize={result.chunksize}")

---
---

# üìö Advanced Options

Amorsize provides additional options for fine-tuning:

## Custom Sample Size

By default, Amorsize tests your function on 5 items. You can change this:

In [None]:
# Test with 10 items instead of 5
result = optimize(my_function, my_data, sample_size=10)
print(f"Tested with {10} sample items")

## Custom Chunk Duration Target

By default, Amorsize targets 0.2 seconds per chunk. You can adjust this:

In [None]:
# Target 0.5 seconds per chunk
result = optimize(my_function, my_data, target_chunk_duration=0.5)
print(f"Chunksize optimized for 0.5s per chunk: {result.chunksize}")

## Verbose Output

Get detailed information about the optimization process:

In [None]:
# Enable verbose output
result = optimize(my_function, my_data, verbose=True)

---
---

# ‚ö†Ô∏è Important Notes

## Function Requirements

Your function must be:
1. **Picklable** - Can be serialized by Python's pickle module
2. **Module-level** - Defined at the top level, not inside another function
3. **Single-argument** - Takes one data item at a time

## When Amorsize Recommends Serial Execution

Amorsize will recommend serial execution (`n_jobs=1`) when:
- Your function is very fast (< 1ms per item)
- Your dataset is very small
- The overhead of parallelization outweighs the benefits
- Your function cannot be pickled

This is a **good thing**! It prevents your code from running slower with parallelization.

## Performance Tips

1. **Make functions expensive enough**: If your function is too fast, parallelization adds overhead
2. **Use appropriate data sizes**: Very small datasets (< 100 items) rarely benefit from parallelization
3. **Avoid global state**: Each worker process has its own memory space
4. **Profile first**: Use Amorsize to test before committing to parallelization

---

# üéâ That's It!

You now know how to use Amorsize to optimize your parallel processing.

**Remember the simple workflow:**
1. Define your function
2. Prepare your data
3. Run `optimize(function, data)`
4. Apply the recommendations with `multiprocessing.Pool`

Happy parallel processing! üöÄ