Skip to content

Python Azure Functions sample: profile CPU-bound Fibonacci with cProfile, then speed it up using ProcessPoolExecutor parallel execution.

Notifications You must be signed in to change notification settings

Azure-Samples/azure-functions-python-cprofile-processpool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Azure Functions CPU-bound Demo

Demonstrates how to:

  1. Use cProfile to identify a CPU-bound bottleneck in naive recursive Fibonacci computations.
  2. Accelerate independent heavy computations with ProcessPoolExecutor to achieve speedup by leveraging multiple processes (bypassing the GIL).

The sample intentionally keeps code simple and self-contained while illustrating a diagnostic-to-mitigation workflow:

Profile the serial baseline -> discover hot function (fib) -> mitigate with parallel process execution -> measure speedup.


Why This Matters

Python's Global Interpreter Lock (GIL) prevents true CPU-bound parallelism with threads. When you have multiple independent heavy computations, distributing them across separate processes allows concurrent execution on multiple cores.

cProfile tells us where time is spent; ProcessPoolExecutor changes how we schedule work to reduce total wall-clock time.


Endpoints

Route Purpose
/api/serial_profile_trigger Serial execution of several Fibonacci tasks under cProfile; returns top cumulative-time stats.
/api/serial Serial execution only (no profiling). Baseline duration.
/api/parallel Parallel execution via ProcessPoolExecutor. Measures duration including process spawn overhead.
/api/compare Runs both serial and parallel paths in one call; returns speedup ratio.

Default Fibonacci inputs: [34, 35, 36, 35] (tuned so each call is noticeably expensive but finishes quickly on a modern machine). Override with query parameter: ?nums=33,34,35 etc.


How cProfile Is Used

serial_profile_trigger wraps the entire serial loop in a cProfile.Profile() context:

profiler = cProfile.Profile()
profiler.enable()
# serial work
profiler.disable()

We then sort by cumulative time to surface the deepest hotspots:

pstats.Stats(profiler).sort_stats("cumtime").print_stats(15)

You should observe the majority of cumulative time in fib due to its naive recursion.

When Not to Use cProfile

  • Ultra-short (sub-millisecond) functions: results dominated by profiler overhead.
  • Highly IO-bound code: use asyncio or specialized tracing tools.

How ProcessPoolExecutor Helps

After profiling reveals repeated independent Fibonacci calls, we parallelize them:

with ProcessPoolExecutor() as executor:
    values = list(executor.map(fib, nums))

Each process runs its own Python interpreter; the GIL is not shared, so true parallel CPU execution occurs. Good when:

  • Tasks are CPU-bound and independent.
  • Result serialization cost (pickling) is small compared to compute time.

Not ideal when:

  • Tasks are extremely fast (overhead outweighs gain).
  • Shared mutable state required (design for message passing instead).

Running the Sample Locally

Prerequisites:

  • Python (recommended 3.11 or 3.10)
  • Azure Functions Core Tools v4

Create local.settings.json

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureWebJobsStorage": "UseDevelopmentStorage=true"
  }
}

Install dependencies:

pip install -r requirements.txt

Start the Functions host:

func start

Invoke endpoints (PowerShell examples):

# Serial profile (profiling + results)
Invoke-WebRequest http://localhost:7071/api/serial_profile_trigger | Select-Object -ExpandProperty Content

# Serial only
Invoke-WebRequest http://localhost:7071/api/serial | Select-Object -ExpandProperty Content

# Parallel only
Invoke-WebRequest http://localhost:7071/api/parallel | Select-Object -ExpandProperty Content

# Compare serial vs parallel (includes speedup)
Invoke-WebRequest http://localhost:7071/api/compare | Select-Object -ExpandProperty Content

With curl:

curl http://localhost:7071/api/serial_profile_trigger
curl http://localhost:7071/api/compare

Expected Output (Profiling + result)

{
  "mode": "serial_profile",
  "nums": [
    34,
    35,
    36,
    35
  ],
  "results": [
    {
      "n": 34,
      "fib": 5702887
    },
    {
      "n": 35,
      "fib": 9227465
    },
    {
      "n": 36,
      "fib": 14930352
    },
    {
      "n": 35,
      "fib": 9227465
    }
  ],
  "started_utc": "2025-10-29T00:14:52.060666Z",
  "ended_utc": "2025-10-29T00:15:24.062489Z",
  "duration_seconds": 32.0018,
  "profiling_top": [
    "         126494861 function calls (2126 primitive calls) in 32.002 seconds",
    "",
    "   Ordered by: cumulative time",
    "   List reduced from 46 to 15 due to restriction <15>",
    "",
    "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)",
    "      256    0.002    0.000    0.013    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:327(wait)",
    "      255    0.001    0.000    0.001    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_channel.py:954(_response_ready)",
    "      256    0.001    0.000    0.001    0.000 {method '_acquire_restore' of '_thread.RLock' objects}",
    "      256    0.000    0.000    0.000    0.000 {method '_release_save' of '_thread.RLock' objects}",
    "      256    0.000    0.000    0.000    0.000 {method 'remove' of 'collections.deque' objects}",
    "      257    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}",
    "        1    0.000    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_common.py:121(wait)",
    "    256/2    0.001    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_common.py:111(_wait_once)",
    "      258    0.000    0.000    0.000    0.000 {method '_is_owned' of '_thread.RLock' objects}",
    "      257    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}",
    "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}",
    "        1    0.000    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_channel.py:238(handle_event)",
    "        1    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\futures.py:406(wrap_future)",
    "        2    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:428(notify_all)",
    "        2    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:398(notify)",
    "",
    ""
  ]
}

Expected Output (Compare Example)

{
  "input_nums": [34, 35, 36, 35],
  "serial": {"duration_sec": 5.42, "results": [{"n":34,"fib":5702887}, ...]},
  "parallel": {"duration_sec": 2.11, "results": [{"n":34,"fib":5702887}, ...]},
  "speedup": 2.57,
  "note": "First parallel call may include process spawn overhead."
}

(Times will vary by CPU and current system load.)


Tuning & Experimentation

Adjustment Effect
Increase numbers (e.g. 37,38) More CPU time; clearer parallel benefit but risk longer cold start/timeouts.
Fewer numbers Less aggregate work; parallel overhead may dominate.
Reorder numbers No practical effect (all independent).
Memoize fib Eliminates cost after first call; reduces usefulness of demo.

Warm-Up Tip

First parallel call includes process creation. Run /api/parallel once before benchmarking for a steadier comparison.


Limits & Production Considerations

  • Recursive Fibonacci is pedagogical only. Replace with an actually needed computation or optimize (iterative, memoization, fast doubling).
  • Process pools are created per request here for clarity. In production, consider a reusable process pool (long-lived) to amortize spawn cost.
  • Avoid very large inputs that may exceed Function time limits or memory constraints.
  • Profiling overhead is non-trivial; use profiling endpoints only in lower environments or gated scenarios.

Extension Ideas

  1. Add an endpoint using functools.lru_cache to show algorithmic speedup vs parallelism.
  2. Add simple tracing of per-task durations.
  3. Implement a reusable, module-level process pool with lazy initialization.
  4. Provide an endpoint to dump raw profiling stats as downloadable text.

Troubleshooting

Symptom Possible Cause Fix
No logs appear Host log level filters INFO Add logging section to host.json or raise to WARNING temporarily.
Parallel slower than serial Too few tasks / small n Increase n values moderately (34–36).
Timeout errors Inputs too large Reduce largest n or split calls.
Import error azure.functions Missing package pip install azure-functions.

Attribution

This sample focuses on clarity over micro-optimizations. It illustrates the journey from measurement (profiling) to improvement (parallelization) for CPU-bound Python workloads in Azure Functions.

Enjoy experimenting and adapting to your own workload!

About

Python Azure Functions sample: profile CPU-bound Fibonacci with cProfile, then speed it up using ProcessPoolExecutor parallel execution.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages