Python Azure Functions CPU-bound Demo

Demonstrates how to:

Use cProfile to identify a CPU-bound bottleneck in naive recursive Fibonacci computations.
Accelerate independent heavy computations with ProcessPoolExecutor to achieve speedup by leveraging multiple processes (bypassing the GIL).

The sample intentionally keeps code simple and self-contained while illustrating a diagnostic-to-mitigation workflow:

Profile the serial baseline -> discover hot function (fib) -> mitigate with parallel process execution -> measure speedup.

Why This Matters

Python's Global Interpreter Lock (GIL) prevents true CPU-bound parallelism with threads. When you have multiple independent heavy computations, distributing them across separate processes allows concurrent execution on multiple cores.

cProfile tells us where time is spent; ProcessPoolExecutor changes how we schedule work to reduce total wall-clock time.

Endpoints

Route	Purpose
`/api/serial_profile_trigger`	Serial execution of several Fibonacci tasks under `cProfile`; returns top cumulative-time stats.
`/api/serial`	Serial execution only (no profiling). Baseline duration.
`/api/parallel`	Parallel execution via `ProcessPoolExecutor`. Measures duration including process spawn overhead.
`/api/compare`	Runs both serial and parallel paths in one call; returns speedup ratio.

Default Fibonacci inputs: [34, 35, 36, 35] (tuned so each call is noticeably expensive but finishes quickly on a modern machine). Override with query parameter: ?nums=33,34,35 etc.

How cProfile Is Used

serial_profile_trigger wraps the entire serial loop in a cProfile.Profile() context:

profiler = cProfile.Profile()
profiler.enable()
# serial work
profiler.disable()

We then sort by cumulative time to surface the deepest hotspots:

pstats.Stats(profiler).sort_stats("cumtime").print_stats(15)

You should observe the majority of cumulative time in fib due to its naive recursion.

When Not to Use cProfile

Ultra-short (sub-millisecond) functions: results dominated by profiler overhead.
Highly IO-bound code: use asyncio or specialized tracing tools.

How ProcessPoolExecutor Helps

After profiling reveals repeated independent Fibonacci calls, we parallelize them:

with ProcessPoolExecutor() as executor:
    values = list(executor.map(fib, nums))

Each process runs its own Python interpreter; the GIL is not shared, so true parallel CPU execution occurs. Good when:

Tasks are CPU-bound and independent.
Result serialization cost (pickling) is small compared to compute time.

Not ideal when:

Tasks are extremely fast (overhead outweighs gain).
Shared mutable state required (design for message passing instead).

Running the Sample Locally

Prerequisites:

Python (recommended 3.11 or 3.10)
Azure Functions Core Tools v4

Create local.settings.json

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "python",
    "AzureWebJobsStorage": "UseDevelopmentStorage=true"
  }
}

Install dependencies:

pip install -r requirements.txt

Start the Functions host:

func start

Invoke endpoints (PowerShell examples):

# Serial profile (profiling + results)
Invoke-WebRequest http://localhost:7071/api/serial_profile_trigger | Select-Object -ExpandProperty Content

# Serial only
Invoke-WebRequest http://localhost:7071/api/serial | Select-Object -ExpandProperty Content

# Parallel only
Invoke-WebRequest http://localhost:7071/api/parallel | Select-Object -ExpandProperty Content

# Compare serial vs parallel (includes speedup)
Invoke-WebRequest http://localhost:7071/api/compare | Select-Object -ExpandProperty Content

With curl:

curl http://localhost:7071/api/serial_profile_trigger
curl http://localhost:7071/api/compare

Expected Output (Profiling + result)

{
  "mode": "serial_profile",
  "nums": [
    34,
    35,
    36,
    35
  ],
  "results": [
    {
      "n": 34,
      "fib": 5702887
    },
    {
      "n": 35,
      "fib": 9227465
    },
    {
      "n": 36,
      "fib": 14930352
    },
    {
      "n": 35,
      "fib": 9227465
    }
  ],
  "started_utc": "2025-10-29T00:14:52.060666Z",
  "ended_utc": "2025-10-29T00:15:24.062489Z",
  "duration_seconds": 32.0018,
  "profiling_top": [
    "         126494861 function calls (2126 primitive calls) in 32.002 seconds",
    "",
    "   Ordered by: cumulative time",
    "   List reduced from 46 to 15 due to restriction <15>",
    "",
    "   ncalls  tottime  percall  cumtime  percall filename:lineno(function)",
    "      256    0.002    0.000    0.013    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:327(wait)",
    "      255    0.001    0.000    0.001    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_channel.py:954(_response_ready)",
    "      256    0.001    0.000    0.001    0.000 {method '_acquire_restore' of '_thread.RLock' objects}",
    "      256    0.000    0.000    0.000    0.000 {method '_release_save' of '_thread.RLock' objects}",
    "      256    0.000    0.000    0.000    0.000 {method 'remove' of 'collections.deque' objects}",
    "      257    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}",
    "        1    0.000    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_common.py:121(wait)",
    "    256/2    0.001    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_common.py:111(_wait_once)",
    "      258    0.000    0.000    0.000    0.000 {method '_is_owned' of '_thread.RLock' objects}",
    "      257    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}",
    "        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}",
    "        1    0.000    0.000    0.000    0.000 C:\\Program Files\\Microsoft\\Azure Functions Core Tools\\workers\\python\\3.13\\WINDOWS\\X64\\grpc\\_channel.py:238(handle_event)",
    "        1    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\asyncio\\futures.py:406(wrap_future)",
    "        2    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:428(notify_all)",
    "        2    0.000    0.000    0.000    0.000 C:\\Users\\tsushi\\AppData\\Local\\Programs\\Python\\Python313\\Lib\\threading.py:398(notify)",
    "",
    ""
  ]
}

Expected Output (Compare Example)

{
  "input_nums": [34, 35, 36, 35],
  "serial": {"duration_sec": 5.42, "results": [{"n":34,"fib":5702887}, ...]},
  "parallel": {"duration_sec": 2.11, "results": [{"n":34,"fib":5702887}, ...]},
  "speedup": 2.57,
  "note": "First parallel call may include process spawn overhead."
}

(Times will vary by CPU and current system load.)

Tuning & Experimentation

Adjustment	Effect
Increase numbers (e.g. 37,38)	More CPU time; clearer parallel benefit but risk longer cold start/timeouts.
Fewer numbers	Less aggregate work; parallel overhead may dominate.
Reorder numbers	No practical effect (all independent).
Memoize fib	Eliminates cost after first call; reduces usefulness of demo.

Warm-Up Tip

First parallel call includes process creation. Run /api/parallel once before benchmarking for a steadier comparison.

Limits & Production Considerations

Recursive Fibonacci is pedagogical only. Replace with an actually needed computation or optimize (iterative, memoization, fast doubling).
Process pools are created per request here for clarity. In production, consider a reusable process pool (long-lived) to amortize spawn cost.
Avoid very large inputs that may exceed Function time limits or memory constraints.
Profiling overhead is non-trivial; use profiling endpoints only in lower environments or gated scenarios.

Extension Ideas

Add an endpoint using functools.lru_cache to show algorithmic speedup vs parallelism.
Add simple tracing of per-task durations.
Implement a reusable, module-level process pool with lazy initialization.
Provide an endpoint to dump raw profiling stats as downloadable text.

Troubleshooting

Symptom	Possible Cause	Fix
No logs appear	Host log level filters INFO	Add `logging` section to `host.json` or raise to WARNING temporarily.
Parallel slower than serial	Too few tasks / small n	Increase n values moderately (34–36).
Timeout errors	Inputs too large	Reduce largest n or split calls.
Import error `azure.functions`	Missing package	`pip install azure-functions`.

Attribution

This sample focuses on clarity over micro-optimizations. It illustrates the journey from measurement (profiling) to improvement (parallelization) for CPU-bound Python workloads in Azure Functions.

Enjoy experimenting and adapting to your own workload!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md
function_app.py		function_app.py
host.json		host.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python Azure Functions CPU-bound Demo

Why This Matters

Endpoints

How cProfile Is Used

When Not to Use cProfile

How ProcessPoolExecutor Helps

Running the Sample Locally

Expected Output (Profiling + result)

Expected Output (Compare Example)

Tuning & Experimentation

Warm-Up Tip

Limits & Production Considerations

Extension Ideas

Troubleshooting

Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Azure-Samples/azure-functions-python-cprofile-processpool

Folders and files

Latest commit

History

Repository files navigation

Python Azure Functions CPU-bound Demo

Why This Matters

Endpoints

How cProfile Is Used

When Not to Use cProfile

How ProcessPoolExecutor Helps

Running the Sample Locally

Expected Output (Profiling + result)

Expected Output (Compare Example)

Tuning & Experimentation

Warm-Up Tip

Limits & Production Considerations

Extension Ideas

Troubleshooting

Attribution

About

Resources

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages