# π Estimation with Monte Carlo methods
We demonstrate how to run Monte Carlo simulations with lithops over IBM Cloud Functions. This notebook contains an example of estimation the number π with Monte Carlo. The goal of this notebook is to demonstrate how IBM Cloud Functions can benefit Monte Carlo simulations and not how it can be done using lithops.<br>
A Monte Carlo algorithm would randomly place points in the square and use the percentage of randomized points inside of the circle to estimate the value of π
![pi](https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif)
Requirements to run this notebook:

* AWS Cloud or GCP account. 
* You will need to have at least one existing object storage bucket. 

# Step 1 - Install dependencies
Install dependencies

In [30]:
from time import time
from random import random
import logging
import sys

try:
    import lithops
except:
    %pip install -r requirements.txt
    import lithops

# you can modify logging level if needed
#logging.basicConfig(level=logging.INFO)

# Step 2 - Write Python code that implements Monte Carlo simulation 
Below is an example of Python code to demonstrate Monte Carlo model for estimate PI

'EstimatePI' is a Python class that we use to represent a single PI estimation. You may configure the following parameters:

MAP_INSTANCES - number of cloud functions invocations. Default is 100<br>
randomize_per_map - number of points to random in a single invocation. Default is 10,000,000

Our code contains two major Python methods:

def randomize_points(self,data=None) - a function to random number of points and return the percentage of points
    that inside the circle<br>
def process_in_circle_points(self, results, futures): - summarize results of all randomize_points
  executions (aka "reduce" in map-reduce paradigm)

In [31]:
MAP_INSTANCES = 1000


class EstimatePI:
    randomize_per_map = 10000000

    def __init__(self):
        self.total_randomize_points = MAP_INSTANCES * self.randomize_per_map

    def __str__(self):
        return "Total Randomize Points: {:,}".format(self.randomize_per_map * MAP_INSTANCES)

    @staticmethod
    def predicate():
        x = random()
        y = random()
        return (x ** 2) + (y ** 2) <= 1

    def randomize_points(self, data):
        in_circle = 0
        for _ in range(self.randomize_per_map):
            in_circle += self.predicate()
        return float(in_circle / self.randomize_per_map)

    def process_in_circle_points(self, results):
        in_circle_percent = 0
        for map_result in results:
            in_circle_percent += map_result
        estimate_PI = float(4 * (in_circle_percent / MAP_INSTANCES))
        return estimate_PI

# Step 3 - Configure access to your Cloud Storage and Cloud Functions

Configure access details to your AWS or other cloud provider.  'storage_bucket'  should point to some pre-existing bucket. This bucket will be used by Lithops to store intermediate results. All results will be stored in the folder `lithops.jobs`.

e.g. for GCP your `.lithops_config` should be similar to: 
    
    lithops:
        storage: gcp_storage
        backend: gcp_functions
        bucket: lithops-pipelines
    
    gcp:
        credentials_path : <PATH_TO_JSON_KEYS>
        region : <GCP_REGION>
    
    gcp_functions:
        region : <GCP_REGION>
    
    gcp_storage:
        region: <GCP_REGION>
        storage_bucket: <GCP_STORAGE_BUCKET>

For AWS your `.lithops_config` should be similar to: 
    
    lithops:
        storage: aws_s3
        backend: aws_lambda
    
    aws:
        access_key_id : <AWS_ACCESS_KEY_ID>
        secret_access_key : <AWS_SECRET_ACCESS_KEY> 
        
    aws_s3:
        storage_bucket: <S3_BUCKET>
        region_name : <REGION>
    
    aws_lambda:
        execution_role: <AWS_ROLE_ARN>
        region_name: <REGION>

# Step 4 - Execute simulation with Lithops over IBM Cloud Functions 

In [32]:
iterdata = [0] * MAP_INSTANCES # funcion + iterable --> length ( numero de elementos )
est_pi = EstimatePI()

start_time = time()
print("Monte Carlo simulation for estimating PI spawing over {} Cloud Function invocations".format(MAP_INSTANCES))
# obtain lithops executor
pw = lithops.FunctionExecutor(runtime_memory=2048)

# execute the code
pw.map_reduce(est_pi.randomize_points, iterdata, est_pi.process_in_circle_points, spawn_reducer=100) # iterdata :  spawn_reducer
#get results
result = pw.get_result()
elapsed = time()
print(str(est_pi))
print("Estimation of Pi: ", result)
print("\nCompleted in: " + str(elapsed - start_time) + " seconds")
# 20% de los maps lanza el reduce --> solapar entrada y salida --> para ir midiendo spawn_reducer --> cambiar  
#


2025-03-28 10:31:28,539 [INFO] config.py:139 -- Lithops v3.6.1.dev0 - Python3.12
2025-03-28 10:31:28,539 [INFO] localhost.py:39 -- Localhost storage client created
2025-03-28 10:31:28,539 [INFO] localhost.py:78 -- Localhost compute v2 client created
2025-03-28 10:31:28,549 [INFO] invokers.py:119 -- ExecutorID e85945-3 | JobID M000 - Selected Runtime: python - 2048MB
2025-03-28 10:31:28,625 [INFO] invokers.py:186 -- ExecutorID e85945-3 | JobID M000 - Starting function invocation: randomize_points() - Total: 1000 activations


Monte Carlo simulation for estimating PI spawing over 1000 Cloud Function invocations


2025-03-28 10:31:29,348 [INFO] invokers.py:225 -- ExecutorID e85945-3 | JobID M000 - View execution logs at /tmp/lithops-bigrobbin/logs/e85945-3-M000.log
2025-03-28 10:31:29,361 [INFO] wait.py:101 -- ExecutorID e85945-3 - Waiting for 1000 function activations to complete


    0%|          | 0/1000  

2025-03-28 10:33:54,058 [INFO] executors.py:618 -- ExecutorID e85945-3 - Cleaning temporary data
2025-03-28 10:33:54,061 [INFO] invokers.py:119 -- ExecutorID e85945-3 | JobID R000 - Selected Runtime: python - 2048MB


Exception: ExecutorID e85945-3 | JobID R000 - Total data exceeded maximum size of 4.0MiB

# multiple queries / values
2025-03-27 20:49:35,134 [INFO] config.py:139 -- Lithops v3.6.1.dev0 - Python3.12
2025-03-27 20:49:35,134 [INFO] localhost.py:39 -- Localhost storage client created
2025-03-27 20:49:35,134 [INFO] localhost.py:78 -- Localhost compute v2 client created
2025-03-27 20:49:35,153 [INFO] invokers.py:119 -- ExecutorID 5b7526-4 | JobID M000 - Selected Runtime: python - 2048MB
2025-03-27 20:49:35,235 [INFO] invokers.py:186 -- ExecutorID 5b7526-4 | JobID M000 - Starting function invocation: randomize_points() - Total: 1000 activations
Monte Carlo simulation for estimating PI spawing over 1000 Cloud Function invocations
2025-03-27 20:49:35,966 [INFO] invokers.py:225 -- ExecutorID 5b7526-4 | JobID M000 - View execution logs at /tmp/lithops-bigrobbin/logs/5b7526-4-M000.log
2025-03-27 20:49:35,977 [INFO] wait.py:105 -- ExecutorID 5b7526-4 - Waiting for 20% of 1000 function activations to complete
## 0%|          | 0/200

2025-03-27 20:50:05,731 [INFO] invokers.py:119 -- ExecutorID 5b7526-4 | JobID R000 - Selected Runtime: python - 2048MB
2025-03-27 20:50:05,770 [INFO] invokers.py:186 -- ExecutorID 5b7526-4 | JobID R000 - Starting function invocation: process_in_circle_points() - Total: 1 activations
2025-03-27 20:50:05,771 [INFO] invokers.py:225 -- ExecutorID 5b7526-4 | JobID R000 - View execution logs at /tmp/lithops-bigrobbin/logs/5b7526-4-R000.log
2025-03-27 20:50:05,772 [INFO] executors.py:494 -- ExecutorID 5b7526-4 - Getting results from 1 function activations
2025-03-27 20:50:05,779 [INFO] wait.py:101 -- ExecutorID 5b7526-4 - Waiting for 801 function activations to complete

## 0%|          | 0/801

Total Randomize Points: 10,000,000,000
Estimation of Pi:  3.1422579999999742

Completed in: 143.96522045135498 seconds

In [None]:
# Price in AWS
import numpy as np

stats = [f.stats for f in pw.futures]
stats2 = [f.stats for f in pw.futures] # save to understand 

mean_exec_time = np.mean([stat['worker_func_exec_time'] for stat in stats])

# Debug: Print the structure of worker_func_perf_energy to understand what's in it
if stats and 'worker_func_perf_energy' in stats[0]:
    print(f"Structure of worker_func_perf_energy: {stats[0]['worker_func_perf_energy']}")

# Handle worker_func_perf_energy as a dictionary
# If it's a dictionary, we need to extract a specific value or calculate an aggregate
try:
    # Option 1: If there's a specific key we want to extract from each dictionary
    # For example, if each dictionary has a 'total' or 'value' key
    if stats and 'worker_func_perf_energy' in stats[0] and isinstance(stats[0]['worker_func_perf_energy'], dict):
        # Try to find a numeric key in the dictionary
        sample_dict = stats[0]['worker_func_perf_energy']
        numeric_keys = [k for k, v in sample_dict.items() if isinstance(v, (int, float))]
        
        if numeric_keys:
            # Use the first numeric key found
            key_to_use = numeric_keys[0]
            worker_func_perf_energy = np.mean([stat['worker_func_perf_energy'][key_to_use] 
                                              for stat in stats 
                                              if 'worker_func_perf_energy' in stat 
                                              and key_to_use in stat['worker_func_perf_energy']])
            print(f"Using key '{key_to_use}' from worker_func_perf_energy dictionary")
        else:
            # If no numeric keys found, skip this calculation
            worker_func_perf_energy = "N/A (No numeric values found in dictionary)"
    else:
        # If it's not a dictionary or doesn't exist, skip this calculation
        worker_func_perf_energy = "N/A"
        
except Exception as e:
    print(f"Error processing worker_func_perf_energy: {e}")
    worker_func_perf_energy = "N/A (Error)"

print(f"Perf Energy: {worker_func_perf_energy}")
# print(f"Global Perf Energy: {pw.stats.get('worker_func_perf_energy', 'N/A')}")

gbxms_price = 0.0000000167
sum_total_time = sum([stat['worker_exec_time'] for stat in stats]) * 1000
price = gbxms_price * sum_total_time * 0.256  # Price GB/ms * sum of times in ms * 0.256 GB runtime

print(f'Experiment total price is {round(price, 5)} USD')

Structure of worker_func_perf_energy: {'pkg': 706.6, 'cores': 682.73, 'total': 1389.33}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 645.9909634551495
Experiment total price is 0.00441 USD


for MAP_INSTANCES = 50 
Structure of worker_func_perf_energy: {'pkg': 606.06, 'cores': 579.64, 'total': 1185.6999999999998}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 494.1625490196078
Experiment total price is 0.00062 USD

for MAP_INSTANCES = 100
Structure of worker_func_perf_energy: {'pkg': 574.76, 'cores': 557.8, 'total': 1132.56}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 611.191188118812
Experiment total price is 0.00131 USD


for MAP_INSTANCES = 200
Structure of worker_func_perf_energy: {'pkg': 630.99, 'cores': 604.2, 'total': 1235.19}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 562.0333333333333
Experiment total price is 0.00759 USD


for MAP_INSTANCES = 300
Structure of worker_func_perf_energy: {'pkg': 630.99, 'cores': 604.2, 'total': 1235.19}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 562.0333333333333
Experiment total price is 0.00759 USD

for MAP_INSTANCES = 500
Structure of worker_func_perf_energy: {'pkg': 630.99, 'cores': 604.2, 'total': 1235.19}
Using key 'pkg' from worker_func_perf_energy dictionary
Perf Energy: 562.0333333333333
Experiment total price is 0.00759 USD

for MAP_INSTANCES = 5000
Exception: ExecutorID 5b7526-1 | JobID R000 - Total data exceeded maximum size of 4.0MiB

In [None]:
# problems where are multiple splits 
if stats and 'worker_func_perf_energy' in stats[1]:
    print(f"Structure of worker_func_perf_energy: {stats[1]['worker_func_perf_energy']}")


print(stats)
# Handle worker_func_perf_energy as a dictionary
# If it's a dictionary, we need to extract a specific value or calculate an aggregate
try:
    # Option 1: If there's a specific key we want to extract from each dictionary
    # For example, if each dictionary has a 'total' or 'value' key
    if stats and 'worker_func_perf_energy' in stats[1] and isinstance(stats[0]['worker_func_perf_energy'], dict):
        # Try to find a numeric key in the dictionary
        sample_dict = stats[0]['worker_func_perf_energy']
        numeric_keys = [k for k, v in sample_dict.items() if isinstance(v, (int, float))]
        
        if numeric_keys:
            # Use the first numeric key found
            key_to_use = numeric_keys[0]
            worker_func_perf_energy = np.sum([stat['worker_func_perf_energy'][key_to_use] 
                                              for stat in stats 
                                              if 'worker_func_perf_energy' in stat 
                                              and key_to_use in stat['worker_func_perf_energy']])
            print(f"Using key '{key_to_use}' from worker_func_perf_energy dictionary")
        else:
            # If no numeric keys found, skip this calculation
            worker_func_perf_energy = "N/A (No numeric values found in dictionary)"
    else:
        # If it's not a dictionary or doesn't exist, skip this calculation
        worker_func_perf_energy = "N/A"
        
except Exception as e:
    print(f"Error processing worker_func_perf_energy: {e}")
    worker_func_perf_energy = "N/A (Error)"

print(f"Perf Energy: {worker_func_perf_energy}")
# print(f"Global Perf Energy: {pw.stats.get('worker_func_perf_energy', 'N/A')}")

Structure of worker_func_perf_energy: {'pkg': 627.71, 'cores': 608.57, 'total': 1236.2800000000002}
[{'host_job_create_tstamp': 1743152368.642506, 'host_job_serialize_time': 0.021225, 'func_data_size_bytes': 7200, 'func_module_size_bytes': 6374, 'host_func_upload_time': 0.000107, 'host_data_upload_time': 2.5e-05, 'host_job_created_time': 0.021755, 'host_status_done_tstamp': 1743152372.4829373, 'host_status_query_count': 1, 'worker_start_tstamp': 1743152369.2235286, 'host_submit_tstamp': 1743152368.664715, 'worker_cold_start': True, 'worker_func_cpu_usage': [98.0, 99.7, 99.3, 100.0, 99.0, 100.0, 100.0, 99.7, 99.0, 99.3, 100.0, 98.7, 99.7, 100.0, 100.0, 99.0, 99.7, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], 'worker_func_cpu_system_time': 932.14, 'worker_func_cpu_user_time': 4630.63, 'worker_func_avg_cpu_usage': 99.68214285714285, 'worker_func_energy_consumption': 461591.1211785714, 'worker_func_sent_net_io': 525320, 'worker_func_recv_net_io': 749688, 'w

In [None]:

for stat_test in stats2:
    print(stat_test)
    print("\n")


print(stats2.__len__())
print(pw.futures.__len__())


{'host_job_create_tstamp': 1743152368.642506, 'host_job_serialize_time': 0.021225, 'func_data_size_bytes': 7200, 'func_module_size_bytes': 6374, 'host_func_upload_time': 0.000107, 'host_data_upload_time': 2.5e-05, 'host_job_created_time': 0.021755, 'host_status_done_tstamp': 1743152372.4829373, 'host_status_query_count': 1, 'worker_start_tstamp': 1743152369.2235286, 'host_submit_tstamp': 1743152368.664715, 'worker_cold_start': True, 'worker_func_cpu_usage': [98.0, 99.7, 99.3, 100.0, 99.0, 100.0, 100.0, 99.7, 99.0, 99.3, 100.0, 98.7, 99.7, 100.0, 100.0, 99.0, 99.7, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0], 'worker_func_cpu_system_time': 932.14, 'worker_func_cpu_user_time': 4630.63, 'worker_func_avg_cpu_usage': 99.68214285714285, 'worker_func_energy_consumption': 461591.1211785714, 'worker_func_sent_net_io': 525320, 'worker_func_recv_net_io': 749688, 'worker_func_rss': 50065408, 'worker_func_vms': 1065660416, 'worker_func_uss': 31371264, 'worker_func_e

In [None]:
import pandas as pd
import numpy as np


df = pd.DataFrame(stats2)

# Display the table
print(df)

# Optional: If you want to see more details
print("\nDetailed PKG Energy Information:")
print(f"Total PKG Energy: {df['worker_func_perf_energy_pkg'].sum()}")
print(f"Average PKG Energy: {df['worker_func_perf_energy_pkg'].mean()}")

     host_job_create_tstamp  host_job_serialize_time  func_data_size_bytes  \
0              1.743152e+09                 0.021225                  7200   
1              1.743152e+09                 0.021225                  7200   
2              1.743152e+09                 0.021225                  7200   
3              1.743152e+09                 0.021225                  7200   
4              1.743152e+09                 0.021225                  7200   
..                      ...                      ...                   ...   
296            1.743152e+09                 0.021225                  7200   
297            1.743152e+09                 0.021225                  7200   
298            1.743152e+09                 0.021225                  7200   
299            1.743152e+09                 0.021225                  7200   
300            1.743152e+09                 0.003791                602610   

     func_module_size_bytes  host_func_upload_time  host_data_u

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

 
df = pd.DataFrame(stats2)

# Create the line graph
plt.figure(figsize=(10, 6))
plt.plot(df['Worker_ID'], df['worker_func_perf_energy_pkg'], marker='o', linestyle='-', linewidth=2, markersize=10)
plt.title('PKG Energy by Worker', fontsize=16)
plt.xlabel('Worker ID', fontsize=12)
plt.ylabel('PKG Energy Value', fontsize=12)
plt.grid(True, linestyle='--', alpha=0.7)

# Add value labels
for x, y in zip(df['Worker_ID'], df['worker_func_perf_energy_pkg']):
    plt.text(x, y, f'{y}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

KeyError: 'Worker_ID'

<Figure size 1000x600 with 0 Axes>