# Things to consider when developing an app

This is an experiment to benchmark 3 algos

The problem is about graphs. The goal is to found that every node is connected to at least two edges. Every edge has a cost. The goal is to minimize the cost of the edges.

## What did I need for this app?

### 1. Benchmark system

Params needed:

- **Algoritihms**: list of algorithms to test
- **Input sizes**: list of **n** number of nodes to test 
- **Repetitions**: number of repetitions to test
- **Output**: benchmark results
  - **cost** = cost of the edges
  - **time** = time to run the algorithm
  - **space** = space to run the algorithm
 
Outcomes needed:
- Cost
  - function to calculate the value ´evaluate_solution´
- Temporal Complexity
  - Package: ´´timeit´´
- Space Complexity
  - Package: ´´tracemalloc´´ or ´´memory_profiler´´


### 2. Algorithms

**Preconditions**

Prepare a function to generate a graph.
- Params:
  - **n**: number of nodes
  - **min_weight**: minimum weight of the edges
  - **max_weight**: maximum weight of the edges

With ´networkx´ library, we can generate full connected graphs to then apply the algorithms.

**Implement the following algorithms**

- Brute force
- Heuristic based in Kruskal algorithm
- Metaheuristic based in simulated annealing

**How to manage a graph to do algorithm operations?**

Use ´´networkx´´ library for this. This way I have the data structure to manage the graph and the algorithms to work with it.

Moreover, this library have **minimum spanning tree algorithm**, which is useful for the heuristic algorithm and the simulated annealing algorithm. This algorithm used Kruskal algorithm to find the minimum spanning tree.


### 3. Results Analysis

We need to compare the results of the algorithms.

We can use plots, dicts or df to show the results.

Libraries:
- ´´matplotlib´´
- ´´pandas´´
- ´´numpy´´
- ´´seaborn´´

The results to compare for each number of node input are:
- Solution cost 
- Solution reliability???
- Time Complexity
- Space Complexity

Because each algorithm run with 3 different input sized and each of that runs are repeated, we can calculate:
- Average
- Min
- Max
Of the results.




# Prototyping


In [None]:
# libraries used
import networkx as nx
import numpy as np
import timeit
import tracemalloc
import matplotlib.pyplot as plt
#scipy needed for nerworkx

## networkx example

In [None]:

def generate_random_graph(num_nodes, num_edges):
    return nx.gnm_random_graph(num_nodes, num_edges)

# Example usage:
test_graph = generate_random_graph(4, 2)

# print the graph
print(test_graph.edges())


## Random graph generation with networkx

In [None]:

# Specific problem graph generator
def generate_random_graph(num_nodes: int, min_cost: int, max_cost: int):
    """
    Undirected graph generator, where the graph is fully connected
    and the weight of the edges are random cost and a reliability value.
    Parameters:
    ----------
    num_nodes: int
        Number of nodes in the graph
    min_cost: int
        Minimum value of the cost of the edges
    max_cost: int
        Maximum value of the cost of the edges
    """
    # num_edges to have a full connected graph
    num_edges = num_nodes * (num_nodes - 1) // 2
    # values of weights are random integers between 1 and 100
    graph = nx.gnm_random_graph(num_nodes, num_edges)
    for u, v in graph.edges:
        # cost and reliability of each edge connection beetween nodes
        graph[u][v]['values'] = {
            "cost": np.random.randint(min_cost, max_cost),
            "reliability": np.random.uniform(0, 1),
        }
    return graph

In [None]:
# testing the function
base_graph = generate_random_graph(num_nodes=5, min_cost=1, max_cost=20)

# print matrix with value tuples
print("Graph Nodes:")
print(base_graph.nodes)


print("\nGraph Edges with Attributes:")
for u, v, attr in base_graph.edges(data=True):
    print(f"({u}, {v}) -> cost: {attr['values']['cost']}, reliability: {attr['values']['reliability']:.2f}")

In [None]:
def evaluate_solution(graph, edges):
    # dummy solution with the cost of all edges
    return sum(graph[u][v]['values']['cost'] for u, v in edges)

## Benchmark system

### Sample size calculation
To compute the ideal sample por each algorithm with a given input size, we need to calculate the number of repetitions to get a good sample.

#### Option size 1 - 90% confidence and 10% error
With 90% confidence and 10% error, we can calculate the sample size with the following formula:
- z = 1.645: z-score for 90% confidence
- e = 0.1  : error
- p = 0.5  : population proportion
- n        : sample size

n = (z^2 * p * (1-p)) / e^2

n = 67,6

#### Option size 2 - 80% confidence and 20% error
With 80% confidence and 20% error, we can calculate the sample size with the following formula:
- z = 1.282: z-score for 80% confidence
- e = 0.2  : error
- p = 0.5  : population proportion
- n        : sample size

n = (z^2 * p * (1-p)) / e^2

n = 10,3


In [None]:
def evaluate_solution(graph, edges):
    # dummy solution with the cost of all edges
    return sum(graph[u][v]['values']['cost'] for u, v in edges)

In [None]:
# Dummy algorithms

import time

def brute_force(graph):
    # Dummy implementation
    # sleep between 2 to 4 seconds
    time.sleep(np.random.randint(2, 4))
    return list(graph.edges)

def kruskal_heuristic(graph):
    # Dummy implementation
    time.sleep(np.random.randint(1, 2))
    return list(graph.edges)

def simulated_annealing(graph: nx.Graph, max_iter=100, minimal_temp=20.1, cooling_rate=0.99):
    """
    Simulated Annealing algorithm to solve Redundant Connection Problem.
    If the max_iter is reached the solution is returned.
    Parameters
    ----------
    graph : nx.Graph
        Input graph
    max_iter : int
        Maximum number of iterations
    minimal_temp : float
        Minimal temperature


    """
    # Dummy implementation
    time.sleep(np.random.randint(1, 2))
    return list(graph.edges)

In [None]:
# Benchmark wrapper function
def benchmark_algorithm(algorithm, graph: nx.graph, number_of_runs=67):
    """
    Compute the benchmark results for a given algorithm with n number of nodes
    and r number_of_runs
    Parameters
    ----------
    algorithm : function
        The algorithm to benchmark
    number_of_nodes : list
        List of integers representing the number of nodes to benchmark
    number_of_runs : int
        Number of runs to perform for each benchmark
        This is used to compute a average of time and memory usage.
    """
    time_results = []
    memory_results = []
    utility_results = []

    for _ in range(number_of_runs):
        # Track time using timeit
        timer = timeit.Timer(lambda: algorithm(graph))
        exec_time = timer.timeit(number=1)

        # Track memory using tracemalloc
        tracemalloc.start()
        result = algorithm(graph)
        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()

        # Evaluate the utility (e.g., total cost, reliability)
        ## evaluate_solution is a dummy function replace for the real one
        utility = evaluate_solution(graph, result)

        # Collect results
        time_results.append(exec_time)
        memory_results.append(peak / 10**6)  # Convert to MB
        utility_results.append(utility)

    return time_results, memory_results, utility_results

In [None]:
# Main function to run the benchmark
def run_benchmarks():

    algorithms = {
        "Brute Force": brute_force,
        "Kruskal Heuristic": kruskal_heuristic,
        "Simulated Annealing": simulated_annealing,
    }

    # Input control variables
    number_of_nodes = [20, 40, 60]
    min_cost = 1
    max_cost = 20


    #for the same input size we use the same graph to test the algorithms
    for size in number_of_nodes:
        graph = generate_random_graph(size, min_cost, max_cost)

        results = {}

        # Print the size tested
        print(f"Testing with {size} nodes")

        for name, algorithm in algorithms.items():
            time_results, memory_results, utility_results = benchmark_algorithm(
                    algorithm, graph, number_of_runs=5)

            results[name] = {
                "time": time_results,
                "memory": memory_results,
                "utility": utility_results,
            }

            # Print results
            print(f"Results for {name}:")
            print("Time (s):", time_results)
            print(f"\tMean (s): {np.mean(time_results)}. Min (s): {np.min(time_results)}. Max (s): {np.max(time_results)}")
            print("Memory (MB):", memory_results)
            print("Utility (Total Cost):", utility_results)
            print()




In [None]:

# Run the benchmark
run_benchmarks()