### Optimization

### Strategy #1: A first very bad idea solution: Random search

In [1]:
import numpy as np

# Define the search space
search_space = {
    'learning_rate': np.linspace(0.01, 0.1, 10),
    'batch_size': [16, 32, 64, 128],
    'num_hidden_units': [32, 64, 128, 256],
    'dropout_rate': np.linspace(0.0, 0.5, 6)
}

num_iterations = 100

def evaluate_model(params):
    # Here, we can train and test a model or perform any other evaluation task
    score = np.random.random()  # Random score for demonstration purposes
    return score

# Perform random search
best_params = None
best_score = float('-inf') ## Python assigns the highest possible float value

for _ in range(num_iterations):
    # Sample random hyperparameters
    params = {param: np.random.choice(values) for param, values in search_space.items()}

    # Evaluate the performance using the current hyperparameters
    score = evaluate_model(params)

    # Update the best hyperparameters if the score is improved
    if score > best_score:
        best_params = params
        best_score = score

# Print the best hyperparameters and the corresponding score
print("Best hyperparameters:", best_params)
print("Best score:", best_score)

Best hyperparameters: {'learning_rate': 0.1, 'batch_size': 32, 'num_hidden_units': 256, 'dropout_rate': 0.5}
Best score: 0.978147424396756


### Strategy #2: Random Local Search

In [3]:
import random
'''
1.Initialize the best solution and its value.
2. Generate a random neighbor
3. Update the best solution if the neighbor is better
'''
def objective_function(x):
    return -(x**2) + 4

def random_neighbor(solution, search_range):
    # Generate a random neighbor within the search range
    neighbor = solution + random.uniform(-search_range, search_range)
    return neighbor

def random_local_search(search_range, max_iterations):
    # step-1
    best_solution = random.uniform(-search_range, search_range)
    best_value = objective_function(best_solution)

    # Perform random local search
    iterations = 0
    while iterations < max_iterations:
        # step-2
        neighbor = random_neighbor(best_solution, search_range)
        neighbor_value = objective_function(neighbor)

        # step-3
        if neighbor_value > best_value:
            best_solution = neighbor
            best_value = neighbor_value

        iterations += 1

    return best_solution, best_value

# Set the search range and maximum number of iterations
search_range = 10
max_iterations = 100

# Run the random local search algorithm
best_solution, best_value = random_local_search(search_range, max_iterations)

# Print the result
print("Best Solution:", best_solution)
print("Best Value:", best_value)

Best Solution: -0.10330336119456618
Best Value: 3.989328415565905


## First Order Optimization

#### Second Order Optimization
Second-order optimization methods, such as Newton's method and variants like the Gauss-Newton method or the Levenberg-Marquardt algorithm, use the second-order derivatives of the loss function with respect to the model parameters to update the parameters. These methods provide more information about the curvature of the loss function compared to first-order methods, which only use the gradient information.

    The Newton's method requires the inversion of the Hessian matrix, which can be computationally expensive and may not always be feasible for large-scale problems. 

1. Newton's method or Newton-Raphson method:
`θ_new = θ_old - H^(-1) * ∇L(θ_old)`
2.  the Gauss-Newton method:
3. Levenberg-Marquardt algorithm:


**In practice,** it is currently not common to see L-BFGS or similar second-order methods applied to large-scale Deep Learning and Convolutional Neural Networks. Instead, SGD variants based on (Nesterov’s) momentum are more standard because they are simpler and scale more easily.