# Randomized optimization

Plaigiarism note: I partially took this course in 2020 so some of the analysis and text is repeated.

mlrose procedure:

1. Define a fitness function
- This is the function we want to maximize or minimize, and is used to evaluate the fitness of a state vector.
2. Define an optimization problem object
3. Select and run a randomized optimization algorithm

mlrose fitness functions: https://mlrose.readthedocs.io/en/stable/source/fitness.html

## Load libraries

In [1]:
import six
import sys
sys.modules['sklearn.externals.six'] = six
import mlrose
import numpy as np
import pandas as pd
import time
from sklearn.preprocessing import normalize
from sklearn.metrics import accuracy_score

## Set directories

In [2]:
directory_hw1 = "/Users/mikepecorino/Documents/machine_learning/HW1/"
directory_hw2 = "/Users/mikepecorino/Documents/machine_learning/HW2/"

## Load inputs

In [3]:
data_all = pd.read_csv(directory_hw1 + "sensor_all.csv")

## Neural Network

### Define features and response variable

### Features list

In [4]:
features = data_all.columns[data_all.columns.isin(["subject", "activity_raw", "activity", "tag", "fold", "response_prop"]) == False]

### Features data

In [5]:
data_all_features = data_all[features]
train_features = data_all[features][data_all["tag"] == "train"]
valid_features = data_all[features][data_all["tag"] == "valid"]
test_features = data_all[features][data_all["tag"] == "test"]
data_cv_features = data_all[features][data_all["tag"].isin(["train", "valid"])]

### Create adjusted response variable

### Response variable

In [54]:
response = "activity_sitting"
data_all[response] = np.where(data_all["activity"] == 0, 1, 0)

### Response data

In [55]:
data_all_response = data_all[response]
train_response = data_all[response][data_all["tag"] == "train"]
valid_response = data_all[response][data_all["tag"] == "valid"]
test_response = data_all[response][data_all["tag"] == "test"]
data_cv_response = data_all[response][data_all["tag"].isin(["train", "valid"])]

### Normalize data

In [8]:
data_cv_features_normalized = normalize(data_cv_features)
test_features_normalized = normalize(test_features)

In [18]:
#Inputs for the Neural Network
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
algos = ["random_hill_climb", "simulated_annealing", "genetic_alg", "gradient_descent"]
random_state = 28
pop_size = 200
mutation_prob = 0.1
#Simulated Annealing: decay schedule for temperature
schedule = mlrose.ExpDecay(init_temp = 100,
                           exp_const = .05,
                           min_temp = 1)

#Initialize an empty data frame for recording results
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
mlrose_nn = pd.DataFrame(columns = ["algorithm",
                                    "random_restart",
                                    "max_attempt",
                                    "max_iter",
                                    "time",
                                    "function_evaluations"
                                    "train_score",
                                    "test_score"])

#Loop
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#Start an iteration counter
iter = 1

#For each combination of algorithm, max attempt, max iter, and random restarts
for algo in algos:
    for max_attempt in [10]:
        for max_iter in [1, 10, 20, 30, 40, 50]:
            for random_restart in [10]:
                
                #Print message
                print("Working on iter:", iter,
                      "Algorithm:", algo,
                      "Random restart:", random_restart,
                      "Max attempt:", max_attempt,
                      "Max iter:", max_iter)
                
                #Start the timer
                start = time.time()
        
                #Create the model object
                nn_model = mlrose.NeuralNetwork(hidden_nodes = [1500],
                                                activation = "relu",
                                                algorithm = algo,
                                                max_iters = max_iter,
                                                bias = True,
                                                is_classifier = True,
                                                learning_rate = 0.0001,
                                                early_stopping = True,
                                                clip_max = 2,
                                                max_attempts = max_attempt,
                                                random_state = random_state,
                                                pop_size = pop_size,
                                                mutation_prob = mutation_prob,
                                                schedule = schedule,
                                                restarts = random_restart,
                                                curve = True)
            
                
                #Fit the model
                nn_model.fit(data_cv_features_normalized, data_cv_response)
                
                #Get the number of function evaluations
                function_evaluations = np.argmax(nn_model.fitness_curve) + 1
                
                #End the timer
                end = time.time()
                
                #Get the total model fitting time
                fit_time = end - start
                
                #Score the model on train and test data
                train_pred = nn_model.predict(data_cv_features_normalized)
                train_score = accuracy_score(data_cv_response, train_pred)
                test_pred = nn_model.predict(test_features_normalized)
                test_score = accuracy_score(test_response, test_pred)
                
                #Add to results list
                mlrose_nn = mlrose_nn.append({"algorithm": algo,
                                              "random_restart": random_restart,
                                              "max_attempt": max_attempt,
                                              "max_iter": max_iter,
                                              "time": fit_time,
                                              "function_evaluations": function_evaluations,
                                              "train_score": train_score,
                                              "test_score": test_score},
                                             ignore_index = True)
                
                #Increment the iteration counter
                iter = iter + 1
                print("Done in time:", fit_time, "with test score:", test_score)
                print("\n")

#Done
print("Done")

#Output
mlrose_nn.to_csv(directory_hw2 + "sensor_randomized_opt_neural_net.csv", index = False)

Working on iter: 1 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 1
Done in time: 7.819718837738037 with test score: 0.6596538853070919


Working on iter: 2 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 10
Done in time: 42.991997957229614 with test score: 0.6596538853070919


Working on iter: 3 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 20
Done in time: 82.41077494621277 with test score: 0.6596538853070919


Working on iter: 4 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 30
Done in time: 122.83553218841553 with test score: 0.6596538853070919


Working on iter: 5 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 40
Done in time: 161.51280689239502 with test score: 0.6596538853070919


Working on iter: 6 Algorithm: random_hill_climb Random restart: 10 Max attempt: 10 Max iter: 50
Done in time: 1182.4522652626038 with test score: 0.6596538853070919


W