# Run Experiments

Let $A \in \mathbb{R}^{m \times n}$, $H \in \mathbb{R}^{n \times m}$.

Consider the following properties:

$$
AHA = A \tag{P1} \\
$$

$$
(AH)^T = AH \tag{P3} \\
$$

$$
A^TAH = A^T \tag{PLS} \\
$$

In this experiment we solve the problems:

$$
(P_{PLS}^{1}) = \min\{||H||_1 : PLS\} \\
$$

$$
(P_{1, 3}^{1}) = \min\{||H||_1 : P1, P3\} \\
$$

Where $A$ is a random rank $r$ $m \times n$ matrix $A$, where $m = 50, 60, ..., 100$, $n = \lfloor 0.5m \rfloor$, $r = \lfloor 0.25m \rfloor$, using the function *random.rand* from the Python’s library Numpy which generates a random $m \times n$ matrix $A'$ with coefficients in the uniform distribution $[0, 1)$, after which we computed the singular value decomposition of $A'$, $U\Sigma' V^T$, created a matrix $\Sigma$ that consists of $Σ'$ with only the $r$-greatest singular values, and computed $A$ as $A = U\Sigma V^T$. In the case where $A'$ has rank less then $r$, we simply generated another $A'$ until $A'$ has rank at least $r$.

We colleted the following measures for each problem:

- $||H||_1$
- $||H||_0$
- r(H) (where r stands for rank)
- $||AHA - A||_F$
- $||HAH - H||_F$
- $||(AH)^T - AH||_F$
- $||A^TAH - AT||_F$
- Computational_Time(s)
- Memory_Used(MiB)

## Imports

In [1]:
# Disables generation of pycache file
import sys
sys.dont_write_bytecode = True

# Imports libraries
import time
import tracemalloc
import os
import pandas as pd

# Imports made functions
from solvers import *
from utility import *

## Running Experiments

In [2]:
result_columns_basenames = ["||H||_1" , "||H||_0", "r(H)", "||AHA - A||_F", "||HAH - H||_F",
                "||(AH)^T - AH||_F", "||A^TAH - AT||_F", "Computational_Time(s)", "Memory_Used(MiB)"]

problems = ["1_norm_PLS", "1_norm_P1_P3"]

result_column_names = []
for problem in problems:
    for basename in result_columns_basenames:
        result_column_names.append(f"{problem}_{basename}")

column_names = ["m", "n", "r"] + result_column_names

m_values = [10*i for i in range(5, 11)]

In [3]:
solvers = {
    "1_norm_PLS": problem_1_norm_PLS_solver,
    "1_norm_P1_P3": problem_1_norm_P1_P3_solver
}

is_viable_checks = {
    "1_norm_PLS": problem_1_norm_PLS_viable_solution,
    "1_norm_P1_P3": problem_1_norm_P1_P3_viable_solution
}

In [4]:
try:
    df = pd.read_csv("./results/results_2.csv")
except:
    # Creates an empty dataframe with the specified column names
    df = pd.DataFrame(columns=column_names)

    # Saves dataframe as a csv file
    df.to_csv('./results/results_2.csv', index=False)

In [5]:
df = pd.read_csv("./results/results_2.csv")

In [6]:
for m in m_values:
    A = generate_random_rank_r_matrix(m=m)
    n = int(0.5 * m)
    r = int(np.floor(0.25 * m))
    instance_results = {
        "m": m,
        "n": n,
        "r": r
    }

    for i in range(len(problems)):
        problem = problems[i]
        solver = solvers[problem]
        is_viable_check = is_viable_checks[problem]
        # Starts tracing memory usage
        tracemalloc.start()
        start_time = time.time()
        H_star = solver(A=A)
        end_time = time.time()
        # Gets current amount of memory in usage and peak memory usage since tracing
        current, peak = tracemalloc.get_traced_memory() # Memory usage is in bytes
        # Resets peak memory usage
        tracemalloc.reset_peak()
        if (not is_viable_check(A=A, H=H_star, m=m, n=n)):
            print(f"m: {m}")
            print(f"problem: {problem} did not find a viable solution")
            sys.exit()
        problem_results = calculate_problem_results(A=A, H=H_star, problem=problem)
        for key, value in problem_results.items():
            instance_results[key] = value
        instance_results[f"{problem}_Computational_Time(s)"] = end_time - start_time
        instance_results[f"{problem}_Memory_Used(MiB)"] = peak / (1024 ** 2)

    df.loc[len(df)] = instance_results
    df.to_csv('./results/results_2.csv', index=False)

In [7]:
# Creates new csv file with different column name
with open("./results/results_2.csv", "r") as file:
    data = file.readlines()
    for i in range(len(data)):
        data[i] = data[i].replace(f"{problems[0]}_", "")
        data[i] = data[i].replace(f"{problems[1]}_", "")
    with open("./results/results_2_temp.csv", "w")as file2:
        for line in data:
            file2.write(line)

import csv
from openpyxl import Workbook

csv_file_path = './results/results_2_temp.csv'
excel_file_path = './results/results_2.xlsx'

# Creates a new Excel sheet
workbook = Workbook()
sheet = workbook.active

# Reads and csv file and write the data on the Excel sheet
with open(csv_file_path, mode='r', encoding='utf-8') as csv_file:
    csv_reader = csv.reader(csv_file)
    for row in csv_reader:
        sheet.append(row)

# Saves the Excel file
workbook.save(excel_file_path)

# Deletes temp csv file
# Verifies if file exists before deleting
if os.path.exists(csv_file_path):
    os.remove(csv_file_path)