## <font size = 6> Introduction
<font size = 5> This notebook implements and evaluates the **Frank-Wolfe (FW) algorithm** as a heuristic for solving the **multi-objective assortment optimization problem** (specifically, mean-variance trade-off) under the Multinomial Logit (MNL) model.

<font size = 5> The primary objective is to test the performance of this iterative heuristic against provably optimal solutions. The optimal solutions are assumed to have been pre-calculated (likely using the algorithm from the "ICG Algorithm.ipynb" file) and are loaded from external Excel files.

<font size = 5> The main contents are as follows:
1. <font size = 5> **Define Key Functions**: In this section we implement the functions to calculate necessary MNL model metrics and the core logic of a single Frank-Wolfe iteration.
2. <font size = 5> **Read Experimental Data**: We then read problem instance settings (revenues, attraction weights) and their corresponding known optimal solutions from a series of Excel files.
3. <font size = 5> **Run FW Algorithm**: For each problem instance and for various risk-aversion coefficients (lambda), we run the iterative FW algorithm until it converges.
4. <font size = 5> **Evaluation**: We finally compare the solutions found by the FW algorithm against the known optimal solutions to measure its correctness and performance. The results are then systematically saved to a new Excel files.


Libraries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import time
from scipy.spatial import ConvexHull, convex_hull_plot_2d
import itertools
import pandas as pd
import openpyxl

In [None]:
# Functions
# The number of products n
# Assortment S, an array of product indices
# Attractiveness vector V, an array of attractiveness values for each product
# Revenue R, an array of revenues for each product
# V0, a constant representing the attractiveness of the outside option
# Lambda_Coeff (float): The risk aversion coefficient.
def Powerset(iterable):
    """
    Generates the powerset of an iterable, excluding the empty set.
    E.g., powerset([1,2,3]) --> (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)
    """ 
    s = list(iterable)
    return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(1,len(s)+1))

def Prob(S,V):
    # Calculates the choice probability for each product in assortment S.
    Temp_Prob = S
    return V.take(Temp_Prob)/(V0+(V.take(Temp_Prob)).sum())

def Rev(S,R,V):
    # Calculates the expected revenue for assortment S.
    Temp_Assort = S
    if Temp_Assort == 'Empty set':
        return 0
    else:
        return (R.take(Temp_Assort)*(V.take(Temp_Assort)/(V0+(V.take(Temp_Assort)).sum()))).sum()

def Var(S,R,V):
    # Calculates the revenue variance for assortment S.
    Temp_Var = S
    if Temp_Var == 'Empty set':
        return 0
    else:
        return (R.take(Temp_Var)**2*(V.take(Temp_Var)/(V0+(V.take(Temp_Var)).sum()))).sum() - ((R.take(Temp_Var)*(V.take(Temp_Var)/(V0+(V.take(Temp_Var)).sum()))).sum())**2

def Obj(S,R,V,Lambda_coeff):
    # Calculates the mean-variance objective function value for given assortment and mean-variance coefficient.
    return Rev(S,R,V) - Lambda_coeff*Var(S,R,V)

def Kappa(S,R,V,Lambda_Coeff):
    # Calculates the 'kappa' values, which are the pseudo-revenues used in the
    # linearization step of the Frank-Wolfe algorithm. This is derived from the
    # gradient of the objective function at the current solution S.
    Temp_Rev = Rev(S,R,V)
    return 2*Lambda_Coeff*R*Temp_Rev + R - Lambda_Coeff * R**2

def ArgMax(Cand_Assort_Set,R,V):        
    # Given a set of candidate assortments, this function finds the one that
    # optimizes the classic MNL problem (revenue maximization) using the provided
    # pseudo-revenues (R_pseudo). This solves the linearized subproblem of the FW algorithm.                           
    Temp_Rev = []
    for Cand_Assort in Cand_Assort_Set:
        Temp_Rev.append(Rev(Cand_Assort,R,V))
    S_Optimal = Cand_Assort_Set[np.argmax(Temp_Rev)]
    return S_Optimal

def FW(n,S,R,V,Lambda_Coeff):
    # Performs a single iteration of the Frank-Wolfe (FW) algorithm.

    # Linearize the objective function by calculating pseudo-revenues (kappa).
    Temp_Kappa = Kappa(S,R,V,Lambda_Coeff)

    # Solve the linearized subproblem. This is a standard MNL problem with kappa as revenues.
    # The optimal solution is known to be one of the revenue-ordered assortments.
    Sorted_Products = np.array(Temp_Kappa).argsort()[::-1]           # Sort products by decreasing kappa.
    Cand_Assort_Set = [Sorted_Products[:i+1] for i in range(n)]      # Generate candidate assortments.
    Cand_Assort_Set.append('Empty set')

    # Find the best assortment among the candidates using the pseudo-revenues.
    Solution = ArgMax(Cand_Assort_Set,Temp_Kappa,V)
    return Solution

## <font size = 6> Experiment Setup
<font size = 5> Define the problem sizes (number of products, list `N`) to be tested.

<font size = 5> Set the initial assortment for the FW algorithm to start from (WLOG, we start with the assortment that only offers the first product).

In [None]:
N = [10,20,50,100]
S_initial = np.array([0])

In [None]:
# --- Outer loop: Iterate through different problem sizes (n) ---
for n in N:
    
    # Define paths for the output file and the input files containing parameters and optimal results.
    Write_path = str(n) + '_FW Results.xlsx'
    Parameter_path = str(n)+'_Parameters.xlsx'
    Result_path = str(n)+'_Test Results_fixed.xlsx'

    # Setup Pandas ExcelWriter to save the results.
    Results_writer = pd.ExcelWriter(Write_path, engine='openpyxl')

    # Open the Excel file
    xls_para = pd.ExcelFile(Parameter_path)
    xls_result = pd.ExcelFile(Result_path)

    # Get the names of all worksheets, where each sheet represents a different parameter configuration.
    sheet_names = xls_para.sheet_names

    # Iterate through each parameter configuration (worksheet)
    for sheet_name in sheet_names:
        
        #Each worksheet has a unique parameter configuration, we compute V0 based on the parameter phi
        Phi = eval(sheet_name)[-1]

        # Iterate through each problem instance within the worksheet
        # Each worksheet contains multiple independent instances.
        for instance in range(10): # Process X instances per sheet.

            # Read revenue/weight data from the specified worksheet
            df_para = pd.read_excel(Parameter_path, sheet_name=sheet_name, usecols=[3*instance, 3*instance+1])
            # Read the pre-calculated optimal solutions (efficient frontier and assortments) for the current instance.
            df_result = pd.read_excel(Result_path, sheet_name=sheet_name, usecols=[4*instance+5, 4*instance+6])

            # Clean the imported data by dropping rows with NaN values.
            df_para = df_para.dropna(axis=0,how='any')
            df_result = df_result.dropna(axis=0,how='any')

            #Get the revenue, weight, efficient frontier and optimal assortments (including empty set)
            R = df_para.iloc[:,0].values               # Revenue
            V = df_para.iloc[:,1].values               # Attraction Weight

            EF = df_result.iloc[:,1].values            #Efficient frontier (lambda breakpoints)
            
            # Append a large value to the EF to represent the interval to +infinity.
            EF = np.append(EF,2*abs(EF[-1]))                

            # Calculate V0 based on the total attraction V and the parameter Phi.
            V0 = V.sum() * Phi/(1-Phi)

            #Optimal assortments, first get non-empty ones by converting str to list, then append the str empty set.
            Temp_Optimal_Assortment = df_result.iloc[:,0].values[:-1]

            Optimal_Assortments = []
            for s in Temp_Optimal_Assortment:
                Optimal_Assortment = eval(s)
                Optimal_Assortments.append(Optimal_Assortment)
            Optimal_Assortments.append('Empty set')
            
            #Get the list of lambda (midpoints of every lambda interval in EF) we are going to visit
            Lambda_Coeff_list = (EF[:-1] + EF[1:])/2

            # Lists to store the results for the current instance.
            Optimal_Assortment_FW = []              #The optimal assortments' list for current instance
            Elapsed_Time_FW = []

            # FW Algorithm Execution
            # For each fixed lambda in the list, we run the FW alg with intial assortment tilde S = {1}
            for Lambda_Coeff in Lambda_Coeff_list:

                #Pin the start time
                start_time = time.time()

                Assort_Visited = [set(S_initial)]                           # Keep track of visited assortments to detect cycles.                         #The assortment visited in the FW alg
                Solution = FW(n,S_initial,R,V,Lambda_Coeff)                 # The optimal assortment given the current tilde S
                Set_Solution = set(Solution)                                # Check if the solution is already visited
                
                while (Set_Solution not in Assort_Visited):                 # If not visited, add the solution to visited assortment, iterate the alg
                    Assort_Visited.append(Set_Solution)
                    Solution = FW(n,Solution,R,V,Lambda_Coeff)
                    Set_Solution = set(Solution)

                    #If the algorithm runs for more than 2mins, break the loop
                    current_time = time.time()
                    elapsed_time = current_time - start_time
                    if elapsed_time > 120:
                        break

                # Record the final solution and the elapsed time for this lambda.
                Optimal_Assortment_FW.append(Solution)
                Elapsed_Time_FW.append(elapsed_time)

            #The rate that solutions by FW are also optimal
            Correctness_FW = []
            for i in range(len(Optimal_Assortment_FW)):
                set1 = set(Optimal_Assortment_FW[i])
                set2 = set(Optimal_Assortments[i])
                if set1 == set2:
                    Correctness_FW.append(1)
                else:
                    Correctness_FW.append(0)
            Correctness_count = sum(Correctness_FW)
            Rate_FW = Correctness_count/ len(Correctness_FW)

            # The performance compared with the optimal solution. This measures the quality of the heuristic.
            Performance_Rate = []
            for i in range(len(Lambda_Coeff_list)):
                Obj_Opt = Obj(Optimal_Assortments[i],R,V,Lambda_Coeff_list[i])
                Obj_FW = Obj(Optimal_Assortment_FW[i],R,V,Lambda_Coeff_list[i])
                if Obj_Opt == Obj_FW:
                    Performance_Rate.append(1)
                else:
                    if Obj_Opt == 0:
                        Performance_Rate.append('Optimal is empty set')
                    else:
                        Performance_Rate.append(Obj_FW/Obj_Opt)

            # Save Results to Excel. Write the DataFrame to the appropriate location in the output Excel file.
            Result_FW = pd.concat([pd.DataFrame({'Time':Elapsed_Time_FW}), pd.DataFrame({'Optimal_Assortment': Optimal_Assortment_FW}), pd.DataFrame({'Correctness': Correctness_FW}), pd.DataFrame({'Correctness Rate':[Rate_FW]}),pd.DataFrame({'Performance':Performance_Rate})], axis = 1)
            Result_FW.to_excel(Results_writer, sheet_name=sheet_name, startcol= 6*instance, index = False)
    # Save and close the Excel file after processing all sheets.
    Results_writer.save()
    Results_writer.close()