# Project Description

##**Overview**
Your task is to create a program that plays a repeated 2-person Stackelberg pricing games as the leader, under conditions of imperfect information.



## **Specifications**

**Followers’ Specifications**

*   There are three different followers, named as MK1, MK2, and MK3.
*   The payoff function and strategy spaces of these followers are unknown. However, historical data from 100 days of previous games between the leader and each follower is available and provided.

**Leader’s Specifications**

Assume that the leader’s strategy (i.e., price) is $u_L$ and the follower’s strategy (i.e., price) is $u_F$. The specifics regarding the leader’s strategy and profits are as follows:
*   **Unit cost**: $c_L=£1.00$
*   **Strategy space**:
      1.   When playing with MK1 and MK2, the leader’s strategy space is $U_L = [1.00, +∞)$
      2.   When playing with MK3, the leader’s strategy space is $U_L = [1.00, 2.00]$     
* 	**Demand Model** (i.e., price-sale relationship): $S_L(u_L, u_F)=2 - u_L + 0.3u_F$    
*   **Daily Profit**: $(u_L - c_L)S_L(u_L, u_F)$
*   **Objective**: The leader’s objective of playing is to maximise the accumulated profit for the next 30 days.
*   **Assumption**: For simplicity, it is assumed that the leader’s unit cost, strategy space, and the demand model remain unchanged throughout the entire period of 130 days.

## **Game Rules and Playing Scenarios**

**Game Rules**

On day $t$, the leader announces the price, $u_L (t)$, first. After learning the leader’s price, the follower will choose the responding price, $u_F (t)$. This process occurs over a period of $t=1,2,…,130$, where $t=1,2,…,100$ corresponds to the days for which historical data is available, and $t=101,102,…,130$ are the days during which the game will be played repeatedly.


**Game Playing Scenarios**

In this setup, your program, acting as the leader, will engage in three separate games against three different competitors, identified as followers MK1, MK2, and MK3.

For each of the repeated game, the leader selects his learning method to learn the follower’s reaction function based on the provided set of historical data which spans from $t=1,2,…,100$. During the next 30 days ($t=101,102,…,130$), the leader announces a price, $u_L (t)$, to the game platform daily. After the leader's price is announced, the follower will respond with the price, $u_F (t)$. Upon receiving the follower's price, $u_F (t)$, from the game platform, the leader may use this new information to refine their knowledge/ update the learned reaction function and decide his price of the next day. In the other words, the leader (i.e., your program) should be able to take the follower’s price of the previous day from the game platform for updating, and then decide and send his price of the next day to the game platform.

It should be emphasised that the follower’s strategy and payoff function is subject to the changing and time varying environment. That is, the parameters in the follower’s payoff function may not be the same every day.



## **Implementation Options**
1.	**Single Leader Approach**: You may choose to design and implement a single leader algorithm that competes against all three followers, adapting its strategy based on the opponent being faced.
2.	**Multiple Leaders Approach**: Alternatively, you could design and implement three distinct leaders, each tailored to play against one of the followers. This approach allows for specialised strategies to be developed for each follower type.

Whichever method you choose, it's crucial to document in this notebook section [Your Leader](#your-leader) which leader(s) is designed to play against each follower. This clarity will help in understanding the design choices and strategies implemented in your program.


## **Project Timeline and Support**

**Project Timeline**

*	**Week 7 (Starting 10 March)**: Start to work on your project and project planning.
*	**Week 11 (28 April – 2 May)**: Group presentations telling us what you have done.
*	**Deadline (2 May at 6:00pm)**: Deadline to submit codes and supporting document.


**Support**

*	**Weekly Support Sessions**: Project support sessions are scheduled every Wednesday from 11:00 am to 12:00 noon between Weeks 7 and 10. These sessions will be held at <u>Booth Street East -TH A (G.10)</u>. You are encouraged to attend if you have questions, need clarification, or seek guidance on any aspect of the project. Attendance is optional and based on your needs for support.
*	**Email**: In addition to the weekly sessions, you are welcome to ask questions or seek support via email at any time. If you encounter any issues with installing or running the game platform, or if you have any other project-related inquiries, do not hesitate to contact the teaching assistants by email for support.


## **Assessment Criteria and Submission**

**Assessment Criteria**

The project is a significant component of your final grade, accounting for 50% of the total mark, with the remaining 50% coming from the final examination. The project mark is distributed as follows:
1.	**Content of the Approach (50% of this project mark)**:
  *	Presentation (40%): Your grade in this category is based on the presentation you give in Week 11. This includes the originality of your idea, how well it integrates knowledge from the course, and any relevant external literature or research you have incorporated.
  *	Written Materials (10%): This is based on the documentation you submit to your group journal on Blackboard. It evaluates the depth and clarity of your written explanation of the approach, including the rationale behind your strategies and methods.
2.	**Performance of the Approach (50% of this project mark)**:
  *	This evaluates the effectiveness of your submitted program by looking at how well it performs in the game platform, that is, the accumulated profit generated by your submitted codes.
3.	**Group Mark Distribution**:
  *	The overall group mark will be allocated to individual group members based on self-assessment and the demonstration of knowledge during the presentation.


**Presentation**

During the presentation week of Semester 2, every group is scheduled to present their project to the lecturers. It is important for each group member to actively participate in speaking and answering questions. This participation allows us to evaluate each member's contributions to the project. The assessment criteria focus solely on the content of your presentation, including the activities your group undertook and the lessons learned through the project process. Presentation skills are not part of the evaluation, so the emphasis should not be on creating a polished presentation but on effectively communicating your project's substance and outcomes.


**Performance of the Approach – Assessment Details**

*	**Step 1**. We will run your submitted codes against MK1, MK2, and MK3 and get the accumulated profit for the days $101 – 130$. These profits will be used as the performance of your submitted codes against MK1, MK2 and MK3
*	**Step 2**. We have also designed 3 further followers MK4, MK5, and MK6, which are created by slightly changes of the parameters of MK1, MK2, and MK3 respectively. That is, MK1 and MK4 are very similar, MK2 and MK5 are very similar, MK3 and MK6 are very similar. No data about MK$i$ ($i=4, 5, 6$) is provided. That is, they are invisible followers to you. But the setting of each MK$i$ ($i=4,5,6$) is the same as the provided followers, with 100 historical data and we will run your codes to get the accumulated profit for days $101 – 130$, which will be used as the performance of your submitted codes against MK4, MK5 and MK6.
*	**Step 3**. Based on the accumulated profits for MK1 and MK4 and compared with the best possible performances of MK1 and Mk4, your codes will get a percentage mark; Similarly you will get a percentage mark from MK2 and MK5, and then a percentage mark from MK3 and MK6. Finally your performance mark will be the average of these 3 percentage marks.

**Remark**. The creation of MK4, MK5, and MK6 is a new assessment component for this year and so we are happy to hear any feedback, and change if necessary. The motivation and justification are as follows:
*	We have run the codes submitted by the project groups of the last year to MK 4, MK5, and MK6, and noticed various issues, such as some submissions were hard coded for some parameters rather than via the learning; Some were invalid such as out of the price bounds; Some have the running errors etc.
*	In real applications such as petrol station pricing, it is impossible to many many experiments and tests for each station, but we need a piece of software usable across a large numbers of petrol stations. The creation of MK4, MK5, and Mk6 is to simulate and test such a need from a real application point of view.
*	In the last year, many groups produced very similar performance due to the limited test cases. The creation of MK4, MK5, and MK6 will provide more testing cases and help distinguishing the performance of different approaches.
*	As MK4, MK5, and MK6 are invisible to you, this should not add your workload for the project.      


**Submission**

All submissions will be marked using the free version of Google Colaboratory.
* Make sure your implementation works within this environment.
* Avoid using any external dependencies that might not be available.

Please download this notebook for submission by clicking “File” in the top left corner, then selecting “Download” followed by “Download .ipynb”. If your notebook requires any external files to run, ensure that you submit them along with your notebook. Submission will be done through Blackboard, and further details will be provided later.


## **Game Platform**

The project will be run on Google Colaboratory, which is a hosted Jupyter Notebook service. This is done to avoid compatibility issues across different operating systems and to provide a collaborative environment where group members can work effectively using Colab’s sharing functionality. Follow the steps below to set up and use Google Colaboratory for your project.


**Using Google Colaboratory for the Project**
* Running Code
  * Click on a cell and press Shift + Enter, or
  * Click the Play button on the left side of the cell
* Adding Cell
  * Hover over an existing cell and click “+ Code” or “+ Text” where needed
* Stopping a Running C→ell:
  * From the top menu, click “Runtime” → “Interrupt execution”.
* Restarting Runtime
  * Click “Runtime” → “Restart session”.
  * This will clear all variables.
* Disconnecting and Deleting the Runtime
  * Click “Runtime” → “Disconnect and delete session”.
  * This will reset the Colaboratory environment completely.
*	Sharing with Group Members
  *	You can share your notebook with your group members by clicking “Share” in the top-right corner.
  *	Be careful when saving, if multiple people edit the notebook at the same time, you may overwrite each other’s work.
*	For More Functionalities, please visit [this link](https://colab.research.google.com).

**Understanding the Notebook Structure**

The notebook is structured into several sections. You can click the first icon on the top left to open the “Table of contents” and navigate through different sections.
1.	Project Description: This section provides an overview of the project.
2.	Install and Import: This section installs, imports necessary packages, and unzips comp34612.zip.
3.	Base Leader and Example Leader: This section contains the base leader class and an example leader implementation.
4.	Your Leader: In this section,
	*	First write your group number as an integer.
	*	Then implement your leader(s) based on the project requirements.
	*	Clearly document:
      *	Which implementation approach you choose (single or multiple).
      *	Which leader is designed to play against each follower.
5.	Simulation: This section provides an interface to simulate the pricing game between your selected leader and selected follower.


# Install and Import

In [4]:
!pip install xlsxwriter



In [5]:
import zipfile
import os
import random
import gc
from IPython.display import Javascript
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

In [6]:
extract_path = "."
zip_filename = "comp34612.zip"

os.makedirs(extract_path, exist_ok=True)

# this has been changed from their default file cos their given code dont extract properly for the next imports to work
with zipfile.ZipFile(zip_filename, "r") as zip_ref:
    # Extract files manually to avoid recreating the top-level folder
    for member in zip_ref.namelist():
        member_path = member.split("/", 1)[-1]
        if member_path:  #ignore top level comp3... folder to just read its contents
            zip_ref.extract(member, extract_path)
            try:
                os.rename(os.path.join(extract_path, member), os.path.join(extract_path, member_path))
            except:
                pass


In [7]:
from engine import Engine
from gui import GUI

Please import the necessary packages here or use `!pip install *package_name*` if they are not already installed.

In [8]:
import numpy as np
import sympy as sp
import os
import numpy as np
from sklearn.model_selection import KFold
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from itertools import product


# Base Leader and Example Leader

## Base Leader

This is the base class for all leader subclasses.

In [9]:
class Leader:
    _subclass_registry = {}

    def __init__(self, name, engine):
        self.name = name
        self.engine = engine

    @classmethod
    def cleanup_old_subclasses(cls):
        """
        A function to remove old subclasses before defining new ones.
        """
        existing_subclasses = list(cls.__subclasses__())

        for subclass in existing_subclasses:
            subclass_name = subclass.__name__
            if subclass_name in cls._subclass_registry:
                del cls._subclass_registry[subclass_name]
                del subclass
        gc.collect()

    @classmethod
    def update_subclass_registry(cls):
        """
        A function to update registry after cleaning up old subclasses.
        """
        cls.cleanup_old_subclasses()
        cls._subclass_registry = {subclass.__name__: subclass for subclass in cls.__subclasses__()}

    def new_price(self, date):
        """
        A function for setting the new price of each day.
        :param date: date of the day to be updated
        :return: (float) price for the day
        """
        pass

    def get_price_from_date(self, date):
        """
        A function for getting the price set on a date.
        :param date: (int) date to get the price from
        :return: a tuple (leader_price, follower_price)
        """
        return self.engine.exposed_get_price(date)


    def start_simulation(self):
        """
        A function runs at the beginning of the simulation.
        """
        pass

    def end_simulation(self):
        """
        A function runs at the beginning of the simulation.
        """
        pass

<a name="your-leader"></a>
# Your Leader

Please write your group number below in type int.

In [10]:
group_num = 1
assert isinstance(group_num, int), f"Expected an integer for group_num, but got {type(group_num).__name__}"

Please implement your leaders below. Please clearly document:
*   Which implementation approach you choose (single leader approach or multiple leaders approach)
*   State which leader(s) is designed to play against each follower.

We chose to implement a multiple leader approach, although there are minimal differences between the three leader agents that we produced.
* `Leader_Mk_1_4` is designed to play against Followers 1 and 4.
* `Leader_Mk_2_5` is designed to play against Followers 2 and 5.
* `Leader_Mk_3_6` is designed to play against Followers 3 and 6.

# Leader for Followers 1 & 4

In [11]:
class Leader_Mk_1_4(Leader):
    def __init__(self, name, engine):
        super().__init__(name, engine)
        self.c_L = 1 # given
        self.random_state = 42
        self.lower_bound = 1
        self.upper_bound = 1.9

    def demand_L(self, u_L, u_F):
        return 2 - u_L + (0.3 * u_F)

    def profit_L(self, u_L, u_F):
        return (u_L - self.c_L) * self.demand_L(u_L, u_F)

    def get_historical_data(self, current_date):
        historical_data = {}
        i = 0
        for date in range(1, current_date):
            leader_price, follower_price = self.get_price_from_date(date)
            historical_data[i] = (date, leader_price, follower_price)
            i += 1
        historical_data_df = pd.DataFrame.from_dict(historical_data, orient='index', columns=['t', 'u_L', 'u_F'])

        return historical_data_df

    def optimise_hyperparams(self, max_degree=5, num_splits=5):
        degrees = np.arange(2, max_degree+1, 1)
        alpha_vals = np.logspace(-4, 0, 10)
        param_grid = list(product(degrees, alpha_vals))

        best_mse = float('inf')

        kf = KFold(n_splits=num_splits, shuffle=True, random_state = self.random_state)
        X = self.historical_data[['u_L', 't']].values
        y = self.historical_data['u_F'].values

        for degree, alpha in param_grid:
          mse_scores = []
          poly = PolynomialFeatures(degree)
          X_poly = poly.fit_transform(X)

          t_min = self.historical_data['t'].min()
          time_diffs = self.historical_data['t'] - t_min
          sample_weights = np.exp(alpha * time_diffs)

          for train_index, test_index in kf.split(self.historical_data):
            X_train, X_val = X_poly[train_index], X_poly[test_index]
            y_train, y_val = y[train_index], y[test_index]

            weights = sample_weights[train_index]

            model = LinearRegression()
            model.fit(X_train, y_train, sample_weight=weights)
            y_pred = model.predict(X_val)
            mse = np.mean((y_val - y_pred)**2)
            mse_scores.append(mse)

          avg_mse = np.mean(mse_scores)
          if avg_mse < best_mse:
            best_mse = avg_mse
            best_degree = degree
            best_alpha = alpha

        return best_alpha, best_degree

    def train_follower_model(self, degree, alpha):
      X = self.historical_data[['u_L', 't']].values
      y = self.historical_data['u_F'].values

      poly = PolynomialFeatures(degree)
      X_poly = poly.fit_transform(X)

      t_min = self.historical_data['t'].min()
      time_diffs = self.historical_data['t'] - t_min
      sample_weights = np.exp(alpha * time_diffs)

      self.follower_model_poly = poly
      self.follower_model = LinearRegression()
      self.follower_model.fit(X_poly, y, sample_weight=sample_weights)

    def optimise_follower_model(self, current_date):
      self.historical_data = self.get_historical_data(current_date=current_date)
      self.upper_bound = np.mean(self.historical_data['u_L']) + 1.5 * np.std(self.historical_data['u_L'])

      self.alpha, self.degree = self.optimise_hyperparams(max_degree=5, num_splits=5)

      self.train_follower_model(degree=self.degree, alpha=self.alpha)

    def new_price(self, date: int):
        """
        We predict the follower's price based on the leader's price and date.
        We differentiate the follower's learned reaction function to maximise the leader's profit.

        The process for choosing the leader's price is:
          1. Differentiate the follower's reaction function with respect to u_L
          2. Find any local maxima within the pricing bounds - [1.00, +inf) for Follower Mk2
          3. Find profit at each local maxima, and the boundaries
          4. Choose the point with the highest leader profit.

        Args:
            date (int): The current date.

        Returns:
            float: The chosen leader's price.
        """
        self.optimise_follower_model(current_date=date)

        u_L, t = sp.symbols("u_L t")

        # Construct follower reaction function u_F = f(u_L, t)
        feature_names = self.follower_model_poly.get_feature_names_out(['u_L', 't'])
        follower_expr = sum(
            coef * sp.sympify(term.replace("^", "**").replace(" ", "*"))
            for coef, term in zip(self.follower_model.coef_, feature_names)
        ) + self.follower_model.intercept_

        # Plug into the profit function
        profit_expr = (u_L - self.c_L) * (2 - u_L + 0.3 * follower_expr)

        # Derivative of profit
        profit_derivative = sp.diff(profit_expr, u_L)

        extrema = sp.solve(profit_derivative.subs(t, date), u_L)

        candidates = [
            self.lower_bound,
            np.mean(self.historical_data['u_L']),
            self.upper_bound
        ]

        extrema = sp.solve(profit_derivative, sp.Symbol("u_L"))

        for ext in extrema:
          ext_value = ext.subs(sp.Symbol("t"), date)
          if ext_value.is_real and self.lower_bound <= ext_value <= self.upper_bound:
            candidates.append(float(ext_value))

        profits = {}
        for u_L in candidates:
          u_F = self.follower_model.predict(self.follower_model_poly.fit_transform([[u_L, date]]))[0]

          profits[u_L] = self.profit_L(u_L, u_F)

        best_u_L = max(profits, key=profits.get)

        return best_u_L

# Leader for Followers 2 & 5

In [12]:
class Leader_Mk_2_5(Leader):
    def __init__(self, name, engine):
        super().__init__(name, engine)
        self.c_L = 1 # given
        self.random_state = 42
        self.lower_bound = 1
        self.upper_bound = 1000

    def demand_L(self, u_L, u_F):
        return 2 - u_L + (0.3 * u_F)

    def profit_L(self, u_L, u_F):
        return (u_L - self.c_L) * self.demand_L(u_L, u_F)

    def get_historical_data(self, current_date):
        historical_data = {}
        i = 0
        for date in range(1, current_date):
            leader_price, follower_price = self.get_price_from_date(date)
            historical_data[i] = (date, leader_price, follower_price)
            i += 1
        historical_data_df = pd.DataFrame.from_dict(historical_data, orient='index', columns=['t', 'u_L', 'u_F'])

        return historical_data_df

    def optimise_alpha(self, max_degree=5, num_splits=5):
      """
      Finds the optimal alpha (recency weight decay factor) via cross-validation.

      Args:
          alpha_values (list): A list of alpha values to test.
          max_degree (int): Maximum degree for polynomial features.
          num_splits (int): Number of splits for K-Fold cross-validation.

      Returns:
          float: Best alpha value based on lowest cross-validated MSE.
      """
      alpha_values = np.logspace(-4, 0, 10)  # 0.0001 to 1

      X = self.historical_data[['u_L', 't']].values
      y = self.historical_data['u_F'].values
      t_min = self.historical_data['t'].min()

      best_alpha = None
      best_mse = float('inf')

      for alpha in alpha_values:
          kf = KFold(n_splits=num_splits, shuffle=True, random_state = self.random_state)
          poly = PolynomialFeatures(max_degree)
          X_poly = poly.fit_transform(X)

          time_diff = self.historical_data['t'] - t_min
          sample_weights = np.exp(alpha * time_diff)

          mse_scores = []

          for train_index, test_index in kf.split(X_poly):
              X_train, X_val = X_poly[train_index], X_poly[test_index]
              y_train, y_val = y[train_index], y[test_index]
              weights_train = sample_weights.iloc[train_index]

              model = LinearRegression()
              model.fit(X_train, y_train, sample_weight=weights_train)
              y_pred = model.predict(X_val)
              mse = np.mean((y_val - y_pred) ** 2)
              mse_scores.append(mse)

          avg_mse = np.mean(mse_scores)

          if avg_mse < best_mse:
              best_mse = avg_mse
              best_alpha = alpha

      return best_alpha

    def optimise_hyperparams(self, max_degree=5, num_splits=5):
        degrees = np.arange(2, max_degree+1, 1)
        alpha_vals = np.logspace(-4, 0, 10)
        param_grid = list(product(degrees, alpha_vals))

        best_mse = float('inf')

        kf = KFold(n_splits=num_splits, shuffle=True, random_state = self.random_state)
        X = self.historical_data[['u_L', 't']].values
        y = self.historical_data['u_F'].values

        for degree, alpha in param_grid:
          mse_scores = []
          poly = PolynomialFeatures(degree)
          X_poly = poly.fit_transform(X)

          t_min = self.historical_data['t'].min()
          time_diffs = self.historical_data['t'] - t_min
          sample_weights = np.exp(alpha * time_diffs)

          for train_index, test_index in kf.split(self.historical_data):
            X_train, X_val = X_poly[train_index], X_poly[test_index]
            y_train, y_val = y[train_index], y[test_index]

            weights = sample_weights[train_index]

            model = LinearRegression()
            model.fit(X_train, y_train, sample_weight=weights)
            y_pred = model.predict(X_val)
            mse = np.mean((y_val - y_pred)**2)
            mse_scores.append(mse)

          avg_mse = np.mean(mse_scores)
          if avg_mse < best_mse:
            best_mse = avg_mse
            best_degree = degree
            best_alpha = alpha

        return best_alpha, best_degree

    def train_follower_model(self, degree, alpha):
      X = self.historical_data[['u_L', 't']].values
      y = self.historical_data['u_F'].values

      poly = PolynomialFeatures(degree)
      X_poly = poly.fit_transform(X)

      t_min = self.historical_data['t'].min()
      time_diffs = self.historical_data['t'] - t_min
      sample_weights = np.exp(alpha * time_diffs)

      self.follower_model_poly = poly
      self.follower_model = LinearRegression()
      self.follower_model.fit(X_poly, y, sample_weight=sample_weights)

    def optimise_follower_model(self, current_date):
      self.historical_data = self.get_historical_data(current_date=current_date)
      self.upper_bound = np.mean(self.historical_data['u_L']) + 2 * np.std(self.historical_data['u_L'])

      self.alpha, self.degree = self.optimise_hyperparams(max_degree=5, num_splits=5)

      self.train_follower_model(degree=self.degree, alpha=self.alpha)

    def new_price(self, date: int):
        """
        We predict the follower's price based on the leader's price and date.
        We differentiate the follower's learned reaction function to maximise the leader's profit.

        The process for choosing the leader's price is:
          1. Differentiate the follower's reaction function with respect to u_L
          2. Find any local maxima within the pricing bounds - [1.00, +inf) for Follower Mk2
          3. Find profit at each local maxima, and the boundaries
          4. Choose the point with the highest leader profit.

        Args:
            date (int): The current date.

        Returns:
            float: The chosen leader's price.
        """
        self.optimise_follower_model(current_date=date)

        u_L, t = sp.symbols("u_L t")

        # Construct follower reaction function u_F = f(u_L, t)
        feature_names = self.follower_model_poly.get_feature_names_out(['u_L', 't'])
        follower_expr = sum(
            coef * sp.sympify(term.replace("^", "**").replace(" ", "*"))
            for coef, term in zip(self.follower_model.coef_, feature_names)
        ) + self.follower_model.intercept_

        # Plug into the profit function
        profit_expr = (u_L - self.c_L) * (2 - u_L + 0.3 * follower_expr)

        # Derivative of profit
        profit_derivative = sp.diff(profit_expr, u_L)

        extrema = sp.solve(profit_derivative.subs(t, date), u_L)

        candidates = [
            self.lower_bound,
            np.mean(self.historical_data['u_L']),
            self.upper_bound
        ]

        extrema = sp.solve(profit_derivative, sp.Symbol("u_L"))

        for ext in extrema:
          ext_value = ext.subs(sp.Symbol("t"), date)
          if ext_value.is_real and self.lower_bound <= ext_value <= self.upper_bound:
            candidates.append(float(ext_value))

        profits = {}
        for u_L in candidates:
          u_F = self.follower_model.predict(self.follower_model_poly.fit_transform([[u_L, date]]))[0]

          profits[u_L] = self.profit_L(u_L, u_F)

        best_u_L = max(profits, key=profits.get)

        return best_u_L

# Leader for Followers 3 & 6

In [13]:
class Leader_Mk_3_6(Leader):
    def __init__(self, name, engine):
        super().__init__(name, engine)

        self.follower_name = "Follower_Mk1" # Change this to the name of the sheet containig follower agent historical data

        self.random_state = 42
        self.c_L = 1 # given
        self.lower_bound = 1
        self.upper_bound = 2 # very large (proportionally) to model infinity without overflow

    def demand_L(self, u_L, u_F):
        return 2 - u_L + (0.3 * u_F)

    def profit_L(self, u_L, u_F):
        return (u_L - self.c_L) * self.demand_L(u_L, u_F)

    def get_historical_data(self, current_date):
        historical_data = {}
        i = 0
        for date in range(1, current_date):
            leader_price, follower_price = self.get_price_from_date(date)
            historical_data[i] = (date, leader_price, follower_price)
            i += 1
        historical_data_df = pd.DataFrame.from_dict(historical_data, orient='index', columns=['t', 'u_L', 'u_F'])
        return historical_data_df

    def optimise_alpha(self, max_degree=5, num_splits=5):
      """
      Finds the optimal alpha (recency weight decay factor) via cross-validation.

      Args:
          alpha_values (list): A list of alpha values to test.
          max_degree (int): Maximum degree for polynomial features.
          num_splits (int): Number of splits for K-Fold cross-validation.

      Returns:
          float: Best alpha value based on lowest cross-validated MSE.
      """
      alpha_values = np.logspace(-4, 0, 10)  # 0.0001 to 1

      X = self.historical_data[['u_L', 't']].values
      y = self.historical_data['u_F'].values
      t_min = self.historical_data['t'].min()

      best_alpha = None
      best_mse = float('inf')

      for alpha in alpha_values:
          kf = KFold(n_splits=num_splits, shuffle=True, random_state = self.random_state)
          poly = PolynomialFeatures(max_degree)
          X_poly = poly.fit_transform(X)

          time_diff = self.historical_data['t'] - t_min
          sample_weights = np.exp(alpha * time_diff)

          mse_scores = []

          for train_index, test_index in kf.split(X_poly):
              X_train, X_val = X_poly[train_index], X_poly[test_index]
              y_train, y_val = y[train_index], y[test_index]
              weights_train = sample_weights.iloc[train_index]

              model = LinearRegression()
              model.fit(X_train, y_train, sample_weight=weights_train)
              y_pred = model.predict(X_val)
              mse = np.mean((y_val - y_pred) ** 2)
              mse_scores.append(mse)

          avg_mse = np.mean(mse_scores)

          if avg_mse < best_mse:
              best_mse = avg_mse
              best_alpha = alpha

      return best_alpha

    def get_best_degree(self, max_degree=5, num_splits=5, alpha=1):
        degrees = range(2, max_degree+1)
        follower_models = {}
        best_mse = float('inf')

        kf = KFold(n_splits=num_splits, shuffle=True, random_state = self.random_state)
        X = self.historical_data[['u_L', 't']].values
        y = self.historical_data['u_F'].values

        for degree in degrees:
          mse_scores = []
          poly = PolynomialFeatures(degree)
          X_poly = poly.fit_transform(X)

          t_min = self.historical_data['t'].min()
          time_diffs = self.historical_data['t'] - t_min
          sample_weights = np.exp(alpha * time_diffs)

          for train_index, test_index in kf.split(self.historical_data):
            X_train, X_val = X_poly[train_index], X_poly[test_index]
            y_train, y_val = y[train_index], y[test_index]

            weights = sample_weights[train_index]

            model = LinearRegression()
            model.fit(X_train, y_train, sample_weight=weights)
            y_pred = model.predict(X_val)
            mse = np.mean((y_val - y_pred)**2)
            mse_scores.append(mse)

          avg_mse = np.mean(mse_scores)
          if avg_mse < best_mse:
            best_mse = avg_mse
            best_degree = degree

        return best_degree

    def train_follower_model(self, degree, alpha):
      X = self.historical_data[['u_L', 't']].values
      y = self.historical_data['u_F'].values

      poly = PolynomialFeatures(degree)
      X_poly = poly.fit_transform(X)

      t_min = self.historical_data['t'].min()
      time_diffs = self.historical_data['t'] - t_min
      sample_weights = np.exp(alpha * time_diffs)

      self.follower_model_poly = poly
      self.follower_model = LinearRegression()
      self.follower_model.fit(X_poly, y, sample_weight=sample_weights)

    def optimise_follower_model(self, current_date):
      self.historical_data = self.get_historical_data(current_date=current_date)
      self.alpha = self.optimise_alpha()
      self.degree = self.get_best_degree(max_degree=5, alpha=self.alpha)
      self.train_follower_model(degree=self.degree, alpha=self.alpha)

    def new_price(self, date: int):
        """
        We predict the follower's price based on the leader's price and date.
        We differentiate the follower's learned reaction function to maximise the leader's profit.

        The process for choosing the leader's price is:
          1. Differentiate the follower's reaction function with respect to u_L
          2. Find any local maxima within the pricing bounds - [1.00, +inf) for Follower Mk2
          3. Find profit at each local maxima, and the boundaries
          4. Choose the point with the highest leader profit.

        Args:
            date (int): The current date.

        Returns:
            float: The chosen leader's price.
        """
        self.optimise_follower_model(current_date=date)

        u_L, t = sp.symbols("u_L t")

        # Construct follower reaction function u_F = f(u_L, t)
        feature_names = self.follower_model_poly.get_feature_names_out(['u_L', 't'])
        follower_expr = sum(
            coef * sp.sympify(term.replace("^", "**").replace(" ", "*"))
            for coef, term in zip(self.follower_model.coef_, feature_names)
        ) + self.follower_model.intercept_

        # Plug into the profit function
        profit_expr = (u_L - self.c_L) * (2 - u_L + 0.3 * follower_expr)

        # Derivative of profit
        profit_derivative = sp.diff(profit_expr, u_L)

        extrema = sp.solve(profit_derivative.subs(t, date), u_L)

        candidates = [
            self.lower_bound,
            np.mean(self.historical_data['u_L']),
            self.upper_bound
        ]

        extrema = sp.solve(profit_derivative, sp.Symbol("u_L"))

        for ext in extrema:
          ext_value = ext.subs(sp.Symbol("t"), date)
          if ext_value.is_real and self.lower_bound <= ext_value <= self.upper_bound:
            candidates.append(float(ext_value))

        profits = {}
        for u_L in candidates:
          u_F = self.follower_model.predict(self.follower_model_poly.fit_transform([[u_L, date]]))[0]

          profits[u_L] = self.profit_L(u_L, u_F)

        best_u_L = max(profits, key=profits.get)

        return best_u_L

# Methods to use for each Follower
`MK1`: `Leader_Mk_1_4`

`MK2`: `Leader_Mk_2_5`

`MK3`: `Leader_Mk_3_6`

`MK4`: `Leader_Mk_1_4`

`MK5`: `Leader_Mk_2_5`

`MK6`: `Leader_Mk_3_6`


# Simulation

Below is the GUI interface. Please select a leader from the dropdown menu and a follower from the dropdown menu, then click “Connect.” Once the status updates to “Connected to *your_selected_leader* and *your_selected_follower*” click “Start Simulation” to begin. If you wish to save the generated data, click “Export Data.” The dataset will be saved in the “run” folder.

In [14]:
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 5000})'''))
engine = Engine()
app = GUI(engine, Leader, group_num)

<IPython.core.display.Javascript object>

VBox(children=(Dropdown(description='Leader:  ', layout=Layout(width='250px'), options=('Leader_Mk_1_4', 'Lead…