In [13]:
import numpy as np
from scipy.optimize import linprog

# Problem 1

In [2]:
def generate_bids(n, m, p_bar):
    """
    Generate a sequence of random bids.
    :param n: Total number of bids.
    :param m: Number of items.
    :param p_bar: Ground truth price vector.
    :return: Array of bids.
    """
    bids = []
    for _ in range(n):
        a_k = np.random.choice([0, 1], size=m)  # Generate a_k
        pi_k = np.dot(p_bar, a_k) + np.random.normal(0, np.sqrt(0.2))  # Calculate bid price
        bids.append((a_k, pi_k))
    return bids

In [3]:
def solve_partial_lp_dual(bids, k, n, m, b_i):
    """
    Solve the partial linear program (SLPM) for the first k bids and extract the dual prices.
    :param bids: Sequence of bids.
    :param k: Number of bids to consider in the LP.
    :param n: Total number of bids.
    :param m: Number of items.
    :param b_i: Capacity for each item.
    :return: Dual prices (y_bar).
    """
    # Objective function: maximize sum(pi_j * x_j) for j=1 to k
    c = -np.array([pi_k for _, pi_k in bids[:k]])

    # Constraints: sum(a_ij * x_j) <= (k/n) * b_i for all i
    A = np.array([a_k for a_k, _ in bids[:k]]).T # shape: (m, k)
    b = (k / n) * np.array(b_i)
    
    # Bounds for decision variables: 0 <= x_j <= 1
    x_bounds = [(0, 1) for _ in range(k)]

    # Solve the linear program
    result = linprog(c, A_ub=A, b_ub=b, bounds=x_bounds, method='highs')

    if result.success:
        # The dual variable corresponding to the inequality constraints A_ub * x <= b_ub
        return result.get('slack')
    else:
        raise ValueError("Linear programming failed to find a solution")

In [4]:
def run_slpm(bids, k, n, m, b_i):
    """
    Run the revised SLPM algorithm.
    :param bids: Sequence of bids.
    :param k: Size of k for the SLPM algorithm.
    :param n: Total number of bids.
    :param m: Number of items.
    :param b_i: Capacity for each item.
    :return: Total revenue generated.
    """
    revenue = 0
    remaining_capacity = np.array(b_i)
    
    # Solve the partial LP for the first k bids to get dual prices
    y_bar = solve_partial_lp_dual(bids, k, n, m, b_i)

    # Adjust capacity for each item based on the dual prices
    b_i -= y_bar


    for i, (a_k, pi_k) in enumerate(bids):
        if i >= k:
            # Allocate based on the decision rule using y_bar
            if pi_k > np.dot(a_k, y_bar) and all(remaining_capacity - a_k >= 0):
                revenue += pi_k
                remaining_capacity -= a_k

    return revenue

In [5]:
def solve_offline_lp(bids, m, b_i):
    """
    Solve the offline linear programming problem.
    :param bids: Sequence of all bids.
    :param m: Number of items.
    :param b_i: Resource limit for all i.
    :return: Total revenue from the offline LP solution.
    """
    c = -np.array([pi_k for _, pi_k in bids])  # Negative for maximization
    A = np.array([a_k for a_k, _ in bids]).T  # Transpose to match dimensions
    b = b_i * np.ones(m)    

    # Solving the LP
    result = linprog(c, A_ub=A, b_ub=b, bounds=(0, 1), method='highs')

    if result.success:
        return -result.fun  # Revenue (negate because of maximization)
    else:
        raise ValueError("Offline LP did not converge")

In [6]:
def run_simulation(n, m, p_bar, k_values):
    # Regenerate bids with the fixed p_bar
    bids_fixed = generate_bids(n, m, p_bar)

    b_i = np.ones(m) * 1000  # Bid cap for all i

    slpm_revenues = {k: run_slpm(bids_fixed, k, n, m, b_i) for k in k_values}

    for slpm_revenue, k in zip(slpm_revenues.values(), k_values):
        print(f"SLPM revenue: {slpm_revenue} at k={k}")

    offline_revenue_value = solve_offline_lp(bids_fixed, m, b_i)

    print(f"Offline revenue: {offline_revenue_value}")

# Simulation parameters
n = 10000  # Total number of bids
m = 10     # Number of items
k_values = [50, 100, 200]  # Different k values to test

# Fixed ground truth price vector (p_bar) - set to ones for simplicity
p_bar_fixed = np.ones(m)  # Vector of ones

run_simulation(n, m, p_bar_fixed, k_values)

SLPM revenue: 10005.45842518648 at k=50
SLPM revenue: 10006.250098949256 at k=100
SLPM revenue: 10009.94451409099 at k=200
Offline revenue: 11326.588623160282


## Trade-off between high and low k

Choosing a Large $k$ :
1. Stability and Accuracy:
- A larger $k$ means that the algorithm considers a greater number of past bids before making a decision. This can lead to more stable and potentially more accurate estimations of the optimal pricing or decision-making strategy because it relies on a more comprehensive set of historical data.
- The decisions are less likely to be influenced by short-term fluctuations or anomalies in the bid data, which might be beneficial in environments where the underlying distribution of bids doesn't change rapidly.
2. Reduced Responsiveness:
- The downside is that the algorithm becomes less responsive to recent trends or changes in the bidding environment. If the nature of the bids changes suddenly, it might take longer for these changes to be reflected in the decision-making process.
- In fast-changing markets or scenarios where recent data is significantly more relevant than older data, a larger $k$ might lead to suboptimal decisions.

Choosing a Small $k$ :
1. Responsiveness to New Information:
- A smaller $k$ makes the algorithm more responsive to recent bids. This can be advantageous in dynamic environments where the characteristics of bids change frequently, and staying up-to-date with the most recent trends is crucial.
- It allows the algorithm to quickly adapt to new information, potentially capturing opportunities that a more stable, slower-to-adapt approach might miss.
2. Potential Instability and Inaccuracy:
- The drawback is that with fewer data points considered, the estimations and decisions might be less stable and less accurate. The algorithm is more susceptible to being influenced by outliers or short-term fluctuations in the bid data.
- Decisions might be over-fitted to recent but possibly non-representative data, leading to erratic or suboptimal bidding strategies.

# Problem 2

In [14]:
def run_slpm_revised(bids, k_values, n, m, b_i):
    """
    Run the revised SLPM algorithm with dynamic dual price updates.
    :param bids: Sequence of bids.
    :param k_values: Points at which to update the dual prices.
    :param n: Total number of bids.
    :param m: Number of items.
    :param b_i: Capacity for each item.
    :return: Total revenue generated.
    """
    revenue = 0
    remaining_capacity = np.array(b_i)
    y_bar = None

    for i, (a_k, pi_k) in enumerate(bids):
        # Update dual prices at specified points
        if i in k_values or i == 0:
            y_bar = solve_partial_lp_dual(bids, i + 1, n, m, remaining_capacity)

        # Allocate based on the decision rule using y_bar
        if y_bar is not None and pi_k > np.dot(a_k, y_bar) and all(remaining_capacity - a_k >= 0):
            revenue += pi_k
            remaining_capacity -= a_k

    return revenue


In [8]:
def run_simulation_problem_2(n, m, p_bar, k_values):
    # Regenerate bids with the fixed p_bar
    bids_fixed = generate_bids(n, m, p_bar)


    b_i = np.ones(m) * 1000  # Bid cap for all i

    slpm_revenues_revised = {k: run_slpm_revised(bids_fixed, k, n, m, b_i) for k in k_values}

    for slpm_revenue, k in zip(slpm_revenues_revised.values(), k_values):
        print(f"SLPM revenue: {slpm_revenue} at k={k}")

    offline_revenue_value = solve_offline_lp(bids_fixed, m, b_i)

    print(f"Offline revenue: {offline_revenue_value}")

# Simulation parameters
n = 10000  # Total number of bids
m = 10     # Number of items
k_values = [50, 100, 200]  # Different k values to test

# Fixed ground truth price vector (p_bar) - set to ones for simplicity
p_bar_fixed = np.ones(m)  # Vector of ones

run_simulation(n, m, p_bar_fixed, k_values)

SLPM revenue: 9969.348547850243 at k=50
SLPM revenue: 9977.23567103672 at k=100
SLPM revenue: 9990.886998024376 at k=200
Offline revenue: 11336.211826909263


# Problem 3

In [12]:
def ahlda(bids, k, n, m, b_i, opt):
    """
    Action-history-dependent Learning Algorithm.
    """
    revenue = 0
    remaining_capacity = np.array(b_i)
    performance = []

    for i in range(1, k + 1):
        # Update dual prices
        y_bar = solve_partial_lp_dual(bids, i, n, m, remaining_capacity)

        # Make decision for bid i
        a_k, pi_k = bids[i - 1]
        if pi_k > np.dot(a_k, y_bar) and all(remaining_capacity - a_k >= 0):
            revenue += pi_k
            remaining_capacity -= a_k
        
        # Compute performance metric
        performance.append(revenue - (i / n) * opt)

    return performance

# Parameters
n = 10000
m = 10
p_bar_fixed = np.ones(m)
b_i = np.ones(m) * 1000
k_values = [50, 100, 200]

# Generate bids
bids = generate_bids(n, m, p_bar_fixed)

# Solve offline problem for OPT
opt = solve_offline_lp(bids, m, b_i)

# AHDLA algorithm performance
ahlda_performance = {k: ahlda(bids, k, n, m, b_i, opt) for k in k_values}

ahlda_performance[200][:10]  # Display first 10 performance metrics for k=200

[1.8657512983475957,
 5.79783055695639,
 7.534832765856239,
 9.847318988128336,
 13.459262929498934,
 20.307540281444936,
 23.060936512134887,
 25.209605502991117,
 29.227883312537333,
 35.27814051432472]

In [16]:
def performance_metric_slpm_revised(bids, k_values, n, m, b_i, opt):
    """
    Run the revised SLPM algorithm with dynamic dual price updates and compute performance metrics.
    """
    performance = {}
    for k in k_values:
        revenue = 0
        remaining_capacity = np.array(b_i)
        y_bar = None
        performance_k = []

        for i, (a_k, pi_k) in enumerate(bids):
            if i >= k:  # Only consider bids after k
                break

            # Update dual prices at specified points or at the start
            if i in k_values or i == 0:
                y_bar = solve_partial_lp_dual(bids, i + 1, n, m, remaining_capacity)

            # Allocate based on the decision rule using y_bar
            if y_bar is not None and pi_k > np.dot(a_k, y_bar) and all(remaining_capacity - a_k >= 0):
                revenue += pi_k
                remaining_capacity -= a_k

            # Compute performance metric
            performance_k.append(revenue - (i / n) * opt)

        performance[k] = performance_k

    return performance

# SLPM algorithm performance
slpm_performance = performance_metric_slpm_revised(bids, k_values, n, m, b_i, opt)

# Display the first 10 performance metrics for k=200 from SLPM
slpm_performance[200][:10]

[2.996176485466778,
 6.928255744075572,
 8.665257952975422,
 10.977744175247519,
 14.589688116618117,
 21.43796546856412,
 24.19136169925407,
 26.340030690110297,
 30.358308499656516,
 36.408565701443905]

## Comparison between AHDLA and SLPM
- Initial Performance: Both algorithms show a growing trend in performance as more bids are considered. This is expected as accumulating more accepted bids typically increases total revenue.
- Performance Metrics: The performance metrics of both algorithms are quite similar, especially in the later stages (e.g., at bid 10). This suggests that both algorithms are effectively capturing the value of bids over time relative to the optimal offline solution.
- Algorithmic Differences: The key difference lies in how AHDLA dynamically updates decision-making based on action history, while SLPM focuses more on a static decision rule informed by a subset of bids. AHDLA's approach could potentially be more responsive to variations in bid quality over time.

From this analysis, it appears that both algorithms perform well, with neither showing a definitive advantage over the other based on these metrics. The choice between them may depend on specific characteristics of the bidding environment or computational constraints.

# Problem 4

## Convexity of (3)

Problem 3 is given as:
$$
\begin{array}{ll}
\operatorname{minimize}_{\overline{\mathbf{y}}} & \mathbf{d}^T \overline{\mathbf{y}}+\mathbb{E}\left(\pi-\mathbf{a}^T \overline{\mathbf{y}}\right)^{+} \\
\text {s.t. } & \overline{\mathbf{y}} \geq 0
\end{array}
$$
where $\mathbf{d}=\mathbf{b} / n$ and $(\cdot)^{+}=\max \{\cdot, 0\}$.

The first term, $\mathbf{d}^T \overline{\mathbf{y}}$, is linear in $\overline{\mathbf{y}}$. Linear functions are both convex and concave. The second term, $\mathbb{E}\left(\pi-\mathbf{a}^T \overline{\mathbf{y}}\right)^{+}$, requires more consideration. The function $(\cdot)^{+}$ is the positive part function, which is convex because it is the maximum of a linear function and zero (both convex functions). Therefore, $\left(\pi-\mathbf{a}^T \overline{\mathbf{y}}\right)^{+}$is convex in $\overline{\mathbf{y}}$ as it is the composition of a linear function and a convex function. The expectation operator $\mathbb{E}$ preserves convexity. If a function $f(x)$ is convex, then $\mathbb{E}[f(x)]$ is also convex. Therefore, $\mathbb{E}\left(\pi-\mathbf{a}^T \overline{\mathbf{y}}\right)^{+}$is convex in $\overline{\mathbf{y}}$. The constraint $\overline{\mathbf{y}} \geq 0$ is linear and therefore convex.


Since both terms in the objective function are convex and the constraints are also convex (linear, in fact), the problem as a whole is a convex optimization problem. Convex problems have the desirable property that any local minimum is also a global minimum, making them easier to solve reliably and efficiently.

Because the problem is convex, we know it can be solved efficiently and the solution is robust to things like initialization.

## Connection to (1)
In general, while Problem 1 is focused on a practical algorithmic approach to bid management in an online learning context, Problem 3 offers a theoretical counterpart that provides an optimal pricing strategy in a stochastic setting.


In Problem 1, we had:

- A one-time online learning algorithm, specifically the Sequential Linear Programming Method (SLPM), applied to a simulated bidding scenario.
- The primary goal is to maximize revenue (or equivalently, minimize cost) through strategic bid acceptance, given a set of bids and resource constraints.
- The algorithm is tested with different values of $ k $ (50, 100, 200), which influences how much of the bid data is considered in the decision-making process. This essentially determines the balance between responsiveness and stability in the pricing strategy.

### Problem 3 Formulation:

Problem 3 had:

$$
\begin{array}{ll}
\text{minimize}_{\overline{\mathbf{y}}} & \mathbf{d}^T \overline{\mathbf{y}} + \mathbb{E}\left(\pi - \mathbf{a}^T \overline{\mathbf{y}}\right)^{+}, \\
\text{s.t.} & \overline{\mathbf{y}} \geq 0,
\end{array}
$$

- Here, $\mathbf{d}^T \overline{\mathbf{y}}$ represents a direct cost associated with the decision variables $\overline{\mathbf{y}}$, and $\mathbb{E}\left(\pi - \mathbf{a}^T \overline{\mathbf{y}}\right)^{+}$ signifies the expected opportunity cost of not accepting certain bids.
- The problem aims to find an optimal pricing strategy $\overline{\mathbf{y}}$ that minimizes the expected total cost, which includes both direct and opportunity costs.


1. **Algorithmic Approach vs. Theoretical Optimization:**
   - Problem 1 deals with an algorithmic approach to bid management, where decisions are made in an online setting based on limited information.
   - Problem 3, on the other hand, represents a more theoretical approach to understanding the optimal pricing strategy in a stochastic environment. It's an optimization problem that considers the expected values of variables and costs.

2. **Role of $ k $ and $\overline{\mathbf{y}}$:**
   - In Problem 1, $ k $ determines how much historical bid data is used in setting the prices $\overline{\mathbf{y}}$. A larger $ k $ would mean more historical data influencing the pricing decision, potentially leading to a more stable but less responsive strategy.
   - In Problem 3, $\overline{\mathbf{y}}$ is optimized considering the entire distribution of possible bids. This theoretical optimum could be seen as an ideal target for the online algorithm in Problem 1. The larger the $ k $, the closer the SLPM's decision-making process might come to this theoretical optimum.

3. **Trade-offs in Decision Making:**
   - Problem 1 highlights the trade-off between using more data for stability (large $ k $) and being more responsive to recent trends (small $ k $).
   - Problem 3 implicitly addresses a similar trade-off by balancing direct costs and expected opportunity costs. The optimal solution $\overline{\mathbf{y}}$ in this problem is akin to finding a balance that minimizes total expected cost, considering both historical and potential future bids.

4. **Stochastic Nature of Bidding:**
   - Both problems acknowledge the uncertainty and stochastic nature of bidding. Problem 1 does this through the simulation of bids and online learning, while Problem 3 incorporates it through the expectation operator in its objective function.

### Implications:

- **SLPM as an Approximation to the Theoretical Optimum:** The SLPM algorithm can be viewed as an attempt to approximate the theoretically optimal pricing strategy (as conceptualized in Problem 3) in a real-time, dynamic environment.
- **Insights from Problem 3 to Inform Problem 1:** The structure and solution of Problem 3 can provide valuable insights into how one might adjust the SLPM algorithm to better approach the optimal strategy, especially in how to balance historical data with responsiveness to new information.


# Question 5

To address Question 5, we'll develop a new online algorithm using Stochastic Gradient Descent (SGD) for the optimization problem stated in Problem 3. The aim is to update the dual price vectors $\overline{\mathbf{y}}$ more efficiently, especially when dealing with large-scale data.

### SGD-Based Online Algorithm:

The SGD approach updates the decision variables $\overline{\mathbf{y}}$ iteratively using an approximation of the gradient of the objective function. Given the derivative $ \mathbf{f}(\overline{\mathbf{y}}) = \mathbb{E}\left(\mathbf{d} - \mathbf{a} \mathbb{I}_{\{\pi > \mathbf{a}^T \overline{\mathbf{y}}\}}\right) $, we can update $ \overline{\mathbf{y}} $ at each step based on the current bid's information.

#### Algorithm Steps:

1. **Initialization:**
   - Set $ \overline{\mathbf{y}}^0 = 0 $ (initial dual price vector).

2. **Iterative Update:**
   - For each time step $ k $, update $ \overline{\mathbf{y}} $ using the formula:
     $$
     \overline{\mathbf{y}}^{k+1} = \overline{\mathbf{y}}^k - \frac{1}{\beta} \mathbf{f}\left(\pi_k, \mathbf{a}_k\right)
     $$
   - Where $ \beta = \sqrt{k} $ and $ \mathbf{f}\left(\pi_k, \mathbf{a}_k\right) = \mathbf{d} - \mathbf{a}_k \mathbb{I}_{\{\pi_k > \mathbf{a}_k^T \overline{\mathbf{y}}^k\}} $.

3. **Revenue Calculation:**
   - At each step, calculate the revenue based on the current bid and the updated $ \overline{\mathbf{y}} $.

4. **Convergence Check:**
   - Monitor the convergence of $ \overline{\mathbf{y}} $ towards the true vector $ \overline{\mathbf{p}} $.

In [11]:
def stochastic_gradient_descent(bids, m, d, beta_func, max_iter=10000):
    """
    Implement the Stochastic Gradient Descent (SGD) algorithm for the online learning problem.
    """
    y = np.zeros(m)  # Initialize y
    revenue = 0
    revenues = []
    y_updates = []

    for k in range(1, max_iter + 1):
        a_k, pi_k = bids[k - 1]
        beta = beta_func(k)

        # Indicator function
        indicator = 1 if pi_k > np.dot(a_k, y) else 0

        # Gradient approximation
        f = d - a_k * indicator

        # Update y
        y -= f / beta

        # Calculate revenue
        if pi_k > np.dot(a_k, y):
            revenue += pi_k

        # Store revenue and y updates for analysis
        revenues.append(revenue)
        y_updates.append(y.copy())

    return revenues, y_updates

# Simulation parameters
m = 10  # Number of items
d = np.ones(m) / 10000  # Vector d as b/n
beta_func = lambda k: np.sqrt(k)  # Beta function

# Run SGD algorithm
sgd_revenues, sgd_y_updates = stochastic_gradient_descent(bids, m, d, beta_func)

# Analyze the first few updates and revenues
sgd_revenues[:10], sgd_y_updates[:5]  # Display first 10 revenues and first 5 y updates


([0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
 [array([-1.000e-04,  9.999e-01,  9.999e-01,  9.999e-01, -1.000e-04,
         -1.000e-04,  9.999e-01,  9.999e-01, -1.000e-04,  9.999e-01]),
  array([-1.70710678e-04,  1.70693607e+00,  1.70693607e+00,  1.70693607e+00,
          7.06936071e-01,  7.06936071e-01,  9.99829289e-01,  1.70693607e+00,
         -1.70710678e-04,  9.99829289e-01]),
  array([0.57712182, 1.70687834, 1.70687834, 2.2842286 , 1.2842286 ,
         0.70687834, 1.57712182, 2.2842286 , 0.57712182, 1.57712182]),
  array([0.57707182, 1.70682834, 1.70682834, 2.2841786 , 1.2841786 ,
         0.70682834, 1.57707182, 2.2841786 , 0.57707182, 1.57707182]),
  array([0.5770271 , 1.70678361, 1.70678361, 2.28413388, 1.28413388,
         0.70678361, 1.5770271 , 2.28413388, 0.5770271 , 1.5770271 ])])

### Revenue Performance:

This sequence of revenues above indicates an initial increase, followed by a plateau. This pattern suggests that the algorithm quickly adapts to the bidding data, but then stabilizes in its decision-making.

### Convergence of $ \overline{\mathbf{y}} $:

The first five updates to the dual price vector $ \overline{\mathbf{y}} $ are shown. The updates show a trend of changes, reflecting the algorithm's responsiveness to the bid data. However, the convergence of these vectors towards the true vector $ \overline{\mathbf{p}} $ is not immediately evident from these early steps and would require a more detailed analysis over a larger number of iterations.

### Observations and Findings:

- **Dynamic Learning Performance:** The SGD-based algorithm demonstrates dynamic learning capabilities, adapting its pricing strategy based on the observed bids. The choice of $ \beta = \sqrt{k} $ appears to provide a balance between rapid adaptation and stability.
- **Convergence Assessment:** To assess the convergence of $ \overline{\mathbf{y}} $ to $ \overline{\mathbf{p}} $, we would need to examine the trajectory of $ \overline{\mathbf{y}} $ updates over a longer period and possibly compare them to a known or estimated true price vector $ \overline{\mathbf{p}} $. The convergence may depend on various factors, including the nature of the bid data and the scaling factor $ \beta $.
- **Comparison with Previous Algorithms:** It's important to compare the SGD algorithm's performance with the earlier algorithms (SLPM and AHDLA) in terms of both revenue generation and computational efficiency. The SGD approach is expected to be more efficient, particularly for large $ n $, due to its iterative and gradient-based nature.

In conclusion, the SGD-based online algorithm shows promise in terms of adaptability and efficiency, especially for large-scale problems. However, a comprehensive evaluation of its long-term performance and convergence properties is necessary to fully understand its effectiveness and potential advantages over traditional LP-based methods in online learning scenarios.