# Imports

In [1]:
import torch
import cgd_utils

# Cournot Simulation

## Linear Price and Cost Functions

**TLDR: linear price function, identical linear cost function, pairwise CGD converges to Nash Equilibrium**

Our profit for each player $i$ is defined as the following:
\begin{gather}
\Pi_i = P\left(\sum_j{q_j}\right) \cdot q_i -C_i(q_i) \\
P(q) = 100 - q \\
C_i(q_i) = 10 \cdot q_i
\end{gather}

Thus, to solve for the Nash equilbrium, we take the first derivative and set it to zero:
\begin{gather}
\frac{\partial\Pi_i}{\partial q_i} = \frac{\partial P\left(\sum_j{q_j}\right)}{\partial q_i} \cdot q_i + P\left(\sum_j{q_j}\right) - \frac{\partial C_i (q_i)}{\partial q_i} = 0
\end{gather}

For the example below, this becomes the following:
\begin{gather}
-1 \cdot q_i + \left(100 - \sum_j {q_j}\right) - 10 = 0
\end{gather}

Solving this, we get $q_i = \frac{45}{2}$ (which is what our algorithm converges to)

In [11]:
def player_payoffs(quantity_tensor,
                   market_demand=lambda q: 100 - q,
                   marginal_cost=lambda q: q * 10):
    price = torch.max(market_demand(torch.sum(quantity_tensor)),
                      torch.tensor(0., requires_grad=True))

    payoffs = []
    for i, quantity in enumerate(quantity_tensor):
        # Negative, since CGD minimizes player objectives.
        payoffs.append(- (quantity * price - marginal_cost(quantity)))
        
    return torch.stack(payoffs)

num_iterations = 100

# Define individual sellers quantities
p1 = torch.tensor([50.], requires_grad=True)
p2 = torch.tensor([0.], requires_grad=True)
p3 = torch.tensor([40.], requires_grad=True)

players = torch.stack([p1, p2, p3])

learning_rates = [0.1, 0.1, 0.1]

for i in range(num_iterations):
    
    payoffs = player_payoffs(players)
    updates, _ = cgd_utils.metamatrix_conjugate_gradient(
        payoffs, [p1, p2, p3], lr_list=learning_rates)
    
    for player, update in zip(players, updates):
        player.data.add_(update)

print(players)
print(payoffs)


tensor([[22.5002],
        [22.4998],
        [22.5001]], grad_fn=<StackBackward>)
tensor([[-506.2539],
        [-506.2442],
        [-506.2520]], grad_fn=<StackBackward>)


## Quadratic Price Function

**TLDR: quadratic price function, identical linear cost function, pairwise CGD converges to Nash Equilibrium (with learning rate tuning)**

Our profit for each player $i$ is defined as the following:
\begin{gather}
\Pi_i = P\left(\sum_j{q_j}\right) \cdot q_i -C_i(q_i) \\
P(q) = 100 - \sum_j{q_j^2} \\
C_i(q_i) = 10 \cdot q_i
\end{gather}

Thus, to solve for the Nash equilbrium, we take the first derivative and set it to zero:
\begin{gather}
\frac{\partial\Pi_i}{\partial q_i} = \frac{\partial P\left(\sum_j{q_j}\right)}{\partial q_i} \cdot q_i + P\left(\sum_j{q_j}\right) - \frac{\partial C_i (q_i)}{\partial q_i} = 0
\end{gather}

For the example below, this becomes the following:
\begin{gather}
-2 \cdot q_i^2 + \left(100 - \sum_j {q_j^2}\right) - 10 = 0
\end{gather}

Solving this, we have multiple Nash Equlibrium, but the only solution with all non-negative quantities, we get $q_i = \sqrt{18} = 4.24$ (which is what our algorithm converges to).

A few things to note, pairwise CGD here seems to have convergence rely more on learning rate (i.e. diverging for larger LR), which maybe defeats some of the core purpose of CGD. However, we can see that this behavior diverges into territory outside of our game constraints (i.e. negative quantities), so adding constraints (like in CMD) might fix this problem. 

In [33]:
def player_payoffs2(quantity_tensor,
                   market_demand=lambda q: 100 - q,
                   marginal_cost=lambda q: q * 10):
    price = torch.max(
        market_demand(torch.sum(torch.pow(quantity_tensor, 2))),
        torch.tensor(0., requires_grad=True)
    )

    payoffs = []
    for i, quantity in enumerate(quantity_tensor):
        # Negative, since CGD minimizes player objectives.
        payoffs.append(- (quantity * price - marginal_cost(quantity)))
        
    return torch.stack(payoffs)

num_iterations = 100

# Define individual sellers quantities
p1 = torch.tensor([0.], requires_grad=True)
p2 = torch.tensor([7.], requires_grad=True)
p3 = torch.tensor([7.], requires_grad=True)

players = torch.stack([p1, p2, p3])

learning_rates = [0.01, 0.01, 0.01]

for i in range(num_iterations):
    payoffs = player_payoffs2(players)
    updates, _ = cgd_utils.metamatrix_conjugate_gradient(
        payoffs, [p1, p2, p3], lr_list=learning_rates)
    
    for player, update in zip(players, updates):
        player.data.add_(update)

print(players)
print(payoffs)


tensor([[4.2426],
        [4.2426],
        [4.2426]], grad_fn=<StackBackward>)
tensor([[-152.7349],
        [-152.7351],
        [-152.7351]], grad_fn=<StackBackward>)


## Non-linear Cost Function

**TLDR: linear price function, identical non-linear cost function (simulating at-scale production), pairwise CGD converges to Nash Equilibrium**

Our profit for each player $i$ is defined as the following:
\begin{gather}
\Pi_i = P\left(\sum_j{q_j}\right) \cdot q_i -C_i(q_i) \\
P(q) = 100 - q \\
C_i(q_i) = 10 \cdot \left(\frac{10}{x+10}\right)
\end{gather}

Thus, to solve for the Nash equilbrium, we take the first derivative and set it to zero:
\begin{gather}
\frac{\partial\Pi_i}{\partial q_i} = \frac{\partial P\left(\sum_j{q_j}\right)}{\partial q_i} \cdot q_i + P\left(\sum_j{q_j}\right) - \frac{\partial C_i (q_i)}{\partial q_i} = 0
\end{gather}

For the example below, this becomes the following:
\begin{gather}
-1 \cdot q_i + \left(100 - \sum_j {q_j}\right) - \frac{1000}{(q_i+10)^2} = 0
\end{gather}

Solving this, we have multiple Nash Equilibrium, but the only solution with all non-negative quantities, we get $q_i = \sqrt{18} = 4.24$ (which is what our algorithm converges to).

A few things to note, pairwise CGD here seems to have convergence rely more on learning rate (i.e. diverging for larger LR), which maybe defeats some of the core purpose of CGD. However, we can see that this behavior diverges into territory outside of our game constraints (i.e. negative quantities), so adding constraints (like in CMD) might fix this problem. 

In [42]:
def player_payoffs3(quantity_tensor,
                   market_demand=lambda q: 100 - q,
                   marginal_cost=lambda q: 100 * q / (q + 10)):
    price = torch.max(
        market_demand(torch.sum(quantity_tensor)),
        torch.tensor(0., requires_grad=True)
    )

    payoffs = []
    for i, quantity in enumerate(quantity_tensor):
        # Negative, since CGD minimizes player objectives.
        payoffs.append(- (quantity * price - marginal_cost(quantity)))
        
    return torch.stack(payoffs)

num_iterations = 100

# Define individual sellers quantities
p1 = torch.tensor([0.], requires_grad=True)
p2 = torch.tensor([5.], requires_grad=True)
# p3 = torch.tensor([10.], requires_grad=True)

players = torch.stack([p1, p2])

learning_rates = [0.1, 0.1]

for i in range(num_iterations):
    payoffs = player_payoffs3(players)
    updates, _ = cgd_utils.metamatrix_conjugate_gradient(
        payoffs, [p1, p2], lr_list=learning_rates)
    
    for player, update in zip(players, updates):
        player.data.add_(update)

print(players)
print(payoffs)


tensor([[33.1543],
        [33.1544]], grad_fn=<StackBackward>)
tensor([[-1040.1848],
        [-1040.1873]], grad_fn=<StackBackward>)
