# LUX AI S3

**Introduction**  
Mars has been terraformed with help from over 600 space organizations, creating colonies thanks to growing lichen fields and a new atmosphere. Telescopes from Mars found mysterious ancient structures floating beyond our solar system, possibly from an old species. Ships are now exploring these relics to uncover their secrets. The Lux AI Challenge is a competition where you design bots to gather resources, optimize strategies, and outsmart opponents in a 1v1 game. Check the GitHub for code and join the Discord to connect with others.

Here’s more detail on **Units & Actions** and **Winning** in the Lux AI Season 3 Challenge, explained simply:

### Units & Actions
In the game, your units are ships that you control on a 24x24 map. Each team gets a set number of ships (up to a max defined by the game rules). These ships start in one corner of the map, depending on your team, and they’re your tools to explore, gather resources, and fight the opponent.

- **Energy**: Every ship starts with 100 energy and can go up to 400. Energy is like fuel—it powers everything your ship does. You gain energy from energy nodes on the map (these emit energy fields), but you can lose it from actions, nebula tiles (which sap energy), or enemy attacks. If a ship’s energy hits 0, it’s removed and might respawn later.

- **Movement**: Ships can move one tile at a time in five directions: up, down, left, right, or stay still (center). Moving anywhere except staying still costs energy (called `unit_move_cost`, a random value between 1 and 5). You can’t move onto asteroids (they block you) or off the map (you just stay put and lose the energy). Friendly ships can stack on the same tile, which can be smart for teamwork but risky if the enemy attacks.

- **Sap Actions**: This is your attack move. A ship can target a tile within a range (called `unit_sap_range`, random between 3 and 8) and sap energy from enemy ships there. It costs energy to use (called `unit_sap_cost`, random between 30 and 50). The targeted tile’s enemy ships lose that same amount of energy, and nearby enemy ships (on the 8 surrounding tiles) lose less (the cost times a drop-off factor, like 0.25, 0.5, or 1). If you miss, you waste energy, but hitting a stack of enemy ships can wipe them out fast.

- **Collisions & Energy Void**: If enemy ships end a turn on the same tile, the team with the most total energy there wins—losers get removed. If it’s a tie, all ships on that tile are gone. Also, ships have an “energy void” field that saps energy from enemy ships on the four adjacent tiles (up, right, down, left). The strength depends on the ship’s energy and a random factor (0.0625 to 0.375). Stacking your ships can reduce this damage by splitting it among them.

- **Vision**: Your ships determine what you see. Each has a sensor range (random, 2 to 4 tiles) that shows tiles around it. The farther a tile is, the weaker the vision, and nebula tiles can block it more (vision reduction of 0 to 3). If multiple ships’ vision overlaps, it gets stronger, helping you see through fog of war.

### Winning
The game is a best-of-5 match series, and each match lasts 100 time steps. Your goal is to beat the other team in more matches than they beat you.

- **Match Win**: At the end of a match (after 100 steps), the team with the most relic points wins. Relic points come from relic nodes—special spots on the map where your ships earn points by sitting on hidden “point tiles” near them. These tiles are secret, so you have to guess and test to find them. Only one point per tile counts, even if you stack ships there.

- **Tiebreakers**: If both teams have the same relic points, the winner is the team with more total unit energy across all their ships. If that’s tied too, the game picks a winner randomly.

- **Game Win**: Out of the 5 matches, the team that wins the most is the overall winner. Since maps and random settings (like energy costs or sap range) stay the same across all 5 matches, you can learn the map and your opponent’s moves early on, then adjust to win the later matches.

- **Turn Order**: Each step follows this order: move ships, do sap actions, resolve collisions and void fields, update energy (from map or nebulae), spawn new ships, check vision, move map objects (like asteroids), and count points. This happens 100 times per match, and what you do affects the next step.

In short, your ships move, attack, and gather points while managing energy. To win, focus on finding relic points, outlasting your opponent’s energy, and adapting over the 5 matches!


for more details: https://www.kaggle.com/competitions/lux-ai-season-3/overview 

In [None]:
!unzip lux-ai-season-3.zip

Archive:  lux-ai-season-3.zip
  inflating: README.md               
  inflating: agent.py                
  inflating: lux/__init__.py         
  inflating: lux/kit.py              
  inflating: lux/utils.py            
  inflating: main.py                 


In [None]:
!python --version
!pip install --upgrade luxai-s3
!mkdir agent && cp -r ../input/lux-ai-season-3/* agent/

Python 3.11.11
Collecting luxai-s3
  Downloading luxai_s3-0.2.1-py3-none-any.whl.metadata (253 bytes)
Collecting gymnax==0.0.8 (from luxai-s3)
  Downloading gymnax-0.0.8-py3-none-any.whl.metadata (19 kB)
Collecting tyro (from luxai-s3)
  Downloading tyro-0.9.17-py3-none-any.whl.metadata (9.5 kB)
Collecting gym>=0.26 (from gymnax==0.0.8->luxai-s3)
  Downloading gym-0.26.2.tar.gz (721 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m721.7/721.7 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting shtab>=1.5.6 (from tyro->luxai-s3)
  Downloading shtab-1.7.1-py3-none-any.whl.metadata (7.3 kB)
Downloading luxai_s3-0.2.1-py3-none-any.whl (35 kB)
Downloading gymnax-0.0.8-py3-none-any.whl (96 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.3/96.3 kB[0m [31m9.7 MB/s[0m eta 

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import random
from collections import deque
import os
from luxai_s3.wrappers import LuxAIS3GymEnv
from tqdm import tqdm

class DQN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(DQN, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, output_size)
        )

    def forward(self, x):
        return self.fc(x)

class ReplayBuffer:
    def __init__(self, capacity):
        self.buffer = deque(maxlen=capacity)

    def push(self, state, action, reward, next_state, done):
        self.buffer.append((state, action, reward, next_state, done))

    def sample(self, batch_size):
        return random.sample(self.buffer, batch_size)

    def __len__(self):
        return len(self.buffer)

class Agent:
    def __init__(self, player: str, env_cfg, training=False):
        self.player = player
        self.team_id = 0 if player == "player_0" else 1
        self.env_cfg = env_cfg
        self.training = training

        self.state_size = 27
        self.action_size = 6
        self.hidden_size = 256
        self.batch_size = 128
        self.gamma = 0.99
        self.epsilon = 1.0
        self.epsilon_min = 0.05
        self.epsilon_decay = 0.995
        self.learning_rate = 0.0005
        self.update_target_every = 1000

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.policy_net = DQN(self.state_size, self.hidden_size, self.action_size).to(self.device)
        self.target_net = DQN(self.state_size, self.hidden_size, self.action_size).to(self.device)
        self.target_net.load_state_dict(self.policy_net.state_dict())
        self.optimizer = optim.Adam(self.policy_net.parameters(), lr=self.learning_rate)
        self.memory = ReplayBuffer(10000)
        self.step_counter = 0

        self.visit_counts = np.zeros((env_cfg["map_width"], env_cfg["map_height"]))

        if not training:
            self.load_model()
            self.epsilon = 0.0

    def _state_representation(self, unit_id, obs, step):
        unit_pos = obs["units"]["position"][self.team_id][unit_id]
        unit_energy = obs["units"]["energy"][self.team_id][unit_id]

        relic_nodes = np.array(obs["relic_nodes"])
        relic_mask = np.array(obs["relic_nodes_mask"])
        if not relic_mask.any():
            closest_relic = np.array([-1, -1])
        else:
            visible_relics = relic_nodes[relic_mask]
            distances = np.linalg.norm(visible_relics - unit_pos, axis=1)
            closest_relic = visible_relics[np.argmin(distances)]

        friendly_positions = obs["units"]["position"][self.team_id]
        friendly_mask = obs["units_mask"][self.team_id]
        friendly_pos_list = [pos for i, pos in enumerate(friendly_positions) if friendly_mask[i] and i != unit_id]
        friendly_pos_list = friendly_pos_list[:5] + [np.array([-1, -1])] * (5 - len(friendly_pos_list))

        opp_team_id = 1 - self.team_id
        enemy_positions = obs["units"]["position"][opp_team_id]
        enemy_mask = obs["units_mask"][opp_team_id]
        enemy_pos_list = [pos for i, pos in enumerate(enemy_positions) if enemy_mask[i]]
        enemy_pos_list = enemy_pos_list[:5] + [np.array([-1, -1])] * (5 - len(enemy_pos_list))

        friendly_flat = np.concatenate(friendly_pos_list)
        enemy_flat = np.concatenate(enemy_pos_list)

        on_point_tile = int(any(np.array_equal(unit_pos, rn) for rn in relic_nodes[relic_mask]))

        state = np.concatenate([
            unit_pos, closest_relic, [unit_energy], [step / 505.0],
            friendly_flat, enemy_flat, [on_point_tile]
        ])
        return torch.FloatTensor(state).to(self.device)

    def _get_valid_actions(self, unit_pos, unit_energy, obs):
        valid_mask = [True] * 6
        directions = [(0, 0), (0, -1), (1, 0), (0, 1), (-1, 0)]
        tile_type = obs["map_features"]["tile_type"]
        map_width, map_height = tile_type.shape

        for i in range(1, 5):
            next_pos = [unit_pos[0] + directions[i][0], unit_pos[1] + directions[i][1]]
            if not (0 <= next_pos[0] < map_width and 0 <= next_pos[1] < map_height):
                valid_mask[i] = False
            elif tile_type[next_pos[0], next_pos[1]] == 2:
                valid_mask[i] = False
            elif unit_energy < self.env_cfg["unit_move_cost"]:
                valid_mask[i] = False

        if unit_energy < self.env_cfg["unit_sap_cost"] * 2:
            valid_mask[5] = False

        return valid_mask

    def act(self, step: int, obs, remainingOverageTime=60):
        unit_mask = np.array(obs["units_mask"][self.team_id])
        available_unit_ids = np.where(unit_mask)[0]
        actions = np.zeros((self.env_cfg["max_units"], 3), dtype=int)

        for unit_id in available_unit_ids:
            state = self._state_representation(unit_id, obs, step)
            unit_pos = obs["units"]["position"][self.team_id][unit_id]
            valid_mask = self._get_valid_actions(unit_pos, obs["units"]["energy"][self.team_id][unit_id], obs)

            with torch.no_grad():
                q_values = self.policy_net(state).cpu().numpy()

            if state[-1].item() == 1:
                q_values[0] += 10.0

            if self.training and random.random() < self.epsilon:
                q_values_valid = q_values.copy()
                q_values_valid[~np.array(valid_mask)] = -float('inf')
                action = np.argmax(q_values_valid)
            else:
                q_values_valid = q_values.copy()
                q_values_valid[~np.array(valid_mask)] = -float('inf')
                temperature = max(0.1, self.epsilon)
                probs = torch.softmax(torch.tensor(q_values_valid) / temperature, dim=0).numpy()
                action = np.random.choice(len(valid_mask), p=probs / probs.sum())

            if action == 5:
                opp_team_id = 1 - self.team_id
                opp_positions = np.array(obs["units"]["position"][opp_team_id])
                opp_energies = np.array(obs["units"]["energy"][opp_team_id])
                opp_mask = np.array(obs["units_mask"][opp_team_id])
                valid_targets = [(pos, energy) for i, (pos, energy) in enumerate(zip(opp_positions, opp_energies)) if opp_mask[i] and pos[0] != -1]
                if valid_targets:
                    relic_nodes = np.array(obs["relic_nodes"])
                    relic_mask = np.array(obs["relic_nodes_mask"])
                    point_tiles = [tuple(rn) for rn in relic_nodes[relic_mask]]
                    scores = [energy + 10 if tuple(pos) in point_tiles else energy for pos, energy in valid_targets]
                    target_pos = valid_targets[np.argmax(scores)][0]
                    actions[unit_id] = [5, target_pos[0], target_pos[1]]
                else:
                    actions[unit_id] = [0, 0, 0]
            else:
                actions[unit_id] = [action, 0, 0]

            if self.training:
                pos = unit_pos.astype(int)
                self.visit_counts[pos[0], pos[1]] += 1

        if self.training:
            self.epsilon = max(self.epsilon_min, self.epsilon * self.epsilon_decay)
        return actions

    def learn(self):
        if len(self.memory) < self.batch_size:
            return None

        batch = self.memory.sample(self.batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)

        rewards = [float(r) if r is not None else 0.0 for r in rewards]
        dones = [float(d) if d is not None else 0.0 for d in dones]  # Ensure dones are scalars

        states = torch.stack(states).to(self.device)
        actions = torch.LongTensor(actions).unsqueeze(1).to(self.device)
        rewards = torch.FloatTensor(rewards).unsqueeze(1).to(self.device)
        next_states = torch.stack(next_states).to(self.device)
        dones = torch.FloatTensor(dones).unsqueeze(1).to(self.device)

        for i, state in enumerate(states):
            pos = state[:2].cpu().numpy().astype(int)
            exploration_bonus = 0.01 / (1 + self.visit_counts[pos[0], pos[1]])
            rewards[i] += exploration_bonus
            if state[-1].item() == 1:
                rewards[i] += 1.0

        q_values = self.policy_net(states)
        q_values_selected = q_values.gather(1, actions)
        next_q_values = self.target_net(next_states).detach().max(1)[0].unsqueeze(1)
        targets = rewards + (1 - dones) * self.gamma * next_q_values

        loss = nn.MSELoss()(q_values_selected, targets)
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        self.step_counter += 1
        if self.step_counter % self.update_target_every == 0:
            self.target_net.load_state_dict(self.policy_net.state_dict())

        return loss.item()

    def save_model(self):
        torch.save(self.policy_net.state_dict(), f'dqn_model_{self.player}.pth')

    def load_model(self):
        model_file_name = f"dqn_model_{self.player}.pth"
        model_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), model_file_name)
        if os.path.exists(model_path):
            self.policy_net.load_state_dict(torch.load(model_path, map_location=torch.device('cpu'), weights_only=True))
        else:
            print(f"No trained model found for {self.player} at {model_path}")


def evaluate_agents(agent_cls, seed=42, training=True, games_to_play=10):
    env = LuxAIS3GymEnv(numpy_output=True)
    obs, info = env.reset(seed=seed)
    env_cfg = info["params"]

    player_0 = agent_cls("player_0", env_cfg, training=training)
    player_1 = agent_cls("player_1", env_cfg, training=training)

    rewards_0, rewards_1 = [], []
    losses_0, losses_1 = [], []

    for _ in tqdm(range(games_to_play), desc="Playing games", unit="game"):
        obs, info = env.reset(seed=random.randint(0, 1000))  # Random seed for variety
        game_done = False
        step = 0
        total_reward_0 = 0
        total_reward_1 = 0

        while not game_done:
            actions = {}
            states = {}
            next_states = {}
            for agent in [player_0, player_1]:
                actions[agent.player] = agent.act(step, obs[agent.player])
                if training:
                    # Use a list sized to max_units, fill only active units
                    states[agent.player] = [None] * env_cfg["max_units"]
                    for unit_id in range(env_cfg["max_units"]):
                        if obs[agent.player]["units_mask"][agent.team_id][unit_id]:
                            states[agent.player][unit_id] = agent._state_representation(unit_id, obs[agent.player], step)

            next_obs, rewards, terminated, truncated, info = env.step(actions)
            dones = {k: terminated[k] or truncated[k] for k in terminated}

            total_reward_0 += rewards["player_0"]
            total_reward_1 += rewards["player_1"]

            if training:
                for agent in [player_0, player_1]:
                    # Same for next_states
                    next_states[agent.player] = [None] * env_cfg["max_units"]
                    for unit_id in range(env_cfg["max_units"]):
                        if next_obs[agent.player]["units_mask"][agent.team_id][unit_id]:
                            next_states[agent.player][unit_id] = agent._state_representation(unit_id, next_obs[agent.player], step + 1)

                    # Push experiences using correct indices
                    for unit_id in range(env_cfg["max_units"]):
                        if obs[agent.player]["units_mask"][agent.team_id][unit_id]:
                            action = actions[agent.player][unit_id][0]
                            state = states[agent.player][unit_id]  # Now matches unit_id
                            next_state = next_states[agent.player][unit_id] if next_states[agent.player][unit_id] is not None else state
                            reward = float(rewards[agent.player])
                            done = float(dones[agent.player])
                            agent.memory.push(state, action, reward, next_state, done)

                    loss = agent.learn()
                    if loss is not None:
                        if agent.player == "player_0":
                            losses_0.append(loss)
                        else:
                            losses_1.append(loss)

            obs = next_obs
            step += 1
            if any(dones.values()):
                game_done = True

        rewards_0.append(total_reward_0)
        rewards_1.append(total_reward_1)
        if training:
            player_0.save_model()
            player_1.save_model()
            if (_ + 1) % 10 == 0:
                print(f"Games {_+1}: Avg Reward P0={sum(rewards_0[-10:])/10}, P1={sum(rewards_1[-10:])/10}")
            if (_ + 1) % 50 == 0:
                print(f"Saved models at game {_+1}")

    env.close()
    print(f"Total rewards: player_0={sum(rewards_0)}, player_1={sum(rewards_1)}")
    return losses_0, losses_1, rewards_0, rewards_1



In [None]:
# Initialize environment
env = LuxAIS3GymEnv(numpy_output=True)
obs, info = env.reset()

# Initialize agents
player_0 = Agent("player_0", info["params"], training=True)
player_1 = Agent("player_1", info["params"], training=True)



In [None]:
evaluate_agents(Agent)


Playing games: 100%|██████████| 10/10 [03:34<00:00, 21.47s/game]

Games 10: Avg Reward P0=518.0, P1=497.0
Total rewards: player_0=5180, player_1=4970





([1.3026412725448608,
  2.3662638664245605,
  0.912497878074646,
  0.07512978464365005,
  0.8902903199195862,
  0.8967297077178955,
  0.2942718267440796,
  0.052261751145124435,
  0.3113905191421509,
  0.6534585356712341,
  0.4437175691127777,
  0.21112975478172302,
  0.12200560420751572,
  0.24660775065422058,
  0.3519248068332672,
  0.2992708683013916,
  0.18716144561767578,
  0.15001091361045837,
  0.15402710437774658,
  0.30753394961357117,
  0.31805652379989624,
  0.17335361242294312,
  0.08322121948003769,
  0.18759922683238983,
  0.2553573548793793,
  0.20860961079597473,
  0.08629877865314484,
  0.12725375592708588,
  0.099032923579216,
  0.16866126656532288,
  0.09042458236217499,
  0.03542415052652359,
  0.04523830488324165,
  0.05536286532878876,
  0.0588531494140625,
  0.0754711925983429,
  0.029826374724507332,
  0.05692661553621292,
  0.04990717023611069,
  0.034285880625247955,
  0.03987926244735718,
  0.0629575252532959,
  0.08871515095233917,
  0.028817415237426758,
  

In [None]:
train(player_1, player_0, num_games=10, save_interval=10)

Training started...
Game 1/10 - Rewards: player_0=306.00, player_1=709.00
Game 2/10 - Rewards: player_0=103.00, player_1=912.00
Game 3/10 - Rewards: player_0=304.00, player_1=711.00
Game 4/10 - Rewards: player_0=507.00, player_1=508.00
Game 5/10 - Rewards: player_0=507.00, player_1=508.00
Game 6/10 - Rewards: player_0=305.00, player_1=710.00
Game 7/10 - Rewards: player_0=102.00, player_1=913.00
Game 8/10 - Rewards: player_0=407.00, player_1=608.00
Game 9/10 - Rewards: player_0=1.00, player_1=1014.00
Game 10/10 - Rewards: player_0=305.00, player_1=710.00
After 10 games - Avg Rewards: player_0=284.70, player_1=730.30
Models saved after 10 games.
Training finished.


In [None]:
# Train the agents
train(player_0, player_1, num_games=10, save_interval=10)

Training started...
Game 1/10 - Rewards: player_0=103.00, player_1=912.00
Game 2/10 - Rewards: player_0=507.00, player_1=508.00
Game 3/10 - Rewards: player_0=0.00, player_1=1015.00
Game 4/10 - Rewards: player_0=405.00, player_1=610.00
Game 5/10 - Rewards: player_0=0.00, player_1=1015.00
Game 6/10 - Rewards: player_0=0.00, player_1=1015.00
Game 7/10 - Rewards: player_0=0.00, player_1=1015.00
Game 8/10 - Rewards: player_0=203.00, player_1=812.00
Game 9/10 - Rewards: player_0=1.00, player_1=1014.00
Game 10/10 - Rewards: player_0=1.00, player_1=1014.00
After 10 games - Avg Rewards: player_0=122.00, player_1=893.00
Models saved after 10 games.
Training finished.


In [None]:
train(player_1, player_0, num_games=10, save_interval=10)

In [None]:

# Evaluate the trained agents
evaluate(player_0, player_1, num_games=10)

In [None]:
!luxai-s3 main.py main.py --output=replay.html

Time Elapsed:  36.35243535041809
Rewards:  {'player_0': array(2, dtype=int32), 'player_1': array(3, dtype=int32)}


In [None]:
!tar -czvf submission.tar.gz agent.py qmix_model_player_0.pth qmix_model_player_1.pth main.py

In [None]:
!tar -czvf LAGx40-2flip-2-9.tar.gz agent.py qmix_model_player_0.pth main.py qmix_model_player_1.pth lux

In [None]:
# next goals : enhance the reward system
# print the match scores while training and evaluating