# AI Model Development for Racing Game Automation
## Overview
This report provides a comprehensive overview of the development and implementation of an AI model to automate gameplay in Trackmania which is free on steam and a game which isn't too heavy on the system so we would be able to work on our laptops along with our desktops. This report will walk you through how we approached the project for both the OpenCV and the deep learning portions.

## Team Members
- Emre Ekici
- Salih Ekici
- Kyano Trevisan


## Encountered Problems and Solutions

During the development of the scripts, particularly those involving OpenCV for game steering, we encountered a few challenges. Here's an overview of these issues and the solutions we implemented:

### 1. Difficulty with Timing in OpenCV Script
- **Problem**: One of the main difficulties was related to the timing of key inputs. Since we were using a keyboard for inputs instead of a controller, we couldn't input driving angles directly. This limitation made it challenging to precisely control the steering in the game.
- **Solution**: We adjusted the script to better handle the timing of key presses. This involved fine-tuning the duration for which each key was held down and the intervals between key presses. By experimenting with these timings, we were able to achieve more nuanced control, even with the binary nature of keyboard inputs.

### 2. Issue with pyautogui in TrackMania
- **Problem**: Initially, we tried using `pyautogui` for simulating keyboard inputs. However, we found that `pyautogui` was not effective in the context of the game TrackMania.
- **Solution**: After some research and testing, we switched to `pydirectinput`. This library is specifically designed for direct input in games and works well with games like TrackMania. `PyDirectInput` provided more reliable and consistent input simulation compared to `pyautogui`, which solved the issue we were facing.

### General Tips for Similar Projects
- **Experiment with Timing**: Timing of key presses is crucial, especially when simulating analog input (like steering angles) with digital means (like keyboard presses). It often requires trial and error to get it right.
- **Choose the Right Library for Input Simulation**: Not all libraries work the same in every context, especially in gaming environments. If one library doesn’t work (like `pyautogui` in our case), try others (like `pydirectinput`).
- **Document Adjustments**: Keep track of the adjustments you make, especially when fine-tuning timings or switching libraries. This documentation can be invaluable for troubleshooting and future development.

## Detailed Script Descriptions
Each script in our project serves a specific purpose in the AI model's development:

1. **open_cv.py**: This script utilizes OpenCV for processing screen captures from the game. It detects lane lines and calculates the driving angle and speed, essential for understanding the car's position and movement on the track.

2. **gather_play_data.py**: It focuses on collecting gameplay data. This script captures screen data while driving the car manually and logs corresponding control actions. This data forms the basis for training the AI model.

3. **create_model.py**: This script is responsible for constructing and training the deep learning model. It includes steps for loading data, preprocessing, setting up the neural network architecture, and training the model with data augmentation techniques.

4. **use_model.py**: The final script uses the trained model to automate gameplay. It makes real-time decisions for controlling the car in the game, effectively replacing manual input with AI-driven responses.

<hr>

## OpenCV

Script: `open_cv.py`

#### Automated Game Steering with OpenCV and PyDirectInput

Here, we explore an automated steering system for a game using Python libraries like OpenCV, PyDirectInput, and MSS. The script captures a portion of the game screen, processes the image to detect lines (which represent the road in the game), and then calculates the steering angle to control the game character or vehicle.


In [1]:
# Import necessary libraries
import cv2
import numpy as np
import pydirectinput as pdi
import mss
import time

#### Function: open_cv
The `open_cv` function takes a screenshot of the specified game area, processes it to detect lines, and calculates the steering angle based on these lines. It uses computer vision techniques to understand the game's environment and make decisions accordingly.


In [3]:
# This function will take in the screenshot, process it, find the lines and determine the input
def open_cv(game_area): 
    # Intialize a variable to use to determine input
    steering_angle = 0

    with mss.mss() as sct:
        screencap = sct.grab(game_area)
    
    # Convert screencap to a numpy array and process it to detect line visibility
    screencap_np = np.array(screencap)
    screencap_gray = cv2.cvtColor(screencap_np, cv2.COLOR_RGB2GRAY)
    screencap_blur = cv2.GaussianBlur(screencap_gray, (5, 5), 0)
    screencap_canny = cv2.Canny(screencap_blur, threshold1=100, threshold2=300)

    # Use the houghlinesp algorithm to detect lines from our processed image

    lines = cv2.HoughLinesP(screencap_canny, 1, np.pi/180, threshold=100, minLineLength=150, maxLineGap=15)
    line_image = np.zeros_like(screencap_canny)

    # Initialize lists to store the left and right lines
    left_line_x = []
    right_line_x = []

    # Grab the wanted region of the screencap 
    y1 = line_image.shape[0]
    y2 = int(y1 * 0.6)

    # Check for detected lines
    if lines is not None:
        # Iterate through every line and calculate the slop for the lines
        for line in lines:
            for x1, y1, x2, y2 in line:
                slope = (y2 - y1) / (x2 - x1) if (x2 - x1) != 0 else float('inf')
                # Assign line to the left if slope is negative
                if slope < 0:
                    left_line_x.extend([x1, x2])
                # Assign line to the right if slope is positive
                else:
                    right_line_x.extend([x1, x2])
                cv2.line(line_image, (x1, int(y1)), (x2, int(y2)), (255, 0, 0), 5)
    # Grab the average of the left lines and the right lines
    left_x_avg = np.mean(left_line_x) if left_line_x else 0
    right_x_avg = np.mean(right_line_x) if right_line_x else line_image.shape[1]

    # Find the center of the 2 lines
    mid_x = (left_x_avg + right_x_avg) / 2

    # Define the placement of the car
    car_pos = line_image.shape[1] / 2  

    # Define deviation to find steering angle
    deviation = car_pos - mid_x

    # Define steering angle to determine inputs
    steering_angle = deviation / line_image.shape[1] * 100  


    # Check steering angle to determine the input
    # High steering angle = more time pressed down
    if 7 > steering_angle > 2:
        pdi.keyDown('left')
        pdi.keyDown('w')
        pdi.keyUp('left')
        pdi.keyUp('w')
    elif -7 < steering_angle < -2:
        pdi.keyDown('right')
        pdi.keyDown('w')
        pdi.keyUp('right')
        pdi.keyUp('w')
    elif -2 <= steering_angle <= 2:
        pdi.keyDown('w')
        time.sleep(0.1)
        pdi.keyUp('w')
        key_input = 2
    elif steering_angle < -7:
        pdi.keyDown('w')
        pdi.keyDown('right')
        time.sleep(0.1)
        pdi.keyUp('w')
        pdi.keyUp('right')
    elif 7 <= steering_angle:
        pdi.keyDown('w')
        pdi.keyDown('left')
        time.sleep(0.1)
        pdi.keyUp('w')
        pdi.keyUp('left')
    else:
        pdi.keyDown('w')
        time.sleep(.1)
        pdi.keyUp('w')  


#### Inside the open_cv function

1. **Initialization**: The function begins by initializing the steering angle and capturing the game screen using the MSS library.
2. **Image Processing**: The captured screen is converted to a grayscale image, blurred, and edges are detected using the Canny method from OpenCV.
3. **Line Detection**: It then uses the HoughLinesP method to detect lines in the processed image.
4. **Line Analysis**: The function analyzes these lines to differentiate between left and right lines of the road.
5. **Steering Angle Calculation**: Based on the position of these lines, it calculates a steering angle.
6. **Control Inputs**: Depending on the steering angle, it simulates keyboard inputs using the PyDirectInput library to control the game.


In [None]:
if __name__ == "__main__":
    # The game is placed in the top left corner of the screen
    game_area = {"left": 0, "top": 270, "width": 970, "height": 200}
    # Time to switch over to the Trackmania window
    time.sleep(2)
    while True:
        time.sleep(.1)
        screen_capture = open_cv(game_area)

<hr>

## Data Collection

Script: `gather_play_data.py`

#### Game Screen Capture and Key Press Logging

This notebook outlines a Python script that captures a specified area of the game screen and logs key presses. The script uses libraries like OpenCV, MSS, and Keyboard. It's designed to save screenshots of the game and log the corresponding key presses in a CSV file for further analysis or machine learning purposes.


In [13]:
# Import necessary libraries
import cv2
import numpy as np
import mss
import time
import keyboard
import os
import csv

#### Function: capture_and_save_screen
This function captures the specified area of the screen and saves the screenshot to a file. It uses MSS for screen capturing and OpenCV for saving the image.


In [15]:
def capture_and_save_screen(game_area, file_path):
    with mss.mss() as sct:
        # Capture the screen at the specified game area
        screencap = sct.grab(game_area)
        screencap = np.array(screencap)

        # Save the captured image
        cv2.imwrite(file_path, screencap)

    return screencap, file_path

#### Function: log_key_press
The `log_key_press` function logs the key pressed along with the filename of the screenshot. It writes this data to a CSV file, facilitating the pairing of screen states with user inputs.


In [16]:
def log_key_press(key, file_path, log_file):
    with open(log_file, 'a', newline='') as file:
        writer = csv.writer(file)
        writer.writerow([file_path, key])

#### Main Script
The main script sets up the game area for screen capture, initializes directories and files for saving screenshots and logs, and enters a loop to continuously capture the screen and log key presses. It also handles the creation of a CSV file for logging and sets up an OpenCV window to display the captured screen.


In [None]:
if __name__ == "__main__":
    game_area = {"left": 0, "top": 270, "width": 970, "height": 200}
    img_directory = "screenshots"
    log_file = "key_log.csv"

    # Create directories if they don't exist
    if not os.path.exists(img_directory):
        os.makedirs(img_directory)

    # Initialize csv file for logging
    if not os.path.exists(log_file):
        with open(log_file, 'w', newline='') as file:
            writer = csv.writer(file)
            writer.writerow(["image", "key"])

    time.sleep(5)
    img_count = 0

    while True:
        img_count += 1
        file_path = os.path.join(img_directory, f"image_{img_count}.png")

        # Capture the screen and save it
        screen_capture, saved_file_path = capture_and_save_screen(game_area, file_path)

        # Display the captured screen
        cv2.imshow("Game Screen", screen_capture)

        # Check for arrow key presses and log them
        key_pressed = None
        if keyboard.is_pressed('d'):
            key_pressed = 'up&right'
        elif keyboard.is_pressed('a'):
            key_pressed = 'up&left'
        elif keyboard.is_pressed('w'):
            key_pressed = 'up'    
        
        if key_pressed:
            print(key_pressed)
            log_key_press(key_pressed, saved_file_path, log_file)

        # Break the loop if 'q' key is pressed
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    # Release the OpenCV window
    cv2.destroyAllWindows()

<hr>

## Model Creation and Training

Script: `create_model.py`

#### Image Classification with TensorFlow and Keras

This notebook demonstrates the process of building an image classification model using TensorFlow and Keras. It involves loading image paths and labels from a CSV file, preprocessing the images, and training a Convolutional Neural Network (CNN) for classification. The dataset is divided into training and validation sets, and the model is trained to classify images based on key presses logged during a gaming session.


In [18]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
import pandas as pd




#### Load and Process Dataset

The dataset consists of image paths and their corresponding key press labels. The dataset is balanced by sampling an equal number of images for each key press and then split into training and validation sets.


In [None]:
log_file = "key_log.csv"
df = pd.read_csv(log_file)
group_sizes = df.groupby('key').size()
print(group_sizes)
sampled_df = df.groupby('key').apply(lambda x: x.sample(n=min(group_sizes))).reset_index(drop=True)
train_df, val_df = train_test_split(sampled_df, test_size=0.3, random_state=42)

#### Image Data Preprocessing

ImageDataGenerator is used for data augmentation and preprocessing. The images are rescaled, and various transformations are applied to the training data. Validation data is only rescaled and not augmented.


In [None]:
# Image dimensions
input_shape = (224, 224, 3)  

# Data augmentation
datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2,
    rotation_range=0,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=False,
    fill_mode='nearest'
)

# Load and preprocess the training data with data augmentation
train_generator = datagen.flow_from_dataframe(
    train_df,
    x_col='image',
    y_col='key',
    target_size=(224, 224),
    batch_size=32,
    class_mode='sparse',  
    subset='training'
)

# Load and preprocess the validation data without data augmentation
validation_generator = datagen.flow_from_dataframe(
    val_df,
    x_col='image',
    y_col='key',
    target_size=(224, 224),
    batch_size=32,
    class_mode='sparse',
    subset='validation'
)

# Print label tags for indices
print("Label Tags for Indices in Training Generator:")
print(train_generator.class_indices)

#### Convolutional Neural Network (CNN) Model

A CNN model is defined using TensorFlow Keras. The model consists of convolutional layers, max pooling layers, and dense layers. It is compiled and trained on the dataset, and the accuracy and loss are monitored on both the training and validation sets.


In [None]:
# Define the CNN model
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(train_generator, epochs=20, validation_data=validation_generator)

#### Save the Trained Model

After training, the model is saved to a file for future use, such as in applications or further analysis.


In [None]:
model.save('modelmain.h5')

<hr>

## Model Usage in Game Automation

Script: `use_model.py`

#### Real-time Game Control using TensorFlow Model

This notebook demonstrates the application of a pre-trained TensorFlow model for real-time game control. The model predicts the necessary key presses based on the game screen captured in real-time. The script uses MSS for screen capturing, TensorFlow for loading and making predictions with the model, and PyDirectInput for simulating key presses.


In [19]:
import tensorflow as tf
import numpy as np
import mss
import pydirectinput as pdi
import time

#### Load the Pre-trained Model

The pre-trained TensorFlow model is loaded. This model has been trained to classify screen captures into different categories, each corresponding to a specific key press or combination of key presses.


In [20]:
model = tf.keras.models.load_model('modelmain.h5')





#### Define Game Area and Class Labels

The area of the game screen to be captured is defined. Additionally, a dictionary mapping class indices to their respective labels (key presses) is set up. This mapping will be used to interpret the model's predictions.

In [21]:
game_area = {"left": 0, "top": 270, "width": 970, "height": 200}
class_labels = {0: 'up', 1: 'up&left', 2: 'up&right'}  # Map class indices to labels

#### Function: read_screen

The `read_screen` function captures a specified area of the game screen, preprocesses the image to match the input requirements of the model, and returns the processed image.


In [22]:
def read_screen():
    with mss.mss() as sct:
        # Capture the screen at the specified game area
        screencap = sct.grab(game_area)
        screencap_np = np.array(screencap)

    # Discard the alpha channel and convert to RGB
    screencap_rgb = screencap_np[:, :, :3]

    # Resize to match the input size of the ResNet50 model (224x224)
    screencap_resized = tf.image.resize(screencap_rgb, (224, 224))

    # Add a batch dimension to the input data
    return np.expand_dims(screencap_resized, axis=0)

#### Main Script for Game Control

In the main script, the screen is continuously captured and fed into the model to predict the necessary key presses. Based on the model's predictions, corresponding key presses are simulated in real-time to control the game.


In [None]:
if __name__ == "__main__":
    while True:
        prediction_probabilities = model.predict(read_screen())
        predicted_class_index = np.argmax(prediction_probabilities)
        predicted_class_label = class_labels[predicted_class_index]
        
        if predicted_class_label:
            if '&' in predicted_class_label:
                parts = predicted_class_label.split('&')
                pdi.keyDown(parts[1])
                time.sleep(0.1)
                pdi.keyUp(parts[1])
                pdi.keyDown(parts[0])
                pdi.keyUp(parts[0])
            else:
                pdi.keyDown(predicted_class_label)
                time.sleep(0.1)
                pdi.keyUp(predicted_class_label)
        
        print("Predicted Label:", predicted_class_label)

## Conclusion

This notebook encapsulates a journey through automating game control using a combination of technologies like OpenCV, TensorFlow, and PyDirectInput. We navigated various challenges, from image processing and machine learning model training to real-time game interaction and key press simulation.

### Key Takeaways
1. **Interdisciplinary Integration**: This project illustrates the power of integrating different fields—computer vision, machine learning, and game automation. It shows how they can come together to create an innovative solution.
2. **Problem-Solving and Adaptability**: We encountered and overcame several challenges. Adjusting the timing of simulated key inputs and switching to a more suitable library for game control are testaments to the importance of problem-solving and adaptability in software development.
3. **Practical Applications of AI**: The project offers a glimpse into the practical applications of AI and machine learning. It demonstrates how AI can interact with and control other software systems (like video games) in real-time.

### Final Thoughts
This project is not just about automating a game; We learned about what can be achieved when we bridge the gap between AI and real-world applications. It's a stepping stone towards more complex and nuanced AI interactions in various domains. As we continue to push the boundaries of what's possible, the learnings from such projects will be invaluable.