SQLite Database Initialization
This section initializes the SQLite database by creating necessary tables such as Items, Auctions, and ActionEvents. It ensures the existence of the required tables and sets up the database schema.

File and Configuration Setup
Additionally, it creates other essential files and configurations needed for subsequent operations.

In [None]:
import os
import sys
import sqlite3
import json
import pandas as pd
import mysql.connector
import pickle

from datetime import datetime
from tqdm import tqdm
from pathlib import Path
from sklearn.metrics import mean_squared_error

wd = Path(os.path.dirname(os.path.abspath("__file__"))).parent.resolve()
sys.path.append(str(wd))

from data.transformers import add_features, transform_data

In [None]:
db_path = 'auction.db'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

cursor.execute('''
    CREATE TABLE IF NOT EXISTS Items (
        item_id INT PRIMARY KEY,
        item_name TEXT,
        quality TEXT,
        item_level INT,
        required_level INT,
        item_class TEXT,
        item_subclass TEXT,
        purchase_price_gold INT,
        purchase_price_silver INT,
        sell_price_gold INT,
        sell_price_silver INT,
        max_count INT,
        is_stackable INT
    )
''')

cursor.execute('''
    CREATE TABLE IF NOT EXISTS Auctions (
        auction_id INT PRIMARY KEY,
        bid INT,
        buyout INT,
        quantity INT,
        time_left TEXT,
        item_id INT
    )
''')

cursor.execute('''
    CREATE TABLE IF NOT EXISTS ActionEvents (
        auction_id INT,
        record DATETIME,
        PRIMARY KEY (auction_id, record),
        FOREIGN KEY (auction_id) REFERENCES Auctions(auction_id)
    )
''')

conn.commit()
conn.close()

In [None]:
file_info = {}
data_dir = 'sample/'

for root, dirs, files in os.walk(data_dir):
    for filename in tqdm(files):
        filepath = os.path.join(root, filename)
        date = datetime.strptime(filename.split('.')[0], '%Y%m%dT%H')

        file_info[filepath] = date

file_info = {k: v for k, v in sorted(file_info.items(), key=lambda item: item[1])}
filenames = list(file_info.keys())

MySQL Items Data Retrieval
In this part, the script retrieves data from a MySQL database. It reads the MySQL database configuration from a JSON file, establishes a connection, and fetches data from the Items table. The retrieved data is then loaded into a Pandas DataFrame.

SQLite Database Update
After retrieving the data, the script connects to the SQLite database, deletes all existing records from the Items table, and appends the newly fetched data.

In [None]:
db_path = 'auction.db'
data_dir = 'sample/'
db = sqlite3.connect(db_path)
cursor = db.cursor()

for i, filepath in tqdm(enumerate(filenames)):
    try:
        data = json.load(open(filepath, "r"))
    except (FileNotFoundError, json.JSONDecodeError) as e:
        print(f"Error reading file {filepath}: {e}")
        continue

    filename = filepath.split('/')[-1]
    auction_record = datetime.strptime(filename[:-5], "%Y%m%dT%H")

    if i == 0:
        auction_ids = []
        auctions_data = []

        for auction in data["auctions"]:
            if auction["id"] not in auction_ids:
                auctions_data.append((auction["id"], auction["bid"], auction["buyout"], auction["quantity"], auction["time_left"], auction["item"]["id"]))
                auction_ids.append(auction["id"])

        try:
            cursor.executemany("""
                INSERT INTO Auctions (auction_id, bid, buyout, quantity, time_left, item_id)
                VALUES (?, ?, ?, ?, ?, ?)
            """, auctions_data)
            db.commit()
        except sqlite3.Error as err:
            db.rollback()
            print(f"Error inserting auction data for file {filepath} in Auctions: {err}")

    action_events_data = []
    for auction in data["auctions"]:
        action_events_data.append((auction["id"], auction_record.strftime('%Y-%m-%d %H:%M:%S')))
            
    try:
        cursor.executemany("""
            INSERT OR REPLACE INTO ActionEvents (auction_id, record)
            VALUES (?, ?)
        """, action_events_data)
        db.commit()
    except sqlite3.Error as err:
        db.rollback()
        print(f"Error inserting auction events for file {filepath} in ActionEvents: {err}")

cursor.close()
db.close()

Auction Data and Items Storage
This section processes JSON files containing auction data. It iterates through the files, extracts relevant information, and inserts it into the Auctions and ActionEvents tables of the SQLite database.

Data Import from Pandas DataFrame
Moreover, it initializes a connection to the SQLite database and imports data from a Pandas DataFrame into the Items table. This DataFrame is generated from MySQL database retrieval, ensuring the SQLite Items table is up-to-date.

In [None]:
with open('../data/config.json') as f:
    config = json.load(f)

db_path = 'auction.db'
query = "SELECT * FROM Items"

def import_items():
    try:
        mysql_db = mysql.connector.connect(**config['database'])
    except mysql.connector.Error as err:
        print(err)
        return

    cursor = mysql_db.cursor()

    items = cursor.execute(query)
    items = cursor.fetchall()

    mysql_db.close()
    cursor.close()

    try:
        db = sqlite3.connect(db_path)
        print("Connected to SQLite")
    except sqlite3.Error as err:
        return

    cursor = db.cursor()
    cursor.executemany("""
        INSERT OR REPLACE INTO Items (item_id, item_name, quality, item_level, required_level, item_class, item_subclass, purchase_price_gold, purchase_price_silver, sell_price_gold, sell_price_silver, max_count, is_stackable)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    """, items)

    db.commit()

    cursor.close()
    db.close()

    print("Inserted items into SQLite: " + str(len(items)))

import_items()

It is responsible for connecting to an SQLite database, executing a SQL query involving multiple tables, and retrieving the results. These results are stored in the variable

In [44]:
db_path = 'auction.db'
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

query = """
    SELECT
        a.auction_id,
        a.bid / 10000.0 AS bid_in_gold,
        a.buyout / 10000.0 AS buyout_in_gold,
        (a.buyout / 10000.0) / a.quantity AS unit_price,
        a.quantity,
        a.time_left,
        a.item_id,
        i.item_name,
        i.quality,
        i.item_class,
        i.item_subclass,
        i.is_stackable,
        i.purchase_price_gold,
        i.required_level,
        i.item_level,
        i.sell_price_gold,
        MIN(ae.record) AS first_appearance_timestamp,
        strftime('%Y', MIN(ae.record)) AS first_appearance_year,
        strftime('%m', MIN(ae.record)) AS first_appearance_month,
        strftime('%d', MIN(ae.record)) AS first_appearance_day,
        strftime('%H', MIN(ae.record)) AS first_appearance_hour,
        COUNT(*) AS hours_on_sale
    FROM Auctions a
    JOIN ActionEvents ae ON a.auction_id = ae.auction_id
    JOIN Items i ON i.item_id = a.item_id
    WHERE A.time_left <> 'SHORT'
    GROUP BY a.auction_id
"""

cursor.execute(query)
results = cursor.fetchall()

conn.close()

df = pd.DataFrame(results, columns=[i[0] for i in cursor.description])
df.head(10)

Unnamed: 0,auction_id,bid_in_gold,buyout_in_gold,unit_price,quantity,time_left,item_id,item_name,quality,item_class,...,purchase_price_gold,required_level,item_level,sell_price_gold,first_appearance_timestamp,first_appearance_year,first_appearance_month,first_appearance_day,first_appearance_hour,hours_on_sale
0,969962251,6,7,1,10,MEDIUM,13463,Dreamfoil,Common,Trade Goods,...,0,0,54,0,2024-01-09 00:00:00,2024,1,9,0,2
1,969962371,500,500,25,20,MEDIUM,8838,Sungrass,Common,Trade Goods,...,0,0,46,0,2024-01-09 00:00:00,2024,1,9,0,2
2,969962375,500,500,25,20,MEDIUM,8838,Sungrass,Common,Trade Goods,...,0,0,46,0,2024-01-09 00:00:00,2024,1,9,0,2
3,969962377,500,500,25,20,MEDIUM,8838,Sungrass,Common,Trade Goods,...,0,0,46,0,2024-01-09 00:00:00,2024,1,9,0,2
4,969962379,500,500,25,20,MEDIUM,8838,Sungrass,Common,Trade Goods,...,0,0,46,0,2024-01-09 00:00:00,2024,1,9,0,2
5,969962381,500,500,25,20,MEDIUM,8838,Sungrass,Common,Trade Goods,...,0,0,46,0,2024-01-09 00:00:00,2024,1,9,0,2
6,969962382,500,500,25,20,MEDIUM,14047,Runecloth,Common,Trade Goods,...,0,0,50,0,2024-01-09 00:00:00,2024,1,9,0,2
7,969962384,500,500,25,20,MEDIUM,43010,Worm Meat,Common,Trade Goods,...,0,0,70,0,2024-01-09 00:00:00,2024,1,9,0,2
8,969962387,500,500,25,20,MEDIUM,4306,Silk Cloth,Common,Trade Goods,...,0,0,30,0,2024-01-09 00:00:00,2024,1,9,0,2
9,969962389,500,500,25,20,MEDIUM,43012,Rhino Meat,Common,Trade Goods,...,0,0,70,0,2024-01-09 00:00:00,2024,1,9,0,2


organize the results of a SQL query into a DataFrame, and then apply some form of preprocessing to that data using functions from a module called preprocess_data.

In [45]:
df = add_features(df)
X, y = transform_data(df)

df.head(5)

Unnamed: 0,auction_id,bid_in_gold,buyout_in_gold,unit_price,quantity,time_left,item_id,item_name,quality,item_class,...,std_competitor_price,competitor_count,lowest_competitor_price,top_competitor_price,relative_price_difference,relative_avg_price_difference,relative_buyout_difference,relative_bid_difference,relative_price_to_lowest_competitor,relative_price_to_top_competitor
0,969962251,6,7,1,10,12,13463,Dreamfoil,Common,Trade Goods,...,1,171,1,1,0,0,0,0,0,0
1,969962371,500,500,25,20,12,8838,Sungrass,Common,Trade Goods,...,707,162,2,3,7,0,11,11,9,6
2,969962375,500,500,25,20,12,8838,Sungrass,Common,Trade Goods,...,707,162,2,3,7,0,11,11,9,6
3,969962377,500,500,25,20,12,8838,Sungrass,Common,Trade Goods,...,707,162,2,3,7,0,11,11,9,6
4,969962379,500,500,25,20,12,8838,Sungrass,Common,Trade Goods,...,707,162,2,3,7,0,11,11,9,6


loads a trained model from a file, makes predictions on a data set, calculates the RMSE and displays the result

In [51]:
pd.options.display.float_format = '{:.0f}'.format

with open('models/tree_model.pkl', 'rb') as model_file:
    model = pickle.load(model_file)

predictions = model.predict(X)
df['prediction'] = predictions
df[['item_name', 'item_class', 'unit_price', 'buyout_in_gold', 'hours_on_sale', 'prediction']].head(20)

Unnamed: 0,item_name,item_class,unit_price,buyout_in_gold,hours_on_sale,prediction
0,Dreamfoil,Trade Goods,1,7,2,45
1,Sungrass,Trade Goods,25,500,2,11
2,Sungrass,Trade Goods,25,500,2,11
3,Sungrass,Trade Goods,25,500,2,11
4,Sungrass,Trade Goods,25,500,2,11
5,Sungrass,Trade Goods,25,500,2,11
6,Runecloth,Trade Goods,25,500,2,8
7,Worm Meat,Trade Goods,25,500,2,50
8,Silk Cloth,Trade Goods,25,500,2,50
9,Rhino Meat,Trade Goods,25,500,2,50


In [52]:
rmse = mean_squared_error(y, predictions, squared=False)

print("RMSE:", rmse)

RMSE: 17.881771869448155


