<img src="./figs/IOAI-Logo.png" alt="IOAI Logo" width="200" height="auto">

[IOAI 2025 (Beijing, China), Individual Contest](https://ioai-official.org/china-2025)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IOAI-official/IOAI-2025/blob/main/Individual-Contest/Antique/Antique.ipynb)

# Antique Painting Authentication

## 1. Problem Description

You have studied Artificial Intelligence for quite some time. Old friend of your father, famous archeologist and art critic, heard about this and asked for your help. You need to design an algorithm that can classify antique paintings as either authentic or replica pieces.

Because professional authentication is expensive, the research team has only obtained authenticity labels for a small portion of the paintings. For the majority of samples, the authenticity remains unknown. It is known that the paintings' digital features exhibit strong structural patterns. You are tasked with leveraging all available samples — including those with unknown labels — to train a model for classifying the authenticity of antique paintings.

## 2. Dataset

The dataset consists of a training set, a validation set and a test set, each of them has 500 independent samples. 

1. **Training Set (`training_set.csv`)**:

   - The first five columns represent the digital features of each antique painting.
   - The sixth column contains the label: 1 for authentic, -1 for replica, and 0 for unknown.

   The training set is used for training your models and can be accessed and downloaded directly during the competition.

2. **Validation Set (`validation_set.csv`)**: 
   - These are similar to the training set format but do not contain the label column.

   The validation set is used to calculate the Leaderboard A score and is not directly accessible during the competition.

3. **Test Set (`test_set.csv`)**: 
   - These are similar to the training set format but do not contain the label column.

   The test set is used to calculate the Leaderboard B score and is not directly accessible during the competition.

## 3. Task

Your task is to train an appropriate model capable of predicting the authenticity of paintings in the test sets, despite the large number of unlabeled samples.

## 4. Submission

Contestants need to submit a notebook file named `submission.ipynb`. The file should output a zip file named `submission.zip`, which should contain the following two files:

1. `submissionA.csv`: Contains the model's predicted label results on the validation set, with each line being a -1 or 1 and no header.
2. `submissionB.csv`: Contains the model's predicted label results on the test set, with each line being a -1 or 1 and no header.

The testing machine will read `submission.zip` and calculate the scores. The submission files must strictly follow the above format and naming; otherwise, the system will not be able to read them correctly. 

Details about the submission procedure are provided in the baseline notebook. Contestants are encouraged to refer to it for guidance.

## 5. Score

The evaluation metric will be **classification accuracy**, defined as the proportion of correctly predicted samples over the total number of evaluated samples.

## 6. Baseline and Training Set

- Below you can find the baseline solution.
- The dataset is in `training_set` folder.
- The highest score by the Scientific Committee for this task is 0.98 in Leaderboard B,  this score is used for score unification.
- The baseline score by the Scientific Committee for this task is 0.46 in Leaderboard B, this score is used for score unification.

### Train Your Model

In [None]:
import pandas as pd
import numpy as np
import os
from sklearn.svm import SVC

TRAIN_PATH = "./training_set/" # The address of trainig set
# The training set is deployed automatically in the testing machine. 
# Your notebook can access the TRAIN_PATH even if you do not mount it along with notebook.

#DATA_PATH is the secret environment variable to point the address of the validation set and test set on the testing machine. 
#You cannot access this address locally.
if os.environ.get('DATA_PATH'):  
    DATA_PATH = os.environ.get("DATA_PATH") + "/" 
else:
    DATA_PATH = ""  # Fallback for local testing
train = pd.read_csv(TRAIN_PATH + "training_set.csv")

X = np.array(train.iloc[:,:5])
y = np.array(train.iloc[:,5])

np.random.seed(42)
y[y == 0] = np.random.choice([-1, 1], size=(y == 0).sum())

svm_binary_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_binary_model.fit(X, y)

### Make Predictions on the Validation and Test Set

In [None]:
testA = np.array(pd.read_csv(DATA_PATH + "validation_set.csv"))
testB = np.array(pd.read_csv(DATA_PATH + "test_set.csv"))

predA = svm_binary_model.predict(testA)
predB = svm_binary_model.predict(testB)

### Generate `submission.zip` for Submission

In [None]:
import zipfile
import os

submissionA = pd.DataFrame(predA)
submissionA.to_csv("./submissionA.csv", index=False, header=False)

submissionB = pd.DataFrame(predB)
submissionB.to_csv("./submissionB.csv", index=False, header=False)

files_to_zip = ['./submissionA.csv', './submissionB.csv']
zip_filename = 'submission.zip'

with zipfile.ZipFile(zip_filename, 'w') as zipf:
    for file in files_to_zip:
        zipf.write(file, os.path.basename(file))

print(f'{zip_filename} is created succefully!')