# Clevr VQA

This notebook provides a step-by-step explanation of the Clevr Visual Question Answering (VQA) system implemented in `main.py`. The system uses the DomiKnowS framework to perform logic-guided VQA tasks.

## Overview

The Clever VQA system is designed to answer questions about images by combining:
1. Visual processing (using ResNet)
2. Logical reasoning (using DomiKnowS inference program)

The system can be run in two modes:
- Regular mode: Uses a pre-trained ResNet model for feature extraction
- Dummy mode: Uses a lightweight configuration for testing purposes

Here, we use the dummy mode for demostration. Refer to the readme.md to use the regular mode.
Let's break down the code and understand how it works.


## 1. Imports and Setup

First, we import the necessary libraries and set up the Python path to include the required directories. DomiKnowS needs logging to be set to info to report the progress on training.


In [1]:
import sys
sys.path.append('../../../')
sys.path.append('../../')
sys.path.append('../')
sys.path.append('./')
from domiknows.sensor.pytorch import EdgeSensor, ModuleLearner
from domiknows.sensor.pytorch.sensors import ReaderSensor, FunctionalSensor, FunctionalReaderSensor
from domiknows.program.lossprogram import InferenceProgram
from domiknows.program.model.pytorch import SolverModel
from preprocess import preprocess_dataset, preprocess_folders_and_files
from graph import create_graph
from pathlib import Path
from modules import ResNetPatcher, DummyLinearLearner
import argparse, torch, logging

logging.basicConfig(level=logging.INFO)


Log file for dataNode is in: D:\PycharmProjects\DomiKnowS_interface\test_regr\Clever\logs\datanode.log


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Darius\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\Darius\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


## 2. Argument Parsing

Next, we set up the command-line arguments that control the behavior of the program.


In [2]:
from types import SimpleNamespace

args = SimpleNamespace(
    train_size=10,
    test_size=10,
    epochs=4,
    lr=1e-6,
    batch_size=1,
    eval_only=False,
    dummy=True,
    tnorm="G",
)

### Explanation of Arguments

- `--train-size`: Number of training examples to use (default: use all available)
- `--test-size`: Number of test examples to use (default: use all available)
- `--epochs`: Number of training epochs (default: 4)
- `--lr` or `--learning-rate`: Learning rate for optimization (default: 1e-6)
- `--batch-size`: Mini-batch size for training (default: 1)
- `--eval-only`: Flag to skip training and only evaluate a pre-trained model
- `--dummy`: Flag to use a lightweight configuration for testing
- `--tnorm`: T-norm to use in the InferenceProgram (choices: "G", "P", "L", default: "G")
  - T-norms are fuzzy logic operators used for combining logical expressions
  - "G" = Gödel t-norm, "P" = Product t-norm, "L" = Łukasiewicz t-norm


## 3. Data Preprocessing

Now we prepare the dataset and create the necessary directories.


In [3]:
CACHE_DIR = preprocess_folders_and_files(args.dummy)
NUM_INSTANCES = 10
device = "cpu"
dataset = preprocess_dataset(args, NUM_INSTANCES, CACHE_DIR)

re-loaded dataset_cache\instance_0.pkl
re-loaded dataset_cache\instance_1.pkl
re-loaded dataset_cache\instance_2.pkl
re-loaded dataset_cache\instance_3.pkl
re-loaded dataset_cache\instance_4.pkl
re-loaded dataset_cache\instance_5.pkl
re-loaded dataset_cache\instance_6.pkl
re-loaded dataset_cache\instance_7.pkl
re-loaded dataset_cache\instance_8.pkl
re-loaded dataset_cache\instance_9.pkl
Dataset length: 10


## 4. Graph Creation and Logic Preparation

Next, we create the knowledge graph and prepare the logical executions for each instance in the dataset.


In [4]:
questions_executions, graph, image, object, image_object_contains, attribute_names_dict,graph_text = create_graph(dataset,return_graph_text=True)

print(f"Created graph based on the object properties in the dataset: \n{graph_text}\n\n")


Created graph based on the object properties in the dataset: 
from domiknows.graph import Graph, Concept
from domiknows.graph.logicalConstrain import ifL, andL, existsL

with Graph('image_graph') as graph:

	image = Concept(name='image')
	obj = Concept(name='obj')
	image_object_contains, = image.contains(obj)

	is_gray = obj(name='is_gray')
	is_red = obj(name='is_red')
	is_blue = obj(name='is_blue')
	is_green = obj(name='is_green')
	is_brown = obj(name='is_brown')
	is_purple = obj(name='is_purple')
	is_cyan = obj(name='is_cyan')
	is_yellow = obj(name='is_yellow')

	is_rubber = obj(name='is_rubber')
	is_metal = obj(name='is_metal')

	is_cube = obj(name='is_cube')
	is_sphere = obj(name='is_sphere')
	is_cylinder = obj(name='is_cylinder')

	is_small = obj(name='is_small')
	is_large = obj(name='is_large')






The root concept is an image which contains multiple objects. Each object has properties such as color, material, shape, and size. Based on these graph we can create logical executable programs for each question. For example for the first two questions, their logical executions are created as bellow:

In [5]:
for i in range(2):
    print("Question: ",dataset[i]["question_raw"])
    print("Logical Expression: ",questions_executions[i])
    print("Answer: ",dataset[i]["answer"])
    print("----------------------------------------")

Question:  Are any tiny purple rubber balls visible?
Logical Expression:  existsL(
			andL(
				is_small("prop0"),
				andL(
					is_purple("prop1"),
					andL(
						is_rubber("prop2"),
						is_sphere("prop3")
					)
				)
			)
		)
	
Answer:  False
----------------------------------------
Question:  Is there a big matte ball?
Logical Expression:  existsL(
			andL(
				is_large("prop0"),
				andL(
					is_rubber("prop1"),
					is_sphere("prop2")
				)
			)
		)
	
Answer:  True
----------------------------------------


Subsequently, we load the execution form of each question and its answer into the dataset.

In [6]:
for i in range(len(dataset)):
    dataset[i]["logic_str"] = questions_executions[i]
    dataset[i]["logic_label"] = torch.LongTensor([bool(dataset[i]['answer'])]).to(device)

## 5. Sensor Setup

Now we set up various sensors for processing images and objects from the dataset. Sensors in DomiKnowS connect concepts that are defined in the graph and read their properties from the Dataset. They also define the learable Modules.


In [7]:
# These sensors read the pillowed image and its id from the dataset
image["pil_image"] = FunctionalReaderSensor(keyword="pil_image", forward=lambda data: [data])
image["image_id"] = FunctionalReaderSensor(keyword='image_index', forward=lambda data: [data])

# These sensors read the bounding boxes and the properties of the onjects from the dataseet

object["bounding_boxes"] = ReaderSensor(keyword="objects_raw")
object["properties"] = ReaderSensor(keyword="all_objects")

# Edge sensor connects the objects to the image which has a contains relationship
object[image_object_contains] = EdgeSensor(object["bounding_boxes"], image["pil_image"], relation=image_object_contains, forward=lambda b, _: torch.ones(len(b)).unsqueeze(-1))

# For each property of an object we need to define a learner
for attr_name, attr_variable in attribute_names_dict.items():
    object[attr_variable] = DummyLinearLearner(image_object_contains, "properties", current_attribute=attr_name)


1. Image sensors:
   - `image["pil_image"]`: A sensor that reads the PIL image from the dataset
   - `image["image_id"]`: A sensor that reads the image index from the dataset

2. Object sensors:
   - `object["bounding_boxes"]`: A sensor that reads the bounding boxes of objects from the dataset
   - `object["properties"]`: A sensor that reads the properties of objects from the dataset


3. Edge sensor: `object[image_object_contains]`: Creates a sensor for the relationship between images and objects

5. Attribute sensors: For each attribute (color, material, shape, size), uses a dummy learner.

These sensors are responsible for processing the input data and extracting features that will be used by the inference program.


## 6. Model Compilation and Program Creation

Now we compile the logic and create the inference program.


In [8]:
dataset = graph.compile_logic(dataset, logic_keyword='logic_str', logic_label_keyword='logic_label')
program = InferenceProgram(graph, SolverModel, poi=[image, object, *attribute_names_dict.values(), graph.constraint], device=device, tnorm=args.tnorm)


1. `graph.compile_logic(dataset, logic_keyword='logic_str', logic_label_keyword='logic_label')`:
   - Compiles the logical expressions in the dataset
   - Uses the "logic_str" field for the logical expressions and "logic_label" field for the labels
   - Returns the dataset with compiled logic

2. `InferenceProgram(graph, SolverModel, poi=[image, object, *attribute_names_dict.values(), graph.constraint], device=device, tnorm=args.tnorm)`:
   - Creates an inference program with the knowledge graph
   - Specifies the points of interest (poi) in the graph: image, object, attributes, and constraints
   - Sets the device and t-norm for the program

This step prepares the inference program that will be used for training and evaluation.


## 7. Training/Evaluation Loop

Finally, we train the model and evaluate its performance. The training process optimizes the model parameters to minimize the loss based on logical executions, while the evaluation process measures the model's accuracy in answering questions correctly.

The training is performed using:
- Adam optimizer
- Learning rate of 1e-6 
- Batch size of 1
- Training on CPU device
- c_warmup_iters set to 0 (immediate execution learning)

The preprocessed dataset includes the logical executions required for training. Since we only use executions for training (no other training methods), we start learning them immediately by setting c_warmup_iters=0.


In [9]:
program.train(dataset, Optim=torch.optim.Adam, train_epoch_num=args.epochs, lr=args.lr, c_warmup_iters=0, batch_size=args.batch_size, device=device)

acc = program.evaluate_condition(dataset, device=device)
print("Accuracy on Test: {:.2f}".format(acc * 100))

INFO:domiknows.program.program:Epoch: 1
INFO:domiknows.program.program:Training:
Epoch 1 Training:   0%|          | 0/10 [00:00<?, ?it/s]

Log file for ilpOntSolver is in: D:\PycharmProjects\DomiKnowS_interface\test_regr\Clever\logs\ilpOntSolver.log
Log file for ilpOntSolverTime is in: D:\PycharmProjects\DomiKnowS_interface\test_regr\Clever\logs\ilpOntSolver.log


INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=0) = 0.023
Epoch 1 Training:  10%|█         | 1/10 [00:00<00:05,  1.59it/s]INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=1) = 0.445
Epoch 1 Training:  20%|██        | 2/10 [00:00<00:02,  2.82it/s]INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=2) = 0.078
Epoch 1 Training:  30%|███       | 3/10 [00:00<00:01,  3.52it/s]INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=3) = 0.068
Epoch 1 Training:  40%|████      | 4/10 [00:01<00:01,  3.99it/s]INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=4) = 0.176
Epoch 1 Training:  50%|█████     | 5/10 [00:01<00:01,  4.27it/s]INFO:domiknows.program.program:closs is not zero
INFO:domiknows.program.program:loss (i=5) = 0.200
Epoch 1 Training:  60%|██████    | 6/10 [00:01<00:00,  4.29it/s]INFO:domiknows.program

Accuracy on Test: 100.00
