# Text2Bool

It would be useful for us (from a compliance verification point-of-view) if we could have a text2bool tool that could take in a sentence from the AI act and give out a boolean expression in multiple boolean variables that's output would be a boolean value representing whether the user is compliant with that particular regulation.

We found that simple relations (e.g. edges in a knowledge graph) struggle to encode some of the more complicated, conditional requirements seen in the AI Act. 
Boolean expressions could be used to encode the entire rulesego

e.g. A simplified ruleset could be

`compliant = (used_large_compute && is_open_source) || (using_facial_recognition && (for_research_purposes  && is_open_source) || (used_by_governments && assists_in_criminal_investigations))`

This reflects the fact that certain techniques are only allowed in certain circumstances (e.g. facial recognition may be allowed only if it's open source and for research purposes or used by the government for crime investigation).

## Generating rulesets

So far, we pass in legal texts sentence-by-sentence and try to extract relationships from them, e.g. "AI systems cannot use facial recognition". If we picked up on a relationship like that then we would use that in the knowledge graph, which would be applied to all AI systems. This is obviously a naive approach as there are certain situations where this doesn't apply.

We now have access to metadata about each sentence, such as the article number and point/subpoint it came from. This is useful because if we know that all of the rules in a certain article only apply in a certain situation, then we can append that to the boolean ruleset. E.g. if we read "Models under 1 billion parameters are not subject to the regulations set out in Article X", then we can find that section of the ruleset and append `&& has_over_1B_params` to that section of the ruleset. 

We can probably implement this as a custom Python class that has some functionality built on top of a standard boolean variable

In [10]:
class ExtendedBool:
    """
    Replaced by BooleanExpression class ;(
    RIP ExtendedBool 2024-2024
    """
    def __init__(self, value, case_or_situation=None, article_num=0):
        self.value = value
        self.case_or_situation = case_or_situation
        self.article_num = article_num

    # TODO: Overload the logical opeerators ("and", "or", etc) instead of these functions

    def _and(self, other):
        return ExtendedBool(self.value and other.value)

    def _or(self, other):
        return ExtendedBool(self.value or other.value)

    def __repr__(self):
        return f"{self.value}"

    
bool1 = ExtendedBool(True, 24)
bool2 = ExtendedBool(False, 12)
bool3 = bool1._and(bool2)
bool4 = bool1._or(bool2)
print(bool3, bool4)


False True


In [15]:
class Ruleset:
    def __init__(self, ruleset):
        self.ruleset = ruleset
    
    # This might not work because it could evaluate the ruleset to true or false before, 
    #   so I need to think of a way to represent a boolean expression rather than just a boolean variable
    def modify(self, article_num, case_or_situation, point_num=None, subpoint_num=None):
        # TODO: Add functionality for points and subpoints
        if not point_num and not subpoint_num:
            self.ruleset[article_num] = self.ruleset[article_num] and case_or_situation

In [16]:
# User model data, pretend this was extracted from the model documentation
is_over_1b_params = False
is_open_source = True
for_research_purposes = True

ruleset_dict = {
    "Article 1":
    {
        "Point 1": is_open_source or for_research_purposes
    }
}

ruleset = Ruleset(ruleset_dict)
ruleset.modify("Article 1", ExtendedBool(True, is_over_1b_params))

In [17]:
# TODO: Finish the following example code to "automatically" change the ruleset given a sentence.

def find_which_rules_apply_to_which_articles(ai_act_sentence, ruleset):
    # Pretend there's some actual code here... Maybe we can get an LLM to do this on the fly
    if sentence == "Article 1 does not apply to models under 1b params":
        ruleset.modify("Article 1", ExtendedBool(True, is_over_1b_params))
    elif sentence == "Article 2 does not apply to open_source_models":
        ruleset.modify("Article 2", ExtendedBool(True, is_open_source))


sentence = "Article 1 does not apply to models under 1b params"
find_which_rules_apply_to_which_articles(sentence, ruleset)


# How to represent Boolean Expressions
We can have an extended Boolean Expression class that contains one or two boolean operands and a boolean operator. We can construct Boolean Expressions from these and evaluate them by recursively evaluating the sub-boolean expressions.


In [65]:
class BooleanExpression:
    """
    Represents a Boolean expression.

    first: ExtendedBool
    second (optional): ExtendedBool
    operator (optional): String
    """
    def __init__(self, first=None, operator=None, second=None, value=None):
        self.operator = operator
        self.first = first
        self.second = second
        self.evaluate(value)


    def evaluate(self, value=None):
        if (not self.first) or (not self.second):
            self.value = value

        # Debugging
        print("Got Here, {}")

        if self.operator == "AND" and self.first.value and self.second.value:
            self.value = True
        if self.operator == "OR" and (self.first.value or self.second.value):
            self.value = True
        if self.operator == "NOT":
            self.value = not self.first.value

    def __and__(self, other):
        return BooleanExpression(first=self, second=other, operator="AND", value=self.evaluate())

    def __or__(self, other):
        return BooleanExpression(first=self, second=other, operator="OR")

    def __invert__(self):
        return BooleanExpression(first=self, operator="NOT")

    def __bool__(self):
        return bool(self.evaluate())
    
    def __repr__(self):
        return f"ExtendedBool({self.first=}, {self.second=}, {self.value=}, {self.operator=})"

In [16]:
# Pseudocode for how I want the class to work

def string_to_bool(str):
    # TODO: Make this more robust
    return str == "y"


def getBoolExpr(name, query):
    bool_value = string_to_bool(input(query))
    return BooleanExpression(name, bool_value)


user_params = {}

# Query the users to get information about their situation
user_params["is_open_source"] = getBoolExpr("is_open_source", "Is your code open-source (y/n): ")
user_params["over_1b_params"] = getBoolExpr("over_1b_params", "Does your AI system have >1b params (y/n): ")
# ... TODO

In [19]:
# Boolean expression representing the requirement (is_open_source or (not over_1b_params))
not_over_1b_params = BooleanExpression(operator="NOT", first=user_params["over_1b_params"])
is_open_source_or_not_over_1b_params = BooleanExpression(first = user_params["is_open_source"],
                                                         operator = "OR",
                                                         second = not_over_1b_params)

# TODO: Write code to replace that with something much simpler, the equivalent should be:
# requirement = is_open_source or not over_1b_params
# Hopefully python operator overloading allows this and retains the correct operator precedence

requirement = is_open_source_or_not_over_1b_params

In [46]:
name_is_dylan = BooleanExpression(value=True)
studies_at_trinity = BooleanExpression(value=True)
likes_karl_marx = BooleanExpression(value=False)

not_name_is_dylan = ~name_is_dylan

dylan_and_studies_at_trinity = name_is_dylan & studies_at_trinity
description_of_me = name_is_dylan & studies_at_trinity & ~likes_karl_marx

TypeError: __bool__ should return bool, returned NoneType

In [68]:
a = BooleanExpression(value=True)
b = BooleanExpression(value=True)
c = a & b
print(f"{a=}")
print(f"{b=}")


Got Here, {self}
Got Here, {self}
Got Here, {self}
Got Here, {self}
Got Here, {self}
a=ExtendedBool(self.first=None, self.second=None, self.value=None, self.operator=None)
b=ExtendedBool(self.first=None, self.second=None, self.value=True, self.operator=None)


In [40]:
dylan_and_studies_at_trinity

ExtendedBool(self.first=ExtendedBool(self.first=None, self.second=None, self.value=True, self.operator=None), self.second=ExtendedBool(self.first=None, self.second=None, self.value=True, self.operator=None), self.value=None, self.operator='AND')

In [None]:
# Sample sentece (not actually from the AI Act)
sentence = "If the AI system is open source or it is not over 1 billion parameters in size, then the provider of the AI system does not have to undergo extra safety testing"




# Alternative Approaches

There are multiple ways we can approach this:

## Labelling each edge
While going through the act, we generate a number of edges in the knowledge graph. These represent the "requirements" set out in the act. Within each point, there could also be more information relating to the exact situation that the requirement must be met. These situations can be represented by a boolean expression. 

So each time we go over a sentence, we get:
    1. A requirement / relationship in the knowledge graph
    2. A boolean expression representing the situation where that requirement has to be satisfied.

A sample pseudocode would be:

```python
for sentence in ai_act:
    relationship, situation = find_relationship(sentence)  # Query an LLM to do relation extraction and boolean situation extraction
```

The 'find_relationship()` above would query a language model (probably with few-shot examples) and parse the output and return a triplet representing the relationship as well as custom BooleanExpression class representing the situation.
