# [DK914] Context of difference

The aim of this project is to propose a formal definition of the contextual difference relation between 2 URIs. 
A context of difference can be a subgraph of the instances description of the URIs. Then develop a tool that can extract these subgraphs for each pair of URIS.


## Authors : Yohan Chalier, François Amat, Subhy Albakour, Luka Jakovljevic

## Contact : {firstname}.{lastname}@telecom-paristech.fr

# This project has been split in different parts:
1. The first part is the parsing, which consist at getting the owl ontology from the IIMB_LARGE files into the python owl api.

2. The second part is the brain of our algorithm. The computation of common_properties or differences between two individuals. 

3. The third part is optimisation of our algorithm; the usage of multithreading.

4. The final part is testing. 

# 1. Parsing

The algorithm described here is in the file parser.py under the class `Ontology`.

```python
    def __init__(self, subfolder_id="000", folder="IIMB_LARGE",
                 filename="onto.owl"):
        """ Load an ontology from a given folder.

        By extracting the default folder IIMB_LARGE, subfolders are of the
        form 000, 001, 002, ..., 080. Each contains a file onto.owl with an
        ontology in XML format.

        This method intializes a Ontology object by loading the content of a
        given folder.

        """

        self.iri = os.path.join("file://", sys.path[0],
                                folder, subfolder_id, filename)
        print("Loading ontology at", self.iri)
        owl.Ontology.__init__(self, owl.World(), base_iri=self.iri+"#")
        self.load()

    def select(self, query):
        """ Selects one item based on its node name.

        If none element is found, returns None.

        """

        search = self.search(iri="*"+query)
        if len(search) > 0:
            return search[0]
        return None
```

We define the `class Ontology(owl.Ontology)` which use the python library `owlready2`. In order to adapt this class to our project we need to change the `init` and `select` methods.

The init method takes for arguments the filename of the ontologies, the name of the folder and subfolder.
For the select method, we only need the name of the element if it exists. 

# 2. Get the differences

The algorithm described here is in the file context.py under the name `difference`

``` python
def difference(idv_a, idv_b, depth=1, keep_values=False, verbose=False):
    """ Computes the difference subgraph of two entities.

    Returns a JSON-like dictionnary. Keys are properties, and values are either
    nested elements with other entities, or a pair of "real" (string) values
    that differed from the two elements.

    The `depth` parameter controls the recursion depth of the search.

    The parameters `idv_a` and `idv_b` are objects representing individuals from
    the two ontologies considered.

    """

    if verbose:
        print("Entering difference for", idv_a,
              "and", idv_b, "width depth", depth)

    graph = {}

    # prevents infinite recursion
    if depth >= 0:

        # only consider common properties
        for ppt_a, ppt_b in common_properties(idv_a, idv_b):

            if verbose:
                print("Property:", ppt_a)

            # each property may have several values
            # TODO: maybe product comparison is not the best
            for value_a in ppt_a[idv_a]:
                for value_b in ppt_b[idv_b]:

                    if owl.ObjectProperty in ppt_a.is_a:
                        # `value_a` and `value_b` are instances of classes from
                        # the ontology, hence we go deeper in the graph
                        sub_graph = difference(value_a, value_b, depth - 1)
                        if len(sub_graph) > 0:
                            graph[str(ppt_a)] = sub_graph

                    elif owl.DataProperty in ppt_a.is_a:
                        # `value_a` and `value_b` are simple strings, so we just
                        # match them
                        if value_a != value_b:
                            if keep_values:
                                graph[str(ppt_a)] = {"a": value_a, "b": value_b}
                            else:
                                graph[str(ppt_a)] = {}

    if verbose:
        print("Exiting difference.")

    return graph
```

The input of the difference algorithm is a pair of elements (idv_a, idv_b).
We want to build the graph of their differences.

First, we takes all the properties they have in commun, for instance if idv_a has an attribute 'createdAt',
and idv_b has also an attribute 'createdAt', they have' createdAt' as a commun property.
```python 
for ppt_a, ppt_b in common_properties(idv_a, idv_b):
```

Next, we takes all the values of the properties found, in most case it is a single value but in order to be general we consider them as an array of values.


```python 
            for value_a in ppt_a[idv_a]:
                for value_b in ppt_b[idv_b]:                
```


Then, we test if the property value is a node or not, if it is a node we call `difference` on the subgraph.
If it is not a node, it is a name (we have designed our ontology class for that). We compare the name and if it is not the exact same string name we add the difference in a graph. 

Finaly, we return the graph. 


# 3. Optimisation

The algorithm described here is in the file process.py under the classes `worker` and `process`

# 4. testing