# Geological Interpretor Development

This is a notebook for testing and developping some of the basic code in this package.

## Testing ontology manipulation

The knowledge manipulated in this package is formalised in an ontology,<br>
which is store in a *.owl* file.

It is named **MOGI** for **M**inimal **O**ntology for **G**eological **I**nterpretation

To manipulated this ontology, we use the package **owlready2** available from here: https://owlready2.readthedocs.io

In [None]:
import owlready2 as owl

In [None]:
owl.onto_path.append("../ontologies/")
mogi = owl.get_ontology("mogi.owl").load()
mogi

Ontology provides access to its components, e.g.:
* classes
* properties
* individuals
* rules

In [None]:
print(list(mogi.classes()))
print(list(mogi.properties()))
print(list(mogi.individuals()))
print(list(mogi.rules()))

More specific elements can be searched through simple queries:

In [None]:
mogi.search(iri = "*Surface")

In [None]:
context = mogi.Geologic_Context('Data_properties')

In [None]:
context.get_properties()

In [None]:
context.INDIRECT_get_properties()

### Reasoner

Ontologies are even more powerful thansk to their capabilities to use reasoning for infering types, properties, and relationships that were not explicitly stated.
This is usefull for obtaining results implied by the already stated information.

This is achieved by running a *reasoner* on the ontology as follows.

In [None]:
owl.sync_reasoner(infer_property_values=True)

## Geological Knowledge Manager

**GeologicalKnowledgeManager** may know different instances of **GeologicalKnowledgeFramework**,<br>
for example to allow differenciating scenarios or for allowing customisation of knowledge and its formalisation.

**GeologicalKnowledgeFramework** provides access to concept definitions for providing knowledge.

In [None]:
import os

class GeologicalKnowledgeManager(object):
    """GeologicalKnowledgeManager is managing one or several GeologicalKnowledgeFramework.
    
    The GeologicalKnowledgeManager is typically a singleton, so there is always one and only one instance of it.
    
    The GeologicalKnowledgeManager may know different instances of GeologicalKnowledgeFramework,
    for example to allow different interpretation scenarios or for allowing user-specific customisation
    of knowledge and its formalisation.
    
    GeologicalKnowledgeFramework are typically ontologies and extensions defined in this package or elsewhere.
    """
    
    def __new__(cls):
        """Method to access (and create if needed) the only allowed instance of this class.
        
        Returns:
        - an instance of GeologicalKnowledgeManager"""
        if not hasattr(cls, 'instance'):
            cls.instance = super(GeologicalKnowledgeManager, cls).__new__(cls)
            cls.initialised= False
            print("DEBUG::creates new manager")
        return cls.instance
        
    def __init__(self, default= "mogi", default_source_directory= "../ontologies/", default_source_file= "mogi.owl", default_ontology_backend= "owlready2"):
        """Initializes the GeologicalKnowledgeManager with some default values from configuration.
        
        Parameters:
        - default: specifies the name of the default knowledge framework
        - default_source_directory: specifies the default folder containing of the knowledge framework definitions
        - default_source_file: file contained in the source_directory defining the knowledge framework (e.g., .owl file)
        - default_ontology_backend: specifies the default ontology backend to be used
        """
        print("DEBUG::__init__")
        if not self.initialised:
            self._initialise(default= default, default_source_directory= default_source_directory, default_source_file= default_source_file, default_ontology_backend= default_ontology_backend)
            
    def _initialise(self, default, default_source_directory, default_source_file, default_ontology_backend):
        """Initializes the GeologicalKnowledgeManager with some default values from configuration.
        
        Parameters:
        - default: specifies the name of the default knowledge framework
        - default_source_directory: specifies the default folder containing of the knowledge framework definitions
        - default_source_file: file contained in the source_directory defining the knowledge framework (e.g., .owl file)
        - default_ontology_backend: specifies the default ontology backend to be used
        """
        print("DEBUG::initialize manager")
        self.default= default
        self.default_source_directory= default_source_directory
        self.default_source_file= default_source_file
        self.default_ontology_backend= default_ontology_backend
        
        self.knowledge_framework_dict = {}
        
        self.initialised= True
        
    def reset(self, default= "mogi", default_source_directory= "../ontologies/", default_source_file= "mogi.owl", default_ontology_backend= "owlready2"):
        """Reinitializes the GeologicalKnowledgeManager with some default values from configuration.
        
        Parameters:
        - default: specifies the name of the default knowledge framework
        - default_source_directory: specifies the default folder containing of the knowledge framework definitions
        - default_source_file: file contained in the source_directory defining the knowledge framework (e.g., .owl file)
        - default_ontology_backend: specifies the default ontology backend to be used
        """
        print("DEBUG::reset manager")
        self._initialise(default= default, default_source_directory= default_source_directory, default_source_file= default_source_file, default_ontology_backend= default_ontology_backend)
             
    def load_knowledge_framework(self, name=None, source= None, source_directory= None, backend= None):
        """Gets and initilises the ontology from the specified source.
        
        Parameters:
        - name: the name to be given to the knowledge framework. If None (default) the file name will be used.
        - source: filename to the ontology source. If None(default) the default ontology is used.
        - source_directory: where the system should look for ontology definition files. If None, the `GeologicalKnowledgeFramework` will decide.
        - backend: the ontology backend to be used. If None, the `GeologicalKnowledgeFramework` will decide."""
        source = source if source is not None else self.default_source_file
        name = name if name is not None else os.path.basename(source).split(os.path.extsep)[0]
        self.knowledge_framework_dict[name] = GeologicalKnowledgeFramework(name= name, source= source, source_directory= source_directory, backend= backend)
    
    def get_knowledge_framework(self,name= "default"):
        """Accessor to knowledge frameworks."""
        name = self.default if name == "default" else name
        assert len(self.knowledge_framework_dict) > 0, "No ontology has been loaded yet. Please use GeologicalKnowledgeManager().load_knowledge_framework() first"
        assert name in self.knowledge_framework_dict.keys(), "The specified ontology hasn't been loaded: "+name+\
            "\navailable ontology names are: "+"\n".join(self.knowledge_framework_dict.keys())
        return self.knowledge_framework_dict[name]
    
class GeologicalKnowledgeFramework(object):
    """A GeologicalKnowledgeFramework holds the definition of concepts and relationships describing knowledge.
    
    This is typically an overlay around a formal ontology definition, which also brings additional capabilities,
    such as algorithms and factories to achieve specific tasks and create objects."""
    
    def __init__(self, name, source, source_directory= None, backend= None):
        """Initialise a KnowledgeFramework form a given ontology file (source).
        
        Parameters:
        - name: should be the name under which this KnowledgeFramework is known in the manager
        - source: the source file for the ontology definition
        - source_directory: the directory where the source files for the ontology definition are looked for.
        If None (default) the default path provided by the `KnowledgeManager` is used.
        - backend: the ontology backend to be used for this knwoledge framework.
        If None (default) the default ontology backend provided by the `KnowledgeManager` is used."""
        self.name= name
        print(source)
        self.__source_directory= None
        self.init_source_directory(source_directory)
        self.initialise_ontology_backend(backend)
        print(source)
        self.load_ontology(source)
    
    def init_source_directory(self, source_directory):
        """Initialises the folder where source files are searched.
        
        Parameters:
        - source_directory: if None, the previous value is used if it wasn't None, else the `GeologicalKnowledgeManager`default is used."""
        if source_directory is not None:
            self.__source_directory= source_directory
        elif self.__source_directory is None:
            self.__source_directory= GeologicalKnowledgeManager().default_source_directory
    
    def initialise_ontology_backend(self, backend_name:str= None):
        """Initializes the ontology package used as a backend to access ontologies.
        
        This will:
        - try to import the backend as onto
        - set the default path for ontologies"""
                
        self.__ontology_backend = None
        backend_name= GeologicalKnowledgeManager().default_ontology_backend if backend_name is None else backend_name
        if backend_name == "owlready2":
            try:
                import owlready2 as owl2 
                self.__ontology_backend = owl2
                if self.__source_directory not in self.__ontology_backend.onto_path:
                    self.__ontology_backend.onto_path.append(self.__source_directory)
            except ImportError:
                raise ImportError("Your are trying to use Owlready2 as a backend for ontology management, but it doesn't appear to be installed."\
                "This is either because OwlReady2 is given as default option or because you asked for it."\
                "Please install the OwlReady2 package from https://owlready2.readthedocs.io"\
                "or give another backend through GeologicalKnowledgeManager().initialise_ontology_backend()")
                
            # also test if java is correctly installed & accessible, as it is used by owlready2 for reasoning
            try:
                os.system("java -version")
            except:
                raise ImportError("Java doesn't appear to be installed properly as the command `java -version` returned an error."\
                    "This error occured while loading owlready2 package as an ontology backend, because java is used for the reasoning engine.")
        else:
            raise Exception("The specified backed for ontology is not supported: "+backend_name)
          
        
    def load_ontology(self, source):
        """Loads the ontology specified by source.
        
        Parameters:
        - source: the source file for the ontology definition
        - source_directory: the directory where the source files for the ontology definition are looked for.
        If None (default) the default path provided by the `KnowledgeManager` is used."""
        self.__source= source
        print(source)
        try:
            self.__onto = self.__ontology_backend.get_ontology(self.__source).load()
        except Exception as err:
            raise Exception("Unexpected exception received while loading ontology:\n - source: {}\n - onto_path: {}".format(self.__source, self.__ontology_backend.onto_path))
        
    def __call__(self):
        return self.__onto
        
    def get_ontology_backend(self):
        """Gets the ontology backend"""
        assert self.__ontology_backend is not None, "Trying to access the ontology backend without initialising it."
        return self.__ontology_backend
    
    def search(self, name= None, type= None, qualities= None, prepend_star=True) -> list:
        """Search function to interface the serach capabilities of the internal ontology
        
        Parameters:
        - name: the name of the search object (you can use * to replace any set of characters and ? to replace any single character)
        Note: if `prepend_star` a * is always prepended to allows the search to work because of the internal prefix names
        - type: the type of the searched observations as defined by the internal ontology
        - qualities: qualities to filter the observations.
        If `qualities` is a :
         * `str`: a single quality will be searched for with any value ("*"),
         * `list`: a list of qualities will be searched for with any values ("*")
         * `dict`: a list of qualities defined by the keys and with the associated values will be searched for"""
        if name is None:
            name = "*"
        elif prepend_star:
            name = "*"+name
        
        if isinstance(qualities,list):
            kargs = {quality_i: "*" for quality_i in qualities} 
        elif isinstance(qualities,str):
            kargs = {qualities: "*"}
        elif isinstance(qualities,dict):
            kargs = qualities
        else:
            kargs = {}
            assert (qualities is None) or isinstance(qualities,dict), "qualities should be given as either None, a str, a list, or a dict"
        if type is not None: kargs["type"] = type
        return self.__onto.search(iri= name, **kargs)
    
    def sync_reasoner(self, **kargs):
        """Synchronise the reasoner.
        
        Parameters:
        - **kargs:
        |-infer_property_values"""
        self.__ontology_backend.sync_reasoner(**kargs)
    

In our approach, geological datasets will be progressively interpreted in terms of structural objects,<br>
based on a formal definition of concepts own by a **GeologicalKnowledgeManager**.<br>


In [None]:
GeologicalKnowledgeManager()

In [None]:
GeologicalKnowledgeManager()

In [None]:
GeologicalKnowledgeManager().reset()

In [None]:
GeologicalKnowledgeManager()

In [None]:
GeologicalKnowledgeManager().load_knowledge_framework()
GeologicalKnowledgeManager().get_knowledge_framework()

In [None]:
GeologicalKnowledgeFramework("mogi","mogi.owl")

In [None]:
GeologicalKnowledgeManager().knowledge_framework_dict

In [None]:
mogi = GeologicalKnowledgeManager().get_knowledge_framework()
mogi.name

In [None]:
mogi().classes

In [None]:
mogi().search(type= mogi().Ponctual_Observation)

In [None]:
mogi().D1.dip

In [None]:
mogi().search(type= mogi().Ponctual_Observation, dip= "*")

In [None]:
[obs_i.dip[0] for obs_i in mogi().search(type= mogi().Ponctual_Observation, dip= "*")]

In [None]:
[obs_i.dip[0] for obs_i in mogi().search(type= mogi().Ponctual_Observation, dip= 45)]

In [None]:
{obs_i.name: obs_i.dip[0] for obs_i in mogi().search(type= mogi().Ponctual_Observation, dip= "*")}

In [None]:
mogi.search(type= mogi().Ponctual_Observation)

In [None]:
mogi.search(type= mogi().Ponctual_Observation, qualities= "dip")

In [None]:
mogi.search(type= mogi().Ponctual_Observation, qualities= ["dip","occurrence"])

In [None]:
mogi.search(type= mogi().Ponctual_Observation, qualities= {"dip":45})

In [None]:
mogi.search(qualities= {"dip":45})

In [None]:
mogi.search()

In [None]:
mogi.search(name="*D1?")

In [None]:
mogi.search(name="D1")

### Object implementation

We distinguish two kind of operations here:
* representation
* visualisation

A representation is a formal description of how something appears in a given representation space, but it doesn't have to be visualised.<br>
A visualisation takes care of the rendering of a representation with a given support (image, screen).

Representation should also be made a bit more abstract.<br>
1. There is a variety of object that can be rendered in a representation space (typically, different kinds of a dataset components)
2. Several kinds of representation spaces could be envisionned (e.g., spatial 1D,2D,3D, or temporal, or just an abstract text)

In [None]:
import numpy as np

class RepresentationSpace(object):
    """A general framework for Representating geological objects"""
    
class TemporalRepresentationSpace(RepresentationSpace):
    """A `RepresentationSpace` representing temporal apsects of represented objects."""
    
class PhysicalRepresentationSpace(RepresentationSpace):
    """A type of `RepresentationSpace` representing physical aspects of the represented objects."""
    
    __default_coordinate_labels = ["X","Y","Z"]
    
    def __init__(self, dimension: int=None, coordinate_label: str|list= None, dataset= None, **kargs):
        """Initialisation of the representation space.
        
        Parameters:
        - dimension (int): specify the number of dimensions of the representation space, typically 1D, 2D, or 3D (i.e., 1, 2, or 3),
        NB: larger dimension spaces are not supported. At least either the `dimension` parameter or `coordinate_label` parameter should be given.
        - coordinate_label(str|list(str)): gives the label(s) of the coordinates. If given, the number of dimensions is deduced from the size of the list
        and `dimensions`is ignored, otherwise, the labels are taken from the `__default_coordinate_labels` based on the number of `dimension`s. 
        At least either the `dimension` parameter or `coordinate_label` parameter should be given.
        - dataset: a Dataset object containing the data to be attached to this representation space.
        Note that the RepresentationSpace can be created first and then updated automatically when creating the dataset attached to this space.
        - **kargs:
            - use_extension: if True, uses the extension of the dataset, else keeps the current ones
            - padding: if use_extension is True, the given paddign will be used to keep a space around the dataset
        """
        assert not (coordinate_label is None and dimension is None), "At least one of the parameters should be specified"
        if coordinate_label is None:
            assert isinstance(dimension, int),"dimension parameter must be an integer"
            assert dimension in [1,2,3], "The specified number of dimensions ({:d}) is not supported, should be 1, 2 or 3.".format(dimension)
            self.dimension= dimension
            self.coordinate_labels= PhysicalRepresentationSpace.__default_coordinate_labels[:self.dimension]
        elif isinstance(coordinate_label,str):
            self.dimension= 1
            self.coordinate_labels=  [coordinate_label]
        elif isinstance(coordinate_label, list):
            self.dimension= len(coordinate_label)
            self.coordinate_labels= coordinate_label
        else:
            raise("Unsupported initialisation of representation space: dimension({}) and coordinate_label ({}).\n At least one of the parameters shoudl be specified.".format(dimension, coordinate_label))
    
        self.__default_padding= 0.05
        self.__datasets= []
        self.set_extension()
        self.attach_dataset(dataset,**kargs)
        
    def attach_dataset(self, dataset, use_extension= True, padding= None, **kargs):
        """Attach a dataset to the representation space
        
        Parameters:
        - dataset: a Dataset object to be attached
        - use_extension: if True, uses the extension of the dataset, else keeps the current ones
        - padding: if use_extension is True, the given paddign will be used to keep a space around the dataset"""
        if dataset == None: return
        if dataset in self.__datasets: return
        self.data += [dataset]
        
        if use_extension:
            self.set_extension_from_data(padding= padding)
        
        dataset.register_representation_space(self)
        
    def set_extension(self, extension:list= None):
        """Setter for the extension (min,max) of the representation space
        
        Parameters:
        - extension: a list containing a pair of min and max value for each dimension of the space.
        If None, the default will be set, i.e., [[0,1]] * dimension"""
        if extension is None: 
            self.extension = [[0,1]]*self.dimension
            return
        extension= np.array(extension)
        assert extension.shape[0] == self.dimension, "The specified extension ({}) do not match the space dimensions ({})".format(extension, self.dimension)
        assert extension.shape[1] == 2, "The specified extension should provide both lower andupper bounds for each dimension, given: {}".format(extension)
        self.extension= extension
        
    def set_extension_from_data(self, padding= None):
        """Sets the extension of the space from the attached dataset
        
        If no dataset is attached yet, then default extension are used instead (min:0,max:1).
        
        Parameters:
        - padding: a space that is left around the dataset, either a value compatible with the coordinates, or a list of values of same dimensions.
        If None, by default the padding is 5% of the dataset range.
        """
        non_empty_dataset = [data_i for data_i in self.__datasets if data_i.extension is not None]
        if len(non_empty_dataset) == 0:
            self.set_extension()
            return
        
        if padding is None:
            padding= self.__default_padding
        else:
            try: # check if padding as a dimension
                len(padding)
            # if not, then use it a a scaling 
            except TypeError: #just checking it is a number
                assert type(padding) == int or type(padding) == float, "padding should be given as a number (int or float), here: "+type(padding)
                # keep the padding as is in this case
            else:# else check its dimensions are ok
                assert len(padding) == self.dimension, "the dimensions of the specified padding (len({})->) should match the space dimension ({})".format(padding, len(padding), self.dimension)
                padding= np.array(padding)
                
        extension = non_empty_dataset[0].extension
        for data_i in non_empty_dataset[1:]:
            for dim_i in self.dimension:
                extension[dim_i,0] = min(extension[dim_i,0], data_i.extension[dim_i])
                extension[dim_i,1] = max(extension[dim_i,1], data_i.extension[dim_i])
        """:todo: use projected coordinates instead of source coordinates, might fail if 3D data projected on a map"""
                
        # in any cases, except when padding and data is None
        self.center= np.mean(extension, axis= 1)
        diff= extension.T - self.center
        extension= self.center + (1+2*padding)*diff
        self.set_extension(extension.T)
    
    def __str__(self):
        """Description of the physical space parameters and data
        """
        desc= ["Representation space of type: {:s}".format(type(self).__name__)]
        desc+= ["- Number of dimension(s): {}".format(self.dimension)]
        desc+= ["- Coordinate label(s): {}".format(self.coordinate_labels)]
        desc+= ["- Space extension:"]
        for dim_i, lim_i in zip(self.coordinate_labels,self.extension):
            desc+= [" - Coord {}: {}".format(dim_i, lim_i)]  
        return "\n".join(desc)

In [None]:
space= PhysicalRepresentationSpace(2)

space.set_extension_from_data(padding= None)

print(space)


In [None]:
PhysicalRepresentationSpace("depth")

In [None]:
PhysicalRepresentationSpace()

In [None]:
space= PhysicalRepresentationSpace(coordinate_label=["X","Y"])
print(space)

In [None]:
space= PhysicalRepresentationSpace(coordinate_label=["X","Z"])
print(space)

## Dataset

Todo:
* replace storage of data from data_map to ontology directly
* function to create and add data to ontology
* read table -> data in ontology
* visualisation of given data types

In [None]:
import logging

class GeologicalDataset(object):
    """A GeologicalDataset gathers information about geological data to be interpreted.
    
    This class is a hybrid ontology&python class. It is providing pythonic algorithm and high level interface,
    while the data is actually stored in an ontology.
    """
    
    def __init__(self, physical_space= None, time_space= None, representation_spaces= None, knowledge_framework= None):
        """Initialises a `GeologicalDataset`
        
        Parameters:
        - physical_space: a `PhysicalRepresentationSpace`,  which defines the spatial coordinates of this dataset
        - time_space: a `TemporalRepresentationSpace`,  which defines the time coordinates of this dataset
        - representation_spaces: list of `Representationspace`s to which the dataset must be attached
        Note: datasets can be created without representation space and attached later on by using `RepresentationSpace.attach_dataset`
        or `GeologicalDataset.register_representation_space`.
        Alternativelly, a single `PhysicalRepresentationspace` and or `TemporalRepresentationspace` can be given here if `physical_space` and `time_space` are None.
        - default_representation_space: the main RepresentationSpace to which this dataset is attached.
        If None and representation_spaces are provided, then the first one will be taken.
        If the default one is not initially in the full list, then it is added to it.
        - ontology: the name of the ontology to be used for storing the data.
        If None, the default will be taken from `the GeologicalKnowledgeManager`.
        
        Internals: this method initialises several internal attributes:
        - extension: represents the extension of the dataset in the attached representation space
        (i.e., the default on if this dataset is represented in several representation spaces
        - representation_spaces: the dataset can be attached to and represented into several representation spaces,
        `physical_space` and `time_space` are included into this list.
        """
        
        self.knowledge_framework= knowledge_framework if knowledge_framework is not None else GeologicalKnowledgeManager().get_knowledge_framework()
        
        self.extension= None
        """This attribute stores extension of the dataset in its default representation space"""
        
        # setup representation spaces
        self.representation_spaces= set()
        self.setup_representation_space(physical_space, time_space, representation_spaces)
        
        # register the datast in the listed representation spaces
        for space_i in self.representation_spaces:
            space_i.attach_dataset(self)

    def __setup_space(self, space, space_type):
        """Check if space of given type is in list or parameter and return the appropriate value.
        
        Take the given physical/time space, or if None use the first one in the list, and if none just leave None.
        Adds the space to the `self.representation_spaces` set."""
        if space is not None: 
            self.representation_spaces.add(space)
            return space
        if len(self.representation_spaces) == 0: return None
        space_list= [space_i for space_i in self.representation_spaces if isinstance(space_i, space_type)]
        return space_list[0] if len(space_list) > 0 else None
        
    def setup_representation_space(self, physical_space= None, time_space= None, representation_spaces= None):
        """Setup the representation space list and default
        
        Parameters:
        - physical_space: a `PhysicalRepresentationSpace`,  which defines the spatial coordinates of this dataset
        - time_space: a `TemporalRepresentationSpace`,  which defines the time coordinates of this dataset
        - representation_spaces: list of `Representationspace`s to which the dataset must be attached
        Note: datasets can be created without representation space and attached later on by using `RepresentationSpace.attach_dataset`
        or `GeologicalDataset.register_representation_space`.
        Alternativelly, a single `PhysicalRepresentationspace` and or `TemporalRepresentationspace` can be given here if `physical_space` and `time_space` are None.
        """
        if representation_spaces is not None: self.representation_spaces = self.representation_spaces.union(representation_spaces)
        self.physical_space = self.__setup_space(physical_space, PhysicalRepresentationSpace)
        self.time_space = self.__setup_space(time_space, TemporalRepresentationSpace)
                
    def register_representation_space(self, space):
        """Register the given representation space as a space of representation of this dataset"""
        if space is None: return 
        self.representation_spaces.add(space)
        if (self.physical_space is None) and isinstance(space, PhysicalRepresentationSpace):
            self.physical_space = space
        if (self.time_space is None) and isinstance(space, TemporalRepresentationSpace):
            self.time_space = space
        space.attach_dataset(self)
        
    def get_observations(self, observation_type= None, qualities= None, name= None):
        """Accessor to the observations stored in the internal ontology
        
        Parameters:
        - observation_type: the type of the searched observations as defined by the internal ontology
        - qualities: qualities to filter the observations (c.f. `KnowledgeFramework.search`)
        - name: name of the serached observation (c.f. `KnowledgeFramework.search`)"""
        observation_type = observation_type if observation_type is not None else self.knowledge_framework().Ponctual_Observation
        return self.knowledge_framework.search(type= observation_type, qualities= qualities, name= name)
    
    def get_occurrence_observations(self):
        """helper method to access occurrence data, i.e., those having a occurrence quality
        
        :todo: for now the occurrence quality doesn't exist so all the observations are occurrence by default"""
        return self.get_observations() # qualities= "occurrence")
    
    def get_orientation_observations(self):
        return self.get_observations(qualities= "dip")
    
    def remove_observation(self, observations):
        """Removes the given observations stored in this dataset and internal ontology
        
        Parameters:
        - observations: an iterable containing objects of the internal ontology.
        Note that you can use the `search`method to get such a list"""
        for data_i in observations:
            self.knowledge_framework.get_ontology_backend().destroy_entity(data_i)
            
    def remove_observation_by_name(self, name:str):
        """Removes the given observations stored in this dataset and internal ontology"""
        self.remove_observation(self.get_observations(name= name))
        
    def remove_all_observations(self):
        """Removes all the observations stored in this dataset and internal ontology"""
        self.remove_observation(self.get_observations())
        
    def add_observation(self, name: str, **kargs):
        """creates a new observation and adds it to te internal ontology
        
        Parameters:
        - name: the name of the observation (similar to an observation id)
        - **kargs:
          |- any argument whose name corresponds to the `physical_space`coordinate labels or other properties.
          |   Note: all the coordinates must be specified"""
        self.knowledge_framework().Ponctual_Observation(name= name, **{key:[val] for key, val in kargs.items()})
        
    def add_occurrence_observation(self, name: str, observed_object:str, occurrence= True, **kargs):
        """creates a new occurrence observation and adds it to te internal ontology
        
        Parameters:
        - name: the name of the observation (similar to an observation id)
        - observed_object: the name of the observed object
        - occurrence: True (default) if the object was observed here, False if it was observed that it is not there.
        Note that this is different from not having observed that it is here, in which case there should not be an observation.
        - **kargs:
          |- any argument whose name corresponds to the `physical_space`coordinate labels or other properties.
          |   Note: all the coordinates must be specified"""
        if observed_object is not None: kargs["geology"]= observed_object
        if occurrence is not None: kargs["occurrence"]= occurrence
        self.add_observation(name,**kargs)
    
    def add_orientation_observation(self, name: str, observed_object:str, dip, dip_dir, occurrence= True, **kargs):
        """creates a new orientation observation and adds it to te internal ontology
        
        Parameters:
        - name: the name of the observation (similar to an observation id)
        - observed_object: the name of the observed object
        - dip: the value of the measured dip (in degrees, 0-90)
        - dip_dir: the value of the dip direction (in degrees, 0-360, from North towards the East)
        - occurrence: True (default) if the object was observed here.
        This is the default behaviour because if the measurement was made here, we assume that the object actually existed
        so this is in itself a proof ox occurrence. However, one might want to record the orientation without specifically attaching any
        observation of occurrence, in which case None should be given for occurrence and the quality won't be set.
        False, would not make much sense as it would imply that the orientation was measured but we observed that the object wasn't there.
        - **kargs:
          |- any argument whose name corresponds to the `physical_space`coordinate labels or other properties.
          |   Note: all the coordinates must be specified"""
        if occurrence is not None: kargs["occurrence"]= occurrence
        if occurrence == False: logging.warning("occurrence parameter was set to False while adding an orientation observation."\
            "This is weird because it would imply the measure was taken but the rock couldn't be observed."\
            "Did you intend to avoid recording the occurrence, in which case you should prefer None isntead of False.")
        
        if observed_object is not None: kargs["geology"]= observed_object
        if (dip is not None) and (dip_dir is not None):
            kargs["dip"]= dip
            kargs["dip_dir"]= dip_dir
        self.add_observation(name,**kargs)
        
    def __len__(self):
        return len(self.get_observations())
         
    def __str__(self):
        """Description of the dataset
        """
        desc= ["A dataset of type: {:s}".format(type(self).__name__)]
        n = len(self)
        if n == 0:
            desc+= ["The dataset is empty"]
        else:
            desc+= ["Size of dataset: {:d}".format(n)]
            desc+= ["Extension: "+str(self.extension)]
            desc+= ["- types of data:"]
            desc+= [" | occurrence:\t{} entries".format(len(self.get_occurrence_observations()))]
            desc+= [" | orientation:\t{} entries".format(len(self.get_orientation_observations()))]  
        return "\n".join(desc) 
    
def load_dataset_from_csv(source:str, dataset:GeologicalDataset = None, **kargs) ->GeologicalDataset:
    """Loads a dataset from a csv file
    
    Parameters:
    - source(str): the source file from which the data should be loaded
    - dataset: the `GeologicalDataset` in which the data will be loaded. If None, the dataset will be created.
    - **arkgs: passed to pandas.read_csv
    
    Return:
    - the `GeologicalDataset` with the newly loaded data (a new `GeologicalDataset` is created if needed)."""
    try:
        dataframe = pd.read_csv(source, **filter_kargs(pd.read_csv,**kargs))
    except Exception as e:
        e.add_note("This error occurred while loading a dataset from: ", source)
        e.add_note("Additional arguments were given: ", *["{}:{}".format(key,val) for key, val in kargs.items()])
        
    if len(dataframe.columns) < 3: logging.warning("There are less than 3 columns in the loaded dataset.\nCheck the output and consider changing the separator with sep keyword.")
    return load_dataset_from_dataframe(dataframe, dataset)

def load_dataset_from_dataframe(dataframe, dataset:GeologicalDataset = None, labels= None, index= "Id", dtypes= None):
    """Loads a dataset from a `pandas.DataFrame`
    
    Parameters:
    - dataframe(`pandas.DataFrame`): the source dataframe from which the data should be loaded
    - dataset: the `GeologicalDataset` in which the data will be loaded. If None, the dataset will be created.
    - labels: a dict to relabel the dataframe columns prior to loading in the dataset.
    This is usefull for example when the coordinates in the source aren't labelled the same as in the internal ontology.
    The format is {"old_label":"new_label", ...}.
    - index: the label of the column (in original DataFrame, i.e., before renaming), which is to be used as index
    - dtypes: a dict containing a mapping between column name and type
    ...
    
    Return:
    - the `GeologicalDataset` with the newly loaded data (a new `GeologicalDataset` is created if needed)."""
    if dataset is None:
        dataset = GeologicalDataset()

    dtypes= dtypes if dtypes is not None else {'x':float, 'y':float, 'z':float, 'dip_dir':float, 'dip':float, 'geology':str, 'observed_object':str, "occurrence":bool}
    if labels is not None:
        dataframe = dataframe.rename(columns= labels)
        index= labels[index] if index in labels else index
        dtypes= {labels[key] if key in labels else key: value for key, value in dtypes.items()}
    dataframe = dataframe.set_index(index)
    output_frame = output_frame.astype(dtypes)
        
    for name_i, values_i in dataframe.iterrows():
        dataset.add_observation(name_i, **values_i)
    
    return dataset

In [None]:
import inspect

def filter_kargs(target_function,**kargs):
    """Helper function to filter keyword arguments and only pass the needed ones in a function signature"""
    sig = inspect.signature(target_function)
    # check if there is a **kargs in the signature of the function, if yes it is ok as it will take care of the passed extra kargs
    if not any(p.kind == p.VAR_KEYWORD for p in sig.parameters.values()):
        extra_args = kargs.keys() - sig.parameters.keys()
        for args in extra_args:
            del kargs[args]
    return kargs

In [None]:
sign = inspect.signature(pd.read_csv)
sign.parameters.keys()

In [None]:
sign = inspect.signature(load_dataset_from_csv)
[p.kind for p in sign.parameters.values()]

### Manual entries

In [None]:
dataset= GeologicalDataset()
print(dataset)

In [None]:
dataset.get_observations()

In [None]:
dataset.get_observations(name="D8")

In [None]:
dataset.remove_observation_by_name("D8")
dataset.get_observations()

In [None]:
dataset.remove_all_observations()
print(dataset)

In [None]:
dataset.add_occurrence_observation(name="DD", observed_object= "Keuper", occurrence= True, x= 1, z= 2, y= 0)
print(dataset)

dataset.get_observations()

In [None]:
dataset.add_orientation_observation(name="DO", observed_object= "Trias", dip= 30, dip_dir= 270, x= 1, z= 2, y= 0)
print(dataset)

dataset.get_observations()

In [None]:
di = mogi().DD
for prop in di.get_properties():
    for value in prop[di]:
        print(prop,":", value)

In [None]:
di = mogi().DO
for prop in di.get_properties():
    for value in prop[di]:
        print(prop,":", value)

In [None]:
list(di.get_properties())

### From file

In [None]:
dataframe = load_dataset_from_csv("../inputs/data_for_paper.csv", labels={"ID":"Id", "X":"x","Y":"y","Z":"z"})

In [None]:
kargs = {"labels":{"ID":"Id", "X":"x","Y":"y","Z":"z"}}
filter_kargs(pd.read_csv,**kargs)

In [None]:
print(dataset)

In [None]:
dataframe = load_dataset_from_csv("../inputs/data_for_paper.csv", sep=";")

In [None]:
len(dataframe.columns)

In [None]:
dataframe.head()

## Creating a dataset

Data are actually described within the ontology, here thanks to the *Data* class.<br>
Adding new data points calls for creating new *Data* individuals (i.e., instances in the ontology).

In [None]:
import numpy as np
import pandas as pd

In [None]:
data_head = np.array(['name', 'x', 'y', 'z', 'dip_dir', 'dip', 'observed_object','occurrence'])
data_array = np.array([['D1', 15, 20, 35, 270, 45, 'Trias_Base',True],
                       ['D2', 30, 25, 50, 270, 45, 'Trias_Base',True],
                       ['D3', 60, 30, 40, 90, 45, 'Trias_Base',True],
                       ['D4', 75, 15, 25, 90, 45, 'Trias_Base',True],
                       ['D5', 110, 20, 40, 270, 63, 'Trias_Base',True],
                       ['D6', 120, 20, 60, 270, 64, 'Trias_Base',True],
                       ['D7', 155, 20, 60, 89, 39, 'Trias_Base',True],
                       ['D8', 190, 20, 30, 91, 40, 'Trias_Base',True],
                       ['D11', 25, 22, 45, np.nan, np.nan, np.nan,True],
                       ['D22', 50, 22, 50, np.nan, np.nan, np.nan,True],
                       ['D44', 100, 30, 20, np.nan, np.nan, np.nan,True],
                       ['D77', 168, 30, 47, np.nan, np.nan, np.nan,True]]
)
data_test = pd.DataFrame(data = data_array, columns = data_head)
data_test = data_test.astype({'name':str, 'x':float, 'y':float, 'z':float, 'dip_dir':float, 'dip':float, 'observed_object':str, 'occurrence':bool})
data_test.set_index("name", inplace = True)
data_test

In [None]:
data_test.info()

In [None]:
dataset.remove_all_observations()

In [None]:
# setting the dataset in the ontology by creating individuals
for name_i, values_i in data_test.iterrows():
    mogi().Ponctual_Observation(name_i, **{key:[val] for key, val in values_i.items()})
dataset.get_observations()

In [None]:
list(mogi().D1.get_properties())

In [None]:
# for loading dataset from the ontology
output_frame = pd.DataFrame(columns=["name","x","y","z","dip_dir","dip",'geology'])
output_frame.set_index("name",inplace=True)
for di in mogi.search(type = mogi().Ponctual_Observation):
    for prop in di.get_properties():
        for value in prop[di]:
            output_frame.loc[di.name,prop.name] = value
output_frame = output_frame.astype({'x':float, 'y':float, 'z':float, 'dip_dir':float, 'dip':float, 'geology':str})
output_frame.head()

In [None]:
dataset.info()

## Data visualisation

### Testing Data visualisation

In [None]:
import matplotlib.pyplot as plt

In [None]:
def draw_line(center, dip, dir, length= 1, ax= None, color = "black", **kargs):
    ax_plt = plt if ax is None else ax

    center = np.array(center)
    dip_rad = np.deg2rad(dip)
    vec_x =  np.cos(dip_rad)
    if dir == "left": vec_x *= -1
    vec_z = -np.sin(dip_rad)
    vect = 0.5 * length * np.array([vec_x,vec_z])
    start = center - vect
    end = center + vect
    ax_plt.plot([start[0],end[0]],[start[1],end[1]], color = color, **kargs)
    
    return vect
    
def draw_dip_symbol(center, dip, dir, length= 1, polarity= None, ax= None, color = "black", polarity_ratio= 0.4, **kargs):
    ax_plt = plt if ax is None else ax
    
    vect = draw_line(center= center, dip= dip, dir= dir, length= length, ax= ax_plt, color = color, **kargs)
    
    if polarity is not None:
        vect_pol = polarity_ratio * np.array([-vect[1],vect[0]])
        if (dir == "left" and polarity == "up") or (dir == "right" and polarity == "down") : vect_pol *= -1
        ax_plt.arrow(*center,*vect_pol, width=length/100, color = color, **kargs)
        

In [None]:
draw_line([0,0],30, "left")
draw_dip_symbol([0,1],60, "right", polarity= "up", color= "red" )
plt.gca().set_aspect("equal")

In [None]:
def draw_dataset( dataset, ax= None, **kargs):
    ax_plt = plt if ax is None else ax
    
    for data_i in dataset.itertuples():
        if (data_i.dip != np.nan) and (data_i.dip_dir != np.nan):
            dir = "right" if data_i.dip_dir < 180 else "left"
            draw_dip_symbol( center= [data_i.x,data_i.z], dip= data_i.dip, dir= dir, **kargs)

In [None]:
draw_dataset(dataset, length=10, polarity="up")
plt.gca().set_aspect("equal")

In [None]:
next(dataset.itertuples()).dip

In [None]:
# setting the dataset in the ontology by creating individuals
for name_i, values_i in dataset.iterrows():
    mogi.Ponctual_Observation(name_i, **{key:[val] for key, val in values_i.items()})
mogi.search(type = mogi.Ponctual_Observation)

In [None]:
# for loading dataset from the ontology
dataset = pd.DataFrame(columns=["name","x","y","z","dip_dir","dip",'geology'])
dataset.set_index("name",inplace=True)
for di in mogi.search(type = mogi.Ponctual_Observation):
    for prop in di.get_properties():
        for value in prop[di]:
            dataset.loc[di.name,prop.name] = value
dataset = dataset.astype({'x':float, 'y':float, 'z':float, 'dip_dir':float, 'dip':float, 'geology':str})
dataset.head()

## Interpretation Workflow

The interpretation process in itself is run in a **GeologicalInterpretationProcess** and follow a very simple and generic algorithm.<br>
This algorithm implements a Deming wheel process of continual improvement:
1. Plan:
    1. Select a situation
    2. Select an action
2. Do: Implement the action (e.g., CreateInterpretationElement)
    1. List features
    2. Identify possible explanations
    3. Rank/chose explanations
    4. Instanciate individuals
    5. Infer and set parameters
3. Check: Evaluate consistency
    1. Evaluate internal consistency
    2. Evaluate relational likelihood
    3. Evaluate feature explanation
4. Act: Generate anomalies and report

In [None]:
class GeologicalInterpretationProcess(object):
    """GeologicalInterpretationProcess implements the core process of a geological intepretation.
    
    It connects all the required elements and resulting artefacts relatively to a given interpretation sequence:
     - a GeologicalKnowledgeFramework"""
     
    def __init__(self, dataset: GeologicalDataset, knowledge_framework= None):
         """Creates a GeologicalInterpretationProcess
         
         ---------------------------
         Parameters:
         - dataset (GeologicalDataset): a dataset to be explained by this interpretor
         - knowledge_framework: a GeologicalKnowledgeFramework that defines the concepts used for this interpretation.
            If None is given, the the default knowledge framework is used (`GeologicalKnowledgeManager().get_knowledge_framework()`)
         """
         self.knowledge_framework= GeologicalKnowledgeManager().get_knowledge_framework() if knowledge_framework is None else knowledge_framework
    