# Objective

_todo: introduction to objective_

In order to accomplish this objective, it has been used a connection between Python and R. The reason behind this aciton is mainly driven by the need of a code that is easier to understand and implement, that is why Python is the language used for the code development of this paper.

On the other hand, the library that returns the results to Python, is the ECoL library, implemented in R. That'is why there exits the necessity to connect both languages. That is done by using some connecting libraries implemented in both Python and R. 

For this process, it is created a server process in RStudio using `Rserve`. This allows Python to connect as a client using the library `pyRserve`.


In [1]:
def safe_connect(self, operation) :
    connection = self.__connection

    try:
        metric = connection.r(operation)
    except :
        metric = None
        print ('Could not retrieve {}!'.format (operation) )
    finally :
        return metric

This connection allows to implement a aforamentioned link, which calculates the metrics of a dataset in R and it exports those metrics back to the Python environment, where their manipulation is smoother and simpler.

The obtained metrics are not altered or tampered with in any part of the process, but it is implemented a dictionary to store those values and be able to associate them to their respective metric domain. The structure uses a key-value organization, where the value is not only formed by the metrics, but also by a message associated with the metrics - which allows later representation of those values:

```python
metrics.update (
    {'<metric_name>': [message, metric_value]}
)
```

The code representing this process is:


In [2]:
def get_metrics (self, X=None, Y=None):
    if X is None and Y is None :
        if self.__metrics is not None :
            return self.__metrics
        else :
            # No data and no parameters: finish execution
            error_message = '''
                No metrics so far! Try given the dataset vector and 
                target vector as parameters.\n
            '''
            raise Exception(error_message)
            sys.exit (400)
    else :
        # Stores connection to R's RPC.
        connect = self.__connection

        # Sends the input matrix and the output vector to R.
        connect.r.X = X
        connect.r.y = Y
        
        # Library to use in R.
        connect.r('df_X <- as.data.frame(X)')
        connect.r('df_y <- as.data.frame(y)')
        connect.r('library("ECoL")')
        
        ## Metrics, uses a dictionary to provide a faster access to its 
        # contents.
        metrics = {}

        # Balance
        balance = self.safe_connect('balance(df_X, df_y)') 
        message = '# Balance (C1, C2):\t'
        balance_dic_entry = { 'balance' : [message, balance] }
        
        metrics.update (balance_dic_entry)

        # Correlation
        correlation = self.safe_connect('correlation(df_X, df_y)')
        message = '# Correlation (C1, C2, C3, C4):\t'
        correlation_dic_entry = { 'correlation' : [message, correlation] } 

        metrics.update (correlation_dic_entry)

        # Dimensionality
        dimensionality = self.safe_connect('dimensionality(df_X, df_y)')
        message = '# Dimensionality (T2, T3, T4):'
        dimensionality_dic_entry = { 'dimensionality' : [message, dimensionality] }

        metrics.update (dimensionality_dic_entry)

        # Linearity
        linearity = self.safe_connect('linearity(df_X, df_y)')
        message = '# Linearity (L1, L2, L3):\t'
        linearity_dic_entry = { 'linearity' : [message, linearity] }

        metrics.update (linearity_dic_entry)

        # Neighborhood
        neighborhood = self.safe_connect('neighborhood(df_X, df_y)')
        message = '# Neighborhood (N1, N2, N3, N4, T1, LSC):\t'
        neighborhood_dic_entry = { 'neighborhood' : [message, neighborhood] }

        metrics.update (neighborhood_dic_entry)

        # Network
        network = self.safe_connect('network(df_X, df_y)')
        message = '# Network (Density, ClsCoef, Hubs):\t'
        network_dic_entry = { 'network' : [message, network] }

        metrics.update (network_dic_entry)

        # Overlap
        overlap = self.safe_connect('overlapping(df_X, df_y)')
        message = '# Overlap (F1, F1v, F2, F3, F4):\t'
        overlap_dic_entry = { 'overlap' : [message, overlap] }

        metrics.update (overlap_dic_entry)

        # Smoothness
        smoothness = self.safe_connect('smoothness(df_X, df_y)')
        message = '# Smoothness (S1, S2, S3, S4):\t'
        smoothness_dic_entry = { 'smoothness' : [message, smoothness] }

        metrics.update (smoothness_dic_entry)
        
        self.__metrics = metrics

        return metrics

In order to test the code implementation, it has to be used a dataset on which to try the connection. The `iris dataset` has been the one elected to perform the testing on.

That information is loaded and formated so that passing it to R does not return any exception.

In [4]:
# Global variables
DATASET_PATH = '../dataset/iris.csv'
DATASET = []

'''
    Load Dataset data into an array 
'''
with open (DATASET_PATH, 'r') as csv_file:
    # Skip header
    next (csv_file)
    # Interator object to read the CSV
    csv_reader = csv.reader (
        csv_file, 
        delimiter=',', 
        quoting=csv.QUOTE_ALL
    )
    # Create array from CSV
    for row in csv_reader :
        DATASET.append (row)

def parse_dataset () : 
    ## Data
    # Input 
    X = numpy.array (DATASET) # Transformed to numpy array to allow more 
    X = X[:, 0 : -1]          # operations on it.
    X = numpy.array ( [ 
        numpy.array (row).astype (numpy.float) 
        for row in X 
    ] )
    # Target
    Y = numpy.array( [
        row[-1] for row in DATASET
    ] )

    return X, Y

NameError: name 'csv' is not defined

Now the dataset can freely be used and passed to R.

In [None]:
if __name__ == '__main__':
    #Data
    inputs, target = parse_dataset ()

    # R does not take string values. So each class is translated into a 
    # numerical value.
    for row in range (len(target)) :
        if target[row] == 'setosa' :
            target[row] = 1
        elif target[row] == 'versicolor' :
            target[row] = 2
        elif target[row] == 'virginica' :
            target[row] = 3
        else :
            target[row] = 0
    
    # Connect to R
    connector = r_connect()
    # Compute and print metrics for dataset
    connector.get_print_metrics(inputs, target)

The results returned should look like:
```bash
=== Printing metrics ===

# Balance (C1, C2):      <TaggedList(C1=0.9999999999999998, C2=0.0)>
# Correlation (C1, C2, C3, C4):  None
# Dimensionality (T2, T3, T4): [0.02666667 0.01333333 0.5       ]
# Linearity (L1, L2, L3):        <TaggedList(L1=TaggedArray([0.00433569, 0.00750964], key=['mean', 'sd']), L2=TaggedArray([0.01333333, 0.02309401], key=['mean', 'sd']), L3=TaggedArray([0., 0.], key=['mean', 'sd']))>
# Neighborhood (N1, N2, N3, N4, T1, LSC):        <TaggedList(N1=0.10666666666666667, N2=TaggedArray([0.19739445, 0.14762821], key=['mean', 'sd']), N3=TaggedArray([0.06      , 0.23828244], key=['mean', 'sd']), N4=TaggedArray([0.01333333, 0.11508192], key=['mean', 'sd']), T1=TaggedArray([0.05555556, 0.09094996], key=['mean', 'sd']), LSC=0.8164)>
# Network (Density, ClsCoef, Hubs):      <TaggedList(Density=0.8340044742729307, ClsCoef=0.2652736191628974, Hubs=TaggedArray([0.83805083, 0.27533194], key=['mean', 'sd']))>
# Overlap (F1, F1v, F2, F3, F4):         <TaggedList(F1=TaggedArray([0.27981465, 0.26490069], key=['mean', 'sd']), F1v=TaggedArray([0.02677319, 0.03379179], key=['mean', 'sd']), F2=TaggedArray([0.00638177, 0.01105354], key=['mean', 'sd']), F3=TaggedArray([0.12333333, 0.2136196 ], key=['mean', 'sd']), F4=TaggedArray([0.04333333, 0.07505553], key=['mean', 'sd']))>
# Smoothness (S1, S2, S3, S4):   None
```