# Data Processing Assignment

## This notebook contains the documentation and solution to the large data processing class.

## Data Set Description : The dataset was created for the comparison and evaluation of hybrid indoor positioning methods. The dataset presented contains data from W-LAN and Bluetooth interfaces, and Magnetometer.

The measurements were recorded with a android application running on an android phone. 
The x, y, z coordinates were mapped onto the building by placing notes on the wall.
During a measurement, the room name, and coordinates were input by the measuring user.
The application sent a message to a server-side application which stored them in a SQL database.


### The dataset is made up of the following features:
* First 10 features are dense:
 - Measurement ID
 - Timestamp
 - Measurement Coordinates (input by hand, can have errors)
 - Symbolic Position (UUID+ Name of the room)
 
 
* Rest of the features Highly Sparse (11-65) are sparse. At every measurement the following values are recorded:
 - Features from 11 to 42 represent the buildings' WiFi Access Point Received Signal Strenght (RSSI). 
      - At every measurement, every reachable WiFI RSSI is measured.
      - If an acces point is unreachable, it is left as null.
 - Selected bluetooth devices are also recorded from feature 42-65. 
     - The recording strategy is the same as the WiFi RSSI.
    

The dataset can be inspected and downloaded from the following repository:
https://archive.ics.uci.edu/ml/datasets/Miskolc+IIS+Hybrid+IPS


## Used Models, Frameworks:


* Numpy for array data structure, and  miscellaneous mathematical functions.
<img src="https://numpy.org/images/logos/numpy.svg" width="100" height="100" /> 

* Pandas for the DataFrame and Series data structure, and statistical functions.
<img src="https://numfocus.org/wp-content/uploads/2016/07/pandas-logo-300.png" width="100" height="100" /> 

* Scikit-Learn for easy-to-use machine learning models and functions.
<img src="https://upload.wikimedia.org/wikipedia/commons/0/05/Scikit_learn_logo_small.svg" width="100" height="100" /> 
* DEAP for implementing a genetic algorithm model. 
<img src="https://deap.readthedocs.io/en/master/_images/deap_long.png" width="100" height="100" /> 
* MatPlotLib for plotting.
<img src="https://matplotlib.org/_static/logo2_compressed.svg" width="100" height="100" /> 


## Problem Description:

* There are no readily available Indoor Positioning Systems.
* Many approaches, one of the most widely used method uses WiFi RSSI Fingerprinting.
* Fingerprinting creates a radio signal map of the building.
  - Data mining techniques can be used to create a data set. 
  - Data set can be used for machine learning and data analysis.
* Try to learn inferences between 3d positions and WiFi signal strength with a neural network model.
  - Create Neural Network model using Hyperparameter training.
* Implement a neural network inversion method in order to predict positions based on measured signal strenghts.
    - Positions, and RSSI values are not bijective, therefore there is no clear inverse of the original function.
    - Intersect the inverted position surfaces to increase accuracy

## Solution Description :


* <a href='#section1'> Step One : Imports, and utility functions </a>
* <a href='#section2'> Step Two :  Dataset separation, scaling and simple filtering </a>
* <a href='#section3'>Step Three: Hyperparameter Training and Artificial Neural Network Model Building.</a>
* <a href='#section4'> Step Four: Genetic Algorithm.  </a>



<a id='section1'>   </a>
# Step One : Imports, and utility functions

The following block imports all the necessary libraries for the task.
the following lines increase the amound of displayed rows and columns in the columns.

    pd.set_option('display.max_rows', 500)
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 1000)


In [1]:
import pandas as pd
import pickle
import sklearn as sk
import numpy as np
import math
import matplotlib
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import preprocessing
import random
from deap import base
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [2]:
dataset = pd.read_csv("dataset.csv", sep=";")
dataset.head()

Unnamed: 0,measId,measTimestamp,Position X,Position Y,Position Z,zoneId,Zonename,meas X,meas Y,meas Z,gpsLatitude,gpsLongitude,gpsAltitude,N,109.0,AIT-L15,aut-sams-1,bolyai_E4_floor3,Bosch_Telemetry,dd,doa2,doa200,doa203,doa207,doa208,doa6,EET_3,FRM,GEIAKFSZ,IITAP1,IITAP1-GUEST,IITAP2,IITAP2-GUEST,IITAP3,IITAP3-GUEST,info,info2,KEMA10,kemA4,KRZ,library114,TP-LINK_B2765A,UPC Wi-Free,UPC8902044,wireless,00:16:53:4C:B1:F9,00:16:53:4C:B2:02,00:16:53:4C:B4:EB,00:16:53:4C:E9:1D,00:16:53:4C:F2:6A,00:16:53:4C:F5:2D,00:16:53:4C:F9:A4,00:16:53:4C:FA:60,00:16:53:4C:FA:67,48:5A:B6:54:35:DC,6B:C2:26:12:62:60,DANI 6B:C2:26:12:62:60,DM06082 48:5A:B6:54:35:DC,EV3 00:16:53:4C:E9:1D,EV3 00:16:53:4C:F2:6A,EV3 00:16:53:4C:FA:60,EV3 00:16:53:4C:FA:67,EV3BD 00:16:53:4C:F5:2D,IZE 00:16:53:4C:B1:F9,JOE 00:16:53:4C:F9:A4,MEGAROBOT 00:16:53:4C:B2:02,MrEv3 00:16:53:4C:B4:EB
0,04550b4e-b5fc-4665-a043-18e82e94399a,2016-02-27 16:40:22.0,8.0,8.0,4.4,07a25de0-a013-486d-9463-404a348e05ee,1st Floor East Corridor,0.11991,-0.93265,0.049958,,,,,-70.0,,-80.0,,,-83.0,,,-82.0,-74.0,-53.0,-83.0,,,-63.0,-55.0,-57.0,-78.0,-79.0,-74.0,-73.0,,,,,-65.0,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0
1,93645336-eb2d-4983-950f-cb0e4191b986,2016-02-27 16:43:13.0,12.0,9.0,4.4,07a25de0-a013-486d-9463-404a348e05ee,1st Floor East Corridor,1.275436,-1.024375,0.058756,,,,,-69.0,,-82.0,,,-75.0,,,,-82.0,-68.0,,,,-65.0,-75.0,-81.0,-79.0,-80.0,-73.0,-73.0,,,,,-72.0,,-85.0,,,,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,7f210fdb-7018-454c-978b-9a595ee39130,2016-02-27 16:50:48.0,23.0,8.0,4.4,07a25de0-a013-486d-9463-404a348e05ee,1st Floor East Corridor,-0.506722,-0.96453,0.055499,,,,,-65.0,,-79.0,,,-81.0,,-85.0,,,-75.0,,,,-79.0,-74.0,-82.0,-82.0,-81.0,-86.0,-86.0,,,,,-85.0,-86.0,-72.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0
3,d9b0c804-5886-4b69-a65a-0ea9a423f5df,2016-02-27 16:53:22.0,26.0,9.0,4.4,07a25de0-a013-486d-9463-404a348e05ee,1st Floor East Corridor,0.557698,-0.950144,0.12751,,,,,-75.0,,-66.0,,,-83.0,-78.0,-75.0,,,-75.0,,,,-84.0,,,-66.0,-64.0,-85.0,-85.0,,,,,-89.0,,-85.0,,,,1.0,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,d1bdb07f-23c9-4d73-a2c5-e24642778075,2016-02-27 16:51:31.0,24.0,8.0,4.4,07a25de0-a013-486d-9463-404a348e05ee,1st Floor East Corridor,-0.462596,-0.717116,0.083141,,,,,-76.0,,-83.0,,,-86.0,,-82.0,,,-82.0,,,,,-84.0,-81.0,-80.0,-82.0,-86.0,-87.0,,,,,-81.0,,-89.0,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0


## The following methods are used to create synthetic features. 

### The added features are the spherical coordinates of x, y and z, the products of the x and y coordinates as well as the products of x, y and z.

### These features are added in hopes of better conveying the real relations between coordinates and measurements in the data set.

In [3]:
def _calculate_spherical_coordinates(dataset) : 
    r=dataset["Position X"]**2+dataset["Position Y"]**2+dataset["Position Z"]**2
    r=np.sqrt(r)
    tetha=dataset["Position Y"]/r
    tetha=np.arccos(tetha)
    phi=dataset["Position Y"]/dataset["Position X"]
    phi=np.tanh(phi)
    return (r,tetha,phi)

def create_synthetic_features(dataset):
    x_y=dataset["Position X"]*dataset["Position Y"]
    x_y_z=dataset["Position X"]*dataset["Position Y"]*dataset["Position Z"]
    (r,tetha,phi)=_calculate_spherical_coordinates(dataset)
    synthetic= pd.DataFrame()
    synthetic["x_y"]=x_y
    synthetic["x_y_z"]=x_y_z
    synthetic["r"]=r
    synthetic["tetha"]=tetha
    synthetic["phi"]=phi
    return(synthetic)

In [4]:
def get_AP_dataframe(selected_features, AP_name):
    AP_df=selected_features.iloc[:,0:8]
    AP_df[AP_name]=selected_features[AP_name]
    AP_df= AP_df[pd.notnull(AP_df[AP_name])]
    return AP_df
    
def get_AP_scaler(AP_df):
    scaler=preprocessing.StandardScaler()
    scaler.fit(AP_df)
    return scaler

## The following methods are used to create the dataset from the original data. 

### The coordinates, and the synthetic features are added along with the WiFi RSSi measurements.

In [5]:
selected_features= dataset.iloc[:,14:45]
selected_features.insert(0,'pos_x', dataset["Position X"])
selected_features.insert(1,'pos_y', dataset["Position Y"])
selected_features.insert(2,'pos_z', dataset["Position Z"])
selected_features[selected_features.pos_z != 0]
synthetic_features=create_synthetic_features(dataset)
selected_features.insert(3, "x_y", synthetic_features["x_y"])
selected_features.insert(4, "x_y_z", synthetic_features["x_y_z"])
selected_features.insert(5, "r", synthetic_features["r"])
selected_features.insert(6, "tetha", synthetic_features["tetha"])
selected_features.insert(7, "phi", synthetic_features["phi"])

<a id='section2'>   </a>
# Step Two: Dataset separation, scaling and simple filtering

### After the DataFrame has been composed from the selected features, the problem of sparse data has to be solved.
 * Feed Forward Artificial Neural Networks does not handle sparse data well
      - The data has to be converted into dense a form.
      - NaN-s cannot be replaced with arbitrary values, as these would alter the outcome of the training and predictions.
 * Solution: 
      - Create dense DataFrames for each Access Point.
      - df_list contains dataframes of coordinates which are the training data.
      - target_list contains dataframes with associated target RSSI values.
      - all dataframes are scaled individually.

In [7]:
df_list=list()
scaler_list=list()
target_list=list()
df_list_unscaled=list()
i=0
for index, item in enumerate(selected_features.columns):
    #Crude but works
    if index>7:
        df_list.append(get_AP_dataframe(selected_features, AP_name=item))
        df_list_unscaled.append(get_AP_dataframe(selected_features, AP_name=item))
        scaler_list.append(get_AP_scaler(df_list[i]))
        df_list[i][:]=scaler_list[i].transform(df_list[i][:])
        target_list.append(df_list[i].pop(df_list[i].columns[-1]))
        i=i+1
with open('before.txt', 'w') as f:
    for dataframe in df_list:
       print(dataframe.describe(), file=f)

### After the DataFrame lists have been created, WiFi Access Points which have less than hundred entries in the original data set are deleted.

In [8]:
for _ in range(0, 2):
    for index,dataframe in enumerate(df_list):
        if(dataframe.size < 100):
            print(index)
            del df_list[index]
            del target_list[index]
            del scaler_list[index]
            del df_list_unscaled[index]
with open('after.txt', 'w') as f:
    for dataframe in df_list:
        print(dataframe.describe(), file=f)

3
11
21
25
21
24


<a id='section3'>   </a>
# Step Three: Hyperparameter Training and Artificial Neural Network Model Building.

### The created DataFrames can be used to create simple Feed Forward Networks in order to train the inferences between positions and WiFi RSSI signals.

# Takes a long time to compute!

In [None]:
testDataFrame= df_list[0].copy();
target=testDataFrame.pop('109.0')
x_train, x_test, y_train, y_test= train_test_split(testDataFrame, target)

parameter_space = {
    'hidden_layer_sizes': [(50, 100, 200, 100, 50), (200, 400, 800, 400, 200), (1000, 2000, 1000)],
    'activation': ['logistic', 'tanh', 'relu'],
    'solver': ['lbfgs','sgd', 'adam'],
    'alpha': [0.03, 0.01, 0.003, 0.001, 0.0001],
    'learning_rate_init': [0.03, 0.01, 0.003, 0.001, 0.0001],
}

clf= GridSearchCV(MLPRegressor(max_iter=1000, verbose=True, learning_rate="adaptive"), parameter_space, n_jobs=-1, cv=3, verbose=True)
print("Starting GridSearch Fit")
clf.fit(x_train, y_train)


The hyperparameter training has been run with numerous parameters (not just the ones seen on the code block above.).

Performance statistics have been collected in order to analyze the results.
The resulting data can be seen in the gridstatistics.ipnb file.

Decision trees, and apriori algorithm were used in order to determine important parameters, and parameter pairs.

## A list of Feed Forward models are trained with individual data sets.
### Hyperparameter testing was done only on the first dataset. 

TODO :Models with high testing loss would have to be retrained with different parameters.

In [None]:
def create_ANN_list():
    ANN_List=[]
    for (testDataFrame, target) in zip(df_list,target_list):
        x_train, x_test, y_train, y_test= train_test_split(testDataFrame, target)
        model= MLPRegressor(activation='relu', alpha=0.001, batch_size='auto', beta_1=0.9,
               beta_2=0.999, early_stopping=False, epsilon=1e-08,
               hidden_layer_sizes=(200, 300, 400, 300, 200), learning_rate='adaptive',
               learning_rate_init=0.0001, max_iter=5000, momentum=0.5,
               n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
               random_state=None, shuffle=True, solver='adam', tol=0.0001,
               validation_fraction=0.1, verbose=False, warm_start=False)
        model.fit(x_train, y_train)
        ANN_List.append(model)
        print(model.score(x_test, y_test))
    return ANN_List

In [None]:
model_list=create_ANN_list()
with open("model_list.txt", "wb") as fp:
    pickle.dump(model_list, fp)

<a id='section4'>   </a>
# Step Four: Genetic Algorithm.

## The following block contains a class for neural network inversion with a genetic algorithm. 
### The genetic algorithm structure was constructed using the DEAP framework.

* The creator function creates individuals based on the generate_individual() function.

* The evaluation function calculates the error between individuals in the generation and the target values.

* generate_valid_pop() evaluates individuals, and creates a new generation.

In [None]:
class GA_Inverter():
    
    def __init__(self, index, toolbox, ind_size, pop_size, elite_count):
        self.creator=creator
        self.index=index
        self.toolbox=toolbox
        self.IND_SIZE=ind_size
        self.POP_SIZE=pop_size
        self.elite_count=elite_count
        
    def creator_function(self):
        return self.creator.Individual(self.generate_individual())


    def generate_individual(self):
        x= random.randint(math.floor(df_list_unscaled[self.index].min()[0]),math.floor(df_list_unscaled[self.index].max()[0]))
        y = random.randint(math.floor(df_list_unscaled[self.index].min()[1]),math.floor(df_list_unscaled[self.index].max()[1]))
        z = random.randint(math.floor(df_list_unscaled[self.index].min()[2]),math.floor(df_list_unscaled[self.index].max()[2]))
        x_y=x*y
        x_y_z=x*y*z
        r=x**2+y**2+z**2
        r=np.sqrt(r)
        if r is not 0:
            tetha=y/r
        else:
             tetha=0
        tetha=np.arccos(tetha)
        if(x is not 0):
            phi=y/x
        else:
            phi=0
        phi=np.tanh(phi)
        return scaler_list[self.index].transform([[x, y, z, x_y, x_y_z, r, tetha, phi,0]]).tolist()[0][:-1]


    def evaluate(self,individual, regressor, y_pred):
        d=(((regressor.predict(np.asarray(individual).reshape(1, -1))-y_pred)**2).sum(),)
        return d

    def initialize_invertion_functions(self):
        self.creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
        self.creator.create("Individual", list, fitness=creator.FitnessMin)
        #toolbox.register("attr_float", random.random())
        self.toolbox.register("population", tools.initRepeat, list,  self.creator_function, n=self.POP_SIZE )
        self.pop = self.toolbox.population()
        self.toolbox.register("mate", tools.cxTwoPoint)
        self.toolbox.register("mutate", tools.mutShuffleIndexes)
        self.toolbox.register("selectWorst", tools.selWorst)
        self.toolbox.register("selectBest", tools.selBest)
        self.toolbox.register("evaluate", self.evaluate)


        
    def generate_valid_pop(self, index, y_predict, model, scaler, CXPB, MUTPB, NGEN, DESIRED_OUTPUT, OUTPUT_TOLERANCE, ELIT_CNT = 10):
    
        fitnesses=list()
        # evaluation
        for individual in self.pop:
            #TODO USE SCALER LIST [INDEX], IT DOESNT WORK FOR SOME REASON
            temp=self.toolbox.evaluate(individual, model, scaler.transform([[0,0,0,0,0,0,0,0,DESIRED_OUTPUT]] )[0][8] )
            fitnesses.append(temp)
        for ind, fit in zip(self.pop, fitnesses):
              ind.fitness.values = fit

        for g in range(NGEN):

            elits = self.toolbox.selectBest(self.pop,k= ELIT_CNT)
            elits = list(map(self.toolbox.clone, elits))
            offsprings = self.toolbox.selectWorst(self.pop,k= self.POP_SIZE - ELIT_CNT)

            offsprings = list(map(self.toolbox.clone, offsprings))

            sumFitness = 0
            for ind in self.pop:
                sumFitness = sumFitness+ ind.fitness.values[0]

            for offspring in offsprings:
                    parents = tools.selRoulette(self.pop,2)
                    parents = list(map(self.toolbox.clone, parents))
                    offspring = tools.cxTwoPoint(parents[0],parents[1])[0]
                    del offspring.fitness.values

                # Evaluate the individuals with an invalid fitness
            invalid_ind = [ind for ind in offsprings if not ind.fitness.valid]
            fitnesses=list()
            for index, individual in enumerate(invalid_ind):
                fitnesses.append( self.toolbox.evaluate(individual, model, y_predict[index]))
            for ind, fit in zip(invalid_ind, fitnesses):
                  ind.fitness.values = fit

            for i in range(ELIT_CNT):
                self.pop[i] = elits[i]
            for i in range(self.POP_SIZE - ELIT_CNT):
                self.pop[i+ELIT_CNT] = offsprings[i]

            for index, individual in enumerate(self.pop):
                temp=self.toolbox.evaluate(individual, model, y_predict[index])
            fitnesses.append(temp)
            for ind, fit in zip(self.pop, fitnesses):
                  ind.fitness.values = fit

        return [ind for ind in self.pop if ind.fitness.values[0] <2]




### The following function returns dictionaries of AP names and RSSI values, based on measurement time. 

Every dictionary corresponds to a measurement time, denoted by the index in the original data set.

In [None]:
def get_inputs_by_time(selected_features, df_list_unscaled)
    list_of_inputs=[]
        for index in selected_features.index:
            inputs_list_by_time={}
            for df in df_list_unscaled:
                df_mod=pd.DataFrame(df.iloc[:, -1])
                for i in range(df_mod.count()[0]):
                    if df_mod.index[i] == index:
                        inputs_list_by_time.update({df_mod.columns[0]:df_mod.iloc[0,0]})
            list_of_inputs.append(inputs_list_by_time)
    return list_of_inputs

## Function to invert the trained neural network


In [None]:

def invert_all(df_list, target_list, model_list, scaler_list,CXPB, MUTPB, NGEN, DESIRED_OUTPUT, OUTPUT_TOLERANCE):
        inverted_list=[]
        for index,(testDataFrame, target) in enumerate(zip(df_list,target_list)):
            print(index)
            x_train, x_test, y_train, y_test= train_test_split(testDataFrame, target)
            inverter=GA_Inverter(0, base.Toolbox(), x_test.iloc[0].size, len(x_test.index), 10)
            inverter.initialize_invertion_functions()
            y_pred=model_list[index].predict(x_test)
            valid_pop=inverter.generate_valid_pop(index, y_pred, model_list[index], scaler_list[index], CXPB, MUTPB, NGEN, DESIRED_OUTPUT, OUTPUT_TOLERANCE)
            dataset_inverted=df_list[index].copy();
            dataset_original=df_list_unscaled[index].copy().values.tolist();
            dataset_original_df=df_list_unscaled[index].copy()
            dataset_inverted.drop(dataset_inverted.index, inplace=True)
            for ind, row in enumerate(valid_pop):
                dataset_inverted.loc[ind] = valid_pop[ind]
            dataset_inverted['target']= pd.Series(target_list[index])
            dataset_inverted=scaler_list[index].inverse_transform(dataset_inverted)
            inverted_list.append(dataset_inverted)
        return inverted_list

## Function to predict coordinates based on the inverted values.

The prediction takes a weighted sum of the predicted positions of every dictionary in the list to predict a position.
Every invertion coordinate prediction creates a 3D manifold surface of possible positions in the 3D space. 
The intersection of these predicted coordinates give the most accurate predicted coordinates.


  $\bigcap \mathbb{I} (\pi, \epsilon)$, where  $\mathbb{I}$  is the invertion function  $ \mathbb{I}: R^n \rightarrow \mathbb{N}^3$, and $\pi$ is the WiFi RSSI Signal Strenght.
  $\mathbb{I}$

There are other methods which could be used to calculate the intersection of these coordinates, but currently only this method is implemented.

In [None]:
def predict_coordinates(inverted_positions):
    gen_x_coord=[]
    gen_y_coord=[]
    for  values in  inverted_positions.values():
        for g_val in values:
            gen_x_coord.append(g_val[0])
            gen_y_coord.append(g_val[1])
    gen_x_coord=pd.Series(gen_x_coord)
    gen_y_coord=pd.Series(gen_y_coord)
    return(np.average(gen_x_coord[gen_x_coord < np.max(selected_features["pos_x"])]),np.average(gen_y_coord[gen_y_coord < np.max(selected_features["pos_y"])]))

## Error calculation function

In [None]:
def calculate_error(predicted_cooridnates, actual_coordinates):
    return np.array(((predicted_cooridnates - actual_coordinates) ** 2)).mean(axis=-1)

## Driver code

In [None]:
list_of_inputs=get_inputs_by_time(selected_features, df_list_unscaled)

In [None]:
CXPB, MUTPB, NGEN = 0.5, 0.1, 1000
DESIRED_OUTPUT= -80
OUTPUT_TOLERANCE= 2
output_list=get_output_list(list_of_inputs,CXPB, MUTPB, NGEN, OUTPUT_TOLERANCE)

In [None]:
error_list=[]
for invereted_positions in inverted_positions_list:
    predicted_cooridnates=np.array(predict_coordinates(inverted_positions))
    calculate_error(predicted_cooridnates, actual_coordinates)
    error_list.append(calculate_error(predicted_cooridnates, actual_coordinates))

## Plotting Function for the visualization of the invertion function results.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

def plot_inverted(dataset_unscaled, dataset_inverted):
    dataset_original=dataset_unscaled.copy().values.tolist();
    dataset_original_df=dataset_unscaled.copy()
    fig, (ax1, ax2, ax3) =plt.subplots(ncols=3, nrows=1, figsize= ( 20, 6))
    x_number_list_o = [values[0] for values in dataset_original ]
    # y axis value list.
    y_number_list_o = [values[1] for values in dataset_original ]
        # Draw point based on above x, y axis values.
    ax1.scatter(x_number_list_o, y_number_list_o)
    ax1.set_xlim([0-5, dataset["Position X"].max()+5])
    ax1.set_ylim([0-5, dataset["Position Y"].max()+5])
    # Set chart title.
    ax1.title.set_text("Original coordinates of the dataset") 
    # Set x, y label text.
    ax1.set_xlabel("X")
    ax1.set_ylabel("Y")
    x_number_list = [values[0] for values in dataset_original if values[8] > DESIRED_OUTPUT - OUTPUT_TOLERANCE and values[8] < DESIRED_OUTPUT +OUTPUT_TOLERANCE]
    # y axis value list.
    y_number_list = [values[1] for values in dataset_original if values[8] > DESIRED_OUTPUT - OUTPUT_TOLERANCE and values[8] < DESIRED_OUTPUT +OUTPUT_TOLERANCE]
        # Draw point based on above x, y axis values.
    ax2.scatter(x_number_list, y_number_list)
    ax2.set_xlim([0-5, dataset["Position X"].max()+5])
    ax2.set_ylim([0-5, dataset["Position Y"].max()+5])
    # Set chart title.
    ax2.title.set_text("Original coordinates reduced by currently detected WiFi RSSI") 
    # Set x, y label text.
    ax2.set_xlabel("X")
    ax2.set_ylabel("Y")
    x_number_list = [values[0] for values in dataset_inverted ]
    # y axis value list.
    y_number_list = [values[1] for values in dataset_inverted ]
    ax3.scatter(x_number_list, y_number_list, color="r" )
    ax3.set_xlim([0-5, dataset["Position X"].max()+5])
    ax3.set_ylim([0-5, dataset["Position Y"].max()+5])
    # Set chart title.
    ax3.title.set_text("Inverted coordinates by genetic algorithm")
    # Set x, y label text.
    ax3.set_xlabel("X")
    ax3.set_ylabel("Y")
    plt.savefig('coordinatesinverted.pdf')
    plt.show()

In [17]:
from IPython.display import IFrame    
display(IFrame("1cinvert1.pdf", width=900, height=650))

In [12]:
display(IFrame("1cinvert2.pdf", width=900, height=650))

ImportError: MagickWand shared library not found.
You probably had not installed ImageMagick library.
Try to install:
  http://docs.wand-py.org/en/latest/guide/install.html#install-imagemagick-on-windows

In [9]:
display(IFrame("1cinvert3.pdf", width=900, height=650))

In [None]:
display(IFrame("2cinvert1.pdf", width=900, height=650))

In [None]:
display(IFrame("2cinvert2.pdf", width=900, height=650))

In [None]:
display(IFrame("2cinvert3.pdf", width=900, height=650))