# Visualización del DataSet

En este Notebook se muestran algunos de los mecanismos más utilizados para la visualización de Datos.

## DataSet (Conjunto de datos) 

### Descripción 
NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set which are mentioned in. Although, this new version of the KDD data set still suffers from some of the problems discussed by McHugh and may not be a perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs, we believe it still can be applied as an effective benchmark data set to help researchers compare different intrusion detection methods. Furthermore, the number of records in the NSL-KDD train and test sets are reasonable. This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion. Consequently, evaluation results of different research work will be consistent and comparable.

## Ficheros de datos
* <span style="color:green">**KDDTrain+.ARFF**: The full NSL-KDD train set with binary labels in ARFF format</span>
* <span style="color:green">**KDDTrain+.TXT**: The full NSL-KDD train set including attack-type labels and difficulty level
  in CSV format</span> 
* KDDTrain+_20Percent.ARFF: A 20% subset of the KDDTrain+.arff file
* KDDTrain+_20Percent.TXT: A 20% subset of the KDDTrain+.txt file
* KDDTest+.ARFF: The full NSL-KDD test set with binary labels in ARFF format
* KDDTest+.TXT: The full NSL-KDD test set including attack-type labels and difficulty level in CSV format
* KDDTest-21.ARFF: A subset of the KDDTest+.arff file which does not include records with difficulty level of 21 out of 21
* KDDTest-21.TXT: A subset of the KDDTest+.txt file which does not include records with difficulty level of 21 out of 21

In [1]:
# Lectura del DataSet mediante funciones de Python 
with open("datasets/datasets/NSL-KDD/KDDTrain+.txt") as train_set:
    df = train_set.readlines()
    


In [2]:
import pandas as pd
df = pd.read_csv("datasets/datasets/NSL-KDD/KDDTrain+.txt")
df

Unnamed: 0,0,tcp,ftp_data,SF,491,0.1,0.2,0.3,0.4,0.5,...,0.17,0.03,0.17.1,0.00.6,0.00.7,0.00.8,0.05,0.00.9,normal,20
0,0,udp,other,SF,146,0,0,0,0,0,...,0.00,0.60,0.88,0.00,0.00,0.00,0.00,0.00,normal,15
1,0,tcp,private,S0,0,0,0,0,0,0,...,0.10,0.05,0.00,0.00,1.00,1.00,0.00,0.00,neptune,19
2,0,tcp,http,SF,232,8153,0,0,0,0,...,1.00,0.00,0.03,0.04,0.03,0.01,0.00,0.01,normal,21
3,0,tcp,http,SF,199,420,0,0,0,0,...,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,normal,21
4,0,tcp,private,REJ,0,0,0,0,0,0,...,0.07,0.07,0.00,0.00,0.00,0.00,1.00,1.00,neptune,21
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125967,0,tcp,private,S0,0,0,0,0,0,0,...,0.10,0.06,0.00,0.00,1.00,1.00,0.00,0.00,neptune,20
125968,8,udp,private,SF,105,145,0,0,0,0,...,0.96,0.01,0.01,0.00,0.00,0.00,0.00,0.00,normal,21
125969,0,tcp,smtp,SF,2231,384,0,0,0,0,...,0.12,0.06,0.00,0.00,0.72,0.00,0.01,0.00,normal,18
125970,0,tcp,klogin,S0,0,0,0,0,0,0,...,0.03,0.05,0.00,0.00,1.00,1.00,0.00,0.00,neptune,20


In [3]:
# Muestra los ficheros en el directorio del DataSet
import os
os.listdir("datasets/datasets/NSL-KDD/")

['KDDTest-21.txt',
 'index.html',
 'KDDTest-21.arff',
 'KDDTrain+.arff~',
 'KDDTest1.jpg',
 'KDDTrain+.arff',
 'KDDTest+.arff',
 'KDDTest+.txt',
 'KDDTrain+_20Percent.arff',
 'KDDTrain1.jpg',
 'KDDTrain+_20Percent.txt',
 'KDDTrain+.txt']

**An ARFF** (Attribute-Relation File Format) file is an **ASCII** text file that describes a list of instances sharing a set of attributes. **ARFF** files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software. Más información: https://www.cs.waikato.ac.nz/ml/weka/arff.html

In [4]:
# instalar un nuevo paquete en el kernel de jupyter notebook actual para parsear ficheros ARFF
import sys

!{sys.executable} -m pip install liac-arff

Collecting liac-arff
  Downloading liac-arff-2.5.0.tar.gz (13 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: liac-arff
  Building wheel for liac-arff (setup.py) ... [?25ldone
[?25h  Created wheel for liac-arff: filename=liac_arff-2.5.0-py3-none-any.whl size=11717 sha256=82249d2a9b5342581b52985ef83a2af072f0249ab6e2706937612c715ac9e384
  Stored in directory: /home/pako0311/.cache/pip/wheels/08/82/8b/5c514221984e88c059b94e36a71d4722e590acaae04deab22e
Successfully built liac-arff
Installing collected packages: liac-arff
Successfully installed liac-arff-2.5.0


In [6]:
# Lectura del DataSet que se encuentra en formato .ARFF
import arff

with open("datasets/datasets/NSL-KDD/KDDTrain+.arff", "r") as train_set:
    df = arff.load(train_set)
df.keys()

dict_keys(['description', 'relation', 'attributes', 'data'])