# Práctica de Sistemas de Bases de Datos

El fichero necesario "deforestationSubSahara.csv" y éste notebook de jupyter "Practica.ipynb" deben encontrarse en el mismo directorio. Para que funcione correctamente se debe abrir un terminal en ese directorio o navegar hasta él desde un terminal y lanzar el comando /home/cloudera/anaconda2/bin/jupyter notebook.

## Ejercicio 1

#### El primer ejercicio consiste en crear una tabla interna o externa para almacenar los datos. Se debe justificar la elección.

Se ha elegido crear una interna, ya que pese a no aprovechar las ventajas que ofrece una tabla externa (no se elimina al subir los datos, se puede integrar mejor en el entorno de aplicaciones de Hive, se pueden cambiar dinámicamente los datos...), se garantiza la integridad de los datos y no se persigue más que un ejemplo sencillo para ésta práctica. 

Primero crearemos una carpeta donde meter los archivos que se generen

In [1]:
! pwd

/home/cloudera/Desktop/Practica


In [2]:
! ls -l

total 556
-rwxrwx--- 1 cloudera cloudera 547153 Jul  2 01:19 deforestationSubSahara.csv
-rwxrwx--- 1 cloudera cloudera  17331 Oct 29 23:47 Practica.ipynb


In [3]:
! mkdir -p Ficheros

Ahora, crearemos la carpeta para el usuario local en HDFS. Luego crearemos un fichero de texto con las órdenes de creación de una nueva base de datos. Para ello utilizaremos la directiva %%writefile nombreFichero

In [4]:
! hadoop fs -mkdir -p hadoopFicheros

In [5]:
! hadoop fs -ls

Found 1 items
drwxr-xr-x   - cloudera cloudera          0 2019-10-29 23:49 hadoopFicheros


In [6]:
! mkdir -p Ficheros/DB

In [7]:
%%writefile Ficheros/DB/deforestacionDB.hql
create database if not exists deforestacion
Comment 'Base de datos de deforestación'
Location '/user/$(whoami)/hadoopFicheros/deforestacion'
With dbproperties ('Creada por' = 'User', 'Creada el' = '29-Oct-2019');

Writing Ficheros/DB/deforestacionDB.hql


Para ejecutar el código del fichero de arriba, utilizamos la siguiente orden, prestando atención al símbolo ! del principio.

In [8]:
! beeline -u "jdbc:hive2://localhost:10000/default" -f Ficheros/DB/deforestacionDB.hql

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
ciondbc:hive2://localhost:10000/default> create database if not exists deforesta 
' . . . . . . . . . . . . . . . . . . .> Comment 'Base de datos de deforestación 
s/deforestacion'. . . . . . . . . . . .> Location '/user/$(whoami)/hadoopFichero 
', 'Creada el' = '29-Oc . . . . . . . .> With dbproperties ('Creada por' = 'User t-2019');
INFO  : Compiling command(queryId=hive_20191029234949_76713971-2ebc-4779-ab6f-bfdb2abfd8ab): create database if not exists deforestacion
Comment 'Base de datos de deforestación'
Location '/user/$(whoami)/hadoopFicheros/deforestacion'
With dbproperties ('Creada por' = 'User', 'Creada el' = '29-Oct-2019')
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling c

Tras crear la base de datos, ahora crearemos una tabla en ella. 

La separación de los campos es ",", y existen algunos campos compuestos, como "Ongwam Blk I, II, and III", motivo por el que se utilizan las líneas referentes a Serde. Ésta solución se ha consultado en el manual de Hive.


In [9]:
%%writefile Ficheros/DB/deforestacionDB.hql
use deforestacion;

CREATE TABLE IF NOT EXISTS deforestacion
(
WorldBankRegion string, 
    country string, 
    iso3 string, 
    wdpaId string, 
    parkName string, 
    year string, 
    outsideDeforestation string, 
    insideDeforestation string
)
COMMENT 'Tabla de deforestación'
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES(
    "separatorChar" = ",",
    "quoteChar"     = "\""
)
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES('skip.header.line.count'='1');

Overwriting Ficheros/DB/deforestacionDB.hql


Ahora ejecutamos el fichero anterior con la orden de abajo

In [10]:
! beeline -u "jdbc:hive2://localhost:10000/default" -f Ficheros/DB/deforestacionDB.hql

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/default> use deforestacion;
INFO  : Compiling command(queryId=hive_20191029235050_2a3f6e43-dc6a-4d2d-9e9d-80305a33fb3f): use deforestacion
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235050_2a3f6e43-dc6a-4d2d-9e9d-80305a33fb3f); Time taken: 0.1 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235050_2a3f6e43-dc6a-4d2d-9e9d-80305a33fb3f): use deforestacion
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191029235050_2a3f6e43-dc6a-4d2d-9e9d-80305a33fb3f); Time taken: 0.009 seconds
INFO  : OK
No ro

Vamos a comprobar que efectivamente se ha creado esta tabla. Podemos ejecutar órdenes directamente sobre Hive (sin necesidad de escribirlos en un archivo) con el parámetro -e como se indica abajo.

In [11]:
! beeline -u "jdbc:hive2://localhost:10000/default" -e "use deforestacion; show tables;"

scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235050_313a5d0b-e5aa-4827-b966-84675a4e8ac6): use deforestacion
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235050_313a5d0b-e5aa-4827-b966-84675a4e8ac6); Time taken: 0.092 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235050_313a5d0b-e5aa-4827-b966-84675a4e8ac6): use deforestacion
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191029235050_313a5d0b-e5aa-4827-b966-84675a4e8ac6); Time taken: 0.011 seconds
INFO  : OK
No rows affected (0.2 seconds)
INFO  : Compiling command(queryI

## Ejercicio 2

#### El segundo ejercicio  consiste en cargar los datos da la base de datos de deforestación en la tabla creada en el ejercicio anterior.

Crearemos unas carpetas en HDFS y cargaremos el archivo de deforestacion en dicha carpeta.

In [12]:
! hadoop fs -mkdir -p hadoopFicheros/datospractica
! hadoop fs -put deforestationSubSahara.csv hadoopFicheros/datospractica

In [13]:
! hadoop fs -ls hadoopFicheros/datospractica

Found 1 items
-rw-r--r--   1 cloudera cloudera     547153 2019-10-29 23:50 hadoopFicheros/datospractica/deforestationSubSahara.csv


Ahora cargaremos los datos que ya estaban en HDFS en la tabla de deforestación.

In [14]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e \
"LOAD DATA INPATH '/user/$(whoami)/hadoopFicheros/datospractica/deforestationSubSahara.csv' INTO TABLE deforestacion;"

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235050_cba9f1bc-d763-4087-a527-9e799bfa01a1): LOAD DATA INPATH '/user/cloudera/hadoopFicheros/datospractica/deforestationSubSahara.csv' INTO TABLE deforestacion
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235050_cba9f1bc-d763-4087-a527-9e799bfa01a1); Time taken: 0.137 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235050_cba9f1bc-d763-4087-a527-9e799bfa01a1): LOAD DATA INPATH '/user/cloudera/hadoopFicheros/datospractica/deforestationSubSahara.csv' INTO TABLE deforestacion
INFO  : Starting task [Stage-0:MOVE] in serial mode
IN

Comprobamos que los datos se han introducido correctamente ejecutando una orden select.

In [15]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e "select * from deforestacion limit 10;"

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235151_e9c06a5a-4930-4bc1-be37-bc286c3ec933): select * from deforestacion limit 10
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:deforestacion.worldbankregion, type:string, comment:null), FieldSchema(name:deforestacion.country, type:string, comment:null), FieldSchema(name:deforestacion.iso3, type:string, comment:null), FieldSchema(name:deforestacion.wdpaid, type:string, comment:null), FieldSchema(name:deforestacion.parkname, type:string, comment:null), FieldSchema(name:deforestacion.year, type:string, comment:null), FieldSchema(name:deforestacion.outsidedeforestation, type:string, comment:null), FieldSchema(name:deforestacion.insidedeforestation, type:string, c

## Ejercicio 3

#### El tercer ejercicio consiste en crear una vista con los campos iso3, wdpa-id, nombre del parque, porcentaje de deforestación interna, y porcentaje de deforestación externa.

Para crear la vista será suficiente con la siguiente instrucción:

In [16]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e \
"create view vista as select  iso3, wdpaid, parkname, outsidedeforestation, insidedeforestation from deforestacion;"

scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235151_bc59a3f5-9fde-4b38-893b-fdb778b732af): create view vista as select  iso3, wdpaid, parkname, outsidedeforestation, insidedeforestation from deforestacion
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:iso3, type:string, comment:null), FieldSchema(name:wdpaid, type:string, comment:null), FieldSchema(name:parkname, type:string, comment:null), FieldSchema(name:outsidedeforestation, type:string, comment:null), FieldSchema(name:insidedeforestation, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235151_bc59a3f5-9fde-4b38-893b-fdb778b732af); Time taken: 0.126 seconds
INFO  : Concurrency mode is disabled, not

## Ejercicio 4

#### El cuarto ejercicio consiste en realizar 3 consultas a la base de datos.

##### Apartado 1

Consulta para obtener el porcentaje de deforestación interno del parque Rusizi en el año 2011:

In [17]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e \
"select insidedeforestation from deforestacion where year='2011' and parkname='Rusizi';"

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235151_e3e8fff0-98bf-4bbe-87c9-555f8a35867e): select insidedeforestation from deforestacion where year='2011' and parkname='Rusizi'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:insidedeforestation, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235151_e3e8fff0-98bf-4bbe-87c9-555f8a35867e); Time taken: 0.155 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235151_e3e8fff0-98bf-4bbe-87c9-555f8a35867e): select insidedeforestation from deforestacion where year='2011' and parkname='Rusizi'
INFO  : Completed executing command(queryId=hive_2

##### Apartado 2

Consulta para obtener los 10 parques con mayor porcentaje medio de deforestación interno (se realiza sobre la vista creada en el ejercicio 3):

In [18]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e \
"select parkname, avg(insidedeforestation) as media from vista group by parkname order by media desc limit 10;"

scan complete in 4ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235252_650cd279-18b6-4b66-ac66-19dac1e09b80): select parkname, avg(insidedeforestation) as media from vista group by parkname order by media desc limit 10
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:parkname, type:string, comment:null), FieldSchema(name:media, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235252_650cd279-18b6-4b66-ac66-19dac1e09b80); Time taken: 0.212 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235252_650cd279-18b6-4b66-ac66-19dac1e09b80): select parkname, avg(insidedeforestation) as media from vista group by p

##### Apartado 3

Consulta para obtener los 10 países con mayor porcentaje medio de deforestación en el año 2012:

In [19]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e \
"select country, sum(insidedeforestation) as total from deforestacion where year='2012' group by country order by total desc limit 10;"

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235454_abeb8ea2-26ff-43be-b86c-6261072ae151): select country, sum(insidedeforestation) as total from deforestacion where year='2012' group by country order by total desc limit 10
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:country, type:string, comment:null), FieldSchema(name:total, type:double, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235454_abeb8ea2-26ff-43be-b86c-6261072ae151); Time taken: 0.291 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235454_abeb8ea2-26ff-43be-b86c-6261072ae151): select country, sum(insidedeforestation) as total

## Ejercicio 5

#### El quinto ejercicio consiste en desarrollar un código MapReduce en Python que implemente la consulta del apartado 1 del ejercicio 4.

Se crean las carpetas necesarias y el mapper, ya que en éste caso no es necesario un reducer

In [20]:
! mkdir -p Ficheros/MAPPER

In [21]:
! ls -l Ficheros

total 8
drwxrwxr-x 2 cloudera cloudera 4096 Oct 29 23:49 DB
drwxrwxr-x 2 cloudera cloudera 4096 Oct 29 23:55 MAPPER


El código mapper.py es el siguiente. En él no se han considerado los campos compuestos, ya que se conoce de antemano la respuesta al haber efectuado la misma consulta en el apartado 1 del ejercicio 4 y haber comprobado que funciona correctamente, ya que el campo solucitado no es compuesto:

In [22]:
%%writefile Ficheros/MAPPER/mapper.py
#!/usr/bin/python
 
import sys

# Recorremos la entrada 
for line in sys.stdin:
    
    # Separamos por comas
    data = line.strip().split(",")

    # Filtramos las lineas por numero de columnas
    if len(data) == 8:
        
        # Obtenemos los datos
        WorldBankRegion, country, iso3, wdpaId, parkName, year, outsideDeforestation, insideDeforestation = data
    
    # Filtramos con las condiciones del apartado 1 del ejercicio 4
    if year=="2011" and parkName=="Rusizi":

        # Las imprimimos
        print "{0}".format(outsideDeforestation)


Writing Ficheros/MAPPER/mapper.py


In [23]:
! ls -l Ficheros/MAPPER

total 4
-rw-rw-r-- 1 cloudera cloudera 562 Oct 29 23:55 mapper.py


Comprobaremos que el código map funciona correctamente con el código de abajo, que simula localmente la ejecución MapReduce.

In [24]:
! cat deforestationSubSahara.csv | python Ficheros/MAPPER/mapper.py > Ficheros/MAPPER/salidamapper

In [25]:
! ls -l Ficheros/MAPPER

total 8
-rw-rw-r-- 1 cloudera cloudera 562 Oct 29 23:55 mapper.py
-rw-rw-r-- 1 cloudera cloudera   6 Oct 29 23:55 salidamapper


In [26]:
! cat Ficheros/MAPPER/salidamapper

29.36


Ahora probaremos el código en Hadoop. Para ello lo primero tendremos que cargar el fichero de texto en HDFS, y tras eso, podremos ejecutar nuestro código MapReduce en Hadoop. Al hacer esto, estaremos utilizando toda la potencia de nuestro cluster.

In [27]:
! hadoop fs -mkdir -p hadoopFicheros/tmp

In [28]:
! hadoop fs -put deforestationSubSahara.csv hadoopFicheros/tmp

In [29]:
! hadoop fs -ls hadoopFicheros/tmp

Found 1 items
-rw-r--r--   1 cloudera cloudera     547153 2019-10-29 23:56 hadoopFicheros/tmp/deforestationSubSahara.csv


In [30]:
! hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -file Ficheros/MAPPER/mapper.py \
-mapper Ficheros/MAPPER/mapper.py -input hadoopFicheros/tmp/deforestationSubSahara.csv \
-output hadoopFicheros/tmp/salida-mapper

19/10/29 23:56:17 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [Ficheros/MAPPER/mapper.py] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.12.0.jar] /tmp/streamjob2583890420813175416.jar tmpDir=null
19/10/29 23:56:21 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/10/29 23:56:22 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/10/29 23:56:23 INFO mapred.FileInputFormat: Total input paths to process : 1
19/10/29 23:56:23 INFO mapreduce.JobSubmitter: number of splits:2
19/10/29 23:56:24 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572393120588_0032
19/10/29 23:56:24 INFO impl.YarnClientImpl: Submitted application application_1572393120588_0032
19/10/29 23:56:24 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1572393120588_0032/
19/10/29 23:56:24 INFO mapreduce.Job: Running job: job_1572393120588_0032
19/10/29

In [31]:
! hadoop fs -ls hadoopFicheros/tmp/salida-mapper/*

-rw-r--r--   1 cloudera cloudera          0 2019-10-29 23:57 hadoopFicheros/tmp/salida-mapper/_SUCCESS
-rw-r--r--   1 cloudera cloudera          7 2019-10-29 23:57 hadoopFicheros/tmp/salida-mapper/part-00000


In [32]:
! hadoop fs -cat hadoopFicheros/tmp/salida-mapper/part-00000

29.36	


## Limpieza

#### Tabla

In [33]:
! beeline -u "jdbc:hive2://localhost:10000/deforestacion" -e "drop database deforestacion cascade;"

scan complete in 2ms
Connecting to jdbc:hive2://localhost:10000/deforestacion
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235757_aa7b860e-7253-4e98-80b3-3cd96754a6d5): drop database deforestacion cascade
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235757_aa7b860e-7253-4e98-80b3-3cd96754a6d5); Time taken: 0.238 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235757_aa7b860e-7253-4e98-80b3-3cd96754a6d5): drop database deforestacion cascade
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191029235757_aa7b860e-7253-4e98-80b3-3cd96754a6d5); Time taken: 0.31 seconds
INFO  : OK
No rows affected (0.66

In [34]:
! beeline -u "jdbc:hive2://localhost:10000/default" -e "show tables;"

scan complete in 3ms
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.1.0-cdh5.12.0)
Driver: Hive JDBC (version 1.1.0-cdh5.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
INFO  : Compiling command(queryId=hive_20191029235858_f55a045d-b077-4a6d-b71c-c75051c343b3): show tables
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20191029235858_f55a045d-b077-4a6d-b71c-c75051c343b3); Time taken: 0.097 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20191029235858_f55a045d-b077-4a6d-b71c-c75051c343b3): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20191029235858_f55a045d-b077-4a6d-b71c-c75051c343b3); Time taken: 0.031 seconds
INFO  : OK
+----------

#### Hadoop fs

In [35]:
! hadoop fs -rm -r hadoopFicheros

Deleted hadoopFicheros


In [36]:
! hadoop fs -ls

Tras la limpieza, todavía quedará el directorio /user/$(whoami) que crea Hive al insertar la base de datos.

#### Ficheros locales

In [37]:
! rm -r Ficheros

In [38]:
! ls -l

total 588
-rwxrwx--- 1 cloudera cloudera 547153 Jul  2 01:19 deforestationSubSahara.csv
-rwxrwx--- 1 cloudera cloudera  52849 Oct 29 23:56 Practica.ipynb
