
## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [None]:
from pyspark.sql import SparkSession

#Creamos una sesión de Spark
spark = SparkSession.builder.appName("LecturaJSON").getOrCreate()

#Indicamos la ruta del JSON en DBFS(DataBricks File System)
ruta_dbfs_json = "/FileStore/tables/889_zubimendi.json"

#Leemos el JSON en un DataFrame
df = spark.read.option("multiline", "true").json(ruta_dbfs_json)

#Mostramos el DataFrame
df.show()

+-----------------+---+--------------------+----------------+-----------+----------------+---------+--------------------+------------+------+--------------+----------+---------+--------------------+----------+
|    averagePoints| id|              images|lastSeasonPoints|marketValue|            name| nickname|         playerStats|playerStatus|points|      position|positionId|     slug|                team|weekPoints|
+-----------------+---+--------------------+----------------+-----------+----------------+---------+--------------------+------------+------+--------------+----------+---------+--------------------+----------+
|6.176470588235294|889|{{https://assets-...|             183|   46061036|Martín Zubimendi|Zubimendi|[{false, {[1, 0],...|          ok|   105|Centrocampista|         3|zubimendi|{https://assets-f...|         6|
+-----------------+---+--------------------+----------------+-----------+----------------+---------+--------------------+------------+------+--------------+----

In [None]:

#Leemos el JSON en un DataFrame
df = spark.read.option("multiline", "true").json(ruta_dbfs_json)

#Mostramos estadísticas básicas(la descripción del DF)
df.describe()

Out[6]: DataFrame[summary: string, averagePoints: string, id: string, lastSeasonPoints: string, marketValue: string, name: string, nickname: string, playerStatus: string, points: string, position: string, positionId: string, slug: string, weekPoints: string]

In [None]:
#Se imprime el esquema del DataFrame
df.printSchema()

root
 |-- averagePoints: double (nullable = true)
 |-- id: string (nullable = true)
 |-- images: struct (nullable = true)
 |    |-- beat: struct (nullable = true)
 |    |    |-- 1024x1024: string (nullable = true)
 |    |    |-- 128x128: string (nullable = true)
 |    |    |-- 2048x2048: string (nullable = true)
 |    |    |-- 256x256: string (nullable = true)
 |    |    |-- 512x512: string (nullable = true)
 |    |    |-- 64x64: string (nullable = true)
 |    |-- big: struct (nullable = true)
 |    |    |-- 1024x1113: string (nullable = true)
 |    |    |-- 128x139: string (nullable = true)
 |    |    |-- 2048x2225: string (nullable = true)
 |    |    |-- 256x278: string (nullable = true)
 |    |    |-- 512x556: string (nullable = true)
 |    |    |-- 64x70: string (nullable = true)
 |    |-- transparent: struct (nullable = true)
 |    |    |-- 1024x1024: string (nullable = true)
 |    |    |-- 128x128: string (nullable = true)
 |    |    |-- 2048x2048: string (nullable = true)
 |    