
## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [0]:
# File location and type
file_location = "/FileStore/tables/cancer_data.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

_c0,_c1,_c2,_c3,_c4,_c5
mean_radius,mean_texture,mean_perimeter,mean_area,mean_smoothness,diagnosis
17.99,10.38,122.8,1001.0,0.1184,0
20.57,17.77,132.9,1326.0,0.08474,0
19.69,21.25,130.0,1203.0,0.1096,0
11.42,20.38,77.58,386.1,0.1425,0
20.29,14.34,135.1,1297.0,0.1003,0
12.45,15.7,82.57,477.1,0.1278,0
18.25,19.98,119.6,1040.0,0.09463,0
13.71,20.83,90.2,577.9,0.1189,0
13.0,21.82,87.5,519.8,0.1273,0


In [0]:
# Create a view or table

temp_table_name = "cancer_data_csv"

df.createOrReplaceTempView(temp_table_name)

In [0]:
val=df.groupBy("_c0")
result=val.count()
result.show()

+-----+-----+
|  _c0|count|
+-----+-----+
|17.42|    1|
|13.87|    2|
|20.64|    1|
|8.618|    1|
|6.981|    1|
|15.49|    1|
|12.85|    1|
|9.668|    1|
|8.734|    1|
|10.97|    1|
|9.683|    1|
|28.11|    1|
| 12.8|    1|
| 14.2|    1|
| 16.6|    1|
|11.42|    1|
|8.888|    1|
|11.62|    1|
|9.029|    1|
|13.73|    1|
+-----+-----+
only showing top 20 rows

