
## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [0]:
# File location and type
file_location = "/FileStore/tables/Iris-1.csv"
file_type = "csv"


In [0]:
df = spark.read.csv(file_location,header=True,inferSchema=True)

In [0]:
df.printSchema()

root
 |-- Id: integer (nullable = true)
 |-- SepalLengthCm: double (nullable = true)
 |-- SepalWidthCm: double (nullable = true)
 |-- PetalLengthCm: double (nullable = true)
 |-- PetalWidthCm: double (nullable = true)
 |-- Species: string (nullable = true)



In [0]:
from pyspark.ml.feature import VectorAssembler
vassem = VectorAssembler(inputCols=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm'],outputCol='indp_feats')

In [0]:
output = vassem.transform(df)

In [0]:
from pyspark.ml.feature import StringIndexer
indexer = StringIndexer(inputCol='Species', outputCol='species_lbl')
output_fit = indexer.fit(output).transform(output)

In [0]:
fin_data = output_fit.select('indp_feats','species_lbl')

In [0]:
from pyspark.ml.regression import LinearRegression
train, test = fin_data.randomSplit([0.8,0.2])
lr = LinearRegression(featuresCol='indp_feats',labelCol='species_lbl')
lr = lr.fit(train)

In [0]:
lr.coefficients

DenseVector([-0.0969, -0.044, 0.2368, 0.5783])

In [0]:
lr.intercept

0.12780733034206518

In [0]:
pred = lr.evaluate(test)

In [0]:
pred.predictions.show()

+-----------------+-----------+--------------------+
|       indp_feats|species_lbl|          prediction|
+-----------------+-----------+--------------------+
|[4.6,3.4,1.4,0.3]|        0.0| 0.03752773143345772|
|[4.7,3.2,1.6,0.2]|        0.0|0.026171919445397304|
|[4.8,3.1,1.6,0.2]|        0.0|0.020883159117499148|
|[4.9,2.4,3.3,1.0]|        1.0|  0.9071419352038221|
|[5.0,2.3,3.3,1.0]|        1.0|  0.9018531748759241|
|[5.0,3.4,1.5,0.2]|        0.0|-0.03537243080245886|
|[5.1,3.7,1.5,0.4]|        0.0| 0.05739026489656751|
|[5.2,3.5,1.5,0.2]|        0.0|-0.05914925932417056|
|[5.4,3.7,1.5,0.2]|        0.0|-0.08732585713452098|
|[5.4,3.9,1.7,0.4]|        0.0| 0.06688358311161902|
|[5.5,2.3,4.0,1.3]|        1.0|   1.192640886312243|
|[5.7,2.5,5.0,2.0]|        2.0|  1.8060333828467594|
|[5.7,2.8,4.5,1.3]|        1.0|  1.2696610947408236|
|[5.8,2.6,4.0,1.2]|        1.0|  1.0925507230059783|
|[6.1,2.8,4.7,1.2]|        1.0|  1.2204401553258764|
|[6.1,2.9,4.7,1.4]|        1.0|  1.33169091921

In [0]:
pred.r2

0.9180512624946584

In [0]:
lr.save

[0;31m---------------------------------------------------------------------------[0m
[0;31mTypeError[0m                                 Traceback (most recent call last)
File [0;32m<command-616109454760452>, line 1[0m
[0;32m----> 1[0m [43mlr[49m[38;5;241;43m.[39;49m[43msave[49m[43m([49m[43m)[49m

[0;31mTypeError[0m: MLWritable.save() missing 1 required positional argument: 'path'