## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [2]:
# File location and type
file_location = "/FileStore/tables/Street.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

In [3]:
# Create a view or table

temp_table_name = "Street_csv"

df.createOrReplaceTempView(temp_table_name)

In [4]:
import pandas as pd

In [5]:
df = pd.read_csv("/dbfs/FileStore/tables/Street.csv")

In [6]:
print(df)

In [7]:
df1 = df.iloc[:1000,:]
df1.head()

In [8]:
for idx, row in df1.iterrows():
    val = str(df1.loc[idx,'Agent'])
    if( val != 'WEBCONSUMER'):
      df1.loc[idx,'Agent'] = 'Person'

In [9]:
df2 = df1[['Agent','Street Ligh Complaint From','Call Status']]
df2 = df2.dropna()
df1.head()
#df2 = df2[df2['Street Ligh Complaint From'] != 'PUBLIC']

In [10]:
df2.Agent.replace(['WEBCONSUMER','Person'],[1,2],inplace=True)

In [11]:
df2['Street Ligh Complaint From'].replace(['RESIDENT','NIGHT PATROLLING','PUBLIC'],[1,2,3],inplace=True)

In [12]:
df2['Call Status'].replace(['OPEN','ESCALATED','CLOSED','RECTIFIED'],[1,2,3,4],inplace=True)

In [13]:
df2.head()

In [14]:
df2['Street Ligh Complaint From'].replace(['RESIDENT','NIGHT PATROLLING','PUBLIC'],[1,2,3],inplace=True)

In [15]:
#df2.head()
for idx,row in df2.iterrows():
  if(df2.loc[idx,'Call Status'] != 1 and df2.loc[idx,'Call Status'] != 2 and df2.loc[idx,'Call Status'] !=3):
    print(df2.loc[idx,'Call Status'])

In [16]:
trained = df2[['Agent','Street Ligh Complaint From','Call Status']]
trained = trained.dropna()
print(trained)

In [17]:
label = trained[['Call Status']].values
trained = trained[['Agent','Street Ligh Complaint From']].values

In [18]:
from sklearn.model_selection import train_test_split

In [19]:
train_feat,test_feat,train_lb,test_lb = train_test_split(trained,label)

In [20]:
from sklearn.svm import SVC

In [21]:
clf = SVC()
#train_lb = train_lb.astype('int')

In [22]:
train = clf.fit(train_feat,train_lb)

In [23]:
pred = train.predict(test_feat)

In [24]:
print(pred)

In [25]:
print(test_lb)

In [26]:
from sklearn.metrics import accuracy_score

In [27]:
acc = accuracy_score(test_lb,pred)

In [28]:
acc * 100