## Creating a Prediction Model with Virtuoso and Tensorflow

Creating a Tensorflow prediction model from data sourced directly from Virtuoso, using PyODBC. This example leverages PyODBC using SQL or SPASQL for retrieving data that’s applied to the prediction model.

The [dataset](https://www.kaggle.com/mathchi/diabetes-data-set) contains data about female patients of Pima Indian heritage, that are at least 21 years old.

Columns:

* Pregnancies: Number of times pregnant
* Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test
* BloodPressure: Diastolic blood pressure (mm Hg)
* SkinThickness: Triceps skin fold thickness (mm)
* Insulin: 2-Hour serum insulin (mu U/ml)
* BMI: Body mass index (weight in kg/(height in m)^2)
* DiabetesPedigreeFunction: Diabetes pedigree function
* Age: Age (years)
* Outcome: Class variable (0 or 1)

## Import Required Libraries

In [54]:
import pandas as pd
import pyodbc
import tensorflow as tf
import datetime
import numpy as np

SHUFFLE_BUFFER = 500
BATCH_SIZE = 2

### Set Connection, and Create Dataframes

In [55]:
cnxn = pyodbc.connect("DSN=Local Virtuoso;UID=dba;pwd=dba")

q = 'SELECT * FROM "tensorflow"."diabetes".data'

d = pd.read_sql_query(q,cnxn)

d.head()


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [56]:
df = pd.DataFrame(d)

outcome = df.pop("Outcome")

df.head()



Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In [57]:
feat_names = df.columns.tolist()
feats = df[feat_names]


### Convert to Tensor and continue in TensorFlow

In [58]:
print(tf.convert_to_tensor(feats))

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(feats)
normalizer(feats.iloc[:3])

tf.Tensor(
[[  6.    148.     72.    ...  33.6     0.627  50.   ]
 [  1.     85.     66.    ...  26.6     0.351  31.   ]
 [  8.    183.     64.    ...  23.3     0.672  32.   ]
 ...
 [  2.    122.     70.    ...  36.8     0.34   27.   ]
 [  5.    121.     72.    ...  26.2     0.245  30.   ]
 [  1.    126.     60.    ...  30.1     0.349  47.   ]], shape=(767, 8), dtype=float64)


<tf.Tensor: shape=(3, 8), dtype=float32, numpy=
array([[ 0.6387271 ,  0.84705454,  0.14960383,  0.90778947, -0.6935593 ,
         0.20362139,  0.4676379 ,  1.4246367 ],
       [-0.8458293 , -1.1243613 , -0.16038112,  0.5315605 , -0.6935593 ,
        -0.6842581 , -0.36549374, -0.1917781 ],
       [ 1.2325497 ,  1.9422855 , -0.26370946, -1.2868794 , -0.6935593 ,
        -1.1028302 ,  0.6034746 , -0.10670363]], dtype=float32)>

### Create Model

In [131]:
def get_basic_model():
  model = tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
  ])

  model.compile(optimizer='adam',
                loss = 'sparse_categorical_crossentropy',
                metrics=['accuracy'])
  return model

model = get_basic_model()

model.fit(feats, outcome, epochs=50, batch_size=BATCH_SIZE)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x161f66d90>

In [132]:
!mkdir -p saved_model
model.save('saved_model/my_model')

INFO:tensorflow:Assets written to: saved_model/my_model/assets


## Test Predictions

In [137]:
sample = [[6,148,72,35,0,33.6,.627,50],[6,148,72,35,0,33.6,.627,60],[1,85,66,29,0,26.6,0.351,55],[1,185,66,29,0,26.6,0.422,55]]
prediction = model.predict(sample)
print(prediction)

classes = np.argmax(prediction, axis = 1)
print(classes)

[[1.10631852e-04 9.99889374e-01 7.53423296e-11 6.22901383e-11
  1.18493035e-10 8.15207485e-11 8.30141997e-11 6.90038998e-11
  7.55178559e-11 1.15634294e-10]
 [3.66192951e-04 9.99633789e-01 1.29888617e-11 7.91407651e-12
  1.96529078e-11 1.83023527e-11 1.33574072e-11 1.07776791e-11
  1.47946117e-11 2.84010333e-11]
 [9.52655494e-01 4.73444536e-02 8.03585948e-11 2.52314176e-11
  5.23639916e-11 1.22750241e-10 4.88582542e-11 1.01292906e-11
  5.94956237e-11 8.83622620e-11]
 [3.40772808e-01 6.59226656e-01 2.92345490e-08 3.36105330e-08
  6.69027287e-08 7.89286290e-08 5.25233723e-08 3.98892190e-08
  6.42667501e-08 9.31192119e-08]]
[1 1 0 1]


## Extra: Prediction via SPARQL query

### Get SPARQL Query Results using SPARQL-within-SQL (SPASQL)

In [138]:
q3 = '\
SPARQL \
PREFIX : <#> \
SELECT * \
FROM <urn:diabetes:data:test> \
WHERE \
{\
   ?id :pregnancies ?pregnancies; \
   :glucose ?glucose; :bloodPressure \
   ?bp; :skinThickness ?st; \
   :insulin ?insulin; \
   :bmi ?bmi; \
   :diabetesPedigreeFunction ?dbf; \
   :age ?age\
}'

cursor = cnxn.cursor()

cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')

cursor.execute(q3)

results = [[]]

#Get the first row
rows = cursor.fetchone()

print(rows)

for item in range(1,9) :
    results[0].append(float(rows[item]))

('#1', '0', '85', '75', '35', '0', '30.39999961853027', '0', '35')


### Get Prediction

In [98]:
prediction2 = model.predict(results)
classes2 = np.argmax(prediction2, axis = 1)
print(classes2)

    

[0]
