# Oracle Machine Learning for Python - Datastore and Script Repository
Oracle Machine Learning for Python (OML4Py), a component of the Oracle Advanced Analytics option to Oracle Database Enterprise Edition, makes the open source Python scripting language and environment ready for the enterprise and big data. Designed for problems involving both large and small volumes of data, Oracle Machine Learning for Python integrates Python with Oracle Database, allowing users to execute Python commands and scripts for statistical, machine learning, and graphical analyses on database tables and views using Python syntax. Many familiar Python functions are overloaded and translate Python functions into SQL for in-database execution, as well as new automated machine learning capabilities. 
![title](img/OML4P_icon.jpg)
In this notebook, we highlight the datastore and script repository features of OML4Py. 

With a datastore, you can store Python objects, which you can then use in subsequent Python sessions and even make those objects available to other users or programs by granting/revoking read permissions.

Python objects, including OML4Py proxy objects, exist for the duration of the current Python session unless you explicitly save them. You can save one or more Python objects, including oml proxy objects, to a named datastore and then load those objects in a later Python session, including when using embedded Python execution. Datastores exist in the user’s Oracle Database schema. A datastore, and the objects it contains, persist in the database until explicitly deleted.

Using a datastore, you can:

* Save OML4Py and other Python objects in one Python session and load them in another Python session
* Pass non-scalar arguments to Python functions for use in embedded Python execution from the Python, but more importantly, from the SQL API
* List available datastores and explore the contents of a datastore

With the script repository, users can:

* Create and store user-defined Python functions as scripts in Oracle Database
* Grant or revoke the read privilege to a script
* List available scripts
* Load a script function into the Python environment
* Drop a script from the script repository

OML4Py has both a Python and SQL interface for creating and managing scripts in the script repository. You can make scripts either private or global. A private script is available only to the owner. A global script is available to any user. For private scripts, the owner of the script may grant the read privilege to other users or revoke that privilege.

# Connect to Oracle Database
To use OML4Py, first import the package ***oml***. OML4Py supports a variety of connection specification options, including Oracle Wallet. Once connected to an Oracle Database that has OML4Py installed, invoking ***oml.isconnected*** returns true. 

In [1]:
import oml
oml.connect(user="pyquser",password="Welcome1#Welcome1#",dsn='(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=130.61.241.158)(PORT=1521))(CONNECT_DATA=(service_name=pdb1.sub12041412510.bdcevcn.oraclevcn.com)))')
oml.isconnected()

True

# Create a Pandas DataFrame objects and load into Oracle Database
We load three data sets, combining target and predictors into a single DataFrame, before invoking ***create*** and displaying the columns for each. These will be used in exploring _datastore_ functionality.  

In [2]:
from sklearn import datasets
from sklearn import linear_model
import pandas as pd

# Load three data sets and create oml.DataFrame objects for them.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data, columns = ['SEPAL_LENGTH','SEPAL_WIDTH',
                                       'PETAL_LENGTH','PETAL_WIDTH'])
y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor',
                 2:'virginica'}[x], iris.target)), columns = ['Species'])
try:
    oml.drop(table='IRIS')
except:
    pass
IRIS = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
iris = pd.concat([x, y], axis=1)
print(IRIS.columns)

['SEPAL_LENGTH', 'SEPAL_WIDTH', 'PETAL_LENGTH', 'PETAL_WIDTH', 'Species']


In [3]:
diabetes = datasets.load_diabetes()
x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.DataFrame(diabetes.target, columns=['disease_progression'])

DIABETES_TMP = oml.push(pd.concat([x, y], axis=1))
print(DIABETES_TMP.columns)

['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'disease_progression']


In [4]:
boston = datasets.load_boston()
x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
y = pd.DataFrame(boston.target, columns = ['Value'])

BOSTON_TMP = oml.push(pd.concat([x, y], axis=1))
print(BOSTON_TMP.columns)

['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'Value']


Save the actual _iris_ data and the temporary BOSTON proxy object to a datastore named "ds_pydata", overwriting if the named datastore already exists. Note that you can store actual data objects in a datastore, but large data objects should remain as database tables for performance and scalability. 

By storing the BOSTON_TMP object, the temporary table will not be deleted when the session terminates. 

In [5]:
oml.ds.save(objs={'iris':iris, 'oml_boston':BOSTON_TMP},
            name="ds_pydata", description = "python datasets", overwrite=True)

Add a third object, the temporary DIABETES proxy object, to that same datastore. 

In [None]:
oml.ds.save(objs={'oml_diabetes':DIABETES_TMP},
                  name="ds_pydata", append=True)

Save the _iris_ DataFrame to a new datastore, and then list the datastores. Notice we see the datastore name, the number of objects in the datastore, the size in bytes consumed, when the datastore was create/updated, and any description provided by the user. Our two datastores _ds_iris_data_ and _ds_pydata_ are present, with the latter containing three objects. 

In [None]:
oml.ds.save(objs={'iris':iris},
            name="ds_iris_data", description = "iris dataset", overwrite=True)

oml.ds.dir()

To illustrate storing other types of objects in datastores, we'll create regression models using sklearn and OML4Py.

In [None]:
regr1 = linear_model.LinearRegression()
regr1.fit(boston.data, boston.target)

regr2 = oml.glm("regression")
X = BOSTON_TMP.drop('Value')
y = BOSTON_TMP['Value']
regr2 = regr2.fit(X, y)

Save objects to a datastore and allow the read privilege to be granted to them. Then grant the read privilege to the datastore to all users by specifying ***user=None***. Finally list the datastores to which the read privilege has been granted.

In [None]:
oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
            name="ds_pymodel", grantable=True, overwrite=True)

oml.ds.dir()

In [None]:
oml.grant(name="ds_pymodel", typ="datastore", user=None)

oml.ds.dir(dstype="grant")

Load all Python objects from a datastore to the global workspace and sort the result by name. Notice that they have the name specified in the dictionary when saved. 

In [None]:
sorted(oml.ds.load(name="ds_pydata"))

Load the named Python object, regr2, from the datastore to the global workspace.

In [None]:
oml.ds.load(name="ds_pymodel", objs=["regr2"])

Load the named Python object, regr1, from the datastore to the user's workspace.

In [None]:
oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)

Show all saved datastores.

In [None]:
oml.ds.dir(dstype="all")[['owner', 'datastore_name', 'object_count']]

We can then show datastores to which other users have been granted the read privilege.

In [None]:
oml.ds.dir(dstype="grant")

Or, show datastores to which this user has been granted the read privilege (there currently are none).

In [None]:
oml.ds.dir(dstype="granted")

Or, show datastores whose names match a pattern.

In [None]:
oml.ds.dir(name='pydata', regex_match=True)[['datastore_name', 'object_count']]

Let's describe the contents of the ds_pydata datastore. Notice that the three proxy objects are listed. 

In [None]:
oml.ds.describe(name='ds_pydata')

Revoke the read privilege from every user, and again show datastores to which read privilege has been granted. The result is empty.

In [None]:
oml.revoke(name="ds_pymodel", typ="datastore", user=None)

oml.ds.dir(dstype="grant")

Grant the read privilege to the user PYQUSER2.

In [None]:
oml.grant(name="ds_pymodel", typ="datastore", user="PYQUSER2")

oml.ds.dir(dstype="grant")

Delete some objects from the datastore.
Delete a datastore.
Delete all datastores whose names match a pattern.
Show the existing datastores again.

In [None]:
oml.ds.delete(name="ds_pydata", objs=["iris", "oml_boston"])

oml.ds.delete(name="ds_pydata")

oml.ds.delete(name="_pymodel", regex_match=True)

oml.ds.dir()

# Python Script Repository

To illustrate using the Python Script Repository, we first define a function ***build_lm1*** that will fit a regression model. With this function, we create a script named "MyLM_function". 

In [None]:
iris = datasets.load_iris()
x = pd.DataFrame(iris.data, columns = ['SEPAL_LENGTH','SEPAL_WIDTH',
                                       'PETAL_LENGTH','PETAL_WIDTH'])
y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor',
                          2:'virginica'}[x], iris.target)), 
                 columns = ['Species'])
IRIS2 = oml.push(pd.concat([x, y], axis=1))

In [None]:
def build_lm1(dat):
    from sklearn import linear_model
    regr = linear_model.LinearRegression()
    import pandas as pd
    dat = pd.get_dummies(dat, drop_first=True)
    X = dat[["SEPAL_WIDTH", "PETAL_LENGTH", "PETAL_WIDTH", 
             "Species_versicolor", "Species_virginica"]]
    y = dat[["SEPAL_LENGTH"]]
    regr.fit(X, y)
    return regr

oml.script.create("MyLM_function", func=build_lm1, overwrite=True)

List the scripts available only to the current user.

In [None]:
oml.script.dir(sctype='user')

Grant the read privilege to the MyLM_function script to the user PYQUSER2.

In [None]:
oml.grant(name="MyLM_function", typ="pyqscript", user="PYQUSER2")

List the scripts to which read privilege has been granted.

In [None]:
oml.script.dir(sctype="grant")

Revoke the read privilege to the MyLM_function script from the user PYQUSER2.

In [None]:
oml.revoke(name="MyLM_function", typ="pyqscript", user="PYQUSER2")

We'll use embedded Python execution to invoke this function. First we ***sync*** the IRIS table to get a proxy object, then use ***table_apply***, providing the proxy object, function name and the output type. We'll view the result and then pull the coefficients. 

In [None]:
res = oml.table_apply(IRIS2, func="MyLM_function", 
                      oml_input_type="pandas.DataFrame")
res

In [None]:
res.pull().coef_

Let's define and save another function ***build_lm2***, but this time as global. We'll then invoke that function to build another model. 

In [None]:
def build_lm2(dat):
  from sklearn import linear_model
  regr = linear_model.LinearRegression()
  X = dat[["PETAL_WIDTH"]]
  y = dat[["PETAL_LENGTH"]]
  regr.fit(X, y)
  return regr

oml.script.create("MyGlobalLM_function", func=build_lm2, is_global=True, overwrite=True)

In [None]:
res = oml.table_apply(IRIS, func="MyGlobalLM_function", 
                      oml_input_type="pandas.DataFrame")
res

List the scripts in the script repository available to the current user only.

In [None]:
oml.script.dir()

List all of the scripts available to the current user.

In [None]:
oml.script.dir(sctype='all')

List the scripts available to all users.

In [None]:
oml.script.dir(sctype='global')

Load the MyLM_function and MyGlobalLM_function scripts, and pull the models to the local Python session. For MYLM, build the model in on the IRIS data set and pull the coefficients.  For GlobalMYLM, build and display the model. 

In [None]:
MYLM = oml.script.load(name="MyLM_function")
GlobalMYLM = oml.script.load(name="MyGlobalLM_function")

print("Coefficients: ", MYLM(IRIS.pull()).coef_)
print("Model: ", GlobalMYLM(IRIS.pull()))

List the available scripts.

In [None]:
oml.script.dir(sctype="all")

Drop the private script.

Drop the global script.

List the available scripts again.

In [None]:
oml.script.drop("MyLM_function")
oml.script.drop("MyGlobalLM_function", is_global=True)
oml.script.dir(sctype="all")

<img src="img/Oracle-sm.jpg">