<font color=gray>ADS Sample Notebook.

Copyright (c) 2021 Oracle, Inc. All rights reserved. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Getting Started with Oracle Cloud Infrastructure Data Science</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Service Overview

Welcome to Oracle Cloud Infrastructure Data Science Service!

Oracle Cloud Infrastructure Data Science Service is a fully managed platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure.

The Data Science Service:

* Provides data scientists with a collaborative, project-driven workspace.
* Enables self-service access to infrastructure for data science workloads.
* Includes Python-centric tools, libraries, and packages developed by the open-source community and the [Oracle Accelerated Data Science Library](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html), which supports the end-to-end lifecycle of predictive models:
* Data acquisition, profiling, preparation, and visualization.
* Feature engineering.
* Model training.
* Model evaluation, explanation, and interpretation.
* Model storage through the [Model Catalog](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/manage-models.htm). 
* Model deployment.
* Integrates with the rest of the Oracle Cloud Infrastructure stack, including [Oracle Functions](https://docs.cloud.oracle.com/en-us/iaas/Content/Functions/Concepts/functionsoverview.htm), [Data Flow](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow.htm), [Autonomous Data Warehouse](https://docs.cloud.oracle.com/en-us/iaas/Content/Database/Concepts/adboverview.htm), [Streaming](https://docs.cloud.oracle.com/en-us/iaas/Content/Streaming/Concepts/streamingoverview.htm), [Vault](https://docs.cloud.oracle.com/en-us/iaas/Content/KeyManagement/Concepts/keyoverview.htm), [Logging](https://docs.cloud.oracle.com/en-us/iaas/Content/Logging/Concepts/loggingoverview.htm#loggingoverview), and [Object Storage](https://docs.cloud.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm).
* Helps data scientists concentrate on methodology and domain expertise to deliver more models to production.

For more details, check out the [Data Science service guide](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm).

---

## Overview

This TensorFlow for CPUs [conda environment](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments) is an ecosystem to create state-of-the-art machine learning models. You can use TensorFlow to train and deploy deep neural networks for image recognition, natural language processing, recurrent neural networks, and other machine learning applications. Use the `ads-lite` library to speed up your data science workflow.

The notebook examples emphasize the use of [Oracle Accelerated Data Science SDK (ADS)](https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html) for a variety of use cases.

---

**Important:**

Placeholder text for required values are surrounded by angle brackets that must be removed when adding the indicated content. For example, when adding a database name to `database_name = "<database_name>"` would become `database_name = "production"`.

---

## Prerequisites:
- Experience with a specific topic: Novice
- Professional experience: None

---

## Objectives:
- <a href='#authentication'>Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session</a>
    - <a href='#resource_principals'>Authentication with Resource Principals</a>
        - <a href='#resource_principals_ads'>Resource Principals Authentication using the ADS SDK</a>
        - <a href='#resource_principals_oci'>Resource Principals Authentication using the OCI SDK</a>
        - <a href='#resource_principals_cli'>Resource Principals Authentication using the OCI CLI</a> 
    - <a href='#api_keys'>Authentication with API Keys</a>
- <a href='#conda'>Conda</a>
    - <a href='#conda_overview'>Overview</a>
    - <a href='#conda_libraries'>Principal Conda Libraries</a>
    - <a href='#conda_configuration'>Configuration</a>
- <a href='#ref'>References</a> 

---

In [None]:
import logging
import warnings

from ads import set_auth
from ads import set_documentation_mode
from oci.auth.signers import get_resource_principals_signer
from oci.data_science import DataScienceClient
from os import popen

set_documentation_mode(False)
warnings.filterwarnings('ignore')
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

<a id='authentication'></a>
# Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session

When working within a notebook session, the `datascience` user is used. This user does not have an Oracle Cloud Infrastructure Identity and Access Management (IAM) identity, so it has no access to the Oracle Cloud Infrastructure API. To access Oracle Cloud Infrastructure resources, including Data Science projects, models and any other Oracle Cloud Infrastructure service resources from the notebook environment, you must configure either resource principals or API keys. For most applications, the resource principal is the recommended approach.

<a id='resource_principals'></a>
## Authentication with Resource Principals

Oracle Cloud Infrastructure Data Science enables easy and secure authentication using the notebook session's resource principal to access other Oracle Cloud Infrastructure resources, including Data Science projects and models. Follow the steps below to utilize your notebook session's resource principal.

In advance, a tenancy administrator must write policies to grant permissions to the resource principal to access other Oracle Cloud Infrastructure resources, see [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/configure-tenancy.htm) for more details.

There are two methods to configure the notebook to use resource principals and they are `ads` library or using the `oci` library. While both these libraries provide the required authentication, the `ads` library is has been specifically designed for easy operation within a Data Science notebook session.

If you do not wish to take on these library dependencies, it is also possible to use the `oci` command on the command line.

For more details on using resource principals in the Data Science service, see the [ADS Configuration Guide](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#) and the [Authenticating to the Oracle Cloud Infrastructure APIs from a Notebook Session](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb).

<a id='resource_principals_ads'></a>
### Resource Principals Authentication using the ADS SDK

Within a notebook session, configure the use of a resource principal for the ADS SDK by running this in a notebook cell:

The `set_auth()` method sets the proper authentication mechanism for ADS. ADS uses the `oci` SDK to access resources like the model catalog or Object Storage.

In [None]:
set_auth(auth='resource_principal') 

<a id='resource_principals_oci'></a>
### Resource Principals Authentication using the OCI SDK

Within your notebook session, the `oci` library can use the resource principal. The following cell demonstrates how to make a basic connection using the default settings.

In [None]:
resource_principal = get_resource_principals_signer() 
dsc = DataScienceClient(config={}, signer=resource_principal)

<a id='resource_principals_cli'></a>
### Resource Principals Authentication using the OCI CLI

Within a notebook session, the Oracle Cloud Infrastructure CLI can be used to configure the resource principal using the `--auth=resource_principal` flag. For example:

In [None]:
cmd = "oci data-science project get --project-id=$PROJECT_OCID --auth=resource_principal 2>&1"
print(popen(cmd).read())

If the resource principal is correctly configured, a message similar to the following will be printed.

```
{
"data": {
"compartment-id": "ocid1.compartment.oc1..aaaaaaaafl3avkal72rrwuy4m5rumpwh7r4axejjwq5hvwjy4h4uoyi7kzyq",
"created-by": "ocid1.user.oc1..aaaaaaaabfrlcbiyvjmjvgh3ns6trdyoewxytqywwta3yqmy3ah3fa3uw76q",
"defined-tags": {},
"description": "my favorite demo project\n",
"display-name": "jr-demo-project",
"freeform-tags": {},
"id": "ocid1.datascienceproject.oc1.iad.aaaaaaaappvg4tp5kmbkurcyeghxaqmaknw3s5yh2oxcvfrvjeaadinsng6q",
"lifecycle-state": "ACTIVE",
"time-created": "2019-11-14T22:29:06.870000+00:00"
},
"etag": "b4d66fb733748f3454206d5de6b9acb3634edc804b2ad1997bd69dc676035a89"
}
```

<a id='api_keys'></a>
## Authentication with API Keys

If resource principals are not explicitly used, API Keys are used by default. For some use cases, you may want to set up API keys. See the example notebook `api_keys.ipynb` for instructions.


<a id='conda'></a>
# TensorFlow for CPUs Conda Environment

<a id='conda_overview'></a>
## Overview

This TensorFlow for CPUs [conda environment](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments) is an ecosystem to create state-of-the-art machine learning models. Use TensorFlow to train and deploy deep neural networks for image recognition, natural language processing, recurrent neural networks and other machine learning applications. This environment also includes the Oracle Accelerated Data Science (ADS) library. The purpose of this conda is to offer a baseline of libraries that are used in many TensorFlow machine learning projects.

You can access notebook examples for this conda environment in JupyterLab from the **Launcher** tab by clicking **Notebook Examples**. From here you can select one of the notebook examples available for the conda environments installed in your notebook session. First pick the conda environment of your choice in the first dropdown and then pick an associated notebook example in the second dropdown.

For a description of each notebook example, see [Overview of the Notebook Examples](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#overview_of_the_notebook_examples).

<a id='conda_libraries'></a>
## Principal Conda Libraries

1. `ads-lite`
2. `category-encoders`
3. `pandas`
4. `scikit-learn`
5. `tensorflow`

<a id='conda_configuration'></a>
## Configuration

There is no additional configuration needed to use this conda.

# References

* [Understanding and Using Conda Environments](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)
* [ADS Configuration Guide](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#)
* [Authenticating to the Oracle Cloud Infrastructure APIs from a Notebook Session](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb)
* [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/configure-tenancy.htm)
* [Data Science service guide](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
* [Our Data Science & AI Blog](https://blogs.oracle.com/datascience/)