In [None]:
# Upgrade Oracle ADS to pick up the latest preview version to maintain compatibility with Oracle Cloud Infrastructure.

!odsc conda install --uri https://objectstorage.us-ashburn-1.oraclecloud.com/p/qnzzHQPGQYghdyH206yDk25MZH1FaMGdNNhKUl74BhRsW4muvFyGViKIqpxgnxI3/n/ociodscdev/b/ads_conda_pack_builds/o/PySpark_3/teamcity_20230512_084146_38972446/f227145b7ee5fc1c73a69ebaa671b81e/PySpark_3.2_and_Feature_Store.tar.gz

Oracle Data Science service sample notebook.

Copyright (c) 2022 Oracle, Inc. All rights reserved. Licensed under the [Universal Permissive License v 1.0](https://oss.oracle.com/licenses/upl).

***

# <font color="red">Feature store quickstart</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color="teal">Oracle Cloud Infrastructure Data Science Service.</font></p>

---
# Overview:
---
Managing many datasets, data-sources and transformations for machine learning is complex and costly. Poorly cleaned data, data issues, bugs in transformations, data drift and training serving skew all leads to increased model development time and worse model performance. Here, feature store is well positioned to solve many of the problems since it provides a centralised way to transform and access data for training and serving time and helps defines a standardised pipeline for ingestion of data and querying of data.

## Contents:

- <a href="#concepts">1. Introduction</a>
- <a href='#pre-requisites'>2. Pre-requisites</a>
    - <a href='#policies'>2.1 Policies</a>
    - <a href='#prerequisites_authentication'>2.2 Authentication</a>
    - <a href='#prerequisites_variables'>2.3 Variables</a>
- <a href='#featurestore_overview'>3. Feature store quickstart using APIs</a>
    - <a href='#create_featurestore'>3.1. Create feature store</a>
    - <a href='#create_entity'>3.2. Create business entity in feature store</a>
    - <a href='#create_featuregroup'>3.3. Create feature group and upload data to feature group</a>
    - <a href='#query_featuregroup'>3.4. Query feature group</a>
    - <a href='#create_dataset'>3.5. Create dataset from multiple or one feature group</a>
    - <a href='#query_dataset'>3.6 Query dataset</a>
- <a href='#featurestore_yaml'>4. Feature store quickstart using YAML</a>
- <a href='#ref'>5. References</a>

---

**Important:**

Placeholder text for required values are surrounded by angle brackets that must be removed when adding the indicated content. For example, when adding a database name to `database_name = "<database_name>"` would become `database_name = "production"`.

---

Datasets are provided as a convenience.  Datasets are considered third-party content and are not considered materials under your agreement with Oracle.

This [`Citi Bike`](https://ride.citibikenyc.com/data-sharing-policy) dataset license is used in this notebook.

---

<a id="concepts"></a>
# 1. Introduction

Oracle feature store is a stack based solution that is deployed in the customer enclave using OCI resource manager. Customer can stand up the service with infrastructure in their own tenancy. The service consists of API which are deployed in customer tenancy using resource manager.

The following are some key terms that will help you understand OCI Data Science Feature Store:


* **Feature Vector**: Set of feature values for any one primary/identifier key. Eg. All/subset of features of customer id ‘2536’ can be called as one feature vector.

* **Feature**: A feature is an individual measurable property or characteristic of a phenomenon being observed.

* **Entity**: An entity is a group of semantically related features. The first step a consumer of features would typically do when accessing the feature store service is to list the entities and the entities associated features. Another way to look at it is that an entity is an object or concept that is described by its features. Examples of entities could be customer, product, transaction, review, image, document, etc.

* **Feature Group**: A feature group in a feature store is a collection of related features that are often used together in ml models. It serves as an organizational unit within the feature store for users to manage, version and share features across different ml projects. By organizing features into groups, data scientists and ml engineers can efficiently discover, reuse and collaborate on features reducing the redundant work and ensuring consistency in feature engineering.

* **Feature Group Job**: Feature group job is the execution instance of a feature group. Each feature group job will include validation results and statistics results.

* **Dataset**: A dataset is a collection of feature that are used together to either train a model or perform model inference.

* **Dataset Job**: Dataset job is the execution instance of a dataset. Each dataset job will include validation results and statistics results.

<a id='pre-requisites'></a>
# 2. Pre-requisites

Notebook Sessions are accessible through the following conda environment: 

* **PySpark 3.2 and Feature store 1.0 (fs_pyspark32_p38_cpu_v1)**

You can customize `fs_pyspark32_p38_cpu_v1`, publish it, and use it as a runtime environment for a Notebook session cluster. 

<a id='setup_spark-defaults'></a>
### `spark-defaults.conf`

The `spark-defaults.conf` file is used to define the properties that are used by Spark. A templated version is installed when you install a Data Science conda environment that supports PySpark. However, you must update the template so that the Data Catalog metastore can be accessed. You can do this manually. However, the `odsc data-catalog config` commandline tool is ideal for setting up the file because it gathers information about your environment, and uses that to build the file.

The `odsc data-catalog config` command line tool needs the `--metastore` option to define the Data Catalog metastore OCID. No other command line option is needed because settings have default values, or they take values from your notebook session environment. Following are common parameters that you may need to override.

The `--authentication` option sets the authentication mode. It supports resource principal and API keys. The preferred method for authentication is resource principal, which is sent with `--authentication resource_principal`. If you want to use API keys, then use the `--authentication api_key` option. If the `--authentication` isn't specified, API keys are used. When API keys are used, information from the OCI configuration file is used to create the `spark-defaults.conf` file.

Object Storage and Data Catalog are regional services. By default, the region is set to the region your notebook session is running in. This information is taken from the environment variable, `NB_REGION`. Use the `--region` option to override this behavior.

The default location of the `spark-defaults.conf` file is `/home/datascience/spark_conf_dir` as defined in the `SPARK_CONF_DIR` environment variable. Use the `--output` option to define the directory where to write the file.

You need to determine what settings are appropriate for your configuration. However, the following works for most configurations and is run in a terminal window.

```bash
odsc data-catalog config --authentication resource_principal --metastore <metastore_id>
```
For more assistance, use the following command in a terminal window:

```bash
odsc data-catalog config --help
```

<a id='setup_session'></a>
### Session Setup

The notebook makes connections to the Data Catalog metastore and Object Storage. In the next cell, specify the bucket URI to act as the data warehouse. Use the `warehouse_uri` variable with the `oci://<bucket_name>@<namespace_name>/<key>` format. Update the variable `metastore_id` with the OCID of the Data Catalog metastore.

<a id='policies'></a>
### 2.1. Policies
This section covers the creation of dynamic groups and policies needed to use the service.

* [About Data Science Policies](https://docs.oracle.com/iaas/data-science/using/policies.htm)
* [Data Catalog Metastore Required Policies](https://docs.oracle.com/en-us/iaas/data-catalog/using/metastore.htm)

<a id="prerequisites_authentication"></a>
### 2.2. Authentication
The [Oracle Accelerated Data Science SDK (ADS)](https://docs.oracle.com/iaas/tools/ads-sdk/latest/index.html) controls the authentication mechanism with the notebook Spark cluster.<br> 
To setup authentication use the ```ads.set_auth("resource_principal")``` or ```ads.set_auth("api_key")```. 

In [None]:
import ads
ads.set_auth(auth="api_key", client_kwargs={"service_endpoint": "http://localhost:21000/20230101"})

<a id="prerequisites_variables"></a>
### 2.3. Variables
To run this notebook, you must provide some information about your tenancy configuration. To create and run a feature store, you must specify a `<compartment_id>` and bucket `<metastore_id>` for storing logs. The [Data Catalog Hive Metastore](https://docs.oracle.com/en-us/iaas/data-catalog/using/metastore.htm) provides schema definitions for objects in structured and unstructured data assets. The Metastore is the central metadata repository to understand tables backed by files on object storage and the metastore id of hive metastore is tied to feature store construct of feature store service.

In [None]:
import os

compartment_id = "ocid1.tenancy.oc1..aaaaaaaa462hfhplpx652b32ix62xrdijppq2c7okwcqjlgrbknhgtj2kofa"
metastore_id = "ocid1.datacatalogmetastore.oc1.iad.amaaaaaabiudgxyap7tizm4gscwz7amu7dixz7ml3mtesqzzwwg3urvvdgua"

<a id="featurestore_overview"></a>
# 3. Feature store quick start using APIs
By default the **PySpark 3.2, Feature store and Data Flow** conda environment includes pre-installed [great-expectations](https://legacy.docs.greatexpectations.io/en/latest/reference/core_concepts/validation.html) and [deeque](https://github.com/awslabs/deequ) libraries. In an ADS feature store module, you can either use the Python programmatic or YAML interface to define feature store entities. Below section describes how to create feature store entities using programmatic interface.

In [None]:
import pandas as pd 
from ads.feature_store.feature_store import FeatureStore
from ads.feature_store.dataset import Dataset
from ads.feature_store.feature_group import FeatureGroup
from ads.feature_store.feature_store_registrar import FeatureStoreRegistrar
from ads.feature_store.common.enums import ExpectationType

<a id="create_featurestore"></a>
### 3.1 Create feature store
Feature store is a top level construct to provide logical segregation of resources

In [5]:
feature_store_resource = (
    FeatureStore().
    with_description("Data consisting of bike riders data").
    with_compartment_id(compartment_id).
    with_display_name("Bike rides").
    with_offline_config(metastore_id=metastore_id)
)

In [6]:
feature_store = feature_store_resource.create()

<a id="create_entity"></a>
### 3.2 Create entity
An entity is a group of semantically related features. The first step a consumer of features would typically do when accessing the feature store service is to list the entities and the entities associated features. Another way to look at it is that an entity is an object or concept that is described by its features. Examples of entities could be customer, product, transaction, review, image, document, etc.

In [7]:
entity = feature_store.create_entity(
    display_name="Bike rides",
    description="description for bike riders"
)

<a id="create_featuregroup"></a>
### 3.3 Create feature group
A feature group is the code that contains instructions on the ingestion of raw data and computation of the feature. This [`Citi Bike`](https://ride.citibikenyc.com/data-sharing-policy) dataset license is used in this notebook. values. 

In [8]:
bike_df = pd.read_csv("~/Downloads/201901-citibike-tripdata.csv")

In [9]:
bike_df = bike_df.drop(['start station name', 'end station name'], axis=1).head(100)
bike_df.columns = bike_df.columns.str.replace(' ', '')

In [10]:
bike_df.head()

Unnamed: 0,tripduration,starttime,stoptime,startstationid,startstationlatitude,startstationlongitude,endstationid,endstationlatitude,endstationlongitude,bikeid,usertype,birthyear,gender
0,320,2019-01-01 00:01:47.4010,2019-01-01 00:07:07.5810,3160.0,40.778968,-73.973747,3283.0,40.788221,-73.970416,15839,Subscriber,1971,1
1,316,2019-01-01 00:04:43.7360,2019-01-01 00:10:00.6080,519.0,40.751873,-73.977706,518.0,40.747804,-73.973442,32723,Subscriber,1964,1
2,591,2019-01-01 00:06:03.9970,2019-01-01 00:15:55.4380,3171.0,40.785247,-73.976673,3154.0,40.773142,-73.958562,27451,Subscriber,1987,1
3,2719,2019-01-01 00:07:03.5450,2019-01-01 00:52:22.6500,504.0,40.732219,-73.981656,3709.0,40.738046,-73.99643,21579,Subscriber,1990,1
4,303,2019-01-01 00:07:35.9450,2019-01-01 00:12:39.5020,229.0,40.727434,-73.99379,503.0,40.738274,-73.98752,35379,Subscriber,1979,1


In [11]:
from great_expectations.core import ExpectationSuite, ExpectationConfiguration

expectation_suite = ExpectationSuite(expectation_suite_name="feature_definition")
expectation_suite.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_not_be_null",
        kwargs={"column": "stoptime"}
    )
)

{"expectation_type": "expect_column_values_to_not_be_null", "meta": {}, "kwargs": {"column": "stoptime"}}

In [12]:
feature_group_bike = (
    FeatureGroup()
    .with_feature_store_id(feature_store.id)
    .with_primary_keys(["bikeid"])
    .with_name("bike_feature_group")
    .with_entity_id(entity.id)
    .with_compartment_id(compartment_id)
    .with_schema_details_from_dataframe(bike_df)
    .with_expectation_suite(expectation_suite, ExpectationType.LENIENT)
)

In [13]:
feature_group_bike.create()

kind: FeatureGroup
spec:
  compartmentId: ocid1.tenancy.oc1..aaaaaaaa462hfhplpx652b32ix62xrdijppq2c7okwcqjlgrbknhgtj2kofa
  entityId: 1C29D0DF65E456211B7351D85F271E03
  expectationDetails:
    createRuleDetails:
    - arguments:
        column: stoptime
      levelType: ERROR
      name: Rule-0
      ruleType: expect_column_values_to_not_be_null
    expectationType: LENIENT
    name: feature_definition
    validationEngineType: GREAT_EXPECTATIONS
  featureStoreId: AB5F8E0C4BD86255C3828039D8C51853
  id: 60E6662F04168EEFE781D7ACE576F339
  inputFeatureDetails:
  - featureType: INTEGER
    name: tripduration
    orderNumber: 1
  - featureType: STRING
    name: starttime
    orderNumber: 2
  - featureType: STRING
    name: stoptime
    orderNumber: 3
  - featureType: FLOAT
    name: startstationid
    orderNumber: 4
  - featureType: FLOAT
    name: startstationlatitude
    orderNumber: 5
  - featureType: FLOAT
    name: startstationlongitude
    orderNumber: 6
  - featureType: FLOAT
    nam

In [14]:
os.environ["DEVELOPER_MODE"] = "True"

In [15]:
feature_group_bike.materialise(bike_df)

:: loading settings :: url = jar:file:/Users/kshitizlohia/IdeaProjects/oracle/feature-store/advanced-ds/venv/lib/python3.10/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /Users/kshitizlohia/.ivy2/cache
The jars for the packages stored in: /Users/kshitizlohia/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e96bd2ce-ad22-46d2-bd46-aa51029113aa;1.0
	confs: [default]
	found io.delta#delta-core_2.12;2.3.0 in central
	found io.delta#delta-storage;2.3.0 in central
	found org.antlr#antlr4-runtime;4.8 in local-m2-cache
:: resolution report :: resolve 137ms :: artifacts dl 25ms
	:: modules in use:
	io.delta#delta-core_2.12;2.3.0 from central in [default]
	io.delta#delta-storage;2.3.0 from central in [default]
	org.antlr#antlr4-runtime;4.8 from local-m2-cache in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      defa

23/05/16 18:29:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


  if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):

  for column, series in pdf.iteritems():

  for column, series in pdf.iteritems():

INFO:great_expectations.validator.validator:	1 expectation(s) included in expectation_suite.


Calculating Metrics:   0%|          | 0/6 [00:00<?, ?it/s]

[Stage 0:>                                                          (0 + 8) / 8]

23/05/16 18:30:05 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers
23/05/16 18:30:05 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 84.47% for 8 writers
23/05/16 18:30:07 WARN MemoryManager: Total allocation exceeds 95.00% (906,992,014 bytes) of heap memory
Scaling row group sizes to 96.54% for 7 writers


                                                                                

23/05/16 18:30:11 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


                                                                                

23/05/16 18:30:15 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `1c29d0df65e456211b7351d85f271e03`.`bike_feature_group` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.


                                                                                

In [16]:
feature_group_bike.get_statistics().to_pandas()

Unnamed: 0,endstationlongitude,tripduration,bikeid,startstationlongitude,endstationid,usertype,starttime,startstationid,endstationlatitude,startstationlatitude,birthyear,stoptime,gender
completeness,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
approximateNumDistinctValues,83,92,94,83,93,2,104,85,89,86,36,101,3
dataType,Fractional,Integral,Integral,Fractional,Fractional,String,String,Fractional,Fractional,Fractional,Integral,String,Integral
sum,-7398.150004,76840.0,2914421.0,-7398.157728,155797.0,,,186276.0,4074.01599,4074.092498,198127.0,,118.0
min,-74.016584,97.0,14656.0,-74.012723,127.0,,,79.0,40.668603,40.668127,1949.0,,0.0
max,-73.941995,3494.0,35789.0,-73.942237,3709.0,,,3675.0,40.810792,40.804213,1999.0,,2.0
mean,-73.9815,768.4,29144.21,-73.981577,1557.97,,,1862.76,40.74016,40.740925,1981.27,,1.18
stddev,0.018151,686.187846,6319.234326,0.017465,1428.093551,,,1438.05532,0.031828,0.03259,11.713117,,0.497594


In [26]:
feature_group_bike.get_validation_output_df().T

Unnamed: 0,0
success,True
results,"[{'expectation_config': {'expectation_type': 'expect_column_values_to_not_be_null', 'meta': {}, 'kwargs': {'column': 'stoptime', 'batch_id': 'feca776acdd0aa61ae53da7b674430a1'}}, 'exception_info': {'raised_exception': False, 'exception_traceback': None, 'exception_message': None}, 'result': {'element_count': 100, 'unexpected_count': 0, 'unexpected_percent': 0.0, 'partial_unexpected_list': []}, 'success': True, 'meta': {}}]"
statistics.evaluated_expectations,1
statistics.successful_expectations,1
statistics.unsuccessful_expectations,0
statistics.success_percent,100.0
meta.great_expectations_version,0.16.10
meta.expectation_suite_name,bike_feature_group
meta.run_id.run_time,2023-05-16T18:29:58.670292+05:30
meta.run_id.run_name,


<a id="query_featuregroup"></a>
### 3.4 Query feature group
Feature store provides a DataFrame API to ingest data into the Feature Store. You can also retrieve feature data in a DataFrame, that can either be used directly to train models or materialized to file(s) for later use to train models

In [17]:
query = feature_group_bike.select() 
query.show()

+------------+--------------------+--------------------+--------------+--------------------+---------------------+------------+------------------+-------------------+------+----------+---------+------+
|tripduration|           starttime|            stoptime|startstationid|startstationlatitude|startstationlongitude|endstationid|endstationlatitude|endstationlongitude|bikeid|  usertype|birthyear|gender|
+------------+--------------------+--------------------+--------------+--------------------+---------------------+------------+------------------+-------------------+------+----------+---------+------+
|         976|2019-01-01 00:15:...|2019-01-01 00:31:...|        3452.0|   40.71915571696044|   -73.94885390996933|       251.0|       40.72317958|       -73.99480012| 35685|Subscriber|     1994|     1|
|          97|2019-01-01 00:15:...|2019-01-01 00:17:...|        3430.0|   40.71907891179564|   -73.94223690032959|      3095.0|       40.71929301|       -73.94500379| 34307|Subscriber|     198

<a id="create_dataset"></a>
### 3.5 Create dataset
A dataset is a collection of feature snapshots that are joined together to either train a model or perform model inference.

In [18]:
query.to_string()

'SELECT fg_0.tripduration tripduration, fg_0.starttime starttime, fg_0.stoptime stoptime, fg_0.startstationid startstationid, fg_0.startstationlatitude startstationlatitude, fg_0.startstationlongitude startstationlongitude, fg_0.endstationid endstationid, fg_0.endstationlatitude endstationlatitude, fg_0.endstationlongitude endstationlongitude, fg_0.bikeid bikeid, fg_0.usertype usertype, fg_0.birthyear birthyear, fg_0.gender gender FROM `1C29D0DF65E456211B7351D85F271E03`.bike_feature_group fg_0'

In [19]:
dataset_resource = (
    Dataset()
    .with_description("Dataset consisting of a subset of features in feature group: bike riders")
    .with_compartment_id(compartment_id)
    .with_name("bike_riders_dataset")
    .with_entity_id(entity.id)
    .with_feature_store_id(feature_store.id)
    .with_query(query.to_string())
)

In [20]:
dataset = dataset_resource.create()

In [21]:
dataset.materialise()

                                                                                

23/05/16 18:31:37 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table `1c29d0df65e456211b7351d85f271e03`.`bike_riders_dataset` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.


                                                                                

In [2]:
dataset.get_statistics().to_pandas()

NameError: name 'dataset' is not defined

<a id="featurestore_yaml"></a>
# 4. Feature store quick start using YAML
In an ADS feature store module, you can either use the Python programmatic interface or YAML to define feature store entities. Below section describes how to create feature store entities using YAML as an interface.

In [23]:
feature_store_yaml = """
apiVersion: v1
kind: featureStore
spec:
  displayName: Bike feature store
  compartmentId: "ocid1.tenancy.oc1..aaaaaaaa462hfhplpx652b32ix62xrdijppq2c7okwcqjlgrbknhgtj2kofa"
  offlineConfig:
    metastoreId: "ocid1.datacatalogmetastore.oc1.iad.amaaaaaabiudgxyap7tizm4gscwz7amu7dixz7ml3mtesqzzwwg3urvvdgua"

  entity: &bike_entity
    - kind: entity
      spec:
        name: Bike rides

  featureGroup:
    - kind: featureGroup
      spec:
        entity: *bike_entity
        name: bike_feature_group
        primaryKeys:
          - bikeid
        inputFeatureDetails:
          - name: "bikeid"
            featureType: "INTEGER"
            orderNumber: 1
            cast: "STRING"
          - name: "endstationlongitude"
            featureType: "FLOAT"
            orderNumber: 2
            cast: "STRING"
          - name: "tripduration"
            featureType: "INTEGER"
            orderNumber: 3
            cast: "STRING"

  dataset:
    - kind: dataset
      spec:
        name: bike_dataset
        entity: *bike_entity
        description: "Dataset for bike"
        query: 'SELECT bike.bikeid, bike.endstationlongitude FROM bike_feature_group bike'
"""

In [24]:
registrar = FeatureStoreRegistrar.from_yaml(yaml_string=feature_store_yaml)
registrar.create()

loop1:   0%|          | 0/4 [00:00<?, ?it/s]

Successfully created 1 entities, 0 transformations, 1 feature groups and 1 datasets


(kind: featurestore
 spec:
   compartmentId: ocid1.tenancy.oc1..aaaaaaaa462hfhplpx652b32ix62xrdijppq2c7okwcqjlgrbknhgtj2kofa
   dataset:
   - kind: dataset
     spec:
       description: Dataset for bike
       entity: &id001
       - kind: entity
         spec:
           name: Bike rides
       name: bike_dataset
       query: SELECT bike.bikeid, bike.endstationlongitude FROM bike_feature_group
         bike
   displayName: Bike feature store
   entity: *id001
   featureGroup:
   - kind: featureGroup
     spec:
       entity: *id001
       inputFeatureDetails:
       - cast: STRING
         featureType: INTEGER
         name: bikeid
         orderNumber: 1
       - cast: STRING
         featureType: FLOAT
         name: endstationlongitude
         orderNumber: 2
       - cast: STRING
         featureType: INTEGER
         name: tripduration
         orderNumber: 3
       name: bike_feature_group
       primaryKeys:
       - bikeid
   id: A66AAEF30860DEDFC0635EF806CCBD9E
   offlineCo

<a id='ref'></a>
# References

- [ADS Library Documentation](https://accelerated-data-science.readthedocs.io/en/latest/index.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)