### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [2]:
import ads
ads.set_auth("resource_principal")

In [3]:
%load_ext dataflow.magics

In [3]:
import json
command = {
    "compartmentId": "ocid1.compartment.oc1..aaaaaaaaiqj3trilyjqom6ozr5xb4fc32bqxmogjnwpomdxn7qbv6xtrnyza",
    "displayName": "Data Flow Session with Pools",
    "sparkVersion": "3.2.1",
    "driverShape": "VM.Standard2.1",
    "executorShape": "VM.Standard2.1",
    "numExecutors": 1,
    "poolId": "ocid1.dataflowpool.oc1.iad.anuwcljtnif7xwiag5ycfkh3a5gihaburwq2py42gqc2vbpiepapjccfrfaa",
    "type": "SESSION",
    "logsBucketUri": "oci://log-bucket@bigdatadatasciencelarge/"
}
command = f'\'{json.dumps(command)}\''
 
%create_session -l python -c $command

Setting up the Cluster..


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Cluster is ready..
Starting Spark application..


Session ID,Kind,State,Current session
ocid1.dataflowapplication.oc1.iad.anuwcljsnif7xwiagaorfrbk232jipugvg43rc4nsz7fcvzzp3ivpwdjp42a,pyspark,IN_PROGRESS,Dataflow Run


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.
SparkContext available as 'sc'.


In [5]:
%%spark
read_df = spark.read.csv("oci://test-data-bucket@bigdatadatasciencelarge/addresses.csv")
read_df.show()
read_df.write.csv("oci://test-data-bucket@bigdatadatasciencelarge/addresses4.out.csv")

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+-----------+--------------+
|        _c0|           _c1|
+-----------+--------------+
|       Name|       Address|
| Jeff Lewis|789 Avenue USA|
|Frank Roger| 012 Court USA|
+-----------+--------------+

In [None]:
%stop_session