# Test cases requiring or benefiting from the context of a notebook

If the notebook runs successfully from start to finish, the test is successful!

TODO(all): Add additional tests and/or tests with particular assertions, as we encounter Python package version incompatibilities not currently detected by these tests.

In general, only add test cases here that require the context of a notebook. This is because this notebook, as currently written, will abort at the **first** failure. Compare this to a proper test suite where all cases are run, giving much more information about the full extent of any problems encountered.

# Package versions

In [1]:
!pip3 freeze

appnope==0.1.3
asttokens==2.2.1
backcall==0.2.0
comm==0.1.3
debugpy==1.6.7
decorator==5.1.1
executing==1.2.0
ipykernel==6.23.3
ipython==8.14.0
jedi==0.18.2
jupyter_client==8.3.0
jupyter_core==5.3.1
matplotlib-inline==0.1.6
nest-asyncio==1.5.6
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.8.0
prompt-toolkit==3.0.38
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
Pygments==2.15.1
python-dateutil==2.8.2
pyzmq==25.1.0
six==1.16.0
stack-data==0.6.2
tornado==6.3.2
traitlets==5.9.0
wcwidth==0.2.6


# Test cases requiring the context of a notebook 

## Test package installations

NOTE: installing packages via `%pip` installs them into the running kernel - no kernel restart needed.

In [2]:
import sys

In [None]:
sys.path

In [3]:
!env | grep PIP

### Install a package we do not anticipate already being installed on the base image

In [4]:
output = !pip3 show pendulum
print(output)  # Should show not yet installed.



In [5]:
assert(0 == output.count('Name: pendulum'))

In [6]:
%pip install pendulum==2.1.2

Collecting pendulum==2.1.2
  Downloading pendulum-2.1.2.tar.gz (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.2/81.2 kB[0m [31m876.6 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting pytzdata>=2020.1
  Downloading pytzdata-2020.1-py2.py3-none-any.whl (489 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m490.0/490.0 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Building wheels for collected packages: pendulum
  Building wheel for pendulum (pyproject.toml) ... [?25ldone
[?25h  Created wheel for pendulum: filename=pendulum-2.1.2-cp310-cp310-macosx_13_0_arm64.whl size=127246 sha256=5e3f3ca2a718ddf8294125cc1aebab5f8a4ec4868a7f12672ee8bd6f14a8ced7
  Stored in directory: /Users/mcnatt/Library/Caches/pip/wheels/64/1e/bd/79a9fc49d45de83b4f5461dd3

In [7]:
output = !pip3 show pendulum
print(output)  # Should show that it is now installed!

['Name: pendulum', 'Version: 2.1.2', 'Summary: Python datetimes made easy', 'Home-page: https://pendulum.eustace.io', 'Author: Sébastien Eustace', 'Author-email: sebastien@eustace.io', 'License: MIT', 'Location: /Users/mcnatt/.pyenv/versions/3.10.9/lib/python3.10/site-packages', 'Requires: python-dateutil, pytzdata', 'Required-by: ']


In [8]:
assert(1 == output.count('Name: pendulum'))

### Install a package **from source** that we do not anticipate already being installed on the base image

In [None]:
# python setup.py install
output = !pip3 show thefuzz
print(output)  # Should show not yet installed.

In [None]:
assert(0 == output.count('Name: thefuzz'))

In [None]:
%pip install thefuzz

In [None]:
output = !pip3 show thefuzz
print(output)  # Should show that it is now installed!

In [None]:
assert(1 == output.count('Name: thefuzz'))

## Test ipython widgets

In [None]:
import ipywidgets as widgets

widgets.IntSlider()

In [None]:
## Test python images come with base google image

In [None]:
from markdown import *
markdown

import readline
readline.parse_and_bind('tab: complete')

# Teste scipy
from scipy import misc
import matplotlib.pyplot as plt

face = misc.face()
plt.imshow(face)
plt.show()

## Test BigQuery magic

* As of release [google-cloud-bigquery 1.26.0 (2020-07-20)](https://github.com/googleapis/python-bigquery/blob/master/CHANGELOG.md#1260-2020-07-20) the BigQuery Python client uses the BigQuery Storage client by default.
* This currently causes an error on Terra Cloud Runtimes `the user does not have 'bigquery.readsessions.create' permission for '<Terra billing project id>'`.
* To work around this, we do two things:
  1. remove the dependency `google-cloud-bigquery-storage` from the `terra-jupyter-python` image
  1. use flag `--use_rest_api` with `%%bigquery`

In [None]:
%load_ext google.cloud.bigquery

In [None]:
%%bigquery --use_rest_api

SELECT country_name, alpha_2_code
FROM `bigquery-public-data.utility_us.country_code_iso`
WHERE alpha_2_code LIKE 'A%'
LIMIT 5

## Test pandas profiling

In [None]:
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(
    np.random.rand(100, 5),
    columns=['a', 'b', 'c', 'd', 'e']
)

profile = ProfileReport(df, title='Pandas Profiling Report')
profile

# Test cases benefiting from the context of a notebook

Strictly speaking, these could be moved into the Python test cases, if desired.

## Test matplotlib

In [None]:
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
x = np.random.randn(10000)  # example data, random normal distribution
num_bins = 50
n, bins, patches = plt.hist(x, num_bins, facecolor="green", alpha=0.5)
plt.xlabel(r"Description of $x$ coordinate (units)")
plt.ylabel(r"Description of $y$ coordinate (units)")
plt.title(r"Histogram title here (remove for papers)")
plt.show();

## Test plotnine

In [None]:
from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap
from plotnine.data import mtcars

(ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
 + geom_point()
 + stat_smooth(method='lm')
 + facet_wrap('~gear'))

## Test ggplot

In [None]:
from ggplot import *
ggplot

## Test source control tool availability

In [None]:
%%bash

which git
which ssh-agent
which ssh-add

## Test gcloud tools

In [None]:
%%bash

gcloud version 

In [None]:
%%bash

gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS

In [None]:
%%bash

gsutil ls gs://gcp-public-data--gnomad

In [None]:
%%bash

bq --project_id bigquery-public-data ls gnomAD

## Test Google Libraries

In [None]:
from google.cloud import datastore
datastore_client = datastore.Client()

In [None]:
from google.api_core import operations_v1

In [None]:
from google.cloud import storage

In [None]:
%%bash

# test composite object, requires python crcmod to be installed
gsutil cp gs://terra-docker-image-documentation/test-composite.cram . 

In [None]:
from google.cloud import bigquery

## Test TensorFlow
### See https://www.tensorflow.org/tutorials/quickstart/beginner

The oneAPI Deep Neural Network Library (oneDNN) optimizations are also now available in the official x86-64 TensorFlow after v2.5. Users can enable those CPU optimizations by setting the the environment variable TF_ENABLE_ONEDNN_OPTS=1 for the official x86-64 TensorFlow after v2.5.

We enable oneDNN Verbose log to validate the existenance of oneDNN optimization via DNNL_VERBSOE environemnt variable, and also set CUDA_VISIBLE_DEVCIES to -1 to run the workload on CPU.

In [None]:
import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1'
os.environ['DNNL_VERBOSE'] = '1'
os.environ['CUDA_VISIBLE_DEVICES']="-1"

Set up TensorFlow

In [None]:
import tensorflow as tf
import keras

print("TensorFlow version:", tf.__version__)
print("Keras version:", keras.__version__)
print("TensorFlow executing_eagerly:", tf.executing_eagerly())

Load a dataset

In [None]:
# Load a dataset
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

Build a machine learning model

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

In [None]:
predictions = model(x_train[:1]).numpy()
predictions

Define a loss function for training 

In [None]:
tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

Train and evaluate your model

In [None]:
model.fit(x_train, y_train, epochs=5)

In [None]:
model.evaluate(x_test,  y_test, verbose=2)

In [None]:
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

probability_model(x_test[:5])

### Validate usage of oneDNN optimization 
>Please redirect standard outputs and errors to stdout.txt and stderr.txt files by starting jupyter notebook with below command.
```
jupyter notebook --ip=0.0.0.0 > stdout.txt 2>stderr.txt
```
First, we could check whether we have dnnl verose log or not while we test TensorFlow in the previous section.

```
!cat /tmp/stdout.txt | grep dnnl
```

Second, we could further analyze what oneDNN primitives are used while we run the workload by using a profile_utils.py script.

```
!wget https://raw.githubusercontent.com/oneapi-src/oneAPI-samples/master/Libraries/oneDNN/tutorials/profiling/profile_utils.py
```

```
import warnings
warnings.filterwarnings('ignore')
```

Finally, users should be able to see that inner_product oneDNN primitive is used for the workload.

```
run profile_utils.py /tmp/stdout.txt
```

### Validate Intel® Extension for Scikit-Learn Optimization

Let's test that [Intel® Extension for Scikit-Learn](https://www.intel.com/content/www/us/en/developer/articles/guide/intel-extension-for-scikit-learn-getting-started.html) is installed properly by successfully running the following cell. If it is on, a warning should print saying that Intel® Extension for Scikit-Learn has been enabled.

In [None]:
from sklearnex import patch_sklearn, unpatch_sklearn
patch_sklearn()
from sklearn import datasets, svm, metrics, preprocessing
from sklearn.model_selection import train_test_split
#should print warning

Now let's just run some regular scikit-learn code with the optimization enabled to ensure everything is working properly with Intel® Extension for Scikit-Learn enabled.

In [None]:
digits = datasets.load_digits()
X,Y = digits.data, digits.target

# Split dataset into 80% train images and 20% test images
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, shuffle=True)
# normalize the input values by scaling each feature by its maximum absolute value
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)

In [None]:
# Create a classifier: a support vector classifier
model = svm.SVC(gamma=0.001, C=100)
# Learn the digits on the train subset
model.fit(X_train, Y_train)
# Now predicting the digit for test images using the trained model

In [None]:
Y_pred = model.predict(X_test)

result = model.score(X_test, Y_test)

print(f"Model accuracy on test data: {result}")

Then turn off the Intel Extension for Scikit-Learn optimizations through the unpatch method.

In [None]:
unpatch_sklearn() # then unpatch optimizations

### Validate Intel XGBoost Optimizations

Starting with [XGBoost](https://xgboost.readthedocs.io/en/latest/index.html) 0.81 version onward, Intel has been directly upstreaming many training optimizations to provide superior performance on Intel® CPUs. 

Starting with XGBoost 1.3 version onward, Intel has been upstreaming inference optimizations to provide even more performance on Intel® CPUs. 

This well-known, machine-learning package for gradient-boosted decision trees now includes seamless, drop-in acceleration for Intel® architectures to significantly speed up model training and improve accuracy for better predictions. 

We will use the following cell to validate the XGBoost version to determine if these optimizations are enabled.

In [None]:
import xgboost as xgb
import os

major_version = int(xgb.__version__.split(".")[0])
minor_version = int(xgb.__version__.split(".")[1])

print("XGBoost version installed: ", xgb.__version__)
if major_version >= 0:
    if major_version == 0:
        if minor_version >= 81:
                print("Intel optimizations for XGBoost training enabled in hist method!")
    if major_version >= 1:
        print("Intel optimizations for XGBoost training enabled in hist method!")
        if minor_version >= 3:
            print("Intel optimizations for XGBoost inference enabled!")
    else:
        print("Intel XGBoost optimizations are disabled! Please install or update XGBoost version to 0.81+ to enable Intel's optimizations that have been upstreamed to the main XGBoost project.")