# Lab: Modeling with Dask and Ray

To keep things simple, while still giving you a chance to try something hands on, we'll look at 

* Linear modeling with Dask and a different dataset
* Ray RL example using a more powerful algorithm (PPO) than we did earlier

## Dask and Powerplant Output

We'll use the UC Irvine ML repository's Combined Cycle Power Plant Data Set (https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant)

This dataset consists of about 10,000 records of measurements relating to peaker power plants.

* Temperature (AT) in the range 1.81°C and 37.11°C,
* Ambient Pressure (AP) in the range 992.89-1033.30 millibar,
* Relative Humidity (RH) in the range 25.56% to 100.16%
* Exhaust Vacuum (V) in the range 25.36-81.56 cm Hg
* Net hourly electrical energy output (PE) 420.26-495.76 MW

We want to model the power output as a function of the other parameters.

In [None]:
import dask.dataframe as ddf

df = ddf.read_csv('data/powerplant.csv', sample=False)
df

In [None]:
df.head()

In [None]:
y = df.PE
X = df.drop(columns=['PE'])

X

In [None]:
X = X.to_dask_array(lengths=True)
y = y.to_dask_array(lengths=True)

X

In [None]:
from dask_ml.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

X_train

In [None]:
from dask_ml.linear_model import LinearRegression

lr = LinearRegression(solver='lbfgs', max_iter=10)
lr_model = lr.fit(X_train, y_train)

In [None]:
y_predicted = lr_model.predict(X_test)

In [None]:
from dask_ml.metrics import mean_squared_error
from math import sqrt

sqrt(mean_squared_error(y_test, y_predicted))

## Ray RLlib and PPO

PPO of Proximal Policy Optimization is a more powerful (and more complicated) algorithm than the DQN we've looked at.

But thanks to Ray's implementations, you can swap it in easily.

Note that we import `ppo` from `ray.rllib.agents`

By replacing "DQN" with "PPO" you can quickly get better results.

>
> Interested in PPO details? Check out this writeup: https://jonathan-hui.medium.com/rl-proximal-policy-optimization-ppo-explained-77f014ec3f12
>

In [None]:
import ray
import ray.rllib.agents.ppo as ppo

ray.shutdown()
ray.init()

In [None]:
# Specifies the OpenAI Gym environment for CartPole, V1.
SELECT_ENV = "CartPole-v1"

# Number of training runs.
N_ITER = 10

# default configuration.
config = ppo.DEFAULT_CONFIG.copy()

# Suppress too many messages.
config["log_level"] = "WARN"

# Use > 1 for more CPU cores, e.g., over a cluster.
config['num_workers'] = 2

# Describe network
config['model']['fcnet_hiddens'] = [40,20]

# Don't pin a CPU core to each worker (allows more workers).
config['num_cpus_per_worker'] = 0
checkpoint_dir = 'checkpoints'

In [None]:
trainer = ppo.PPOTrainer(config, SELECT_ENV)

In [None]:
fmt = '{:3d},{:8.4f},{:8.4f},{:8.4f}'
last_checkpoint = ''
for n in range(N_ITER):
    result = trainer.train()
    min  = result['episode_reward_min']
    mean = result['episode_reward_mean']
    max  = result['episode_reward_max']
    last_checkpoint = trainer.save(checkpoint_dir)
    print(fmt.format(n, min, mean, max))
print(f'last checkpoint file: {last_checkpoint}')

In [None]:
ray.shutdown()