# Find Good Hyperparameter For News Recommendation System With Tune

The goal of this example is to train a very simple news recommendation system, We will:
- Prepare the training data in parallel with Ray
- Train a simple model that classifies article titles as "popular" or "less popular" using scikit learn and
- Find good hyperparameter settings for the model with Tune, Ray's parallel hyperparameter optimization library.

### Downloading And Preparing The Training Data

The data includes the title of each submission and its score, which roughly corresponds to the number of upvotes. There are 4 batches of JSON files that contain the information, named `ls-1.json` through `ls-4.json`.

In [14]:
# Env variables
%env
TUNE_DISABLE_STRICT_METRIC_CHECKING=1
RAY_AIR_REENABLE_DEPRECATED_SYNC_TO_HEAD_NODE=1

In [15]:
# Imports
import ray
from ray.job_submission import JobSubmissionClient
import time

In [16]:
# Ray cluster information for connection
ray_head_ip = "kuberay-head-svc.kuberay.svc.cluster.local"
ray_head_port = 8265
ray_address = f"http://{ray_head_ip}:{ray_head_port}"
client = JobSubmissionClient(ray_address)

In [18]:
# Submit Ray job using JobSubmissionClient
job_id = client.submit_job(
    entrypoint="python ray-hyperparameter-example.py",
    runtime_env={
        "working_dir": "./",
    },
    entrypoint_num_cpus=3
)

print(f"Ray job submitted with job_id: {job_id}")

# Waiting for Ray to finish the job and print the result
while True:
    status = client.get_job_status(job_id)
    if status in [ray.job_submission.JobStatus.RUNNING, ray.job_submission.JobStatus.PENDING]:
        time.sleep(5)
    else:
        break
print(client.get_job_logs(job_id)) 

2024-03-13 19:44:08,978	INFO dashboard_sdk.py:385 -- Package gcs://_ray_pkg_2f8485ce4dfaa259.zip already exists, skipping upload.


Ray job submitted with job_id: raysubmit_mfVsF384YBiJb5W8
Took 0.009297609329223633 seconds to parse the hackernews submissions
Accuracy on the training set is 1.0
Accuracy on the test set is 1.0
2024-03-13 12:44:12,582	INFO worker.py:1405 -- Using address kuberay-head-svc.kuberay.svc.cluster.local:6379 set in the environment variable RAY_ADDRESS
2024-03-13 12:44:12,582	INFO worker.py:1540 -- Connecting to existing Ray cluster at address: kuberay-head-svc.kuberay.svc.cluster.local:6379...
2024-03-13 12:44:12,702	INFO worker.py:1715 -- Connected to Ray cluster. View the dashboard at [1m[32mhttp://10.224.172.120:8265 [39m[22m
2024-03-13 12:44:13,462	INFO tune.py:592 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
╭────────────────────────────────────────────────────────────────

In [12]:
# Disconnect from the Ray cluster
ray.shutdown()