Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create example of using irace purely from Python with rpy2 #32

Closed
4 tasks done
MLopez-Ibanez opened this issue Jul 11, 2022 · 10 comments
Closed
4 tasks done

Create example of using irace purely from Python with rpy2 #32

MLopez-Ibanez opened this issue Jul 11, 2022 · 10 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@MLopez-Ibanez
Copy link
Owner

MLopez-Ibanez commented Jul 11, 2022

  • import irace in Python
  • define target-runner as a python function
  • load parameters and configuration as files or python objects
  • run!

This idea is being developed at: https://github.com/auto-optimization/iracepy (help is welcome)

@MLopez-Ibanez
Copy link
Owner Author

Suggested by Nicolas Potvin

@DE0CH
Copy link
Contributor

DE0CH commented Jul 12, 2022

I wonder if there is a better way than using a R to python bridge. Perhaps a custom python package that communicates with irace through pipes or sockets and wraps them with python methods and objects would work better.

In my experience, launching irace is not too hard because it just involves calling trace with subprocess, but the cumbersome part is actually writing the parameters because file names can clash and disk resources can leak if files are not cleaned up.

Part of implementing the bridge would really benefit from irace communicating with the outside world purely through command line arguments, stdin and stdout without using any file or spawning any process by itself. I think this is useful for other uses cases as well so I've created a separate issue #34 to track it.

What do you think?

@MLopez-Ibanez
Copy link
Owner Author

I wonder if there is a better way than using a R to python bridge. Perhaps a custom python package that communicates with irace through pipes or sockets and wraps them with python methods and objects would work better.

Isn't this what rpy2 does? Yes, it would be great if somebody created a custom Python package that completely wrapped irace in an easy to use python interface. I can create the initial example, but any help in converting that into an actual python package would be welcome!

@DE0CH
Copy link
Contributor

DE0CH commented Jul 13, 2022

Isn't this what rpy2 does?

I am not very familiar with rpy2, but based on some preliminary reading, it seems like you can only use rpy2 to call r function from python, not the other way around. Being able to call python function from irace is needed if the user wants to pass in a python function as the target runner to irace.

Yes, it would be great if somebody created a custom Python package that completely wrapped irace in an easy to use python interface. I can create the initial example, but any help in converting that into an actual python package would be welcome!

I can look into that. It shouldn't be too difficult. We just have to have some code to wrap the objects. Though implanting things like conditions in parameter.txt which currently uses R syntax might take a bit more work. An initial example would be really helpful!

@MLopez-Ibanez
Copy link
Owner Author

Isn't this what rpy2 does?

I am not very familiar with rpy2, but based on some preliminary reading, it seems like you can only use rpy2 to call r function from python, not the other way around. Being able to call python function from irace is needed if the user wants to pass in a python function as the target runner to irace.

I created a small example to show you how it would work. Of course, an object oriented Python interface would hide all these rpy2 details from the user in the actual python package:

import numpy as np
from rpy2.robjects.packages import importr
from rpy2.robjects import r as R
from rpy2.robjects import numpy2ri
numpy2ri.activate()
from rpy2.interactive import process_revents
process_revents.start()
import rpy2.rinterface as ri
from rpy2.robjects.vectors import ListVector

@ri.rternalize
def target_runner(experiment, scenario):
    print(scenario)
    print(experiment)
    return ListVector(dict(cost=0.0))

instances = np.arange(10)
irace = importr("irace")
scenario = irace.defaultScenario(ListVector(dict(
    targetRunner=target_runner,
    instances = instances,
    maxExperiments = 500,
    logFile = "")))

parameters_table = '''
tmax "" i,log (1, 5000)
temp "" r (0, 100)
'''
parameters = irace.readParameters(text = parameters_table)
irace.irace(scenario, parameters)

I need more time to answer your other comments.

@DE0CH
Copy link
Contributor

DE0CH commented Jul 15, 2022

Thanks so much for this. I will look at this in more detail and hopefully write a python interface. I think this would be a much better way of bring irace to python than rewriting the entire thing in python.

Sure please take your time with the other comments.

@MLopez-Ibanez
Copy link
Owner Author

MLopez-Ibanez commented Jul 15, 2022

A complete example of tuning Scipy dual_annealing. Of course, the idea is to hide all rpy2 stuff behind the interface of a package so that the user is not even aware that they are using rpy2.

import numpy as np
import pandas as pd
from collections import OrderedDict

from rpy2.robjects.packages import importr
from rpy2.robjects import r as R
from rpy2.robjects import numpy2ri
numpy2ri.activate()
# from rpy2.interactive import process_revents
# process_revents.start()
import rpy2.rinterface as ri
from rpy2.robjects.vectors import DataFrame, BoolVector, FloatVector, IntVector, StrVector, ListVector, IntArray, Matrix, ListSexpVector,FloatSexpVector,IntSexpVector,StrSexpVector,BoolSexpVector
from rpy2.robjects.functions import SignatureTranslatedFunction

from scipy.optimize import dual_annealing

# This would be written by the user.
def target_runner(experiment, scenario):
    DIM = 10
    func = lambda x: np.sum(x*x - DIM*np.cos(2*np.pi*x)) + DIM * np.size(x)
    lw = [-5.12] * DIM
    up = [5.12] * DIM
    ret = dual_annealing(func, bounds=list(zip(lw, up)), seed=experiment['seed'], maxfun = 1e4,
                         **experiment['configuration'])
    return dict(cost=ret.fun)

parameters_table = '''
initial_temp "" r,log (0.009, 5e4)
restart_temp_ratio "" r (0,1)
visit "" r (1e-5, 1)
accept "" r (-1e3, -5)
'''

## The rest will be inside the irace python package.

# TODO: How to make this faster?
def r_to_python(data):
    """
    step through an R object recursively and convert the types to python types as appropriate. 
    Leaves will be converted to e.g. numpy arrays or lists as appropriate and the whole tree to a dictionary.
    """
    r_dict_types = [DataFrame, ListVector, ListSexpVector]
    r_array_types = [BoolVector, FloatVector, IntVector, Matrix, IntArray, FloatSexpVector,IntSexpVector,BoolSexpVector]
    r_list_types = [StrVector,StrSexpVector]
    if type(data) in r_dict_types:
        return OrderedDict(zip(data.names, [r_to_python(elt) for elt in data]))
    elif type(data) in r_list_types:
        if hasattr(data, "__len__") and len(data) == 1:
            return r_to_python(data[0])
        return [r_to_python(elt) for elt in data]
    elif type(data) in r_array_types:
        if hasattr(data, "__len__") and len(data) == 1:
            return data[0]
        return np.array(data)
    elif isinstance(data, SignatureTranslatedFunction) or isinstance(data, ri.SexpClosure):
        return data  # TODO: get the actual Python function
    elif data == ri.NULL:
        return None
    elif hasattr(data, "rclass"):  # An unsupported r class
        raise KeyError(f'Could not proceed, type {type(data)} is not defined to add support for this type,'
                       ' just add it to the imports and to the appropriate type list above')
    else:
        return data  # We reached the end of recursion


def make_target_runner(py_target_runner):
    @ri.rternalize
    def tmp_r_target_runner(experiment, scenario):
        py_experiment = r_to_python(experiment)
        py_scenario = r_to_python(scenario)
        ret = py_target_runner(py_experiment, py_scenario)
        return ListVector(ret)
    return tmp_r_target_runner

# IMPORTANT: We need to save this in a variable or it will be garbage collected
# by Python and crash later.
r_target_runner = make_target_runner(target_runner)

instances = np.arange(100)

irace = importr("irace")
scenario = ListVector(dict(
    targetRunner=r_target_runner,
    instances = instances,
    maxExperiments = 500,
    debugLevel = 3,
    logFile = ""))


parameters = irace.readParameters(text = parameters_table)
irace.irace(scenario, parameters)

@MLopez-Ibanez
Copy link
Owner Author

An an object oriented interface like the one below will hide all rpy2 details from the user:

## TODO: This will go inside an irace python package
import numpy as np
from collections import OrderedDict
from rpy2.robjects.packages import importr
from rpy2.robjects import r as R
from rpy2.robjects import numpy2ri
numpy2ri.activate()
# from rpy2.interactive import process_revents
# process_revents.start()
import rpy2.rinterface as ri
from rpy2.robjects.vectors import DataFrame, BoolVector, FloatVector, IntVector, StrVector, ListVector, IntArray, Matrix, ListSexpVector,FloatSexpVector,IntSexpVector,StrSexpVector,BoolSexpVector
from rpy2.robjects.functions import SignatureTranslatedFunction

# TODO: How to make this faster?
def r_to_python(data):
    """
    step through an R object recursively and convert the types to python types as appropriate. 
    Leaves will be converted to e.g. numpy arrays or lists as appropriate and the whole tree to a dictionary.
    """
    r_dict_types = [DataFrame, ListVector, ListSexpVector]
    r_array_types = [BoolVector, FloatVector, IntVector, Matrix, IntArray, FloatSexpVector,IntSexpVector,BoolSexpVector]
    r_list_types = [StrVector,StrSexpVector]
    if type(data) in r_dict_types:
        return OrderedDict(zip(data.names, [r_to_python(elt) for elt in data]))
    elif type(data) in r_list_types:
        if hasattr(data, "__len__") and len(data) == 1:
            return r_to_python(data[0])
        return [r_to_python(elt) for elt in data]
    elif type(data) in r_array_types:
        if hasattr(data, "__len__") and len(data) == 1:
            return data[0]
        return np.array(data)
    elif isinstance(data, SignatureTranslatedFunction) or isinstance(data, ri.SexpClosure):
        return data  # TODO: get the actual Python function
    elif data == ri.NULL:
        return None
    elif hasattr(data, "rclass"):  # An unsupported r class
        raise KeyError(f'Could not proceed, type {type(data)} is not defined! To add support for this type,'
                       ' just add it to the imports and to the appropriate type list above')
    else:
        return data  # We reached the end of recursion

def make_target_runner(py_target_runner):
    @ri.rternalize
    def tmp_r_target_runner(experiment, scenario):
        py_experiment = r_to_python(experiment)
        py_scenario = r_to_python(scenario)
        ret = py_target_runner(py_experiment, py_scenario)
        # TODO: return also error codes and call for debugging.
        return ListVector(ret)
    return tmp_r_target_runner

class Irace:
    irace = importr("irace")
    
    def __init__(self, scenario, parameters_table, target_runner):
        self.parameters = self.irace.readParameters(text = parameters_table)
        # IMPORTANT: We need to save this in a variable or it will be garbage collected
        # by Python and crash later.
        self.r_target_runner = make_target_runner(target_runner)
        self.scenario = scenario
        
    def run(self):
        self.scenario['targetRunner'] = self.r_target_runner
        return self.irace.irace(ListVector(self.scenario), self.parameters)


## END of package content
import numpy as np
from scipy.optimize import dual_annealing

# This would be written by the user.
def target_runner(experiment, scenario):
    DIM = 10
    func = lambda x: np.sum(x*x - DIM*np.cos(2*np.pi*x)) + DIM * np.size(x)
    lw = [-5.12] * DIM
    up = [5.12] * DIM
    ret = dual_annealing(func, bounds=list(zip(lw, up)), seed=experiment['seed'], maxfun = 1e4,
                         **experiment['configuration'])
    return dict(cost=ret.fun)

parameters_table = '''
initial_temp "" r,log (0.009, 5e4)
restart_temp_ratio "" r (0,1)
visit "" r (1e-5, 1)
accept "" r (-1e3, -5)
'''

instances = np.arange(100)

scenario = dict(
    instances = instances,
    maxExperiments = 500,
    debugLevel = 3,
    logFile = "")

irace = Irace(scenario, parameters_table, target_runner)
irace.run()

@MLopez-Ibanez MLopez-Ibanez added help wanted Extra attention is needed good first issue Good for newcomers labels Jul 15, 2022
@MLopez-Ibanez
Copy link
Owner Author

Development of this idea should move to https://github.com/auto-optimization/iracepy

@MLopez-Ibanez MLopez-Ibanez pinned this issue Jul 15, 2022
@MLopez-Ibanez
Copy link
Owner Author

All what was requested here is implement in https://github.com/auto-optimization/iracepy or it is tracked in other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants