Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Automatically exported from code.google.com/p/gpt-rewards
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Distribution of RTDP-BEL for reward-based POMDPs based on the transformation to SSP described in Solving POMDPs: RTDP-Bel bs. Point-based Algorithms. Blai Bonet and Hector Geffner. Proc. of 21st Int. Joint Conf. on Artificial Intelligence (IJCAI). Pasadena, California. 2009. AAAI Press. Pages 1641-1646. Available at http://ldc.usb.ve/~bonet/reports/IJCAI09-rtdpbel.pdf The RTDP-BEL is implemented inside an extract of the GPT software. GPT is a shell-base system for solving POMDPs. Here, we only provide an example session on its usage. For more information, contact Blai Bonet <email@example.com>. REQUIREMENTS ------------ You will need a proper GNU readline library. The ones that come with Mac OSX are not good. In my mac, I use "brew install readline" to obtain a good library. INSTALLATION ------------ Unpack and perform 'make all'. By default, gpt is installed in the bin directory. (In fact, bin/gpt is a symbolic link to src/rtdp-bel/gpt). EXAMPLE ------- Suppose that you are at the top-level directory and the problem RockSample_7_8.POMDP is in the same directory. $ bin/gpt Welcome to GPT, Version 2.00-cassandra. gpt> set pims [8,2000,500] gpt> solve RockSample_7_8.POMDP - [some output] gpt> quit The set command is used to set the value of different parameters. Among the most important are: 'qlevels' that specify the discretization levels used for quantization (a higher qlevels means a finer discretization, default 15), and 'cutoff' that specify the max length of a trial (default 250). The pims parameter tells gpt how it must solve the POMDP and evaluate the calculated policy. In the example, the pims tell gpt that it should perform 16000 trials of RTDP-BEL, and that each 2000 trials, the current policy should be evaluated with 500 trials. Hence, such specification permits the generation of a quality profile that shows how the policy is improved as more RTDP-BEL trials are performed. The solve command receives two parameters: the path to a POMDP file (in Cassandra's format) and the name of a file to redirect the output. If the latter is '-', then the output is redirected to the standard output. For any questions or comments, please email me <firstname.lastname@example.org>. Blai Bonet