-
Notifications
You must be signed in to change notification settings - Fork 2
/
concl.tex
13 lines (12 loc) · 2.86 KB
/
concl.tex
1
2
3
4
5
6
7
8
9
10
11
12
%!TEX root = autocontgrlp.tex
\section{Conclusion}
In this paper, we introduced and analyzed the linearly relaxed approximate linear program (LRALP) whose constraints were obtained as positive linear combination of the original constraints of the ALP.
The main novel contribution is a theoretical result which gives a geometrically interpretable bound on the performance loss due to relaxing the constraint sets. Possibilities for future work include extending the results to other forms of approximate linear programming in MDPs (e.g., \citep{SALP}), exploring the idea of approximating dual variables and
designing algorithms that use the newly derived results to actively compute what constraints to select. \todoc{Other large scale LP}
%By providing performance analysis for the GRLP, this paper justified linear function approximation of the dual variables of the ALP.
\begin{comment}
The approximate linear program (ALP) is a widely employed method to handle MDPs with large number of states. However, an important shortcoming of the ALP is that it has large number of constraints, which is tackled in practice by sampling a tractable number of constraints from the ALP to formulate and solve a reduced linear program (RLP). The RLP has been found to work well empirically in various domains ranging from queues to Tetris games, and performance guarantees are for a specific RLP formulated under idealized assumptions. Alternatively, function approximation in the dual variables of the ALP has been another approach that has been employed in literature \cite{ALP-Bor,dolgov} to address the issue of large number of constraints. However there was no prior theoretical characterization for the function approximation of the dual variables.\par
In this paper, we introduced a novel framework based on the generalized reduced linear program formulation. The constraints of the GRLP were obtained as positive linear combinations of the original ALP. We provided an error bound that relates the optimal value function to the solution of the GRLP. Our error bound contained two terms, one inherent to the ALP formulation and the other due to constraint reduction. We also made qualitative and quantitative observations about the nature of the error term that arose due to constraint reduction.
% Our analysis also revealed the fact that it is not always necessary to sample according to the stationary distribution of the optimal policy and, in fact, potentially several different constraint sampling/approximation strategies might work.
In particular, we also theoretically justified linear function approximation of the constraints. We also discussed the results and provided a numerical example in the domain of controlled queues. To conclude, we observe that by providing a novel theoretical framework to study constraint approximation, this paper provides important results that add to the theory of ALP.
\end{comment}