### Christopher Miller 
### EECS495: Optimization Techniques for Machine Learning and Deep Learning
### Final Project
#### Video: https://youtu.be/wucY6iTZm2w 

#### Course Impact on Research

Presently, I’m trying to develop a method for autonomously switching into or out of different levels of autonomy based on changes in the human’s control signals (e.g. joystick controls), measures of safety (e.g. distance to nearby objects), or the difference between the human-generated and autonomy-generated control signals (e.g. dot product, Fréchet distance, etc.) A level of autonomy defines the extent at which a robotic agent will assist or intervene in a human-robot team. At a lower level, a robot may only assist their human partner in preventing self-harm, such as braking near a cliff’s ledge. At higher levels of autonomy, a robotic partner may realize advanced assistance ranging from obstacle avoidance to high-level command arbitration (i.e. autonomously traveling to set waypoints). Presently, it’s not yet known which individual or combined set of the aforementioned measures best indicates the need for level-shifting. Moreover, it is also not yet known the best way to interpret these measures. Complicating signal analysis are the few invariants; the measures are unique to each person, level-of-autonomy, and measure. Luckily, the measures can be easily normalized; that is: m(t)∈[0,1]. 

My first attempt at level switching was inspired by standard normalization. It’s hypothesized that all of the measures will have a common trait: when a person is successfully controlling their intelligent agent (i.e. no need to shift up or down), their control signals should be consistently Gaussian. In other words, it’s possible to find the mean and standard deviation of a person’s behavior for a given measure and level of autonomy. Given this information, finding a trend becomes trivial. We can scale the inputs by subtracting off the mode- and level-specific mean. Then, we can find the multiple of mode- and level-specific standard deviations a point is from the mean (by dividing off σ). If the majority of data points in a first-in, first-out buffer all exceed Nσ, a level shift will occur.

While an overall minor takeaway, the implications were fairly immediate and resulted in a soon-to-be-executed research study that will help this graduate student keep his advisor happy. 

#### Literature Review

People who are influenced by changing health or environmental factors are not consistently reliable when operating complex machines. While existing research seeks to measure a human’s trust in their robotic partner$^{1-3}$ or discover novel means of human-robot control-sharing$^4$, little quantifies the extent to which a robot should act on its human’s commands. Without a measure of robotic trust, the robot cannot temper its actions and prevent undesired or dangerous outcomes due to lacking human cogency or ability. In the human-robot team, the human and their intelligent agent share control of the robotic platform(s); in many situations, the ultimate goal of the shared control paradigm isn’t to automate away the need for the human, but rather maximize the human’s control authority while also maximizing performance and safety.$^{2,3}$ For example, in assistive and rehabilitative robotics, it has been shown that users desire as much as much control authority as possible while receiving assistance only when necessary.$^2$ To maximize user control, the trustworthiness of the human’s signals must be quantified as to optimize control authority. 

First this literature review aims to present the methods presently defining robotic trust (namely, those flavored with optimization methods). Second the review shall present methods from optimization and machine learning which may define future measures and applications of robotic trust. A note: this literature review will not review methods of estimating a human’s trust in their robotic partner. Further, some of this review morphed into a discussion of potential applications of past work. 

Robotic trust is a largely unexplored topic within human-robot control sharing. According to [5] robotic trust is a measure of human cogency; the more reliable a human’s control signal, the higher the measure of robotic trust. In this work, robotic trust is measured as the difference between a user’s reference control trajectories and optimal control solutions.$^6$ This paper solves two optimization problems; directly, the optimal control solution is found and indirectly, the paper minimized the “danger” cost function. In other words, the paper seeks to maximize user safety. Others have sought to use model-based non-linear control to minimize a controller’s error function using robot-specific trust models (i.e. the robot’s performance while under operator control).$^7$ In [7], trust was not used to make decisions about the quality of the human’s inputs, but rather provide feedback to the human to keep the controller’s cost function within acceptable bounds. This work was expanded from optimizing the control of one robot to optimizing the control of many robots given a single control signal for the swarm.8 Unfortunately, few other papers explicitly describe robotic trust as a measure of the human’s reliability; of those that do exist, they are incremental improvements by the authors of [5-8] on their own works.

Given [5-8], it’s safe to assume that robotic trust is presently founded deeply in optimal and model-based control theory. Other definitions of robotic trust may measure the human’s ability to understand their robot’s physical limitations (e.g. control stability or safety policy violations)$^{9-11}$ or use task-agnostic performance measures$^{12}$ such as mean completion time to characterize a human’s reliability. Another robot-agnostic measure of performance may apply Learning from Demonstration (LfD) techniques. 

In LfD, an “expert” teacher defines a robot’s control policy by demonstrating a task (e.g. painting the hull of a ship) to define robot action.$^{13}$ LfD is used by [14] to provide users with force feedback to assist them in task completion; instead of providing force feedback, given a large enough set of expert demonstrations, divergence from the expert policy could be used as a measure of robotic trust. This assumes the confound of human condition (e.g. deteriorating health) is already built into the LfD model. 

Many possible definitions of robotic trust may be proposed from machine learning and optimization methods and many will be investigated in coming months. Finally, it must be noted that an effective measure of trust may be a combination of these measures. Combinations of the measures may aggressively select the best performing trust metric, temporally average trust metrics, or weigh trust metrics in a blending scheme where weights are learned to maximize overall trust for an individual or task. There is no known literature to inform combinations of robotic trust and thus remains an area of open research. 

Measurements of robotic trust are useless without proper applications. Initial methods of realizing robotic trust should build upon methods classically used in shared control; these serve as a basis for methods derived from machine learning and optimization. In linear autonomy blending,$^{15}$ trust may be used to directly allocate the autonomy blending parameter combining human and robot control. In discrete autonomy allocation, the system measures the performance of the human partner through pertinent measures in real-time, and can dynamically adjust robotic trust to decide when to shift between discrete autonomy levels$^4$ (e.g. obstacle avoidance, full autonomy, etc.). However, these methods are rather limited; they tinker with a single variable as opposed to optimizing over an entire model of the person. 
 
Inspired by LfD and reinforcement learning methods, probabilistic blending uses trust to modulate control authority between the human and robotic partners; here, trust is modeled by a probability distribution that is continuously updated to incorporate the most recent information$^{16}$ (e.g. the robot state, environment state, and/or task information). However, more possible applications of robotic trust can be derived from optimization techniques than from learning methods. 

In [17], the optimal level of assistance is derived from the human themselves; a person’s requests for more or less assistance is used to define a unique cost function. Here, trust measures could be weighed more heavily than control inputs and the optimized cost function could be defined from a record of past requests for assistance. Similarly, in [18] control inputs are passed or blocked based on the principle of Maxwell’s Demon. If the input signal, when compared to a controller’s signals violate an inner-product or angular condition (e.g. small inner product or large angle), the signal is blocked. In [18], blocking is done via a form of haptic feedback as to guide the user without removing complete control authority. This guidance is designed to help users maintain acceptable error values (optimized over their error function). Here, trust could define the acceptable thresholds and magnitudes of feedback provided to users. 

While we seek task- and robot-agnostic measures of trust, the applications of trust may be task-dependent. Recent literature indicates the need to break some tasks into subtask primitives.$^{3,19-20}$ Intent inference models$^{21}$ can be used to predict the human’s task, break it into subtasks, and dynamically shift between appropriate measures of robotic trust. This aim further seeks to explore and define both task-agnostic and -dependent trust applications. By breaking large problems into smaller problems, we can optimize trust for the human’s capabilities and maximize the human’s control of their robotic system. 

Finally, an interesting discovery resulting from this literature search: it can be extremely difficult to optimize cost functions for systems where the human is in-the-loop. It’s been shown that a person’s internal cost function (or state if using MDPs, depending on how one models a person) may optimize over a different set of parameters that seem nonsensical to robot designers.$^{17}$ For example, a person may be bad at using spoons as the result of a stroke. However, they may desire, even though all computational methods indicate they need assistance, for complete independence while using spoons. This is a fact that must be considered and further studied when developing robotic trust and assistive robots in general. 

#### Works Cited
1.	M. Young, et al., An Analysis of Degraded Communication Channels in Human-Robot Teaming and Implications for Dynamic Autonomy Allocation. In: Field and Service Robotics. Springer Proceedings in Advanced Robotics, vol 5. Springer. 2018.
2.	O. Horn, "Smart wheelchairs: Past and current trends," 2012 1st International Conference on Systems and Computer Science (ICSCS), Lille, 2012, pp. 1-6. 2012.
3.	S. Musić, et al., Control sharing in human-robot team interaction, Annual Reviews in Control, Volume 44, 2017, Pages 342-354, ISSN 1367-5788.
4.	M. Chiou, et al., "Experimental analysis of a variable autonomy framework for controlling a remotely operating mobile robot," in Proceedings of the IEEE/RSJ International Intelligent Robots and Systems (IROS), 2016.
5.	B. D. Argall, et al. Computable trust in human instruction. In Artificial Intelligence for Human-Robot Interaction - Papers from the AAAI Fall Symposium, Technical Report. 2014.
6.	A. Broad, et al., "Trust Adaptation Leads to Lower Control Effort in Shared Control of Crane Automation," in IEEE Robotics and Automation Letters, vol. 2, no.1, pp. 239-246, Jan. 2017.
7.	H. Saeidi et al., "Trust-based mixed-initiative teleoperation of mobile robots," 2016 American Control Conference (ACC), 2016.
8.	H. Saeidi et al., A Trust-Based Mixed-Initiative Teleoperation Scheme for the Shared Control of Mobile Robotic Systems. 2016. (doi: 10.13140/RG.2.2.10840.90888.)
9.	G. F. Franklin, et al., Feedback control of dynamic systems, 7th ed. Boston: Pearson, 2015.
10.	X. Xu, et al., Robustness of Control Barrier Functions for Safety Critical Control, IFAC-PapersOnLine, Volume 48, Issue 27, 2015, Pages 54-61, ISSN 2405-8963.
11.	A. Erdogan, et al., The effect of robotic wheelchair control paradigm and interface on user performance, effort and preference…, in Robotics and Autonomous Systems, 2017.
12.	A. Steinfeld, et al. Common metrics for human-robot interaction. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction (HRI '06). 2006. 
13.	Brenna D. Argall, et al., A survey of robot learning from demonstration, in Robotics and Autonomous Systems, Volume 57, Issue 5, 2009, Pages 469-483.
14.	X. Yang, et al, "A framework for efficient teleoperation via online adaptation," in IEEE International Conference on Robotics and Automation (ICRA), 2017.
15.	A. Erdogan, et al., The effect of robotic wheelchair control paradigm and interface on user performance, effort and preference…, in Robotics and Autonomous Systems, 2017.
16.	S. Reddy, et al. Shared Autonomy via Deep Reinforcement Learning. In Proceedings of Robotics: Science and Systems (RSS ’18), 2018
17.	D. Gopinath, et al., "Human-in-the-Loop Optimization of Shared Autonomy in Assistive Robotics," in IEEE Robotics and Automation Letters, vol. 2, no. 1, pp. 247-254, Jan. 2017.
18.	K. Fitzsimons, et al., "Optimal human-in-the-loop interfaces based on Maxwell's Demon," 2016 American Control Conference (ACC), Boston, MA, 2016, pp. 4397-4402.
19.	X. Yang, et al, "A framework for efficient teleoperation via online adaptation," in IEEE International Conference on Robotics and Automation (ICRA), 2017.
20.	M. Young, C. Miller, et al., “Implications of Task Features in Robotic Manipulation for Dynamic Autonomy Allocation,” in IEEE International Conference on Robotics and Automation (ICRA), 2017. (In Review).
21.	S. Javdani, et al. Shared autonomy via hindsight optimization for teleoperation and teaming. In The International Journal of Robotics Research. 2018. 
