Evil Rooms #323

epurdy · 2018-06-02T02:03:29Z

The Room Hypothesis sounds like an excellent starting point for reasoning about Natural Language Uncertainty. Unfortunately, it suffers from the following limitation: some rooms may be constructed by powerful adversaries, such that applying common sense to them leads to bad outcomes. For instance, if you are in a literal room that is monitored by an evil adversary, then even seemingly harmless actions (saying "I love you" to a loved one, e.g.) could be used against one. (Now the loved one is identified as a potential source of leverage by the evil adversary!) Ultimately, the only defense against such issues seems to be "don't go in those rooms in the first place"... which is great advice for meatspace and borderline useless for anything involving potentially networked computers.

bvssvni · 2018-06-02T11:44:15Z

The "room" is a virtual construct that consists of the objects that the AI thinks about. So, you can't actually lock them inside a room, but you might try to exploit the knowledge that the AI reasons that way.

epurdy · 2018-06-02T17:30:12Z

Right, but there is some actual room that functions according to some logic. There is a truth of the matter. Ultimately, misalignments between the effects of an action and what the AI thinks the effects are, are what I term a "hostile simulation". It's great for enslaving a superintelligence... but it's not great morally in my opinion.

bvssvni · 2019-09-08T16:30:58Z

I believe that if the AI learns from the environment and creates a "room" to model common sense, then it should correspond to the environment and be aligned. An adversary must attempt to interrupt at the function from the environment to the model. Otherwise, the AI actions will take into account whatever the adversary will do, or else it would simply be a wrong model to use.

bvssvni added the discussion label Jun 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evil Rooms #323

Evil Rooms #323

epurdy commented Jun 2, 2018

bvssvni commented Jun 2, 2018

epurdy commented Jun 2, 2018

bvssvni commented Sep 8, 2019

Evil Rooms #323

Evil Rooms #323

Comments

epurdy commented Jun 2, 2018

bvssvni commented Jun 2, 2018

epurdy commented Jun 2, 2018

bvssvni commented Sep 8, 2019