About Refuting the Orthogonality Thesis
by Sven Nilsen, 2017
A few people who are skeptical toward The Orthogonality Thesis are ridiculed by a few AI enthusiasts, because it is seen as those who are skeptical are lacking education in logic and mathematics to understand the topic. It can be difficult to construct arguments when the other opponent has better background in logic and philosophy. The median AI enthusiast is probably better at math than the median person on earth, but they are likely not better at math than e.g. the average statistics enthusiast, and often missing the details and sometimes communicate their own black/white understanding instead of opening for philosophical discussion.
For example, this video does make a good argument for The Orthogonality Thesis, but it also take an over-simplified view of what it means to be "stupid" and "intelligent". I felt watching the video that skeptics were viewed as "just wrong" instead of encouraging open discussion.
Yet, like other cool ideas Nick Bostrom proposes, such as the Simulation Argument, the weakness often lies in the assumptions of the proof, something I believe Nick Bostrom is aware of himself, because I have been emailing him about another problem where a similar situation occured.
The basic problem is that terms like "stupid" and "intelligent" are very broad, and how it is thought about theoretically in a special sub-field of science tend to focus on 1-2 dimensional aspects of it to avoid cognitive overload.
My reaction would not be that strong had it not happen before. It even happens around theories of the same philosopher! Somehow Nick Bostrom has a talent for promoting these ideas that themselves might seem wrong on the surface to some people, but which points to something more profound underneath. He has done some stand-ups as a comedian, so it would not surprise me that some deliberate provocation is baked into his theories.
This post is about trying to fix some of this misunderstood black/white arguments, and tell a story about how difficult it can be to create open discussion when ideas are popularized, dividing people, labeling themselves "smart" and "stupid".
Similar Obvious-Sounding Problems Have Been Refuted Before
I emailed Nick Bostrom about a missing 4th options in the Simulation Argument, that overlooks extreme observer selection effects due to eternal cosmic inflation, and he agreed that this could be the case if the Boltzmann brain problem could be overcomed.
Link to my paper Inflating the Simulation Argument.
Here is Nick's response:
Hi Sven, Wouldn’t that model of inflation together with your way of applying anthropics imply that we should expect to find ourselves on a much younger planet? or even with even greater likelihood that we should be Boltzmann brains assembled shortly after the start of the sub-universe in which we exist? ~~~~~ Professor Nick Bostrom Director, Future of Humanity Institute Director, Strategic AI Research Center University of Oxford
I did not expect any response, because Nick Bostrom is superfamous, and he even helped me further along the way, until he suddenly dropped me down a rabbit hole! :)
From my perspective, Nick's view seems to be that in an argument about reality (which has many number of bits and therefore a complex proof in logic), an assumption should be only taken into account if you can assign it at least a reasonable threshold of confidence. I agree with that position.
It turned out that about the same time, physicists were starting to estimate the expected age of the planet. Some argue that you would be more likely to born on a planet circulating around a Red Dwarf, since these stars are smaller and burns out much slower, lasting for trillions of years.
I have some difficulty believing that a planet can sustain life for a trillion years, because chemical processes that keep things alive tend to transform matter into lower energy states. As the planet receives new energy from the star, some of the low energy states can be reset back into high energy states, but not all of it. Therefore, I believe that it is highly unlikely to find yourself on a planet that has existed for a trillion years or longer. Unless you are existing in a high technological civilization, of course.
However, comparing the age of the earth, about 4.5 billion years to a number like 500 billion years, it seems quite small. With other words, it is room for interpreting the evidence as the earth being relative young.
The earth is far from how young you would naively expect from eternal inflation, so if this argument is correct we would expect to find something else out there explaining the difference. One of my favourite ideas is that quantum mechanics can be an emergent theory from an extreme observer effect. Basically, the universe could have some very lousy laws of physics that only seem to work because space is rapidly growing exponentially faster than any other known physical process, but within the universe it looks normal because in order to observe the universe, you first need to exist, but in order to exist you need to find yourself located in a place that supports your existence for some time.
Nick Bostrom does take his ideas seriously, and spent a signficant amount of effort to derive more precise theories of observer selection effects, of which we other thinkers strongly benefit from. I believe he is aware of the delicate assumptions of his arguments, something that is a sign of good philosphy. He is very good at getting the logic right, such that the assumptions is the only weak spot, like logical arguments should be:
- When something seems obviously wrong and you can not find the weakness immediately, it requires investigation of assumptions because it might turn out that what you find will change our worldview
- Whole classes of mathematics and algorithms were discovered because people found a mistake in a small assumption, for example as the discovery of non-Euclidean geometry and General Relativity
The whole idea that you can easily put people into "smart" and "stupid" categories because they defend position A or B, is ridiculous. Perhaps people are smart or stupid, but it does not make their arguments more or less stupid.
Extreme observer selection effects due to cosmic eternal inflation is such a profound idea, that it could turn our worldview upside down. We do not even know yet what it means if it is true.
One thing I have observed is that there is basically no to little popular debate about eternal inflation of this kind. This means that humanity might miss out on one of the greatest ideas of this century!
Discussing Intelligence Leads to Problems Because It is Associated With Being "Smart"
Everybody knows that if you can defend a definition of intelligence that "coincidentally" happens to have similar characteristics of your own thinking, then your score jackpot in social points.
The field of AI research is still within its early stages, compared to e.g. number theory that has continued for thousands of years.
Imagine that somebody defined numbers as "natural numbers" and claimed that no other definition was equally valid. Yes, there have been people like that. Would people discover complex numbers if they only could work with natural numbers? Probably not.
However, there is some truth to the idea that only natural number exists. Computers use bits to store information, which are equivalent to natural numbers, so one can think of everything being based on natural numbers.
Except, when you start using quantum computers, you might want to use complex numbers for efficiency.
The same thing might happen in the future to the field of AI research. People come up with new ideas, and by having an open discussion around them, new people come up with new ideas building on top of existing ideas, and so on.
Traditional Rationality is to Intelligence as Natural Numbers is to Number Theory
There are two kinds of rationality that are widely accepted in the field of AI:
- Instrumental rationality: How to achieve goals
- Epistemic rationality: How to correct beliefs
These two definitions of rationality are based on our understanding of mathematical functions. For example, a recent breakthrough in normative rationality, which deals with knowledge of expected utility from decision theory, claims that agents that behave as if their decision is a function outperforms other agents in some controversial dilemmas. What is it called? Function Decision Theory of course! (very innovative name :P)
However, our understanding of mathematical functions is also a field that changes. Just a few years ago, Homotopy Type Theory created wide cross-discipline interest in the practical applications of dependently type systems and the use of computer assisted theorem proving and its interpretations.
The failure to look beyond these 2 categories of rationality is because our mathematical toolbox so far is not very good at reasoning about higher order functions.
A higher order function is a function that returns a function. Among programmers using languages like Haskell or Idris, function currying is a familar example.
The problem of higher order functions is that there is a fine line in type systems where checking becomes undecidable. Precisely at the same line, consistency of theorem proving breaks down. As a result, some programs in languages using static types that are valid can not be expressed, but they can be simulated or programmed in languages with dynamically typed systems.
The most powerful type theory commonly used in mainstream programming today is dependently types, but new research is underway. To develop mathematics that goes beyond what we have today, it requires going beyond dependently typed systems.
When we have tools for more powerful reasoning, it might open up for new ideas about rationality. My own favourite idea is something I have nicknamed "zen rationality": The idea that you can reason about goals as higher order functions, where the "true" utility function is learned underway by interacting with the environment.
It works by picking a higher order function that fits the general shape of the goal landscape, and make sure it has nice mathematical properties so learning goals do not lead to instability.
This is just a mathematical perspective of what is currently practiced in AI research: People are developing tools for giving feedback to deep learning that "guides" it toward the goal.
Higher Order Reasoning and The Orthogonality Thesis
The Orthogonality Thesis states that motivations and instrumental effectiveness are orthogonal concepts, which means that you can have:
- A useless goal and an inefficient algorithm
- A useless goal and an efficient algorithm
- A useful goal and an inefficient algorithm
- A useful goal and an efficient algorithm
When people are skeptical toward this thesis, the "smart" crowd of AI enthusiats interpret criticism as e.g. a view of 1) and 4) being more plausible than 2) and 4).
What can possibly be wrong with this picture? There are only roughly 4 categories and there are probably examples for any of them!
Of course, when assuming The Orthogonality Thesis, any claim that e.g. 4) is more plausible is wrong, because it violates The Orthogonality Thesis! This is called "circular reasoning".
The reason to be skeptical is not because of the consistency of the picture, but because the picture might be wrong or highly inaccurate. There could be a wrong assumption baked into the picture.
This is where you need to have deep mathematical intuition to appreciate the subtlety of this thesis.
For a logical argument of this kind, you always make two major assumptions:
- Infinite computing capacity
- Unclosed groups of algorithms by inference
A closed group is when generators, the way to move from one state to another, form an "island" in the space of all possible states.
For example, the paperclip maximizer could learn to speak English, but only to increase the number of paperclips in existence.
Yet, in practice, depending on how the paperclip maximizer is programmed, it might never reach a state where it starts to infer English, because it was designed using an architecture where reasoning about English and self improvement is hard to do.
The paperclip maximizer that can learn anything, solve any problem, only exists within the platonic world of logic, where things like undecidability is often overlooked in popular arguments. Still, it was from the field of logic that the idea of undecidability first appeared, so not using it to encourage open discussion is no excuse.
Complex goals, such as the ones we attribute to human values, requires some kind of architectural insight to be programmed into computers. We can not just copy human values by typing in if-rules, because people have already tried that and it takes forever.
This means that in order for useful goals to be defined, such like those we want the AI to have, might require higher order reasoning about goals in the first place to become practical. In such cases, the mathematical obstacles to achieve efficient algorithms could require some sort of interaction and goal learning, which resulted in more stable systems.
As a result, it could be that some points along the instrumental/motivational axes are much more plausible when constructing a functional AI than others. For example, it could be extremely unlikely that a paperclip maximizer would destroy the earth, or it could also be extremely likely!
Imagine that creating a paperclip maximizer is very easy, and creating one with goals that humans find useful is nearly impossible. Perhaps we are just living in an illusion about our goals, so we are not really maximizing them like we think we do, not because we are not behaving rationality, but because the kind of rationality we have is used to trick ourselves into thinking we have certain goals?
Either case would be strong evidence against The Orthogonality Thesis. Instead of thinking of 4 squares where you can name a few examples, there could be 10^10^200 in one square and only 10^10^30 in another square. Mathematically, this would imply that the two axes are not orthogonal.
I believe the overall problem is that we can not sample this landscape enough to make a confident claim in either direction. We do not have time machines or a way to transport ourselves to parallel timelines, to could conduct experiments using different configurations. There is maximum one earth observed from any given location in space-time.
However, I think that higher order reasoning could be a sort of mathematics that you can not easily arrive at from an algorithm that performs well in simple environments. This, and the unknown aspects of bounded computer power, such as efficient sets, makes it plausible that some goals can not be achieved even in principle, even if they are simple.
I suspect that the goals you can have does not vary from simple to complex, but using another metric that makes better measurements of the number of steps to achieve the goal. Kind of like Kolmogorov complexity of the solution, or something like that.
It also depend on how you divide up the landscape. If you draw a square and divide it into smaller squares, it might look as if the squares are equally big to each other, but this is not true if the big square is a non-linear map of some surface, so you get varied density in the smaller squares.
The Orthogonality Thesis seems to be obviously correct for some people because it is a simple idea. Our minds tend to pay more attention to simple ideas, and they are easier to spread around.
In practice I believe that this idea is way over-simplified, and a more accurate answer would emerge from deeper understanding of higher order reasoning. So, the thesis might need corrections that look like a noise landscape with high peaks and lows, or it might end up completely useless in the end, or it might be extremely useful.
I think The Orthogonality Thesis points out some very important ideas, but do not think it is a good tool for reasoning about expected architectures for superintelligent AI. At this point I believe keeping an open discussion is important.