## Theory of Mind for Robotics - A Discussion of SOTA, Neural Bases, and Challenges

Here I am, procrastinating writing my thesis because I am fascinated with this new topic aka ToM - thankfully, I was introduced to it by one who understands it from both a NLP & cognitive standpoint as well as from a robotics standpoint - Prof. Yonatan Bisk! I'll note down some points from my convo with him; read the papers suggested, and try to think of why this is hard and what we can do to solve it.

My question to him:
"Allow me to be annoying for a minute here and reopen the Theory of Mind can of worms once again! I should probably do the readings first; but while I get started on those I do want to ask again about why you said that ToM remains a highly challenging topic - even annoying, because everything is reduced to a low-D problem and solved. You said solving ToM would essentially mean 'solving complete AI'. Maybe I missed the point in the class, but what is hard about solving the 'recursive state update' problem that we discussed as being functionally similar to Theory of Mind?"

His answer:  
"If the full form of theory of mind is an accurate model of a latent representation that you cannot ever observe, then you have to infer the model from the behaviors and contexts.
So you need to be able to run a simulation of what you would do in this situation to lead you to say/do the same things, but you don't know the whole context so you might also need to hallucinate stories that would likely give the context necessary to explain the behavior but you would need to be able to do that well because you'll probably never get supervision for that process and if you get it wrong then you will ascribe the wrong intention to people which typically means the social dynamic goes south -- or as you pointed out in ASD this can lead to frustration and depression because you can't predict why people are doing what they are doing.
But to run a simulation like this means you already need to have your own model of the world (physical and social) that you use for yourself and maybe are reusing to simulate others.
So we could simplify all of this and say it's just a knowledge problem -- the base form -- "A knows X, B saw Y, ..." but the social world you are actually trying to model requires you understand how to build a model of the world -- affordances, actions, etc, that you can run simulations, can evaluate possible pasts and futures, can align those to your observations, can update your understanding for that person and perhaps all people, etc.. 
If I could do that, what couldn't I do?"

Write about my theory - that I think that reading can play a huge role in developing a semblance of ToM in ASD kids (or in people, for that matter). I hold this as a strong opinion, because I read a lot of fiction and other things as a kid; and felt like I had a fairly articulated world model. Although - I was socially awkward for a long time; that likely points to the fact that you need some sort of environmental grounding - "okay, I can say X in context Y but not in context Z", "If A says G when we are in group D, it means this; here is what reactions of B, C  and D to what A said reveal to me about A's standing in the group and of B, C and D's leanings" etc. The fact that I had to look for those things were in some sense inspired by reading books and seeing characters do that; but there was surely a phase of me trying to be in this 'grounding' phase, and go through a lot of 'pain' for grounding failures - and once I had learned the dynamic, I could now be a part of this group. The reason I started saying this is:
- I think there can be supervision to world model building in a robot - just use all the quality good fiction that the human race has generated as training data. However that does bring into question the model being used to learn the patterns in the training data. The human brain is capable of 'true reasoning', and we already established in class that GPT cannot do that - its few failure cases simply show that it's not doing true reasoning. Inherent flaw, will some new magic neuro-inspired model be able to fix this?

### GPT on Prof.Bisk's answer

Breaking Down Prof. Bisk's Answer:

1. Understanding Theory of Mind (ToM): Latent Representations: At its core, ToM involves understanding that others have internal mental states (beliefs, desires, intentions) that are not directly observable. Inference from Behavior and Context: Since we cannot access these internal states directly, we must infer them by observing behaviors and understanding the context in which they occur.
2. Simulating Others' Perspectives: Running Simulations: To predict or explain someone's behavior, we simulate what we might do in their situation to produce similar actions or words.
Incomplete Information: We often lack complete context about others' situations, so we need to hypothesize or "hallucinate" plausible stories that could explain their behavior.
Accuracy is Critical: If our inferred models are incorrect, we may misattribute intentions, leading to misunderstandings or social friction.
Connection to ASD: Individuals with Autism Spectrum Disorder may struggle with ToM, leading to difficulties in predicting behaviors and social interactions, which can cause frustration and emotional distress.
3. Requirements for Effective Simulation: Personal Model of the World: To simulate others, we need a robust understanding of how the world works, including physical laws and social norms.
Reusing Our Models: We often use our own mental frameworks and experiences as a basis for simulating others' perspectives.
4. Simplifying vs. Complexity: Knowledge Problem Simplification: One might reduce ToM to tracking who knows what (e.g., "A knows X," "B saw Y"), but this oversimplifies the complexity of social cognition. Complex Social Modeling: Accurately modeling the social world involves understanding: Affordances: What actions are possible in a given context. Actions and Consequences: How actions lead to outcomes. Simulations of Pasts and Futures: Imagining different scenarios to explain behaviors. Alignment with Observations: Ensuring our simulations match what we observe.
Updating Understanding: Refining our models based on new information about individuals and groups.
5. The Ultimate Challenge in AI: General Intelligence: If an AI could perform all these tasks effectively, it would essentially possess a level of understanding and adaptability akin to human intelligence. Open-Ended Abilities: Mastering ToM in AI implies solving numerous complex problems, leaving little that the AI couldn't do in terms of social cognition and interaction.


### PseudoCode and Thoughts

In [None]:
class WorldModel:
    def __init__(self):
        self.physical_laws = load_physical_laws()
        self.social_norms = load_social_norms()
        self.knowledge_base = initialize_knowledge_base()

    def update(self, observations):
        # Update the world model based on new observations
        self.knowledge_base.update(observations)

class AgentModel:
    def __init__(self, world_model):
        self.world_model = world_model
        self.beliefs = {}
        self.desires = {}
        self.intentions = {}

    def infer_mental_states(self, observed_behavior, context):
        # Infer the agent's mental states from behavior and context
        possible_states = generate_possible_states(context)
        best_match = evaluate_states(possible_states, observed_behavior)
        self.beliefs, self.desires, self.intentions = best_match

    def simulate_behavior(self):
        # Simulate behavior based on inferred mental states
        simulated_behavior = run_simulation(self.beliefs, self.desires, self.intentions)
        return simulated_behavior

def generate_possible_states(context):
    # Generate hypotheses about the agent's mental states
    hypotheses = []
    for possible_belief in context.possible_beliefs():
        for possible_desire in context.possible_desires():
            for possible_intention in context.possible_intentions():
                hypotheses.append((possible_belief, possible_desire, possible_intention))
    return hypotheses

def evaluate_states(possible_states, observed_behavior):
    # Evaluate which hypothesized states best explain the observed behavior
    best_match = None
    highest_score = float('-inf')
    for state in possible_states:
        simulated_behavior = run_simulation(*state)
        score = compare_behavior(simulated_behavior, observed_behavior)
        if score > highest_score:
            highest_score = score
            best_match = state
    return best_match

def run_simulation(beliefs, desires, intentions):
    # Simulate behavior based on mental states
    # This function would involve complex reasoning and prediction
    simulated_behavior = complex_reasoning_engine(beliefs, desires, intentions)
    return simulated_behavior

def compare_behavior(simulated_behavior, observed_behavior):
    # Compare simulated behavior with observed behavior to score the hypothesis
    score = similarity_metric(simulated_behavior, observed_behavior)
    return score

def main():
    world_model = WorldModel()
    agent_model = AgentModel(world_model)

    while True:
        observations = get_new_observations()
        world_model.update(observations['context'])

        observed_behavior = observations['behavior']
        context = world_model.knowledge_base

        agent_model.infer_mental_states(observed_behavior, context)
        predicted_behavior = agent_model.simulate_behavior()

        # Use predicted behavior for further reasoning or interaction
        take_action_based_on_prediction(predicted_behavior)

if __name__ == "__main__":
    main()