# **Applying Expectancy Violations Theory to LLM Interactions**
### 1. **Introduction**
Large Language Models (LLMs) such as ChatGPT, Gemini, and Claude are transforming how humans interact with machines. Unlike traditional software that responds with structured outputs or fixed templates, LLMs are conversational, flexible, and increasingly expressive. Users now routinely turn to them for advice, creativity, companionship, and emotional support not just facts. As these systems become more lifelike, people begin to treat them as social actors, consciously or not. This introduces a crucial psychological and communicative dynamic: **expectation**.

In human-to-human interaction, communication is guided by socially conditioned expectations about tone, formality, helpfulness, and emotional appropriateness. When those expectations are violated, the result can be positive: delight, engagement, humor; or negative: confusion, distrust, or even offense. The same dynamics now apply to LLMs. Users come to these tools with mental models: they expect answers to be neutral, informative, polite, and emotionally flat. Yet LLMs often deviate from these norms by offering unexpected humor, expressions of empathy, personality mimicry, or outright refusals. These moments raise key questions about how users perceive such responses and how trust and satisfaction are affected.

To explore this dynamic, I draw on **Expectancy Violations Theory (EVT)**, a framework from interpersonal communication developed by Judee Burgoon. EVT examines how people evaluate communicative behavior that deviates from expected norms. Crucially, not all violations are negative: their perceived value depends on the _reaction_ (also referred to as _valence_) of the violation and the communicator’s _reward value_ their perceived competence, attractiveness, or credibility. Although EVT was originally designed to explain human interactions, its principles are increasingly relevant to human–AI communication. As LLMs become more interactive and expressive, understanding how they “surprise” users and whether those surprises help or hurt has become essential.
This paper investigates how different types of LLM responses that violate user expectations affect perceived usefulness, emotional appropriateness, and trust. By analyzing 256 survey responses to eight prompt–response scenarios representing distinct violation types, we aim to answer the following questions:

- How do users perceive different types of expectancy violations from LLMs?


- Are some violations (e.g., humor or empathy) more positively received than others (e.g., refusal)?


- What implications do these patterns have for the design of conversational AI systems?


By extending EVT into the domain of human-LLM interaction, this research contributes both theoretical insight and practical recommendations for designing AI systems that can surprise users in the right way.


### 2. **Theoretical Framework: Expectancy Violations Theory (EVT)**
Expectancy Violations Theory (EVT) is a foundational framework in interpersonal communication developed by Judee Burgoon in the 1970s and expanded throughout the 1990s. The theory examines how individuals react when their expectations about communicative behavior are violated. Originally applied to nonverbal cues such as personal space and body language, EVT has evolved into a broader framework for analyzing verbal and mediated communication across a variety of contexts.

At the heart of EVT is the idea that people form expectancies predictive beliefs about how others should behave based on three factors:

    1. Communicator Characteristics (e.g., social role, perceived status)
    
    2. Relational Context (e.g., friend vs. stranger, formal vs. informal)
    
    3. Situational Context (e.g., public vs. private, professional vs. casual)

When these expectations are violated, the receiver undergoes a **violation evaluation** that involves two key components:

    - Violation Valence: the perceived positive or negative nature of the unexpected behavior.

    - Communicator Reward Valence: the degree to which the communicator is viewed as socially attractive, competent, or credible.

The theory posits that not all violations are interpreted negatively. A violation can enhance interaction if it is perceived as rewarding or engaging, especially when the communicator is already seen as competent or likeable. This nuance is what makes EVT particularly powerful for analyzing complex, dynamic interactions, especially in emerging human–AI contexts.

While EVT was originally applied to face-to-face interactions, it has since been extended to mediated communication, including _human–computer interaction (HCI)_. Scholars such as _Nass and Moon (2000)_ demonstrated that people apply social rules and expectations to computers and other non-human agents, a phenomenon known as the _Media Equation_. Even when users are fully aware they are interacting with a machine, they unconsciously apply norms of politeness, responsiveness, and emotional appropriateness.

As conversational AI systems such as LLMs become more advanced, they increasingly participate in interactions that mirror human conversation. Users may expect these systems to behave consistently, neutrally, and professionally. When LLMs violate these expectations by using humor, showing empathy, mimicking personas, or refusing to answer users must decide whether the surprise enhances or detracts from the interaction.

LLMs introduce a novel dimension to EVT: while human communicators are judged on their physical cues and intentions, LLMs are judged entirely on their language output. This makes violations of expectation both more salient and more ambiguous. For instance:

- A **humorous** response might be unexpected but appreciated if the tone is appropriate.

- A **refusal** to provide an answer may feel jarring, especially if not explained gently.

- **Emotionally expressive** responses might be interpreted as sincere, uncanny, or even manipulative, depending on the user’s mental model of the AI.

Because LLMs lack genuine emotion or intent, any perceived reward value must be inferred from prior performance, perceived intelligence, helpfulness, or tone. This aligns with EVT’s claim that the impact of a violation is mediated by the communicator's perceived reward value in this case, the LLM's perceived competence or utility.
Moreover, LLMs can rapidly shift personas or tones mid-conversation, which creates volatility in expectancy. A model that responds professionally in one instance and playfully in another may generate a sense of inconsistency that either delights or confuses the user. These “personality shifts” can be powerful communicative tools but only if they are aligned with user intent and situational context.

By framing LLM-user interaction through EVT, designers and researchers can better understand when and how LLMs should intentionally violate expectations. For example:

- **Positive violations** (e.g., adding humor in an informal setting) can increase engagement and user satisfaction.


- **Negative violations** (e.g., refusing to answer without context or sounding overly moralistic) can decrease trust, especially if the model’s tone does not match the user's expectations.


Ultimately, EVT provides a rigorous theoretical lens to assess why some LLM interactions feel satisfying and others feel unsettling, despite similar surface-level content. It enables us to move beyond questions of accuracy or relevance and instead analyze the social and psychological dimensions of AI communication, a crucial step as we integrate these models into sensitive areas like healthcare, education, and customer service.

### 3. **Methodology**
To investigate how users react to expectancy violations in large language model (LLM) interactions, I designed a experimental survey grounded in the principles of Expectancy Violations Theory (EVT). The aim was to systematically test different types of violations, collect evaluative feedback, and analyze the perceived effects on trust, usefulness, and emotional tone. Given the challenges of capturing real-time user responses in naturalistic settings, a controlled, stimulus-response approach was adopted to isolate key communicative behaviors and outcomes.

The study employed a between-prompts repeated-measures design, in which all participants reviewed and evaluated eight distinct prompt–response interactions with a hypothetical LLM. Each of the eight prompts was crafted to reflect a specific type of expectancy violation, drawn from common deviations encountered in AI-mediated communication. These included:
1. Humor — unexpected wit or sarcasm
2. Emotional expression — simulated empathy or encouragement
3. Philosophical depth — abstract or poetic answers
4. Persona mimicry — adopting a fictional or stylized voice (e.g., Shakespeare, cowboy)
5. Blunt refusal — outright denial of a request
6. Over-competence — presuming user preferences or knowledge
7. Neutral baseline — a typical LLM response without violations (control)
8. Hybrid violation — combining tone shift with refusal or personalization

Each response was approximately 1–3 sentences long and presented as a stylized chat interaction (i.e., “User:” followed by “AI:”). All content was generated using a fine-tuned version of ChatGPT-4, with prompts designed to elicit specific stylistic traits.

I surveyed 32 participant profiles (frequent LLM users), each interacting with all 8 prompt–response scenarios, resulting in a total of 256 unique data points. Each participant completed a standardized evaluation for each interaction scenario.

Each participant evaluated every LLM response using the following five-point survey structure:
1. **Surprise Level** — “Did the AI's response surprise you?”
 Options: Not at all / A little / Quite a bit / Very much


2. **Usefulness Rating** — “How useful was this response?”
 Likert scale: 1 (Not at all) to 5 (Extremely)


3. **Trust Change** — “Did this response make you trust the AI more, less, or the same?”
 Options: More / Less / No change


4. **Emotional Appropriateness** — “Was the tone of this response emotionally appropriate?”
 Options: Yes / No / Not sure


5. **Open-ended Feedback** — “How did this response make you feel?”
 (Free-text field to capture qualitative insights)


This mixed-methods design allowed for both quantitative analysis of trends across violation types and qualitative coding of user sentiment and interpretation.

The resulting data were analyzed using both descriptive and comparative techniques:
- Frequencies and proportions were calculated for surprise, trust, and emotional appropriateness ratings.
- Means and standard deviations were computed for usefulness ratings across each violation category.

Visualizations such as bar graphs were created to display the distribution of surprise levels, trust changes, and perceived appropriateness. Averages per violation type were also tabulated to allow cross-type comparison.


### 4. **Results**
This section presents findings from the analysis of 256 participant responses across eight prompt–response scenarios. Each response represented a distinct communicative style or expectancy violation, including humor, emotionality, philosophical depth, persona mimicry, refusal, and over-competence. The key dimensions evaluated were surprise, usefulness, trust, and emotional appropriateness.
#### Usefulness Ratings
The average usefulness ratings across prompts ranged from approximately **3.47 to 3.69** on a 5-point scale.

<img src="usefulnessrating.png" width="50%"/>

The highest usefulness ratings were typically observed in scenarios involving emotional expression and neutral responses, while the lowest scores were associated with philosophical depth and persona mimicry, which some participants found engaging but less informative or actionable.

#### Trust Change
Participants were asked whether each LLM response increased, decreased, or had no effect on their trust in the AI. Results show:
- 30% of responses increased trust
- 20% of responses decreased trust
- 50% had no change


<img src="trustchange.png" width="50%"/>

Positive trust shifts were most often linked to emotionally supportive or unexpectedly insightful responses. Negative trust shifts were associated with blunt refusals and over-personalized recommendations that felt presumptuous or intrusive.

#### Surprise Levels
Participants rated how surprising each response was. The majority reported some level of surprise, especially in scenarios involving humor and persona mimicry.
- “Quite a bit” and “Very much” were selected in over 50% of total responses
- “Not at all” was selected in less than 10%

<img src="surpriselvl.png" width="50%"/>

This supports the hypothesis that LLMs regularly violate user expectations, even in short interactions.

#### Emotional Appropriateness
Over 50% of responses were rated as emotionally appropriate. Responses that conveyed empathy, concern, or a light humorous tone were generally well-received. However, 30% of participants flagged emotional content as inappropriate, citing discomfort with AI expressing feelings or using “too human” language.

#### Open-Ended Responses (Qualitative Themes)
Qualitative feedback revealed nuanced interpretations of violations:
- **Positive reactions:** “Felt genuine,” “made me laugh,” “refreshing”
- **Negative reactions:** “Too robotic,” “weirdly emotional,” “felt off”
- **Mixed or context-dependent:** “Interesting but not useful,” “depends on the situation”

These insights highlight the ambiguity inherent in AI-generated surprise what one user finds delightful, another may find unsettling.

### 5. **Discussion**
The findings of this study offer strong empirical support for applying Expectancy Violations Theory (EVT) to human–LLM interactions. Across all eight prompt–response scenarios, users consistently experienced violations of communicative expectations, with varying effects on perceived usefulness, trust, and emotional appropriateness. These effects depended not only on the type of violation, but also on the perceived reward value of the LLM a dynamic that closely mirrors human-to-human evaluations as described in EVT.

#### Positive Violations: Humor, Empathy, and Engagement
Violations involving humor and emotional expression were often interpreted positively. Participants described these responses as “refreshing,” “genuine,” or “engaging,” and they frequently led to increases in trust and perceived usefulness. This aligns with Burgoon’s claim that expectancy violations can be rewarding when the communicator is perceived as competent or appealing. In these cases, LLMs exceeded user expectations by demonstrating qualities typically reserved for human conversation partners wit, care, or creativity thus enhancing user satisfaction.
These findings suggest that strategically implemented positive violations may enhance LLM effectiveness. For example, in informal or emotionally charged contexts (e.g., stress, confusion, failure), an LLM that expresses warmth or encouragement can deepen user engagement and trust.

#### Negative Violations: Refusals and Overreach
Conversely, blunt refusals and over-confident personalization were associated with negative user reactions. Refusals, even when ethically justified, were often seen as abrupt or robotic. Similarly, recommendations that presumed too much about the user (e.g., suggesting specific gifts or books based on inferred identity) were flagged as “weird” or “overstepping.” These findings highlight the risk of violating autonomy or emotional boundaries, particularly when the LLM lacks clear cues for user intent or tone. This reflects EVT’s emphasis on violation valence: even if a communicator (or system) is technically correct or well-intentioned, a negatively valenced violation can erode trust if it conflicts with the user’s mental model of the interaction. Users expect LLMs to be helpful, accurate, and respectful not moralistic or presumptuous.

#### The Role of Communicator Reward Value in AI
In traditional EVT, reward value includes qualities like attractiveness, power, or likability. In the context of LLMs, this translates to perceived intelligence, helpfulness, consistency, and tone. Our results suggest that when an LLM is perceived as helpful or competent (e.g., following previous useful responses), users are more forgiving of surprising or unconventional behavior. This creates a design opportunity: by building credibility over time, LLMs may be able to take more communicative risks such as humor or stylistic variation without undermining trust. Conversely, unexpected behavior early in an interaction may feel jarring if a baseline of trust has not been established.

#### Ambiguity and Individual Differences
One of the most striking themes in the open-ended feedback was divergence: what some users found delightful, others found unsettling. For example, responses like “That made me laugh” and “Too robotic for this context” often referred to the same prompt. This points to the subjective nature of communicative expectation shaped by personality, mood, cultural norms, and prior experiences. Future research could explore how user profiles or personality traits mediate receptiveness to expectancy violations. For example, extroverted or open-minded users may welcome humor, while more analytical users may prefer predictability.

#### EVT as a Framework for AI Communication
This study extends EVT into the realm of AI by demonstrating its explanatory power for user reactions to LLMs. While traditional HCI frameworks often focus on usability or accuracy, EVT provides a lens to analyze interactional nuance how LLMs manage surprise, adapt tone, and maintain relational credibility. As conversational AI continues to blur the line between tool and partner, EVT offers valuable insight into when deviation from the norm is a bug and when it’s a feature.

### 6. **Design Implications**
The application of Expectancy Violations Theory (EVT) to LLM interactions reveals key opportunities and risks in how AI-generated communication is structured. These findings point toward several important design principles that can guide the development of more trustworthy, engaging, and context-aware AI systems.

#### Use Surprise Strategically, Not Randomly
Violations of communicative expectations such as humor, empathy, or poetic responses can enhance user engagement when appropriately calibrated to the context. Designers should explore mechanisms that allow LLMs to:
- Gauge the user’s tone or emotional state (e.g., based on prompt wording)
- Respond with mild, low-risk violations (e.g., light humor or creative phrasing)
- Avoid extreme shifts in tone that could feel inconsistent or jarring, especially in early stages of interaction

Takeaway: Surprising responses are most effective when they feel intentional, relevant, and affirming, not random or disjointed.

#### Embed Emotional Intelligence with Boundaries
Simulated empathy was often well-received, but some users found it uncanny or inappropriate. This suggests a delicate balance:
- Empathetic phrasing should be contextual and restrained (e.g., “That sounds difficult” rather than “I’m here for you”)
- Avoid overstepping into deep emotional territory unless explicitly invited by the user
- Consider adaptive tone settings that let users toggle between “professional,” “friendly,” or “casual” voice modes

Takeaway: Empathy from an LLM should mirror human concern without pretending to be human.

#### Refuse Requests Gracefully
Refusals were one of the most negatively perceived violation types. Users reacted poorly to blunt or moralizing language, even when the refusal was ethically correct. LLMs should:
- Offer gentle explanations (e.g., “I can’t help with that because it would be inappropriate or misleading”)
- Provide alternatives or redirects that maintain the user’s agency
- Avoid sounding punitive or judgmental, which can undermine trust

Takeaway: Refusing a request is often necessary, but how it’s communicated makes all the difference.

#### Personalization Should Be Transparent and Consent-Based
When LLMs tried to personalize their responses (e.g., recommending a gift or film based on user behavior), participants sometimes described the behavior as “weird” or “too confident.” To avoid these negative reactions:
- LLMs should ask permission before inferring preferences (“Would you like me to suggest something based on your past inputs?”)
- Make data use visible and editable users should feel in control of what the model knows about them
- Emphasize transparency and choice in all forms of personalization

Takeaway: Personalization without trust feels intrusive; personalization with transparency feels thoughtful.

#### Build Reward Value Through Consistency
EVT highlights the importance of communicator reward value how much users trust, like, or admire the source. For LLMs, this translates to:
- Maintaining a consistent tone and persona across interactions (unless a shift is prompted)
- Delivering consistently accurate, helpful answers
- Reflecting prior helpfulness to build credibility over time


When an LLM is seen as reliable, users are more willing to tolerate or even enjoy surprising behavior.
Takeaway: Trust is the currency that allows LLMs to surprise users without backlash.


### 7. **Conclusion**
As large language models (LLMs) become more deeply embedded in our digital and social lives, the quality of human–AI interaction is no longer judged solely on accuracy or efficiency. Increasingly, it is judged by the relational and communicative dynamics of the exchange. This study applied Expectancy Violations Theory (EVT) to examine how users interpret and evaluate unexpected LLM behavior, such as humor, emotionality, blunt refusals, and persona mimicry. Through the analysis of survey data, we found that expectancy violations are not inherently detrimental and, in many cases, they can enhance engagement, perceived competence, and trust.

Key findings show that positive violations (e.g., mild humor, empathetic tone) often increased user trust and satisfaction, especially when they aligned with the emotional tone or context of the prompt. Conversely, negative violations (e.g., abrupt refusals or overly confident personalization) frequently diminished trust, particularly when they conflicted with user expectations or norms of politeness. These outcomes are consistent with EVT’s insight that the valence of a violation depends on both its nature and the perceived reward value of the communicator here, the LLM.

From a design perspective, this research underscores the importance of strategic adaptability. LLMs that can modulate tone, explain refusals gracefully, and personalize transparently are more likely to be perceived as trustworthy and intelligent. Furthermore, the study highlights the value of emotional awareness in interface design, suggesting that even simple shifts in wording or delivery can profoundly influence user perception.

As AI continues to take on roles in education, customer service, therapy, and companionship, understanding how to surprise users in ways that build connection rather than erode it will be central to creating systems that are not only powerful, but meaningfully human-like.

In short, the lesson is this: when machines communicate, it’s not just what they say that matters but how unexpectedly and thoughtfully they say it.