# Comprehensive Taxonomy of Linguistic Representations Used to Describe a Word

---

## 1. Phonological Level (Sound Representation)

Describes how a word is realized in speech.

- **Phonetic representation** (surface, IPA):  
  - Example: `[kæt]` for “cat”.
- **Phonemic representation** (abstract contrastive units):  
  - Example: `/k/`, `/æ/`, `/t/`.
- **Prosodic representation**: rhythm, stress, intonation, pitch contours.
- **Phonotactic representation**: constraints on sound sequences in a language (e.g. */ng/ does not start an English word).

---

## 2. Morphological Level (Word Structure)

Describes the internal form and formation of words.

- **Morphemic representation**: minimal meaning units.  
  - Example: `un-`, `break`, `-able`.
- **Inflectional representation**: grammatical variants of the same lexeme.  
  - Example: `run → runs → ran → running`.
- **Derivational representation**: word-formation processes.  
  - Example: `happy → happiness`, `teach → teacher`.
- **Lexemic representation**: the abstract base entry gathering all inflections.  
  - Example: `RUN = {run, runs, ran, running}`.

---

## 3. Syntactic Level (Sentence Structure)

Describes how the word behaves grammatically.

- **Syntactic category (POS)**: noun, verb, adjective, adverb, preposition, determiner, etc.
- **Syntactic representation**: phrase-structure trees, dependency trees.
  - Example: `VP → V NP`, or a dependency: `eat → object → apple`.
- **Argument structure / subcategorization frames**:  
  - Example: `give` → requires subject + direct object + indirect object / PP (`NP + NP + PP`).
- **Grammatical features**: number, person, gender, tense, aspect, mood, case, definiteness.

---

## 4. Semantic Level (Meaning Representation)

Describes lexical meaning, relations, and formal interpretation.

- **Lexical semantics**: sense relations  
  - Synonymy (`big` ~ `large`), antonymy (`hot` vs `cold`), hypernymy (`animal` > `dog`), meronymy (`wheel` of `car`).
- **Semantic features**: componential meaning  
  - Example: `[+HUMAN]`, `[+ANIMATE]`, `[+COUNT]`.
- **Compositional semantics**: how meanings combine in phrases/sentences (principle of compositionality).
- **Predicate–argument structure** / logical form:  
  - Example: `eat(x, y)`, `give(x, y, z)`.
- **Formal semantic representation**:  
  - First-order logic, lambda calculus, DRT-style semantic forms.
- **Distributional semantics**: vector-space representations, word embeddings.
- **Frame semantics**: word meaning within an event/frame structure (Fillmore).  
  - Example: `BUY` frame → roles: buyer, seller, goods, price.
- **Prototype / conceptual semantics**: graded category membership; central vs peripheral members.

---

## 5. Pragmatic Level (Contextual Representation)

Describes how a word (or utterance) conveys intended meaning in context.

- **Speech act representation**: assertion, question, request, promise.
- **Deixis and reference**: interpretation relative to speaker, time, and place (`I`, `here`, `now`, `that`).
- **Implicature** (Gricean): what is suggested but not said explicitly.  
  - Example: “It’s cold in here” → request to close the window.
- **Presupposition**: background assumptions the utterance takes for granted.  
  - Example: “John stopped smoking” presupposes John used to smoke.
- **Information structure**: topic–comment, focus, contrastive focus.

---

## 6. Cognitive and Conceptual Level (Mental Representation)

Describes how word meaning is organized in the mind.

- **Conceptual representation**: mental concept linked to the word form.
- **Image schemas**: recurrent embodied patterns (container, source–path–goal, up–down).
- **Semantic network representation**: nodes and links among concepts (WordNet-like structures).
- **Frame-based knowledge representation**: slot–filler structures for events/situations.  
  - Example: `BUY` frame: `{buyer, seller, goods, price, time, place}`.

---

## 7. Computational and Vector-Based Representations

Describes algorithmic, learnable encodings used in NLP.

- **Bag-of-words / TF-IDF**: frequency-based, orderless vectors.
- **Static word embeddings**: dense, low-dimensional vectors (Word2Vec, GloVe, fastText).
- **Contextual embeddings**: token-level, context-dependent vectors (ELMo, BERT, GPT, RoBERTa).
- **Graph-based / knowledge-based representations**: ConceptNet, WordNet, AMR.
- **Dependency-based embeddings**: distribution over syntactic contexts, capturing function as well as content.

---

## 8. Discourse and Textual Level

Describes word behavior across multiple sentences or turns.

- **Discourse Representation Theory (DRT)**: models discourse referents, conditions, accessibility.
- **Coreference representation**: linking NPs/pronouns to antecedents.  
  - Example: “John came. He sat.” → `he = John`.
- **Rhetorical / discourse relations** (RST): contrast, cause, elaboration, background, explanation.
- **Information state / context update**: how each utterance updates the common ground.

---

## Summary Table

| **Level**        | **Representation Type**                 | **Example / Focus**                                  |
|------------------|-----------------------------------------|------------------------------------------------------|
| Phonological     | Phonetic / Phonemic / Prosodic          | `/kæt/`                                              |
| Morphological    | Morphemic / Inflectional / Derivational | `un- + break + -able`                                |
| Syntactic        | Phrase structure / Dependency           | `NP → Det + N`, dependency: `eat → obj → apple`      |
| Semantic         | Predicate logic / Vector / Frame        | `eat(x, y)`, Word2Vec("eat"), frame: COMMERCIAL_EVENT |
| Pragmatic        | Speech act / Implicature / Deixis       | “Can you pass the salt?” → request                   |
| Cognitive        | Conceptual / Frame / Network            | `BUY` frame: `{buyer, seller, goods, price}`         |
| Computational    | Embeddings / Graphs / TF-IDF            | BERT contextual vector                               |
| Discourse        | DRT / Coreference / Rhetorical          | “John came. He sat.” → `he = John`                   |

---

### Note on Integration

A single lexical item (a “word”) in a full-fledged NLP or linguistic model is therefore not just a string; it is potentially mapped **simultaneously** to:

1. a **phonological form** (for speech),
2. a **morphological analysis** (for grammar),
3. a **syntactic role** (for sentence integration),
4. a **semantic representation** (for meaning),
5. a **pragmatic interpretation** (for use in context),
6. a **conceptual node** (for cognition),
7. and a **computational vector** (for machine learning),
8. embedded in a **discourse model** (for cross-sentence coherence).

This layered view is what makes modern lexical/semantic modeling compatible with both **linguistic theory** and **data-driven NLP**.


Core Linguistic Representations Crucial for LLMs

LLMs do not use all linguistic representations equally. They mainly emerge (implicitly) from data-driven training and align with five key levels.

1. Lexical & Morphological Representation  
What it is:  
Understanding and generating correct word forms, subwords, and morphemes.  

In LLMs:  
Implemented through tokenization methods such as  
- Byte-Pair Encoding (BPE)  
- WordPiece  
- SentencePiece  

Tokens correspond roughly to morphemes or subword units (un-, break, able → tokens).  
Models implicitly learn morphological regularities (e.g., plural forms, tense patterns).  

Why it matters:  
Efficient tokenization allows LLMs to handle any language, rare words, and creative word formation — essential for open-vocabulary generation.

---

2. Syntactic Representation  
What it is:  
Capturing grammatical structure, word order, and hierarchical dependencies.  

In LLMs:  
Emergent in attention patterns and transformer layers:  
- Early layers capture POS and local dependencies.  
- Middle layers encode syntactic trees and long-range relations.  
- Late layers shift toward semantic coherence.  
There are no explicit grammar rules — but syntax-like regularities emerge.  

Empirical findings:  
- Attention heads often align with dependency arcs.  
- Probing studies show internal vectors predict constituency structures with high accuracy.  

Why it matters:  
Syntax gives structure for compositional meaning, sentence coherence, and grammatical fluency.

---

3. Semantic Representation  
What it is:  
Capturing meaning, relations, and world knowledge.  

In LLMs:  
Encoded in dense contextual embeddings (hidden states).  
Supports polysemy disambiguation: same word → different meaning by context.  
Distributed over billions of parameters as statistical meaning graphs.  

Analogous to:  
- Distributional semantics (co-occurrence meaning)  
- Frame semantics (event knowledge)  
- Predicate–argument structures (latent in embeddings)  

Why it matters:  
Semantic representation is the core of LLM intelligence — enabling inference, analogy, and abstraction.

---

4. Pragmatic & Discourse Representation  
What it is:  
Understanding intent, context, reference, and coherence across sentences.  

In LLMs:  
Modeled through long-context windows and autoregressive training.  

Emergent phenomena include:  
- Coreference resolution (who/what “it” refers to)  
- Implicit reasoning about speaker intent  
- Maintaining topic continuity  

Although not perfect, LLMs approximate Discourse Representation Theory (DRT) without explicit symbolic structures.  

Why it matters:  
This is how LLMs maintain conversation flow, tone, and logical narrative consistency.

---

5. World Knowledge Representation  
What it is:  
Integration of semantic meaning with factual knowledge and commonsense.  

In LLMs:  
Stored implicitly in parameterized knowledge.  
Encodes relations similar to semantic networks (WordNet, ConceptNet) but at much larger scale.  
Enhanced by instruction tuning and reinforcement learning from human feedback (RLHF), linking linguistic form to human intent.  

Why it matters:  
World knowledge enables reasoning, answering factual questions, and contextual understanding — bridging semantics and pragmatics.

---

Summary: Linguistic Levels vs. LLM Encoding

| Linguistic Level | LLM Relevance | Mechanism in Model | Example Behavior |
|------------------|----------------|--------------------|------------------|
| Phonological | Minimal | Tokenization abstracts text from sound | Not used directly |
| Morphological | High | Subword tokenization | understands “running”, “runner” |
| Syntactic | High | Attention patterns, layer hierarchy | subject–verb agreement |
| Semantic | Very High | Contextual embeddings | word meaning in context |
| Pragmatic / Discourse | High | Long-context modeling | maintains conversation coherence |
| World Knowledge | Very High | Parameterized memory | answers factual questions |
| Cognitive / Conceptual | Emergent | Vector clusters = mental concepts | analogy, metaphor |
| Formal / Logical Semantics | Partial | Learned approximations | reasoning, entailment |
| Phonetic / Prosodic | Not used | Only text input | No sound representation |

---

In Short

LLMs simulate human linguistic competence mainly through four emergent representational layers:

Lexico-Morphological → Syntactic → Semantic → Pragmatic/World Knowledge  

These layers correspond to the human linguistic hierarchy —  
but they emerge statistically, not symbolically.


First: Who focuses on the meaning of the word by itself?

The aspect concerned with the individual word is **Lexical Semantics**.  
It deals with what the word means on its own, without considering the context.

For example, the Arabic word “ʿAyn” (عين) can mean:

- The organ of sight  
- A water spring  
- A spy  
- Pure gold (in Classical Arabic)

All these are independent meanings within the lexicon — studied by **Lexical Semantics**.

At this level, the key question is:  
“What does the word mean by itself in the dictionary?”

---

Second: Who gives the word its meaning within context?

The aspect responsible for this is **Compositional (Contextual) Semantics**, sometimes supported by **Pragmatics**.

This level explains the meaning within a sentence —  
that is, how a word’s meaning changes depending on surrounding words.

Examples:

- “The eye is beautiful” → the organ of sight  
- “The spring (ʿAyn) burst” → a water source  
- “I saw an intelligence ʿAyn” → a spy  

Here, **syntactic** and **semantic** context together determine meaning:

- Syntax defines grammatical relations between words.  
- Semantics interprets the meaning that results from those relations.

---

Third: Where does Syntax fit in?

Syntax does not deal with meaning itself;  
it focuses on the arrangement of words and their grammatical relations (subject, object, adjective, etc.).

Example:

- “The boy ate the apple”  
- “Ate the boy the apple”  
- “The apple ate the boy”

Each sentence has a different **syntactic structure**, and its meaning changes because syntax alters relationships.  
However, syntax alone does not understand that apples cannot eat boys — that is the domain of **Semantics**.

---

Summary

| **Level** | **Concern** | **Example / Role** |
|------------|--------------|--------------------|
| **Lexical Semantics** | The meaning of the word itself | “ʿAyn = eye / spring / spy” |
| **Syntactic** | The position of the word and its relation to others in the sentence | “The eye saw the boy” vs “The boy saw the eye” |
| **Compositional Semantics** | How meanings interact within the sentence | “The boy ate the apple” → verb + subject + object meaning |
| **Pragmatics** | The intended meaning in real-world context or situation | “Can you close the door?” = a request, not a question |
| **Discourse Semantics** | Meaning extending across multiple sentences | “I saw a man. He was tall.” → “He” refers to the man |

---

Simple Summary

- If you ask, “What does the word itself mean?” → **Lexical Semantics**  
- If you ask, “How does its meaning change inside the sentence?” → **Compositional Semantics**  
- If you ask, “What does the speaker actually mean?” → **Pragmatics**  
- If you ask, “How are sentences connected within a paragraph?” → **Discourse Semantics**
