# Chronological Development of Word Embeddings

---

## 1️⃣ Linguistic & Theoretical Origins (1950s–1970s)

- **1953 – Luhn, H. P.**  
  *A New Method of Recording and Searching Information* — established computational text search foundations.

- **1957 – Firth, J. R.**  
  *A Synopsis of Linguistic Theory 1930–1955.*  
  → Introduced the **distributional hypothesis:**  
  > “You shall know a word by the company it keeps.”

- **1957 – Osgood, C. E. et al.**  
  *The Measurement of Meaning.*  
  → Proposed **semantic differential scales**, precursors to semantic spaces.

- **1962 – Salton, G.**  
  *Some Experiments in the Generation of Word and Document Associations.*  
  → Developed the **Vector Space Model (VSM)** for information retrieval.

- **1975 – Salton, Wong & Yang.**  
  *A Vector Space Model for Automatic Indexing.*  
  → Unified term and document vectors into measurable semantic similarity.

---

## 2️⃣ Statistical Semantics & Latent Representation (1980s–1990s)

- **Late 1980s – Deerwester et al.**  
  Developed **Latent Semantic Analysis (LSA)** with **SVD** for reduced-dimensional semantic spaces.

- **2000 – Kanerva, Kristoferson & Holst.**  
  *Random Indexing of Text Samples for LSA.*  
  → Scalable alternative to SVD.

- **2001 – Karlgren & Sahlgren.**  
  *From Words to Understanding.*  
  → Refined distributional semantics.

- **2005 – Sahlgren.**  
  *An Introduction to Random Indexing.*  
  → Formalized the **Random Indexing** technique.

- **2008 – Sahlgren et al.**  
  *Permutations as a Means to Encode Order in Word Space.*  
  → Introduced order sensitivity into semantic spaces.

---

## 3️⃣ Neural Revolution: Distributed Representations (2000–2006)

- **2000–2003 – Bengio et al.**  
  *A Neural Probabilistic Language Model.*  
  → First **neural network–based language model** producing **dense word vectors**.

- **2002 – Vinkourov et al.**  
  *Cross-Language Correlation Analysis.*  
  → Introduced **bilingual embeddings**.

- **2004 – Lavelli et al.**  
  *Distributional Term Representations.*  
  → Compared early embedding architectures.

- **2005 – Morin & Bengio.**  
  *Hierarchical Probabilistic Neural Network Language Model.*  
  → Introduced **hierarchical softmax** for efficient large-vocabulary learning.

- **2006 – Bengio et al.**  
  Consolidated **distributed representation** as a core NLP foundation.

---

## 4️⃣ Deep & Scalable Neural Embeddings (2008–2013)

- **2008 – Collobert & Weston.**  
  *A Unified Architecture for NLP.*  
  → Showed **deep multitask learning** produces reusable embeddings.

- **2009 – Mnih & Hinton.**  
  *A Scalable Hierarchical Distributed Language Model.*

- **2010 – Reisinger & Mooney.**  
  *Multi-Prototype Vector-Space Models of Word Meaning.*  
  → Introduced **multi-sense embeddings**.

- **2012 – Huang, Socher & Ng.**  
  Combined **global context** and **word prototypes**.

- **2013 – Mikolov et al.**  
  *Distributed Representations of Words and Phrases and Their Compositionality.*  
  → **Word2Vec** (CBOW & Skip-gram) — scalable, efficient, industry-changing.

- **2013 – Lebret & Collobert.**  
  *Word Embeddings through Hellinger PCA.*

- **2014 – Levy & Goldberg.**  
  *Neural Word Embedding as Implicit Matrix Factorization.*  
  → Linked Word2Vec with **matrix factorization** theory.

---

## 5️⃣ Multi-Sense & Contextual Representation (2014–2019)

- **2014 – Neelakantan et al.**  
  *Efficient Non-Parametric Estimation of Multiple Embeddings per Word.*

- **2015 – Li & Jurafsky.**  
  *Do Multi-Sense Embeddings Improve NLU?* — empirical evaluation.

- **2015 – Asgari & Mofrad.**  
  *Continuous Distributed Representation of Biological Sequences.*  
  → **BioVec / ProtVec / GeneVec**.

- **2015 – Kiros et al.**  
  *Skip-Thought Vectors.*  
  → Sentence-level embeddings.

- **2016 – Bolukbasi et al.**  
  *Man is to Computer Programmer as Woman is to Homemaker?*  
  → Uncovered **gender bias** in embeddings.

- **2018 – Akbik et al.**  
  *Contextual String Embeddings for Sequence Labeling (Flair).*  

- **2018 – Camacho-Collados & Pilehvar.**  
  *From Word to Sense Embeddings: A Survey.*

- **2018 – Ruas et al.**  
  *Multi-Sense Embeddings via Word Sense Disambiguation.*

- **2019 – Devlin et al.**  
  *BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.*  
  → Contextual embeddings revolution — **meaning depends on context**.

- **2019 – Reimers & Gurevych.**  
  *Sentence-BERT.*  
  → Siamese/triplet network adaptation of BERT.

- **2019 – Reif et al.**  
  *Visualizing and Measuring the Geometry of BERT.*

---

## 6️⃣ Specialized & Applied Embeddings (2015–2021)

- **2015 – Ghassemi et al.**  
  *Clinical Sentiment Vector Representations.*  
  → Domain adaptation to medicine.

- **2021 – Rabii & Cook.**  
  *Word Embeddings of Gameplay Data.*  
  → Applied embeddings to **game dynamics**.

- **2021 – Lucy & Bamman.**  
  *Characterizing English Variation Across Social Media with BERT.*  
  → Sociolinguistic contextualization.

---

## 7️⃣ Ethical, Fairness & Bias Mitigation (2016–2022)

- **2016 – Bolukbasi et al.**  
  Revealed embedding bias — foundation for **fairness-aware NLP**.

- **2017 – Zhao et al.**  
  *Reducing Gender Bias Amplification.*  

- **2018 – Zhao et al.**  
  *Learning Gender-Neutral Word Embeddings.*

- **2020 – Dieng, Ruiz & Blei.**  
  *Topic Modeling in Embedding Spaces.*

- **2022 – Petreski & Hashim.**  
  *Word Embeddings Are Biased. But Whose Bias Are They Reflecting?*  

---

## 8️⃣ Software Ecosystem

- **Word2Vec** – Google (Mikolov et al., 2013)  
- **GloVe** – Stanford (Pennington et al., 2014)  
- **fastText** – Facebook AI Research  
- **ELMo** – AllenNLP (Peters et al., 2018)  
- **Flair** – Akbik et al. (2018)  
- **BERT / Sentence-BERT** – Google & UKP Lab (2019)  
- Frameworks: **Gensim**, **Indra**, **Deeplearning4j**

Visualization methods: **PCA**, **t-SNE**, **Embedding Projector (2018)**.

---

## Summary Timeline

| Period | Core Idea | Representative Works |
|--------|------------|----------------------|
| 1950s–1970s | Distributional hypothesis, VSM | Firth (1957), Salton (1975) |
| 1980s–1990s | Latent semantics | Deerwester (1988), Sahlgren (2005) |
| 2000–2006 | Neural probabilistic LM | Bengio (2003), Morin & Bengio (2005) |
| 2008–2013 | Deep scalable embeddings | Collobert (2008), Mikolov (2013) |
| 2014–2019 | Contextual & multi-sense | Devlin (2019), Reimers (2019) |
| 2015–2021 | Applied/domain embeddings | Asgari (2015), Lucy (2021) |
| 2016–2022 | Ethical & fairness research | Bolukbasi (2016), Petreski (2022) |

---

