# Early Models Related to Markov/Neural Foundations (1900–1990)

## 1. Ising Model (1925)
- **Domain**: Statistical physics  
- **Description**: Models binary spins (+1 / –1) with local interactions.  
- **Relevance**: Inspired later probabilistic models in AI (Boltzmann Machines, Hopfield Nets).  
- **Connection to Markov**: Markov Random Fields can be seen as generalizations of the Ising model.  

---

## 2. McCulloch–Pitts Neuron (1943)
- **Domain**: Neuroscience-inspired computing  
- **Description**: Simplified model of a biological neuron, using binary threshold logic.  
- **Relevance**: First formal model of neural networks.  
- **Connection**: Like Markov chains, it formalized computation with discrete states.  

---

## 3. Hebbian Learning Rule (1949)
- **Concept**: “Cells that fire together, wire together.”  
- **Description**: Strengthens connections between co-activated neurons.  
- **Relevance**: Early rule for updating weights in networks.  
- **Connection**: Provided a probabilistic, local learning mechanism (like stochastic updates in Markov processes).  

---

## 4. Perceptron (Rosenblatt, 1958)
- **Description**: Linear classifier using weighted sums of inputs + threshold.  
- **Relevance**: First trainable machine-learning model inspired by biology.  
- **Limitations**: Could not solve XOR (proved by Minsky & Papert, 1969).  
- **Connection**: Simple state-to-output mapping, like transitions in a Markov model.  

---

## 5. Adaline & Delta Rule (Widrow & Hoff, 1960)
- **Description**: Linear unit trained with gradient descent (LMS rule).  
- **Relevance**: Introduced optimization for weight updates.  
- **Connection**: Early continuous-valued extension of discrete Markov-like state models.  

---

## 6. Hopfield Network (1982)
- **Description**: Recurrent neural network with symmetric weights, converging to stable attractor states (energy minimization).  
- **Relevance**: Linked neural computation to statistical physics.  
- **Connection**: Strongly connected to Ising models and Markov Random Fields.  

---

## 7. Boltzmann Machine (Hinton & Sejnowski, 1985)
- **Description**: Stochastic recurrent network; hidden and visible nodes with probabilistic activations.  
- **Relevance**: One of the first deep generative models.  
- **Connection**: Uses Markov Chain Monte Carlo (MCMC) sampling → explicit link between Markov processes and neural networks.  

---

## 8. Backpropagation (Rumelhart, Hinton, Williams, 1986)
- **Description**: Algorithm for training multi-layer networks via gradient descent and chain rule.  
- **Relevance**: Opened the path for deep learning.  
- **Connection**: Generalized learning beyond local updates like Hebbian rules — but still probabilistic at its core.  

---

## 9. Time-Dependent Models (1986–1989)
- **Jordan Networks (1986)** and **Elman Networks (1989)**  
- **Description**: Introduced recurrent feedback to model sequences.  
- **Relevance**: Early inspiration for later RNNs and sequence models (GRUs, LSTMs, Transformers).  
- **Connection**: Like Hidden Markov Models, these modeled sequential/temporal dependencies.  

---

##  Summary
Between 1900 and 1990, the “Markov-like” predecessors in neural and deep learning were:

- **Ising Model (1925)** → Probabilistic physics model, inspired energy-based networks.  
- **McCulloch–Pitts Neuron (1943)** → First abstract neuron.  
- **Hebbian Learning (1949)** → First biologically motivated weight update.  
- **Perceptron (1958)** → First practical neural network.  
- **Adaline (1960)** → Gradient-based training.  
- **Hopfield Networks (1982)** → Energy minimization, attractor dynamics.  
- **Boltzmann Machines (1985)** → Stochastic generative model with MCMC.  
- **Backpropagation (1986)** → Training multi-layer nets.  
- **Elman/Jordan Networks (1986–1989)** → Recurrent sequence modeling.  


# Foundational Papers of Neural Network & Sequence Models (1925–2017)

## Ising Model (1925)
- Paper: *Beitrag zur Theorie des Ferromagnetismus*  
- Author: Ernst Ising  
- Venue: Zeitschrift für Physik, 1925  

## Markov Model (1906)
- Paper: *Extension of the Law of Large Numbers to Dependent Quantities*  
- Author: Andrey Markov  
- Venue: Proceedings of the Imperial Academy of Sciences of St. Petersburg, 1906  

## Hidden Markov Model (1960s–1970s)
- Paper: *Probabilistic Functions of a Markov Chain and Hidden Markov Models*  
- Author: Leonard E. Baum and colleagues (Baum, Petrie, Soules, Weiss)  
- Venue: Annals of Mathematical Statistics, 1966–1970  

## McCulloch–Pitts Neuron (1943)
- Paper: *A Logical Calculus of the Ideas Immanent in Nervous Activity*  
- Authors: Warren McCulloch, Walter Pitts  
- Venue: Bulletin of Mathematical Biophysics, 1943  

## Hebbian Learning Rule (1949)
- Book: *The Organization of Behavior: A Neuropsychological Theory*  
- Author: Donald O. Hebb  
- Publisher: Wiley, 1949  

## Perceptron (1958)
- Paper: *The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain*  
- Author: Frank Rosenblatt  
- Venue: Psychological Review, 1958  

## Adaline & Delta Rule (1960)
- Paper: *Adaptive Switching Circuits*  
- Authors: Bernard Widrow, Marcian Hoff  
- Venue: IRE WESCON Convention Record, 1960  

## Hopfield Network (1982)
- Paper: *Neural networks and physical systems with emergent collective computational abilities*  
- Author: John J. Hopfield  
- Venue: Proceedings of the National Academy of Sciences (PNAS), 1982  

## Boltzmann Machine (1985)
- Paper: *A Learning Algorithm for Boltzmann Machines*  
- Authors: Geoffrey E. Hinton, Terrence J. Sejnowski  
- Venue: Cognitive Science, 1985  

## Backpropagation (1986)
- Paper: *Learning representations by back-propagating errors*  
- Authors: David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams  
- Venue: Nature, 1986  

## Jordan Network (1986)
- Paper: *Attractor Dynamics and Parallelism in a Connectionist Sequential Machine*  
- Author: Michael I. Jordan  
- Venue: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986  

## Elman Network (1989)
- Paper: *Finding Structure in Time*  
- Author: Jeffrey L. Elman  
- Venue: Cognitive Science, 1989  

## Vanilla RNN (1990s Formalization)
- Paper: *Learning long-term dependencies with gradient descent is difficult*  
- Authors: Yoshua Bengio, Patrice Simard, Paolo Frasconi  
- Venue: IEEE Transactions on Neural Networks, 1994  

While Elman and Jordan proposed the first practical RNNs, Bengio’s paper formalized the **vanishing gradient problem**.  

## LSTM (1997)
- Paper: *Long Short-Term Memory*  
- Authors: Sepp Hochreiter, Jürgen Schmidhuber  
- Venue: Neural Computation, 1997  

## GRU (2014)
- Paper: *Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation*  
- Authors: Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, Yoshua Bengio  
- Venue: EMNLP, 2014  

## Transformer (2017)
- Paper: *Attention Is All You Need*  
- Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin  
- Venue: NeurIPS, 2017  


# Foundational Papers in Neural & Markov Models (1900–1990)

| Year        | Author(s)                         | Title                                                                 | Venue / Publisher |
|-------------|-----------------------------------|----------------------------------------------------------------------|-------------------|
| 1925        | Ernst Ising                       | *Beitrag zur Theorie des Ferromagnetismus*                           | Zeitschrift für Physik |
| 1906 / 1913 | Andrey Markov                     | *Extension of the Law of Large Numbers to Dependent Quantities* / *An Example of Statistical Investigation of Eugene Onegin* | Russian Academy Proceedings |
| 1943        | Warren McCulloch & Walter Pitts   | *A Logical Calculus of the Ideas Immanent in Nervous Activity*        | Bulletin of Mathematical Biophysics |
| 1949        | Donald O. Hebb                    | *The Organization of Behavior: A Neuropsychological Theory*           | Wiley (Book) |
| 1958        | Frank Rosenblatt                  | *The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain* | Psychological Review |
| 1960        | Bernard Widrow & Marcian Hoff     | *Adaptive Switching Circuits (ADALINE & Delta Rule)*                 | IRE WESCON Convention Record |
| 1966        | Leonard Baum & Ted Petrie         | *Statistical Inference for Probabilistic Functions of Finite State Markov Chains* | Annals of Mathematical Statistics |
| 1970        | Baum, Petrie, Soules & Weiss      | *A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains (Baum–Welch)* | Annals of Mathematical Statistics |
| 1982        | John J. Hopfield                  | *Neural Networks and Physical Systems with Emergent Collective Computational Abilities* | PNAS |
| 1985        | Ackley, Hinton & Sejnowski        | *A Learning Algorithm for Boltzmann Machines*                        | Cognitive Science |
| 1986        | Rumelhart, Hinton & Williams      | *Learning Representations by Back-Propagating Errors*                 | Nature |
| 1986        | Michael I. Jordan                 | *Serial Order: A Parallel Distributed Processing Approach*            | UCSD Cognitive Science Report 8604 |
| 1986        | Paul Smolensky                     | *Information Processing in Dynamical Systems: Foundations of Harmony Theory (Harmonium)* | MIT Press (*Parallel Distributed Processing*) |
| 1989        | Jeffrey L. Elman                  | *Finding Structure in Time*                                          | Cognitive Science |

---

 This table compiles the **landmark works** we covered:  
- **Statistical Physics/Markov Foundations:** Ising (1925), Markov (1906/1913).  
- **Neural Foundations:** McCulloch–Pitts (1943), Hebb (1949).  
- **Early Neural Models:** Perceptron (1958), ADALINE (1960).  
- **Markov Models:** Baum & Petrie (1966), Baum–Welch (1970).  
- **Energy-Based Models:** Hopfield (1982), Boltzmann Machines (1985), Smolensky’s Harmonium (1986).  
- **Learning Algorithms:** Backpropagation (1986).  
- **Recurrent Networks:** Jordan Net (1986), Elman Net (1989).  

This corpus shows the **evolution from statistical physics and Markov models to early neural and recurrent networks**, setting the stage for LSTMs (1997) and modern deep learning.
