Merge pull request #325 from alexhernandezgarcia/ahg/readme

[README] Suggestion of changes to the section Main Components of the GFlowNet Library
alexhernandezgarcia · Jun 18, 2024 · 45dfd2e · 45dfd2e
2 parents 3700b2e + 4d43b18
commit 45dfd2e
Showing 1 changed file with 10 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -20,29 +20,29 @@ We could define a reward function $R(x)$ as the number of cells occupied by piec
 
 ## Main Components of the GFlowNet Library
 
-The GFlowNet library comprises four core components, each playing a crucial role in the network's operation. Understanding these components is essential for effectively using and extending the library for your tasks. These components are the Environment, Proxy, Policies (Forward and Backward), and the GFlowNet Agent.
+The GFlowNet library comprises four core components: environment, proxy, policy models (forward and backward), and GFlowNet agent.
 
 ### Environment
 
-The Environment is the main and most important component of the GFlowNet Library. To illustrate this, consider a simple environment currently implemented in the library: the Scrabble environment. 
+The environment defines the state space $\mathcal{S}$ and action space $\mathbb{A} of a particular problem, for example the Tetris task. To illustrate the environment, let's consider an even simpler environment currently implemented in the library: the [Scrabble](https://en.wikipedia.org/wiki/Scrabble) environment, inspired by the popular board game. 
 
-The Scrabble environment simulates a simple letter arrangement game where sequences are constructed by adding one letter at a time, up to a maximum sequence length (in our case 7). Each environment has State Representations and Actions. For instance, in the Scrabble enviroment, Each `State` is a list of indices corresponding to letters. These indices start from 1 and are padded with index 0 to denote unused slots up to the maximum length. For example, if our sequence length is 7, and our constructed word is `Alex`, it would be represented as `[1, 11, 4, 23, 0, 0, 0]`. The library includes helper functions that automatically format and convert states to and from a human-readable format. 
+The Scrabble environment simulates a simple letter arrangement game where words are constructed by adding one letter at a time, up to a maximum sequence length (typically 7). Therefore, the action space is the set of all English letters plus a special end-of-sequence (EOS) action; and the state space is the set of all possible words with up to 7 letters. We can represent each `state` as a list of indices corresponding to the letters, padded with zeroes to the maximum length. For example, the state for the word "CAT" would be represented as `[3, 1, 20, 0, 0, 0, 0]`. Actions in the Scrabble environment are single-element tuples containing the index of the letter, plus the end-of-sequence (EOS) action `(-1,)`.
 
-``Actions`` in the Scrabble environment are single-element tuples containing the index of the letter to be added to the sequence. For instance, the end of the sequence (EOS) action is denoted by (-1,). The tuple format allows us to represent more than single action, because certain enviroments could have multiple actions. 
-
-In the library, we make it easy adding new enviroments for your own task. In the documentation, we show how to do this seamlessly. You can also watch a live coding tutorial on how to add your custom enviroment [here](https://www.youtube.com/watch?v=tMVJnzFqa6w&t=5h22m35s)
+Using the gflownet library for a new task will typically require implementing your own environment. The library is particularly designed to make such extensions as easy as possible. In the documentation, we show how to do it step by step. You can also watch [this live-coding tutorial](https://www.youtube.com/watch?v=tMVJnzFqa6w&t=5h22m35s) on how to code the Scrabble environment.
 
 ### Proxy
 
-The Proxy plays a crucial role in computing rewards for the actions taken within an environment. In other words, In the context of GFlowNets, the proxy can be thought of as a transformation function `R(x) = g(e(x))`, where `e(x)` represents an encoding or transformation or computes the score of the generated output `x`, and `g` translates this into a reward (i.e. `R(x)`). For example, if the word `Alex` is sampled in our Scrabble environment and is valid in our vocabulary, it might receive a score of 39. If `g` is the identity function, then our reward would directly be equal to the proxy score (i.e. `e(x)`). While in many environments the proxy functions is a simple scorer, in more complex settings (like molecule generation where it could be an energy function), we consistently refer to it as the Proxy in the GFlowNet library.
+We use the term "[proxy](gflownet/proxy/base.py)" to refer to the function or model that provides the rewards for the states of an environment. In other words, In the context of GFlowNets, the proxy can be thought of as a function $E(x)$ from which the reward is derived: $R(x) = g(E(x))$, where $g$ is a function that transforms the proxy values into non-zero rewards, that is "the higher the reward the better". For example, we can implement a proxy that simulates the scores of a word in the Scrabble game. That is, the [`ScrabbleScorer`](gflownet/proxy/scrabble.py) proxy computes the sum of the score of each letter of a word. For the word "CAT" that is $E(x) = 3 + 1 + 1 = 5$. While in many environments the proxy functions is a simple scorer, more complex settings like molecule or [crystal generation](gflownet/proxy/crystals/dave.py) may be use proxies that represent the energy or a property predicted by a pre-trained machine learning model.
+
+Adapting the gflownet library for a new task will also likely require implementing your own proxy, which is usually fairly simple, as illustrated in the documentation.
 
-### Policies (Forward and Backward)
+### Policy models
 
-The policies are neural networks that model the probability distributions of possible actions given a current state. They are key to deciding the next state given previous state in the network's exploration of the environment. Both forward and backward policies receive the current state as input and output a flow distribution over possible actions. We use the term "flow" here, because the idea of GFlowNet is to flow a sequence of intermediate steps before generating the final object `x` (e.g. to generate `x` we might take the steps `s_1 -> s_2 -> s_3 -> ... -> x`). Particularly, the forward policy determines the next state, while the backward policy determines the previous state (i.e. helps retrace steps to a previous state).
+The policy models are neural networks that model the forward and backward transitions between states, $F_{F_{\theta}}(s_t \rightarrow s_{t+1})$ (forward) and $F_{B_{\theta}}(s_{t+1} \rightarrow s_t)$ (backward). These models take a state as input and output a distribution over the actions in the action space. For continuous environments, the outputs are the parameters of a probability distribution to sample continuous-valued actions. For many tasks, simple multi-layer perceptrons with a few layers do the job, but technically any architecture could be used as policy model. 
 
 ### GFlowNet Agent
 
-The GFlowNet Agent is the central component that ties all others together. It orchestrates the interaction between the environment, policies, and proxy to conduct training and generation tasks. The agent manages the training setup, action sampling, trajectory generation, and metrics logging. Some of the features and functionalities of the agent are initializing and configuring the environment and proxy to ensure they are ready for training and evaluation. The agent also manages both forward and backward policies to determine the next actions based on the current state. The agent can utilize the various types of loss functions implemented in the library, such as flow matching, trajectory balance, and detailed balance to optimize model's performance during training. 
+The GFlowNet Agent is the central component that ties all others together. It orchestrates the interaction between the environment, policies, and proxy, as well as other auxiliary components such as the Evaluator and the Logger. The GFlowNet can construct training batches by sampling trajectories, optimise the policy models via gradient descent, compute evaluation metrics, log data to [Weights & Biases](https://wandb.ai/), etc. The agent can be configured to optimise any of the following loss functions implemented in the library: flow matching (FM), trajectory balance (TB), and detailed balance (TB) and forward-looking (FL). 
 
 #### Exploring the Scrabble Environment