# Getting started: `gfn` from scratch

The goal of this notebook is two-fold:

* to take you through both the intuitions of why and how to use GFlowNets
* to teach you the basic structures and components of `gfn` to train your own GFlowNets

This is neither a theoretical course on GFlowNets (see resources at the end) nor a guide on how to invent your own `gfn` components (we have [other tutorials for that](link)).

If you're already familiar with GFlowNets, you can skip to part 3 where we look at the code.

Let's get started!

## 1. GFlowNets in a few words

GFlowNets are a family of methods to construct *generative models*. In other words, we want to be able to obrain / create / sample new objects from a given distribution.

The key notion with GFlowNets is that we will *sequentially* build these objects, and we will train neural networks (you'll soon undetsand where they are needed) such that when we're done, the probability to obtain a sample is *proportional* to its "quality".

This *quality* of a sample must be measurable by what we will call a *reward function*. If we're creating Lego spaceships block by block, then we need to be able to assess how "good" a resulting spaceship is. Let's say that we can, *i.e.* that we have a function that takes a Lego construction and returns an unnormalized score for it.

Now what the GFlowNet procedure does is:
* Start from an initial state (empty construction)
* Give that state to a Neural Network that will output a probability distribution over potential next blocks to add to the construction (and its location)
  * There's also a special action the network can take that means "Stop, I'm done"
* When that stop action is sampled, then we give that sample to the reward function and obtain a score, to tell the neural network how good a job it did
* Then we update the neural network and start over to the initial state

Forget about losses etc. for now. The GFlowNet jargon sounds a lot like Reinforcement Learning and to some extent, it *is*. The key differnence though, is that the GFlowNet is not trained to *maximize* is return (=discounted sum of rewards), it is trained to sample *proportionally to it*.

What this means is that Lego spaceships with quality / reward `r` are twice as likely to be generated by the GFlowNet as a spaceship with reward `r / 2`! I'll insist because this is one of the **most important** features of GFlowNets: they will sample proportionally to the reward, and that ensures *diversity*.

In other words, if you have a multimodal distribution, a converged GFlowNet is guaranteed to sample *all modes*, proportionally to their value / reward. And while they are not the only method with that capability, they are the only ones with Neural Networks at their core to "intelligently" (in a data-driven way and across samples & trajectories) explore the space it is sampling from.



## Key concepts

As explained, the **reward function** is the function that takes in a sample and outputs its quality. That reward is expected to be non-negative and the higher the better.

A **state** is a representation of the object we seek to construct. It does not have to be a complete object: there is the *initial* state we start from, *intermediate* states (which could have a value), and *terminal* states when we say: this is an actual, finished sample from the distribution.

The possible steps that update a state in order to construct an object to sample are called **actions**.

The sequence of state and actions taken from the initial state to the final state are called **trajectories**.

The probability we assign each action that can be taken at one given state is called a **forward policy**.