Skip to content

HectorPulido/Pong-Policy-gradients

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pong-Policy-gradients

Policy gradients is a reinforcement learning tecnique that trains a predictive model using the experiences of the agent.

This project is still under development and is highly experimental

gif about how it works

TO DO

  • Generalization of the concepts
  • Cleaner API
  • Refactor

WHY (MOTIVATION)

Banner
This project was made by Hector Pulido for his youtube channel
https://www.youtube.com/c/HectorAndresPulidoPalmar
And his Twitch Channel

How it works

This is the Policy function, it works like the brain, connect the state of the world with some weights. Then convert that enviorimental space to a probability space of actions

Matrix Policy(Matrix state)
{
    var z = state * w;
    var exp = Matrix.Exp(z);
    exp = exp / Matrix.Sumatory(exp)[0,0];
    return exp;
}

The algoritm learn just like this:

This is the gradient function of the policy, it tell us in which direction the policy must move to improve the performance

Matrix GradientSoftmax(Matrix softmax)
{
    Matrix shape = softmax.T;
    Matrix diagoFlat = Matrix.DiagFlat(shape);
    diagoFlat = diagoFlat - shape * shape.T;
    return diagoFlat;
}

that function is used like this

var dsoftmax = GradientSoftmax(action);
dsoftmax = dsoftmax.Slice(index,0,index+1,dsoftmax.X);
dsoftmax = dsoftmax / action[0, index];
var grad = envi.T * dsoftmax;

then the weights get updated

void Train(double reward)
{
    for(int i = 0 ; i < gradHistory.Count; i++)
    {
        var update = gradHistory[i] * (learningRate * 
            decayRatio(i, reward));

        w += update;
    }
    gradHistory = new List<Matrix>();
}

SIMILAR WORKS

IMITATION LEARNING IN UNITY

This is an open source project that uses neural networks and backpropagation in C#, and train it via stochastic gradient descend using the human behaviour as base
https://github.com/HectorPulido/Imitation-learning-in-unity

Evolutionary Neural Networks on Unity For bots

This is a asset that train a neural networks using genetic algorithm in unity to make a bot that can play a game or just interact with the envoriment
https://github.com/HectorPulido/Evolutionary-Neural-Networks-on-unity-for-bots

More Genetic algorithms on Unity

Those are three Genetics Algorithm using unity, The First one is a simple algorithm that Looks for the minimun of a function, The Second one is a solution for the Travelling Salesman Problem, The Third one is a Automata machine
https://github.com/HectorPulido/Three-Genetics-Algorithm-Using-Unity

Vectorized Multilayer Neural Network from scratch

This is a simple MultiLayer perceptron made with Simple Linear Algebra for C# , is a neural network based on This Algorithm but generalized. This neural network can calcule logic doors like Xor Xnor And Or via Stochastic gradient descent backpropagation with Sigmoid as Activation function, but can be used to more complex problems.
https://github.com/HectorPulido/Vectorized-multilayer-neural-network

Where can i learn more

LICENCE

This project contains a copy of:

Everything else is MIT licensed