Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of autograd #30

Merged
merged 20 commits into from
Jul 6, 2017
Merged

Conversation

pavanky
Copy link
Member

@pavanky pavanky commented Jul 2, 2017

What is done so far:

  • A proof of concept implementation for automatic differentiation using autograd::Variable, autograd::backward.
  • This currently implements only a few basic operations (only +, * for now).
  • Ability to perform first order derivatives. Higher order derivatives to come later.

Variable

  • This can be constructed in two ways
    • Using an af::array from the user
    • An operator returning a Variable. The operator constructs the Variable using a set of input Variables, the output array and a grad function.
  • When var.backward(grad_var) is invoked, it builds a DAG as vector starting with the current variable and propagates gradients down the graph to all the Variables in the graph using the grad function specified at each variable.
  • Calculating Gradients for a variable (and its subgraph) can be disabled by ivoking var.setCalcGrad(false)

Functions

  • Each function takes in Variable parameters and return Variable as a parameter.
  • Each function performs the operation immediately on the data.
  • Each function returns a Variable constructed using arguments as parameters:
    • af::array: The result calculated earlier
    • vector<Variable>: containing the inputs to the function
    • BackwardFunction_t: A function pointer to the backward pass. Usually implemented as a lambda function.

Example function:

       Variable operator +(const Variable lhs, const Variable rhs)
       {
           auto result = lhs.getData() + rhs.getData();
           auto backward = [](std::vector<Variable> inputs, Variable grad_output) {
               inputs[0].addGrad(grad_output);
               inputs[1].addGrad(grad_output);
           };
           return Variable(result, {lhs, rhs}, backward);
       }

Example:

A simple example showcasing how this can be done currently

void test()
{
    using af::autograd::Variable;
    auto x = Variable(af::randu(5), true);
    af_print(x.array());
    auto y = Variable(af::randu(5), true);
    af_print(y.array());
    auto z = x * x + x * y + y * y;
    auto dz = Variable(af::constant(1.0, 5), false);
    z.backward(dz);
    auto dx = x.grad();
    auto dy = y.grad();
    af_print(dx.array() - 2 * x.array() - y.array());
    af_print(dy.array() - 2 * y.array() - x.array());
}

TODO: for this PR

  • Add all math operations: +, -, *, /, sin, cos, exp, tanh
  • Add array operations: tile, sum, transpose
  • Add operations required for Dense layers: matmul
  • Reimplement existing layers using autograd
  • Option to enable or disable building sub graphs
  • Option to enable or disable retaining graphs for gradients
  • Make sure perceptron example is working.
  • Add train and evaluation mode for modules

@pavanky
Copy link
Member Author

pavanky commented Jul 2, 2017

@botev @jramapuram @itsnarsi This has been a long time coming, but I'd appreciate if you guys had any feedback as well.

@pavanky
Copy link
Member Author

pavanky commented Jul 2, 2017

CC @arrayfire/core-devel

@pavanky
Copy link
Member Author

pavanky commented Jul 2, 2017

@Reithan too

@jramapuram
Copy link
Member

Awesome work @pavanky . Will take a look in more detail when I get to a terminal. Quick question: can you take second derivatives with your implementation?

@pavanky
Copy link
Member Author

pavanky commented Jul 2, 2017

@jramapuram Not yet, I wanted to get the first order working first :)

@pavanky
Copy link
Member Author

pavanky commented Jul 2, 2017

@jramapuram went ahead and changed the gradients to be Variables too. This should make it easy to perform higher order derivatives.

@itsnarsi
Copy link

itsnarsi commented Jul 3, 2017

@pavanky just tested it on my laptop and it looks pretty neat. Unlike python, I did not see any initial delay. This might be because of no JIT I guess.
When will this be merged to this repo?

@pavanky
Copy link
Member Author

pavanky commented Jul 3, 2017

@itsnarsi This is still very nascent. I want to incorporate some of the stuff mentioned here to make it more efficient:
http://pytorch.org/docs/master/notes/autograd.html#excluding-subgraphs

Copy link

@FloopCZ FloopCZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, nice job!


using namespace af;
using namespace afml;
using namespace afml::nn;
using namespace af;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a tool for detecting this or a really good eye :D

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tool would be great. Unfortunately, I'm just an irritating nitpicker. 😇

{
if (m_grads.size() == 1) return;
Variable grad = m_grads[0];
for (int i = 1; i < (int)m_grads.size(); i++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer unsigned iterable to avoid clang's -Wconversion signedness warnings when indexing to std::vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do thanks.

@pavanky
Copy link
Member Author

pavanky commented Jul 5, 2017

Decreased the scope of the PR to get a minimum viable thing going. The additional functions and operators can be added once this PR gets merged.

- autograd::Variable::Shared now a thin layer without methods
- Variable::BackwardFunc_t renamed to Variable::GradFunc_t
- Variable::getData renamed to Variable::array
- Variable::getGrad renamed to Variable::grad
- Variable::backward renamed to Variable::calcGradInputs
@pavanky
Copy link
Member Author

pavanky commented Jul 5, 2017

@jramapuram I think enabling the support for higher order derivatives by default will increase the memory being used. I am going to enable a flag to enable it during the backward pass. By default only the values will be stored.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor preliminary comments. Everything looks great. We can refactor it later as long as we have a clean user-facing API.


find_package(ArrayFire REQUIRED)

add_library(afml SHARED "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't add SHARED then you can control the type of library you make by BUILD_SHARED_LIBS variable

Variable operator +(const Variable &lhs, const Variable &rhs)
{
auto result = lhs.array() + rhs.array();
auto grad_func = [](std::vector<Variable> &inputs, const Variable &grad_output) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we usually have outputs then inputs?

It looks like you know the # of inputs for each function. I would use something like std::array<Variable, N> for something like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these are inputs. grad_output is an input coming from a different place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And using std::array is not an option. All functions need to share the same signature so they can be stored as GradFunc_t inside Variable.

- Implemented baseclass nn::Module
- Added basic modules: nn::Linear, nn::Sigmoid, nn:Tanh
- Added container modules: nn:Container, nn:Sequential
- Deleted unnecessary examples, cleaned up perceptron.cpp
Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor issues. This is looking great!


// Update parameters
// TODO: Should use optimizer
for (auto param : perceptron.parameters()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto& ?

@@ -0,0 +1,88 @@
/*******************************************************
* Copyright (c) 2015, ArrayFire
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017

GradFunc_t m_grad_func;
};

public:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be aligned with other access qualifiers.

@@ -0,0 +1,61 @@
/*******************************************************
* Copyright (c) 2015, ArrayFire
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017


void Module::eval()
{
for (auto parameter : m_parameters) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto&?

private:
void evalGrad(bool retain_grad_graph = false);

std::vector<Variable> getInputs() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to return by value?

@umar456 umar456 merged commit 8129b47 into arrayfire:master Jul 6, 2017
@pavanky pavanky deleted the autograd branch July 6, 2017 15:57
@pavanky pavanky changed the title [WIP] Initial attempt at autograd Initial implementation of autograd Jul 10, 2017
@pavanky pavanky mentioned this pull request Jul 10, 2017
20 tasks
@pavanky pavanky modified the milestone: 0.1 Jul 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants