Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming, Laziness, and Ordering #214

Closed
ChrisRackauckas opened this issue Dec 22, 2019 · 9 comments
Closed

Naming, Laziness, and Ordering #214

ChrisRackauckas opened this issue Dec 22, 2019 · 9 comments

Comments

@ChrisRackauckas
Copy link
Member

There are three issues that are hitting the same fundamental problem. Here's a few questions to get started:

  1. When doing ODE order lowering, what should we name the outputs? The user has to know and understand the semantics in order to know how to set the inputs to the correct values, so this is not an internal detail.
  2. When doing component-based systems, how do we name things? How does the user know how to set the parameters of component 1 vs component 2?
  3. When building a simple ODE system with automated parameter and variable capturing, how can the user sanely set the parameters?
  4. When doing index reduction, same thing.
  5. Etc. etc. etc. this comes up with almost every transform.

These are all the same underlying issue that, when we generate code:

  1. We need some compiler-y way to make sure we don't override namings.
  2. It needs to be user-facing, because the user must understand the resulting semantics to know how to set values.

How can we solve it? @HarrisonGrodin and I talked in detail in a few different ways. One thing we can do is give up trying to come up with sane names and try to keep things lazy for as long as possible. D(x) makes sense, so try keeping it. The only place where names are then required would be at the very end when we flatten the system to compile it, in which case hopefully the user doesn't have to interact with it anymore.

However, the issue that came up with that is that, the user still has to understand the f that you built because they need to give u0 and p. Do we then move u0 and p into the ODESystem and directly generate ODEProblem? What about tspan? This is the line of thought I was going down before, and it doesn't seem so generalizable: we'd end up special casing a lot of things specifically for DESystem types, and then not a lot of reusability could be had to other domains. Also, it just feels wrong.

But recently an idea that I had was just about building "map vector" functions for the user. For example, if we know internally the order of the variables we are using, then:

map_vector_state(odesys,[x=>1.5,y=>2.0,D(x)=>5.0])

can be used to spit out a u0 vector that matches the system we are generating. The parameters can be handled similarly, and this idea then applies even to NonlinearSystem, optimization, and beyond. It can allow for handling the lazy stuff in a nice way as well. Also, this would be a good place to throw errors, asking for values. For example, if daesys comes from an index reduction, the user might not know up front what values they need to define, and the context of the system itself will be necessary for giving the user a clue. From this we can say Error: D(y) required for the system, and that is easily interpretable by the user.

This still needs more thinking, but it could possibly work. In order to do such matching though, we need to enforce a normal form, which is not something we do well with right now.

@ranjanan @YingboMa @shashi

@YingboMa
Copy link
Member

But recently an idea that I had was just about building "map vector" functions for the user. For example, if we know internally the order of the variables we are using, then:

map_vector_state(odesys,[x=>1.5,y=>2.0,D(x)=>5.0])

I like this idea. Each transformation pass should also return the names of the parameters that the function is expecting.

In the book Structure and Interpretation of Classical Mechanics, there is no such thing as differentiating an expression. It only deals with functions. I think we should do that, too. For scalar differentiation, it works like the following:

1 ]=> (D sin)
#| a-euclidean-derivative |#

1 ]=> ((D sin) 0)
#| 1 |#

1 ]=> ((D sin) 'x)
#|
(cos x)
|#


1 ]=> ((* 5 (D sin)) 'x)
#|
(* 5 (cos x))
|#

1 ]=> ((* 5 (D sin)) 0.1)
#| 4.975020826390129 |#

The D operator returns a functor that has arithmetics defined on it. This enables us to do cool things like taking Taylor series by exponentiation:

1 ]=> (series:print (((exp D) cos) 'h) 8)
(cos h)
(* -1 (sin h))
(* -1/2 (cos h))
(* 1/6 (sin h))
(* 1/24 (cos h))
(* -1/120 (sin h))
(* -1/720 (cos h))
(* 1/5040 (sin h))
#| ... |#

Gradient works like this:

1 ]=> (define g (literal-function 'g
    (-> (UP Real Real) Real))) ; g: R^2 -> R
#| g |#

1 ]=> ((D g) (up 'x 'y))
#|
(down (((partial 0) g) (up x y)) (((partial 1) g) (up x y)))
|#

1 ]=> (define (f x y) (* (expt x 3)
                   (expt y 5)))
#| f |#

1 ]=> ((D f) 'x 'y)
#|
(down (* 3 (expt x 2) (expt y 5)) (* 5 (expt x 3) (expt y 4)))
|#

1 ]=> (define (k x) (* (expt (ref x 0) 3)
                 (expt (ref x 1) 5)))
#| k |#

1 ]=> ((D k) (up 'x 'y))
#|
(down (* 3 (expt x 2) (expt y 5)) (* 5 (expt x 3) (expt y 4)))
|#

This approach generalizes well to high dimensional objects, too.

@HarrisonGrodin
Copy link
Contributor

Ultimately, it seems this boils down to implementing and working with alpha equivalence.

If we're creating new names (e.g. name * "_" * "t"^n), we have to be very careful that names don't overlap, which seems like a losing game. The likely better option is to let names be completely arbitrary (i.e. let gensym deal with this for us) and figure out what to guarantee the user about the result. I think the better option would be to either (a) return a mapping (array, Dict, etc.) of Tuple{Variable,UInt} to Variable or (b) having a better canonical form for "derivative of x with respect to t" which anyone can produce. The latter seems more sustainable, so I quite like the idea of D(x) being a value.


Regarding @YingboMa's point, I agree - differentiation is really just a higher-order function diff : (R -> R) -> (R -> R). Taking variable binding sites seriously is probably long overdue.

For example, consider the expression sinⁿ(x), meaning sin(sin(...(sin(x))...)). What does it mean to differentiate with respect to x? (For that matter, does it mean to differentiate with respect to n?) It's probably more reasonable to treat this as a function, say, f : (x,n) ↦ sinⁿ(x). Then, diff(x ↦ sin(x,2)) makes actual sense! Food for thought...

@shashi
Copy link
Collaborator

shashi commented Dec 23, 2019

I was going to open an issue and write something similar to what Yingbo wrote! Was working on the D operator with ForwardDiff + ModelingToolkit ;)

Gradient works like this:
This approach generalizes well to high dimensional objects, too.

More example: Jacobian

1 ]=>(define (foo x)
           (up (sin (ref x 0)) (cos (ref x 1)) (tan (ref x 2))))
1 ]=> ((D foo) (up 1 2 3))
#|
(down (up .5403023058681398 0 0) (up 0 -.9092974268256817 0) (up 0 0 1.0203195169424268))
|#

1 ]=> ((D foo) (up 'x 'y 'z))
#|
(down (up (cos x) 0 0) (up 0 (* -1 (sin y)) 0) (up 0 0 (/ 1 (expt (cos z) 2))))
|#

and (compose D D) is Hessian:

1 ]=> (define (bar x)
          (dot-product (foo x) (up 1 1 1)))

1 ]=> ((D (D bar)) (up 'x 'y 'z))
#|
(down (down (* -1 (sin x)) 0 0) (down 0 (* -1 (cos y)) 0) (down 0 0 (/ (* 2 (sin z)) (expt (cos z) 3))))
|#

Another very neat convention that the book uses is having t as the first element of any state. It's not treated as special like in DiffEq ecosystem, unrelated, but I love that.

Ultimately, it seems this boils down to implementing and working with alpha equivalence.

Our programming language has it, so we should try to use it without re-implementing if possible. As a side effect, users will already know it.

For example, if we know internally the order of the variables we are using, then:
map_vector_state(odesys,[x=>1.5,y=>2.0,D(x)=>5.0])

Can you say a bit about why the user wouldn't know the order of the variables?

D(x) makes sense

from what I can tell D(x) is just a name right like xdot right?

@shashi
Copy link
Collaborator

shashi commented Dec 23, 2019

One more thing: ((partial i) g) (you can see it in one of Yingbo's examples), read in julia as paritial(i)(g), is an optimized version of x->D(g)(x)[i]. Super convenient to not have to remember names.

@YingboMa
Copy link
Member

Another very neat convention that the book uses is having t as the first element of any state. It's not treated as special like in DiffEq ecosystem, unrelated, but I love that.

In fact, when plotting a solution object, time is treated as the 0-th state in DiffEqBase.jl. 😸 E.g.

plot(sol, vars=(0, 1))

plots t vs u[1]. We might want to do that in MTK, too?

@ChrisRackauckas
Copy link
Member Author

That's not entirely correct though, since with MTK there's not just t, but also potentially x, y, z for example in PDEs.

@shashi
Copy link
Collaborator

shashi commented Dec 23, 2019

The D operator returns a functor that has arithmetics defined on it.

Actually it just returns a plain scheme function! scmutils defines arithmetic directly on functions. It works quite well.

@MasonProtter
Copy link
Contributor

If we're worried about overlap of variable names, how about returning some sort of locally scoped object like a module or closure where the bindings can be safely defined without worrying about variable name clashes?

@ChrisRackauckas
Copy link
Member Author

Finished in #265

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants