Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why "Only lower-case latin letters allowed in names, not ..." #28

Closed
boeddeker opened this issue Feb 14, 2019 · 12 comments
Closed

Why "Only lower-case latin letters allowed in names, not ..." #28

boeddeker opened this issue Feb 14, 2019 · 12 comments
Milestone

Comments

@boeddeker
Copy link
Contributor

Is there a reason that einops does not support upper latin letters?
I would like to use upper and lower letters.

@boeddeker boeddeker changed the title Why "Only lower-case latin letters allowed in names, not '{}'" Why "Only lower-case latin letters allowed in names, not ..." Feb 14, 2019
@jotaf98
Copy link

jotaf98 commented Feb 14, 2019

I also wonder about this. How about other characters, e.g. underscores?

One type of name that would be useful to me would be numbers. This is useful e.g. for singleton dimensions. Their values could be set automatically, e.g. rearrange(x, 'a 1 -> a') is much more intuitive than rearrange(x, 'a dummy -> a', dummy=1).

They would still have to be unique (there's no way to disambiguate two singleton dimensions this way) but this is understandable; it would still be useful in many situations regardless.

@arogozhnikov
Copy link
Owner

arogozhnikov commented Feb 15, 2019

Thanks for good questions, guys.

Is there a reason that einops does not support upper latin letters?

Yup. Einops enforces better code style (reliable, maintainable and readable code - that what it was created for), this is primarily to prevent 't for token, T for time' and things alike. Be verbose, you'll say thanks to yourself!

That said, this decision is open for reconsidering in future releases if there are important use-cases that current API doesn't meet.

I also wonder about this. How about other characters, e.g. underscores?

Underscores reserved in advance if some additional flags will be required. That way those will be disambiguated with passed dimensions. Other characters can't be passed with kwargs, AFAIK.

rearrange(x, 'a 1 -> a')

Current way is: rearrange(x, 'a () -> a'). Uses existing infra and does not introduce special meaning for 1. Because one would definitely also try rearrange(x, 'a 2 -> a').

@boeddeker
Copy link
Contributor Author

Yup. Einops enforces better code style (reliable, maintainable and readable code - that what it was created for), this is primarily to prevent 't for token, T for time' and things alike. Be verbose, you'll say thanks to yourself!

That said, this decision is open for reconsidering in future releases if there are important use-cases that current API doesn't meet.
I would say it does not enforce better code style, becasue lower case letters aren't better than upper case letters. e.g. rearrange(x, 'a b -> b a') vs rearrange(x, 'A B -> B A'). Furthermore the upper case letters are more common.

  • When you have squared dimensions I like to use lower and upper case letters to indicate that they have the same size.
  • An upper case letter would allow to use even more verbose names. e.g. rearrange(x, 'timeDimension -> timeDimension) (Bad example)

Current way is: rearrange(x, 'a () -> a'). Uses existing infra and does not introduce special meaning for 1. Because one would definitely also try rearrange(x, 'a 2 -> a').
I like rearrange(x, 'a 1 -> a') more than rearrange(x, 'a () -> a'), because it is easier to read. Nevertheless rearrange(x, 'a 2 -> a') can be confusing.

@jotaf98
Copy link

jotaf98 commented Feb 15, 2019

Thanks for the reply. I don't feel too strongly either way about enforcing style; it just seemed like a simple thing to add.

As for numbers, rearrange(x, 'a 2 -> a') should always throw an error (number of elements on both sides will be different. However, this makes sense: rearrange(x, 'a 2 -> 2 a'). My suggestion (in hindsight it should've been a different issue) was to treat numbers just like named dimensions, with no syntactic difference whatsoever, and with an already known size (in this case 2).

@arogozhnikov
Copy link
Owner

My suggestion (in hindsight it should've been a different issue) was to treat numbers just like named dimensions, with no syntactic difference whatsoever, and with an already known size (in this case 2).

Unfortunately, this produces even more questions down the road:

rearrange(x, 'a 2 -> 2 a')
rearrange(x, 'a 2 2 -> 2 a 2') # which one is which?
rearrange(x, '2 4 -> 4 2') # hey, I thought it is reshape, not transpose!

Axis 1 is the only that does not cause any doubt, but in this case it is an exception in notion (that can't be extended to other numbers).

  • When you have squared dimensions I like to use lower and upper case letters to indicate that they have the same size.

Thanks for coming up with examples. Wouldn't time1 time2 -> time2 time1 work for this case?

  • An upper case letter would allow to use even more verbose names. e.g. rearrange(x, 'timeDimension -> timeDimension) (Bad example)

I agree it is not good example :) So far I didn't meet need myself to write long axis names, but camelCased identifiers are easier to read, I agree with that.

@boeddeker
Copy link
Contributor Author

Thanks for coming up with examples. Wouldn't time1 time2 -> time2 time1 work for this case?

Sure, but my "glue" code writes many rearranges and I have common letters for my dimensions.
There I prefare to use single letters and in the moment I use for example 'D T F -> F () T D'.lower() (I like upper case letters more than lower case).

But it is still confusing that rearrange(x, 'a () -> a') is valid and rearrange(x, 'A () -> A') not.

@adynathos
Copy link

adynathos commented Aug 28, 2019

I respectfully disagree.
Lowercase letters are not "better style" than uppercase letters. If anything, uppercase letters are more readable.

If this library is used to implement a formula, it would make sense to follow the symbols and case from the formula.
Thus unicode symbols like greek letters should also work.

Thanks for the trick with .lower(), it will do for now.

@shoyer
Copy link

shoyer commented Aug 20, 2020

+1 for relaxing this. I think a better rule would be supporting any valid Python identifier as a name:
https://docs.python.org/3/reference/lexical_analysis.html#identifiers

Python 3 names are a bit more complex with unicode support, but I would at least support any valid Python 2 name, i.e.,

the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

@arogozhnikov
Copy link
Owner

There is enough consensus to relax requirement. Thanks everyone for input.

From now on

  • axis name can be any python identifier (camelCase, under_scored, and e.g. greek letters allowed as well)
    • please use this freedom wisely
  • exception: _leading and trailing_ underscores are still booked for extensions
  • axes that are python reserved keywords trigger warning but not an error (if, for, while, class, etc.)
  • when there is need to reserve specific names, einops will go thru deprecation warning in one version, then will raise an error

Code in master is already updated to allow new friendly axes.
Last point is in action as well - name axis is reserved for compatibility with planned operations.

@shoyer
Copy link

shoyer commented Aug 23, 2020

Nice! Just to clarify: Is _ a valid axis name, or is that reserved?

@arogozhnikov
Copy link
Owner

Nice catch @shoyer

  • '_' is reserved and disallowed in rearrange/reduce/repeat
  • '_' is allowed in parse_shape and has its previous meaning (skip this axis)

@arogozhnikov
Copy link
Owner

This feature became part of einops 0.3 release.
Announcement at reddit/ML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants