Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make _timelex a parser class attribute #729

Closed
wants to merge 1 commit into from

Conversation

jbrockmendel
Copy link
Contributor

Makes _timelex (well, _timelex_cls) a parser class attribute to facilitate overriding in subclasses. _timelex.split constitutes a fairly large chunk of parser runtime and can benefit from e.g. cythonizing.

Zero logic is affected.

@@ -571,6 +571,10 @@ def resolve_ymd(self, yearfirst, dayfirst):


class parser(object):
# timelex_cls is a class attribute so it can be easily overriden in
# subclasses
_timelex_cls = _timelex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both _timelex_cls and _timelex are private attributes. I think this is the right thing to do in the end, but in the short term the only effect this will have is encouraging people to make use of the private interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on what you're suggesting? Are you saying _timelex_cls should be public?

I think it makes sense to keep private attributes that end-users don't need to worry about (and/or shouldn't mess with). If someone is sub-classing, it's less obvious how meaningful the public/private distinction is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying this change is not necessary because we don't need the _timelex_cls attribute internally (since we have exactly one lexer) and no one who is not dateutil should be writing code that relies on the private interface (whcih are assumed to be implementation details).

Whether or not someone is subclassing does not really change the meaning of public vs. private. From the perspective of someone subclassing your class, you should assume that only the public parts of the interface exist (other than trying to avoid namespace collisions). Everything else is an implementation detail.

For example, we may write a drop-in replacement for the parser with a compiled backend, in which case likely we would not write Python versions of the private interface, because that would just slow things down and we've already advertised that no one should be relying on the details of the implementation by marking those things private. Any code relying on the existence of that private interface would break when it encounters the compiled version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble squaring what you're saying here with the opinions expressed in #123 and #139. Has something changed your mind?

Let's avoid delving into the semantics of "public" vs "private" vs "implementation detail".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to eventually make the lexer public, but it is not currently public.

Adding a private attribute to parser that we, the only consumers of the private interface, do not need, does not help anything. When timelex or equivalent is made public, depending on the details we decide for the API, we may add a public attribute like lexer_class, but this private attribute would have to be replaced when that happens anyway, so it doens't help anything in the long or short run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this private attribute would have to be replaced when that happens anyway,

We can worry about that when we get there. It's not as if this is on the immediate horizon.

so it doens't help anything in the long or short run.

de facto pandas uses _timelex, and would benefit from being able to drop in a simplified+cythonized version of it.

Plus making it available to early adapters will get some feedback and make design decisions for an eventual public version better-informed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de facto pandas uses _timelex, and would benefit from being able to drop in a simplified+cythonized version of it.

I have said many times that pandas should not be using _timelex. I will likely not continue to go through deprecation cycles for very explicitly private code in the future as I have last time this came up. Enabling people to do stuff like this is exactly why this kind of thing has caused problems in the past.

Plus making it available to early adapters will get some feedback and make design decisions for an eventual public version better-informed.

At this point there's nothing to give feedback on.

@pganssle
Copy link
Member

Since the aim of this PR is to enable the use of the private interface, which is something we should be actively discouraging, I'm closing it.

@pganssle pganssle closed this May 27, 2018
@jbrockmendel
Copy link
Contributor Author

Please re-open and give this a day to think about. We're on the same team here.

@pganssle
Copy link
Member

I'm happy to re-open it if there's some actual value, but basically your position seems to be that we should do this so that people can use the private interface. I've been trying to actively discourage the use of the private interface, so it doesn't seem like there's anything to think about.

@jbrockmendel
Copy link
Contributor Author

but basically your position seems to be that we should do this so that people can use the private interface

I observe the following:
a) _timelex has significant room for efficiency improvements (improvements for which have been rejected in the past based on... _timelex-subclassing concerns),
b) you're on record in multiple places as saying _timelex should be subclassable
c) something resembling this PR is necessary if that is to be put into practice,
d) I don't find the public/private distinction very meaningful/well-defined/interesting

Based on this, my contention is that either:

a) You (and by extension we) no longer intend _timelex (or a publicly-named version thereof, whatever) to be subclassable, in which case
i) #123 should be closed,
ii) #139 should be reverted,
iii) we should discuss other ways that _timelex can be simplified, or
b) this PR is necessary

@pganssle
Copy link
Member

The fact that you find the public/private distinction meaningful and/or well-defined seems to be the main point of contention, and it's a big one. To be absolutely clear, my position on public vs. private in all of my projects is:

  1. If something starts with _ (and it is not a dunder method), it is private.
  2. If something is not in __all__, it is almost certainly private.
  3. If something is private, it is to be considered an implementation detail, meaning that it can change behavior, move, or stop existing at any point, even in a patch release, with no deprecation period. Other implementations (e.g. a theoretical compiled dateutil) are not guaranteed to have the same private interface. In other words, it does not form part of the contract of the software I am providing.

Because of Hyrum's Law, I prefer to enforce this contract whenever possible, which means doing things like defining gettz in a closure and later deleting the private function used to create the function. Adding something to the private interface whose sole purpose is to allow downstream users to more conveniently interact with the private interface flies directly in the face of these goals. As it is, even if it had a more legitimate purpose I would be worried that it would be encouraging people to do this.

a) You (and by extension we) no longer intend _timelex (or a publicly-named version thereof, whatever) to be subclassable, in which case

I don't know what gives you this idea. I am reluctant to make ill-informed changes to the public interface and I have said many times that the parser is a particularly difficult thing to work with/on and I am not currently equipped with a deep enough theoretical understanding of the problem to feel confident in recognizing a good final API. Until that changes, I do not want to do things like make _timelex public because I don't even know if something like _timelex will continue to exist.

b) this PR is necessary

This specific PR would never be necessary, because all it does is modify the private interface in a way that the only consumer of the private interface (dateutil) has no use for.

When _timelex is made public, some similar mechanism for specifying the parser's lexer will be necessary, but it will be a change to the public interface of the parser.

@jbrockmendel
Copy link
Contributor Author

Is there a meaningful difference in any of this logic between _result and _timelex?

I don't know what gives you this idea.

Right, that was why this was listed as one of two possibilities. It sounds like the possibility I had missed is:
c) You don't want to allow incremental changes to the parser until after [...]. This effectively locks out any kind of teamwork. This is your prerogative, but again I'd ask you to reconsider.

@pganssle
Copy link
Member

pganssle commented May 28, 2018

Is there a meaningful difference in any of this logic between _result and _timelex?

I don't think so, but _result has always been defined on the parser (so no point changing it at this point) and it's a container class with no logic in it. You'll notice that I could futher discourage use of _timelex by hiding it in a closure or defining it in the parser's __init__ or something, but I'm happy to leave things as they are until this sorts out.

c) You don't want to allow incremental changes to the parser until after [...]. This effectively locks out any kind of teamwork. This is your prerogative, but again I'd ask you to reconsider.

This is somewhat unfair in a few ways. For one thing, this is not an incremental change in the sense that it is not a change that moves us towards a situation we want to be in. As I mentioned earlier, this specific PR, which only makes modifications to the private interface, is not on the road map. Something like this will happen as part of the "making _timelex public", but it won't be this specifically, so this PR would just get undone. What is the point of saying, "We made a change and you aren't supposed to use it." It serves no purpose.

Additionally, I am very open to teamwork on this, but at the end of the day it's a tough problem to be solved in an important part of dateutil and the standards are high. You have made many useful contributions to the parser, even within the bounds of not messing with the public API. Figuring out where the parser is going is one of my highest priorities for dateutil. so hopefully the public API freeze won't last much longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants