Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional API for DbProviderFactories in .NET Core #22229

Closed
divega opened this issue Jun 10, 2017 · 72 comments
Closed

Additional API for DbProviderFactories in .NET Core #22229

divega opened this issue Jun 10, 2017 · 72 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Data.SqlClient
Milestone

Comments

@divega
Copy link
Contributor

divega commented Jun 10, 2017

Latest Proposal

Copied from latest API approval: https://github.com/dotnet/corefx/issues/20903#issuecomment-342605350

public static class DbProviderFactories
{
    // exiting members
    
    public static DbProviderFactory GetFactory(string providerInvariantName); 
    public static DbProviderFactory GetFactory(DataRow providerRow); 
    public static DbProviderFactory GetFactory(DbConnection connection); 
    public static DataTable GetFactoryClasses(); 

    // new members

    /// <summary>
    /// Registers a provider factory using the assembly qualified name of the factory and an 
    /// invariant name
    /// </summary>
    public static void RegisterFactory(string providerInvariantName, string factoryTypeAssemblyQualifiedName);

    /// <summary>
    /// Registers a provider factory using the provider factory type and an invariant name
    /// </summary>
    public static void RegisterFactory(string providerInvariantName, Type factoryType);

    /// <summary>
    /// Extension method to register a provider factory using the provider factory instance and 
    /// an invariant name
    /// </summary>
    public static void RegisterFactory(string providerInvariantName, DbProviderFactory factory);

    /// <summary>
    /// Returns the provider factory instance if one is registered for the given invariant name
    /// </summary>
    public static bool TryGetFactory(string providerInvariantName, out DbProviderFactory factory);

    /// <summary>
    /// Removes the provider factory registration for the given invariant name  
    /// </summary>
    public static bool UnregisterFactory(string providerInvariantName);

    /// <summary>
    /// Returns the invariant names for all the factories registered
    /// </summary>
    public static IEnumerable<string> GetProviderInvariantNames();
}

Original proposal

Issue #15732 is about porting the existing surface of DbProviderFactories into .NET Core. This new issue is specifically about new API that needs to be added in .NET Core that does not (yet) exist in .NET Framework. DbProviderFactories on .NET Framework can only be initialized from .config files, and in order to make the API usable without .config files the new API is needed.

The proposal by @FransBouma can be found in dotnet/standard#356 (comment) and is repeated below:

I've refactored the code a bit, the new API now looks like:

public static void ConfigureFactory(Type providerFactoryClass);
public static void ConfigureFactory(Type providerFactoryClass, string providerInvariantName);
public static void ConfigureFactory(Type providerFactoryClass, string providerInvariantName, string name, string description);
public static void ConfigureFactory(DbConnection connection);
public static void ConfigureFactory(DbConnection connection, string providerInvariantName);
public static void ConfigureFactory(DbConnection connection, string providerInvariantName, string name, string description);

Two new overloads are added. They'll use the fallback code for the providerInvariantName, as it is also present in netfx' auto-init code: it will use the namespace of the type. I've added this to avoid people making a typo in the name as for most factories I know (I don't really know of a dbproviderfactory where this isn't the case) the invariant name is equal to the namespace.

Method usage / purpose

public static void ConfigureFactory(Type providerFactoryClass)

Description: This method will register the factory instance contained in the specified type, under invariant name equal to the namespace of the specified type. It will leave name and description empty.
Purpose: This method is meant to be the easiest way to register a factory. Most (if not all) ADO.NET providers use as invariant name the namespace of the factory type.

public static void ConfigureFactory(Type providerFactoryClass, string providerInvariantName)

Description: This method will register the factory instance contained in the specified type, under invariant name specified in providerInvariantName. It will leave name and description empty.
Purpose:: This method can be used to register a factory for the ADO.NET providers which don't use the namespace as the invariant name, and it can also be used to register a different factory type under a well-known invariant name, e.g. in the case of a wrapping factory for ADO.NET profiling.

public static void ConfigureFactory(Type providerFactoryClass, string providerInvariantName, string name, string description)

Description: This method will register the factory instance contained in the specified type, under invariant name specified in providerInvariantName and will fill in the name and description values for the factory.
Purpose:: This method is equal to ConfigureFactory(type, string) and can be used to fill in the additional two columns in the factory table if a user requires that.

public static void ConfigureFactory(DbConnection connection)

Description: This method will register the factory instance obtained from the specified connection, under invariant name equal to the namespace of the factory instance's type. It will leave name and description empty.
Purpose: This method is meant to be the easiest way to register a factory if the user doesn't know the factory type but does know the connection type. As DbProviderFactory registration was mainly hidden for most users it can very well be they're not familiar with the factory types, so this method and its overloads make it easier for them to register a factory. Most (if not all) ADO.NET providers use as invariant name the namespace of the factory type.

public static void ConfigureFactory(DbConnection connection, string providerInvariantName)

Description: This method will register the factory instance obtained from the specified connection, under invariant name specified. It will leave name and description empty.
Purpose:: This method can be used to register a factory for the ADO.NET providers which don't use the namespace as the invariant name, and it can also be used to register a different factory type under a well-known invariant name, e.g. in the case of a wrapping factory for ADO.NET profiling.

public static void ConfigureFactory(DbConnection connection, string providerInvariantName, string name, string description)

Description: This method will register the factory instance obtained from the specified connection, under invariant name specified in providerInvariantName and will fill in the name and description values for the factory.
Purpose:: This method is equal to ConfigureFactory(DbConnection, string) and can be used to fill in the additional two columns in the factory table if a user requires that.

@FransBouma
Copy link
Contributor

My advice would be: keep the overloads and not just 1 method as having just 1 method will be more cumbersome to use. In general having overloads gives a bit of a maintenance burden, however the class they're part of is pretty much complete in functionality, as in: I don't think this class will ever get more functionality so the overloads won't have a necessity to change over time with new features being added to this class. In that light, the overloads have as much maintenance overhead as a single method, so I think having them is not a big burden but has a positive impact in usability of the class :)

@karelz
Copy link
Member

karelz commented Jun 11, 2017

@divega @saurabh500 @corivera can you please review it on Monday from API experts point of view (make a decision on 1 method vs. more), then add a note and mark it for API review (on Tue morning) by relabeling to api-ready-for-review? Thanks!

@divega
Copy link
Contributor Author

divega commented Jun 11, 2017

@karelz I have setup something for this week.

@divega
Copy link
Contributor Author

divega commented Jun 21, 2017

We discussed this with @saurabh500 @corivera @ajcvickers and the @dotnet/fxdc people and came up with the API we want:

        /// <summary>
        /// Programmatically registers the configuration of a <see cref="DbProviderFactory" />. 
        /// </summary>
        /// <param name="invariantName"> A unique name that can be used to refer to the data 
        /// provider. Required to not be null nor whitespace.
        /// </param>
        /// <param name="assemblyQualifiedName"> Assembly qualified name of the 
        /// <see cref="DbProviderFactory" /> class, which contains enough information to 
        /// instantiate the object. Required to not be null nor whitespace. 
        /// </param>
        /// <param name="name"> Human readable name for the data provider. Can be null in which 
        /// case it is ignored. 
        /// </param>
        /// <param name="description"> Human readable description of the data provider. Can be null 
        /// in which case it is ignored. 
        /// </param>
        /// <remarks>
        /// If a registration exists for the given <paramref name="invariantName" /> the call to 
        /// this method will override the configured values.
        /// </remarks>
        public static void RegisterFactory(
            string invariantName,
            string assemblyQualifiedName,
            string name,
            string description)

Rationale

  • This is a relatively low-level API which we only expect to be called one or twice in any application that requires it. When we looked at all the sugar overloads in the original proposal we felt that they extend the surface and the add decision points without providing enough value. E.g. all of them can be replaced with a call to the method above and at most a couple of additional method calls or property access.

  • We preferred the version of the method that takes a string for the type name vs. an actual type because this allows for the application code that calls into the API to be decoupled from the provider at compile time. If the type is available, it is enough to access the AssemblyQualifiedName property to obtain the right string to pass to this API. Incidentally this aligns with the actual implementation, which will only store the AssemblyQualifiedName.

  • The invariantName argument moves to the first position because this is the unique identifier of each configuration record, e.g. it would be the key if this was a dictionary.

  • There was feedback from @davkean that the distinction between typeName and assemblyQualifiedName is important because typeName would imply that a simple type name would be sufficient when it actually is not.

  • In the end result, the parameter names align with the existing representation of provider factory configuration in the schema of the DataTable returned by GetFactoryClasses().

  • For the naming, we discussed RegisterFactory vs. ConfigureFactory. In the end, RegisteryFactory won because it reflects the way we describe this API in plain language.

@ajcvickers, @saurabh500, @corivera, @terrajobst, @davkean, @FXDC please let me know if you see any error of misrepresentation from what we discussed or if any important details are underspecified.

BTW, there is one tweak I took the liberty of making today: I shortened providerInvariantName to invariantName. I think provider is implied by the name of the class and we are not using provider or factory or providerFactory as a prefix for any of the other parameters although a prefix would apply to all of them. Incidentally this makes the parameter names 100% exactly the same as the column names of the DataTable returned by GetFactoryClasses() as documented. I am ok with reverting this small tweak if people feel strongly that it should be providerInvariantName.

cc @FransBouma

@FransBouma
Copy link
Contributor

FransBouma commented Jun 22, 2017

I'm disappointed in this API. The main reason is that as a person who directly has to explain developers of .NET applications how to use this API, I have to explain more than I should have: it's completely non-intuitive, and as there are no overloads proposed, they always have to call this method. Although you qualify it as a 'low level API', the end developer still has to call it. What I was after was a way to let them make as less mistakes as possible, with an API that's as easy to use as possible: no mistakes.

Now, there's an API which actually has just 1 overload which only accepts a string for a typename. I understand why you'd want a string variant, but it's the only variant, so everyone has to call this method (and they have to call it always), so mistakes are easily made and only visible at runtime. People already make mistakes typing the provider invariant name, do you really think they make no mistake in typing a type name, oh sorry, assembly qualified name?

I find this API so bad actually, that I refuse to implement it: every ORM will likely implement some extension methods which make this single method completely obsolete anyway.

@FransBouma
Copy link
Contributor

To explain things per point:

This is a relatively low-level API which we only expect to be called one or twice in any application that requires it. When we looked at all the sugar overloads in the original proposal we felt that they extend the surface and the add decision points without providing enough value. E.g. all of them can be replaced with a call to the method above and at most a couple of additional method calls or property access.

yes, but that's always the case. The BCL has a tremendous amount of ctors and methods which only call into other methods and are there just to make things easier. We could all get rid of these. But they're there for a reason: they make life easier for the situations when you don't have to pass additional arguments. This is the case here too. I specified with every method I proposed, why it was there. This is completely ignored, which baffles me. I'm in the ORM business now for 14 years, I work with users of ORMs every day and have done so for the past 14-15 years. I know what mistakes they make. It's not as if I don't know what I'm talking about when I say: this api has to be as simple as possible; mistakes shouldn't be possible here.

But you came up with the variant which exactly allows that. The problem is that you don't have to explain to my users how to use this api, but I have to. ADO.NET provider writers have to. It was already a complex affair to explain to people how to register a factory in the web.config file when an ADO.NET provider didn't register itself in machine.config, which was the case with the PostgreSql and Firebird ADO.NET providers for some period of time. I had to specify the XML so they could copy/paste it, and even then they made mistakes because they got the version wrong in the strong name in the type specification.

That's what I mean with make it as easy as possible. You now propose an API which basically requires a set of extension methods to be easy to use so most people will use it without a problem: users of this method aren't ORM / ADO.NET specialists, they write websites, they create services, and using an ORM/ADO.NET api is a burden and out of their comfort zone. Proposing an API which makes it hard to use it correctly won't help at all.

We preferred the version of the method that takes a string for the type name vs. an actual type because this allows for the application code that calls into the API to be decoupled from the provider at compile time. If the type is available, it is enough to access the AssemblyQualifiedName property to obtain the right string to pass to this API. Incidentally this aligns with the actual implementation, which will only store the AssemblyQualifiedName.

Yes, it's 'enough' to use the AssemblyQualifiedName, but it's not intuitive. Why not allow an overload which simply accepts a type? What's so utterly bad about that? I already described I don't expect any code changes in the implementation of this API going forward, so there's little to no chance things need refactoring soon, if at all.

The invariantName argument moves to the first position because this is the unique identifier of each configuration record, e.g. it would be the key if this was a dictionary.

The first two are important, but the 2nd argument (assemblyQualifiedName) is actually the only value which is required. You can infer invariantName, like the netfx auto-init code uses too (and which my implementation uses as well).

About the 3rd and 4th argument:

Can be null in which case it is ignored.

I don't know but... passing null values for strings... Come on, why on earth would this be a properly designed API where it accepts nulls for strings, while the default value (so it's ignored) is "", not null.

Sorry, but in the current state I won't associate my name to it, so I won't port my implementation in the PR dotnet/corefx#19885 to this API; I'll live on forever as the person who would have implemented it into the BCL and netstandard and I refuse to do that.

// cc @karelz

@FransBouma
Copy link
Contributor

One small thing that bugs me as well is the following: why has the debate about this API been held in the secret and not right here in this issue? Isn't it better for everyone involved to have transparency and openness so we all know what's going on? Now no-one knows who said what for what reasons, only that 'this is the API' is the result, presented as a Law sent down from the Ivory Tower where the design committee is located.

@jnm2
Copy link
Contributor

jnm2 commented Jun 22, 2017

@FransBouma

why has the debate about this API been held in the secret and not right here in this issue?

Because they meet in person. Can you imagine trying to live-post here while talking?

Isn't it better for everyone involved to have transparency and openness so we all know what's going on?

That is exactly what is happening. They did not say this is final, only that this is the API they like and why, and they even specifically cc'd you for your review of their opinion.

Now no-one knows who said what for what reasons

The reasons are all listed in Rationale in https://github.com/dotnet/corefx/issues/20903#issuecomment-310225542. That's plenty to go on, as you showed in your response.

@dustinmoris
Copy link

dustinmoris commented Jun 22, 2017

I don't want to throw oil into a fire, but since you heavily tweeted about this I came here and I don't fully follow your anger.

Now no-one knows who said what for what reasons,...

I don't agree with this. As an outsider I can fully follow the argumentation that was provided alongside the proposed API. Whether you agree with those arguments is up for discussion, but "no one knows what" is certainly not true.

And what exactly do you mean by "why has the debate about this API been held in the secret" ?

That is just non-sense. Did you document all your customer conversations from the last 14 years somewhere int he public so we all can follow your reasons for your API decisions? If not, why do you keep those as a secret?

OSS doesn't mean that people need to keep protocol of everything they discuss in person. What is relevant is that reasons are provided for any decision and that has been done here, what else did you expect? You can still further discuss those reasons here...

@jnm2
Copy link
Contributor

jnm2 commented Jun 22, 2017

Also I'm really confused at the tone. It stands out of place as a curiosity in contrast to everything else going on here.

If history is a guide, antagonism isn't going to accomplish what you want.

@FransBouma
Copy link
Contributor

FransBouma commented Jun 22, 2017

@jnm2 @dustinmoris

This issue is 1.5 years old, if you don't know. A lot has been said about this issue in that period of time. What surprises me is that they didn't debate this particular api once during that 1.5 years, nor during the month or so since I picked it up. Perhaps it's a perception wrt what to expect regarding transparency, but I really don't see the added value of not debating things here in the open if everything else is open, especially since this issue is already so old and a lot already has been said.

Also I'm really confused at the tone. It stands out of place as a curiosity in contrast to everything else going on here.

I've already spend a lot of time on this and that's now wasted. I really don't see how I should be cheerful about that. The thing is that what's in the start post is what I proposed and what's already implemented, and I did all that with the best of my intentions and knowledge of the field in question. I now have no idea why any of that is wrong or irrelevant. You apparently do, that's great but I really don't. I simply like to know why exactly even a simple overload is already too much work or an overload which accepts a type.

Mind you, the purpose of each method I proposed is derived directly from my own experience with ORM users and their benefits sought.

(edit) It's not that I want my API to be accepted and nothing else, on the contrary. What I like to see is a reasonable API based on what I brought forward as knowledge regarding this particular API. That methods I proposed need to change their name, or that there needs to be an overload which accepts a string as a typename, fine by me, really. But now it's that I spent time on this API which was apparently for naught: Microsoft can't possibly think I'd simply say "sure, nice API, let's port my code over and toss all the results of my own thought process and the debates that have already been held overboard!".

@jnm2
Copy link
Contributor

jnm2 commented Jun 22, 2017

@FransBouma

I really don't see the added value of not debating things here in the open if everything else is open

Like I said, what's not open? The debate is ongoing in this thread.

I've already spend a lot of time on this and that's now wasted. I really don't see how I should be cheerful about that.

Because it's effective. You want to catch flies, use honey not vinegar. I don't believe for a moment that you truly want to waste time and be ineffective on this.

I now have no idea why any of that is wrong or irrelevant. You apparently do, that's great but I really don't.

No, I'm with you on that. I like your proposal better. But the tone is turning me away. Fyi.

@FransBouma
Copy link
Contributor

FransBouma commented Jun 22, 2017

@jnm2

Like I said, what's not open? The debate is ongoing in this thread.

No, it's really not. The API that's acceptable is what they proposed. If they had done the debate here in the open we could have participated in that debate. Now we can only bicker about the result and hope they will change their mind, but we can't be effective on that as we don't know why exactly they chose this over the others.

I don't believe for a moment that you truly want to waste time and be ineffective on this.

I don't follow. Everything is already implemented and tested and works. This new API requires a new implementation, new tests and a truckload of documentation to tell everyone how to use the API as there are no convenient overloads.

That there has to be a debate about whether there should be overloads at all is odd: like they didn't understand that overloads might be handy here.

No, I'm with you on that. I like your proposal better. But the tone is turning me away. Fyi.

Well, to give you an idea: I opened the related issue #45711.5 years ago. A tremendous amount of time was wasted in the issue, due to circumstances within MS. As no-one did anything I picked it up but it was a rough ride: tooling issues which were hard to solve and not a lot of documentation regarding tools to use made this a frustrating effort. I spent nearly a full workweek on this, and I'm sorry to say that as I am not really a slow programmer, but it now looks like it. A full workweek which led (due to release mess it was pretty much chaos in this issue and it led to the API review being proposed after the code) to a PR which isn't mergeable and which has an API which is a 180 from what's proposed as the API to implement. So this leaves me with a tough decision: I either implement this API which MS finds acceptable (and I will hate myself for that forever) which will also take time and effort, and I already spent too much on this, or I don't implement the API and the time I spent is wasted as well.

So yeah, that frustrates me a bit. This whole experience was dreadful for me and the outcome has been in line with that. If my tone is then not that friendly, I apologize for that, but it is what it is.

@jnm2
Copy link
Contributor

jnm2 commented Jun 22, 2017

No, it's really not. The API that's acceptable is what they proposed. If they had done the debate here in the open we could have participated in that debate. Now we can only bicker about the result and hope they will change their mind, but we can't be effective on that as we don't know why exactly they chose this over the others.

Let's just give the team a chance to respond to your feedback.

@terrajobst
Copy link
Member

terrajobst commented Jun 22, 2017

@divega

When we looked at all the sugar overloads in the original proposal we felt that they extend the surface and the add decision points without providing enough value.

We generally do not consider overloads as adding to the concept count. That's largely because the concepts are expressed by names and the overloads are just different mechanics to do the same thing. Logically, overloads are often considered being chained, i.e. they provide conveniences and shortcuts.

Personally, I don't like having a single method which supports a combination of things, based on which arguments are null or defaulted; rather, it's more usable to have overloads that do that for you. In many cases, I can just see from the way the overloads are structured, that certain values can be omitted or that certain arguments form logical groups and are mutually exclusive. That's super useful.

We preferred the version of the method that takes a string for the type name vs. an actual type because this allows for the application code that calls into the API to be decoupled from the provider at compile time.

I totally see the reason why you'd want to support a string-based registration. However, I don't think it's a good idea to omit the type-based one as this makes calling the API harder for the 80% case. Many people don't directly know how to get an assembly qualified name; typeof(Foo) is much more straightforward. Also, it seems to me that not having a compile-time dependency isn't an issue for most apps. Thus, I suggest adding that one back.

@FransBouma

I find this API so bad actually, that I refuse to implement it: every ORM will likely implement some extension methods which make this single method completely obsolete anyway.

I find that argument not very convincing. Granted, I agree with your sentiment that this API is more cumbersome to use than we probably want but (1) I think we can fix that and (2) if can't or won't I'd solve this problem by delegation, not by igoring the API altogether.

What I would do is follow what ASP.NET Core has done with their IAppBuilder system: middle-ware provides one-line registration methods and they in turn call the low-level registration APIs that take lambdas and whatnot.

In other words, I'd provide a method that I'd instruct customers to call from their Main, like so:

ImmosFantasticOrm.Register();

Now, I'm not saying that's what you should do; but if I had to solve usability issues, I'd rather do this than not using a provided extension point many things plug into.

@FransBouma
Copy link
Contributor

FransBouma commented Jun 22, 2017

@terrajobst

I find that argument not very convincing. Granted, I agree with your sentiment that this API is more cumbersome to use than we probably want but (1) I think we can fix that and (2) if can't or won't I'd solve this problem by delegation, not by igoring the API altogether.

It's actually rather simple :) Let's say ORM L relies on DbProviderFactory instances internally (most do, except EF and NH, the latter uses reflection). For a user to use L, she needs to register the factory for the DB she wants to use. This is overhead which for some users already might be confusing: shouldn't the ORM take care of that? (it should, but it can't as it doesn't know the ADO.NET provider, but that's a complex story for the user as well) They need to register factories to make L work at all. In no way I want to imply users of ORMs aren't able to understand this, it's simply not in their field of expertise, so that they need to register a factory is new info for a lot of them.

If anything goes wrong with that registration, L won't work, will throw errors or worse: it might not throw errors but .NET might throw an error that some factory isn't known... So for the developers of L (and likewise tools) it's key to make sure the user of L doesn't get these errors. For netfx we can rely on installers for ado.net providers in most cases or e.g. that SqlClient is already registered. For .net core this has to be done by the user of L. So for me, to avoid any issues for my users, I'd provide methods which makes it much harder to fail at registering a factory than with the api proposed. And as my fellow ORM devs are much smarter than me, I am pretty sure they will too ;)

What I would do is follow what ASP.NET Core has done with their IAppBuilder system: middle-ware provides one-line registration methods and they in turn call the low-level registration APIs that take lambdas and whatnot.

That's an idea, though we can't as one crucial element is missing: the type of the ado.net provider's factory. So we can't simply let the user call a method without arguments, sadly: which ADO.NET provider dll is to be used when that method is registered?

So alternatively, with the idea you brought forward, you could perhaps make ADO.NET provider writers call the api instead. So instead do:

System.Data.SqlClient.SqlClientFactory.Register();

Writing the code for this issue I pondered a bit about what could have been done differently (as the API in netfx how it is setup isn't stellar) and one thing I realized was that this whole registration business should be with the ADO.NET provider itself.

(with one caveat, it should be possible to override a type for a given invariantName, so you can wrap a factory with a profiler or logger factory)

ps: This kind of debate is what I meant with: keep it in the open. Thanks @terrajobst :)

@divega
Copy link
Contributor Author

divega commented Jun 23, 2017

@FransBouma @terrajobst trying to catch up with the long thread. I am not going to try to answer on every point, but hopefully I can address some important ones.

From @terrajobst

We generally do not consider overloads as adding to the concept count...

I agree the effects of multiple overloads is different from multiple methods. However I don't agree that having multiple overloads doesn't add any complexity. Don't call it concept count, but it adds to the decision points. I agree with adding options that prove useful.

From @terrajobst

What I would do is follow what ASP.NET Core has done...

I agree. Our intention was actually to get the basic building block right, then iteratively look at how to improve the usability for the actual 80% scenario that emerges. Perhaps the problems is that this iterative improvements process doesn't work well with the nature of BCL and its schedule, and we should instead have been more proactive about having a more complete solution from the start.

From @FransBouma

This is completely ignored, which baffles me. I'm in the ORM business now for 14 years...

Have you heard about YAGNI? 😄 Try not to take things personally, but I would challenge any framework developer, even myself, who proposes to add every possible overload of an API, no matter how much experience they have, to build the API from the ground up and argue for real scenarios. I suspect we don't agree on what shapes of this API would actually be useful in real scenarios, but that would be fine, as long as we can rely on actual consumers of the API to provide feedback.

From @FransBouma

What I was after was a way to let them make as less mistakes as possible,

I can completely agree with that. In fact you have convinced me that with just the one basic method isn't enough to achieve that.

BTW, all this chat about extension methods is making me think that having something like this would be nice:

SqlClientFactory.Instance.RegisterFactory(...);

This would look like one of @FransBouma's previous proposals in the PR with generics, but with a this argument. I can see that being what covers the 80% scenario. I think I would be ok with having this defined for DbConnection as well.

I am not sure about this but since DbProviderFactories is a static class that already hosts methods in this area, I think it could itself be the sponsor class for the extension methods.

Thoughts?

Also, @stephentoub has brought up the fact that maintaining an internal ConcurrentDictionary<string, DbProviderFactory> could make GetFactory() much more efficient than it is today. We don't know what the actual impact would be, but we think the implementation would be simple, and a version of RegisterFactory that takes an instance could take full advantage of that.

@terrajobst
Copy link
Member

terrajobst commented Jun 23, 2017

@divega

I agree the effects of multiple overloads is different from multiple methods. However I don't agree that having multiple overloads doesn't add any complexity. Don't call it concept count, but it adds to the decision points.

I do agree with the fact that overloads increase complexity. But in many cases the complexity is a function of usage patterns. For instance, if all I need is this:

DbProviderFactories.RegisterFactory(typeof(ImmosFancyDbProvider));

But by making me write this:

DbProviderFactories.RegisterFactory(
    nameof(ImmosFancyDbProvider),
    typeof(ImmosFancyDbProvider).AssemblyQualifiedName,
    nameof(ImmosFancyDbProvider),
    "Immo's Fancy Provider"
);

You also increase complexity by what is known as cognitive dissonance, i.e. what I have to write in code doesn't match with the operation I have in my head. I'd argue that the complexity in choosing the overloads is the same set of choices as deciding what values to put in my code, however, IntelliSense helps me by showing possible patterns and thus provides some guidance. Personally, I think the net complexity is slightly lower with the overloads than without.

Now, I do buy the argument that maybe this isn't the API where usability is important, i.e. that it is a low-level building block and that real usability improvements are elsewhere, e.g. in providing an extension method. In that case, I think it's fair to say "look, I'd rather invest the cost of adding APIs there than by making the low-level stuff more usable".

I am not sure about this but since DbProviderFactories is a static class that already hosts methods in this area, I think it could itself be the sponsor class for the extension methods.

I'd be fine with using DbProviderFactories for hosting the extension methods. We have types like Enumerable which host both static as well as extension methos and I don't see any issues with that. Just keep in mind that adding extension methods to core types comes with a tax. For instance, I suggest not to add them on DbConnection but rather on DbProviderFactory.

@divega
Copy link
Contributor Author

divega commented Jun 23, 2017

@terrajobst I don't disagree with any of these things. I value finding the sweet spot between a spare and a intuitive API. I believe what we need to explore more is what usage scenarios are common and important.

Just keep in mind that adding extension methods to core types comes with a tax.

Can you please elaborate a bit more on the tax for future reference?

@terrajobst
Copy link
Member

terrajobst commented Jun 24, 2017

Can you please elaborate a bit more on the tax for future reference?

Create a console app. Make sure to import System.Linq. Declare a local variable of type string. Invoke IntelliSense. Drown in extension methods 😄

In other words, the tax is that extension methods will often show up, even if not always desired. Sometimes a static method taking an instance is better suited precisely because it is less in your face.

@FransBouma
Copy link
Contributor

FransBouma commented Jun 24, 2017

@divega

Have you heard about YAGNI? 😄 Try not to take things personally, but I would challenge any framework developer, even myself, who proposes to add every possible overload of an API, no matter how much experience they have, to build the API from the ground up and argue for real scenarios. I suspect we don't agree on what shapes of this API would actually be useful in real scenarios, but that would be fine, as long as we can rely on actual consumers of the API to provide feedback.

😄 I always run a script to get every possible argument mutation ;). But seriously, not every possible overload is required of course, but there are multiple scenarios here, 2 in particular: one where only invariantname / type is specified and one where everything is specified (your version). The latter likely will never be used or very rarely. The former will the one used by most, if not all people, if just 2 methods were provided. As you can further slim this down as the invariantName can be inferred (as it's the namespace, see the internal logic of NetFX's DbProviderFactories), a convenience variant of the first method is desired.

(I had to chuckle about the close-to-bikeshedding regarding whether method overloads were useful btw. Didn't a former colleague of yours write a book about this? 😉 )

I can completely agree with that. In fact you have convinced me that with just the method isn't enough to achieve that.

Good :)

(Extension methods ... ) I am not sure about this but since DbProviderFactories is a static class that already hosts methods in this area, I think it could itself be the sponsor class for the extension methods. Thoughts?

When I ported the code over I realized that the registration / key/value stuff isn't well designed in netfx wrt DbProviderFactory instances: the key, the invariant name, isn't provided by the type it represents, but in practice, the only source which/who can define the invariant name is the developer who creates the ADO.NET provider, so the invariant name should be produced by something in the ado.net provider. In Netfx they can get away with it as the types are registered in either app/web.config or in machine.config, and the latter is the general way to do this (using an installer, likely provided by the ADO.NET provider developer) but in CoreFX this isn't the case.

In CoreFX the ADO.NET provider is referenced through NuGet in a lot of cases anyway (as there's no Gac), so the types are known at the app level. In cases where this isn't, a string based approach could be used. DbProviderFactories are not for the code at app level but for the code in the ORM used, to avoid coupling there, hence it's still useful to have this machinery.

So I realized the registration of the factory in DbProviderFactories should be logic inside the ADO.NET provider. The only problem is: how to enforce that. In .NET land we have companies like Sybase and Oracle who are Feet Dragging World Champions and are not likely to add something that 'should be there but isn't enforced through e.g. an abstract method which has to be implemented'. It's now by convention that there's a static 'Instance' method on the DbProviderFactory type, but it's not enforced, only by the DbProviderFactories internal implementation. Adding similar machinery (so based on 'good faith') for registration is probably going to be problematic as the feet draggers highly likely overlook this.

It would be nice, absolutely, but unless there's an enforcement implemented, it's fragile.

Also, @stephentoub has brought up the fact that maintaining an internal ConcurrentDictionary<string, DbProviderFactory> could make GetFactory() much more efficient than it is today. We don't know what the actual impact would be, but we think the implementation would be simple, and a version of RegisterFactory that takes an instance could take full advantage of that.

I first had a design around ConcurrentDictionary, as indeed that's all it takes. I decided against it after I realized the following.

  • To comply with the NetFX api, there still has to be code present to build the DataTable with the values. This would make the code of the entire class more complex than keeping the registered factories in a DataTable instance (as both the concurrent dictionary management code has to be there and all the datatable code to build it)
  • In the Netfx datatable there are more columns than just 2, and to store all that info in a concurrentdictionary, a class to keep the data per entry has to be created, which adds to the complexity (and you thought adding an overload already adds complexity hehe ;))
  • For performance this is not needed: In Netfx's api (and thus also here), you get the same factory back every time you call the GetFactory method with the same invariant name. In general you therefore call the method only once per application session, and cache the DbProviderFactory instance from then on. To my knowledge every ORM using DbProviderFactories does this. This makes optimizations in this area not needed. I do realize there are people who are in the camp of 'if something is returned by a method or property and it's cacheable, the callee should cache it, not the caller', and in the general sense they do have a point, however as this is an API with a long period of usage already behind it, the behavior how to use it won't change much (as the people using it will be ORM devs).
  • (Minor point, but still) On Netfx, most ADO.NET / ORM profilers, including mine, use a hack to wrap the existing DbProviderFactory instances registered with a wrapper factory so they can wrap every object created with a wrapper too to do tracking/profiling. To make porting these interceptor libraries to .net core easier, if they can rely on the same hack that would be great. But as I said, it's minor, they can also refactor their code a bit and call the registration method again to overwrite the factory in place. The other points are mostly the reasons why I kept the datatable oriented logic.

The API (and code) I came up with was the result of a lot of thinking about use cases and compromises that have to be taken wrt Netfx compatibility and also what is to be expected what 3rd party ADO.NET provider developers will do: it's not the API one would come up with if everything is started over, but I think it's the API that works best, knowing usage patterns of the netfx' variant of DbProviderFactories and what people do wrong there.

@divega
Copy link
Contributor Author

divega commented Jun 24, 2017

@terrajobst Ah, I thought you could be talking about something more subtle. In this case I think an extension method on DbConnection would be acceptable as long as we buy the argument that there is an important scenario in which with a connection instance at hand I want to make sure that its provider factory is registered. I personally don't buy that (that is why those overloads that take DbConnection were not included in the proposal), but I am open to discuss it since @FransBouma seems to believe it is important enough.

@FransBouma
Copy link
Contributor

@divega The reason I want a DbConnection oriented registration variant is that the type is likely known by the user. I mean: they likely know the existence of 'SqlConnection', but I doubt many know the existence of SqlClientFactory and have to look it up. (Additionally, there's also a GetFactory(DbConnection) variant in the Netfx API of DbProviderFactories)

@karelz
Copy link
Member

karelz commented Jul 17, 2017

@divega any update? It would be nice to push the API forward. Or do you want to wait for @saurabh500 to get back from his leave?

@divega
Copy link
Contributor Author

divega commented Jul 17, 2017

@karelz this is still sitting on my plate, I have just been busy with other things. Planning to come back to it next week.

@karelz
Copy link
Member

karelz commented Aug 6, 2017

@divega any progress?
It's been another 3 weeks ;) ... I know summer time is hard due to vacations, but this issue drags for quite some time :( ... it would be really great if we can set some realistic expectations (even if it means long wait time).

@karelz
Copy link
Member

karelz commented Aug 14, 2017

@divega ping?

@divega
Copy link
Contributor Author

divega commented Aug 14, 2017

@karelz, @FransBouma, and everyone else interested: I am sorry it is taking me much longer than I had hoped to come back to this issue. And I am sorry it is blocked on me. 2.0 has kept me quite busy until the last minute, and today I am starting a family vacation. I will be back before the end of the month, and then I will make this one of my top priorities.

Thanks @karelz for your pacient insistence 😄

@divega
Copy link
Contributor Author

divega commented Sep 5, 2017

I have been discussing (initially only with @ajcvickers) the details of a new proposal to capture what we have learned so far, as well as the specific concerns we had with some of the API that have been previously proposed. I found that for all the time we all spent on this issue, we are still far from reaching consensus, and that there are still many lingering concerns that need to be addressed.

So I just wanted to update the thread with information on where we are. Consider this an early draft, and please respond with feedback. I hope this can help us converge into something that can be soon added for 2.1.

Scenarios

We are currently contemplating two main scenarios:

  1. Provider registration: An application developer adds calls on application startup code that programmatically register a standard ADO.NET provider factory with the purpose of enabling a library (e.g. an O/RM) which depends on being able to resolve the provider factory based on a well-known provider invariant name (commonly contained in configuration, alongside the connection string).

  2. Provider replacement: An application developer adds calls on application startup code that programmatically replace an ADO.NET provider already registered under a given provider invariant name, with a provider that augments its capabilities, e.g. by implementing interception. This has historically been a common practice on .NET Framework based data access profilers, for lack of a general means to obtain rich diagnostics information about database requests being processed by ADO.NET providers.

Call to action: Help us improve these descriptions and identify additional important scenarios that could be missing in this list.

Scope

This proposal centers on adding new members to the static System.Data.Common.DbProviderFactories class which don't currently exist in .NET Framework, but that are needed in .NET Core because we cannot rely on ConfigurationManager backend and DbProviderFactories is a read-only API in .NET Framework.

These APIs additions should be adequate to be ported back to .NET Framework in the future, at which point they can also be incorporated into a future version of .NET Standard.

Existing API

These are the current members on DbProviderFactories. Achieving good synergy with the existing API is a goal, although the existing API contains several quirks that we don't necessarily want to embrace, like the usage of DataTable and DataRow instead of a simpler object model.

(I have added some information on API usage along the signatures, which I obtained from https://apisof.net)

public static DbProviderFactory GetFactory(string providerInvariantName) {...} // api port 1 %, nuget.org 1.3 %

public static DbProviderFactory GetFactory(DataRow providerRow) {...} // api port 0.3 %, nuget.org 0.3 %

public static DbProviderFactory GetFactory(DbConnection connection) {...} // api port 0.2 %, nuget.org 0.2 %

public static DataTable GetFactoryClasses() {...} // api port 0.5 %, nuget.org 0.4 %

Concerns with previous proposals

Usability of overload that that takes provider factory assembly-qualified type name

We acknowledge feedback that although this overload can be the fundamental building block of the API, it can be more difficult to use and error prone that other alternatives.

Name and description arguments

We have not reached consensus on whether there is any real value in being able to associate a name and description with a provider factory registration. That is something that the ConfigurationManager based mechanism used in .NET allows (or requires in the case of name), but that we don't believe it is that common to consume. In fact, the only way to obtain the name and the description for a provider registration is to look into the DataTable returned by GetFactoryClasses().

For the purpose of this draft, we are going to leave this as an independent decision point, i.e. we won't include signatures that include these two arguments in the description of the methods below, but if we agree there is enough value, we can add appropriate overloads.

Instance vs. type based registration

Provider factories have been part of ADO.NET since the times of .NET Framework 2.0. There has since then been a prescribed pattern to make them effectively singletons: a private parameterless constructor and a public static read-only field called Instance. However this prescribed pattern is not enforced by the type system. Any API that takes an instance of a DbProviderFactory as an argument and uses it only to register the type name is taking a bet on the fact that this contract is being respected by the provider, and that two instances will always be one and the same. We don't know 100% if this is the case in all providers, in particular for wrapping providers used for replacement. We believe it would be slightly better if DbProviderFactories was able to return the same instance tht was registered. Another reason to do this is the potential performance benefits (@stephentoub's intial investigation on this showed that maintaing a concurrent dictionary cache could make the implementation of GetFactory(invariantName) up to ~70 times faster, and this is the most commonly used API on this class).

Connection instance vs. factory as the starting point

The original proposal includes method overloads that register a factory based on a connection instance. We have discussed this and re-read all the feedback in this thread about why this could be important. It seems that the main point is that many customers that may need to register providers (for either scenario 1 or 2) will not be familiar with the factory types but could be familiar with the connection types. This seems to be a legitimate concern.

However it is a bit strange when you think about how the user would need to create a connection instance before they can register the factory that the application later is supposed to get the connection from. In the case of scenario 2 (provider replacement), the type of the connection instance used initially will not even match the one that would be obtained later.

We are still wondering if it is reasonable for consumers of this API to be informed about provider factory types at the same time that they are informed about the need to call these APIs. For the purpose of this draft, we won't include these overloads, but we would like to understand this a bit more before making a call.

Most of the concerns about this were raised by @ajcvickers, so I will let him expand more on these if necessary.

Inferring the invariant name from the type's namespace

The original proposal includes registration overloads that assume the invariant name to be used is the namespace of the provider's factory class. This seems to in fact match the actual default provider invariant names chosen by provider writers, with the notorious exception of the SQL Server CE providers, which append version numbers to their invariant names.

We haven't been able to reach consensus on whether these should be included or whether they could become a pit of failure.

One of the possible issues is that the overload would register the wrong invariant name 100% of the time for the provider replacement scenario.

Another possible issue is that this convention is not enforced by any means and can be incorrect for existing providers or future providers.

For the purpose of this drafts, we are not including overloads that infer the invariant name, but we can add them if we reach consensus that they are more value than the possible pitfalls.

Again, most of the concerns about this were raised by @ajcvickers, so I will let him expand more on these if necessary.

High confidence additions

Here a list of APIs we are convinced will enable the scenarios described above and that we believe present a very low risk of becoming pits of failure.

Register extension method overload that takes the factory instance

public static void Register(this DbProviderFactory factory, string invariantName) 

Example usage:

// as extension method
SqlClientFactory.Instance.Register("System.Data.SqlClient");
// as static method
DbProviderFactories.Register(SqlClientFactory.Instance, "System.Data.SqlClient");

Pros:

  • Compared to the overload that takes a provider factory assembly-qualified type name, this overload is an extension method and can be used on the provider factory instance without having to copy a long string

Cons:

  • It can be error prone of the invariant name is typed by hand.

Neutral (i.e. can be seen as both pros and cons):

  • It requires the user to be aware of provider factories
  • It requires the user to be aware of invariant names

Register overload that takes assembly-qualified name

public static void Register(string factoryAssemblyQualifiedName, string invariantName)

Example usage:

DbProviderFactories.Register(
    @"System.Data.SqlClient.SqlClientFactory, 
    System.Data, 
    Version=2.0.0.0, 
    Culture=neutral, 
    PublicKeyToken=b77a5c561934e089",
    "System.Data.SqlClient");

Pros:

  • Allows a registering multiple provider without loading any assemblies until the providers are actually used.

Cons:

  • It can be error prone if the assembly qualified name or the invariant name are typed by hand.

Optional additions

These are APIs that could be added and for which we don't see high risks, but for which the value added isn't clear.

Register overload that takes a factory type

public static void Register(Type factoryType, string invariantName) 

Example usage:

DbProviderFactories.Register(typeof(SqlClientFactory), "System.Data.SqlClient");

Cons:

  • Given the prescribed pattern for provider factories to have an Instance field to access the singleton, this API seems redundant and less compelling than the one that takes an instance, because it cannot be used as an extension method.

TryGetFactory method that doesn't throw if the provider is not registered

We believe this would a nice addition, so that higher level libraries can easily probe if a provider is registered without relying on DataTable or handling exceptions:

public static bool TryGetFactory<TFactory>(string invariantName, out TFactory factory) where TFactory : DbProviderFactory

Example usage:

// with var out
if (DbProviderFactories.TryGetFactory<SqlClientFactory>("System.Data.SqlClient", var out factory))
{
    return factory.CreateConnection();
}
else
{
    return new SqlConnection();
}
//

Un-registration methods

These are mostly for completeness. They don't belong in any fully articulated scenario we have identified but given that an application will be able to programmatically register provider factories it seems a bit strange that they cannot be unregistered.

public static bool Unregister(string invariantName) {...}

public static bool Unregister(this DbProviderFactory factory) {...}

public static bool Unregister(Type factoryType) {...}

public static void Clear() {...}

@karelz
Copy link
Member

karelz commented Nov 7, 2017

Great! Reassigned to you @FransBouma. If you don't have time, or change your mind, just let us know ;).

@FransBouma based on your reply I assume you are ok with the final shape of the API and you don't have any must-have feedback we should discuss prior to implementation. Is that correct assumption?

@karelz
Copy link
Member

karelz commented Nov 8, 2017

FYI: The API review discussion was recorded - see https://youtu.be/HnKmLJcEM74?t=4047 (48 min duration)

@FransBouma
Copy link
Contributor

@karelz

based on your reply I assume you are ok with the final shape of the API and you don't have any must-have feedback we should discuss prior to implementation. Is that correct assumption?

correct :)

Question: I have a PR with the original code / API I wrote for this, that's flagged as not merge etc. Is it better to commit to that PR or delete that PR and create a new one?

@FransBouma
Copy link
Contributor

FransBouma commented Nov 8, 2017

@terrajobst @divega
Refactoring the code now, this method:

    /// <summary>
    /// Returns the invariant names for all the factories registered
    /// </summary>
    public static IEnumerable<string> GetProviderInvariantNames();

the problem is, the data is stored in a shared resource, so locking (readwritelockslim) is used for its access. This gives a problem with this method, as the enumerator traversing the stream would require a full lock (it can't deal with a write mid-enumeration). Internally projecting the names to a list and returning that is also a bit lame, as it's actually 1) running into the same problem and 2) it mitigates the whole purpose of returning IEnumerable.

I'll revert the internal implementation to ConcurrentDictionary and create a DataTable construction function for GetFactories() (which currently creates a copy, like in NetFX), as ConcurrentDictionary offers enumeration with writes happening.

We do lose some aspects which might not have been considered because of this though:

  1. Calling GetFactories() twice might give potentially different order in the rows. Mitigating factor: it's not a given in the API spec, so relying on that is not warranted, but it is an aspect of the NetFX api, so the APIs will differ in behavior.
  2. Code which currently, through reflection, changes the factory types in the datatable to wrap the factories (Glimpse, Hibernating Rhino's profiler and my own do this, on NetFX). MItigating factors: this is an implementation detail and not part of the API, porting profilers to CoreFX does require rewriting code anyway, so this is then a requirement as well.

Keeping both internally is really not great.

So... what to do?
TIA

(edit) If I cache the created datatable which is copied for GetFactories(), and re-create it when a new factory is registered, point 1 is solved. Point 2 is of little relevance, so I think we're all good.

@karelz
Copy link
Member

karelz commented Nov 8, 2017

Question: ... Is it better to commit to that PR or delete that PR and create a new one?

@FransBouma it's up to you what you think is easier for you and PR feedback. New PR might start fresh with new comments and new rebase, so it may be easier overall. But if current PR is very close to final code, I can imagine it will be easier for you to reuse it ...

@FransBouma
Copy link
Contributor

FransBouma commented Nov 8, 2017

@karelz I think it's better to use a new PR regardless as the history of the old PR isn't really relevant for this new code. I'll refer to it in the new PR. I lean towards going for the ConcurrentDictionary approach (see previous message) so the previous PR's code isn't close to the final code.

I'll likely make a mistake with rebasing / history squashing as I haven't done that before (once I think) as I mainly use subversion, but we'll address that when the PR is posted ;)

edit: old PR closed, will refer to it in new PR. I expect all code to pass the tests later this week.

@divega
Copy link
Contributor Author

divega commented Nov 8, 2017

@FransBouma I also think it is ok to switch the implementation to use a ConcurrentDictionary<string,TValue>, with TValue possibly being a struct that can hold both a reference to the actual factory instance (if has already been resolved) and the metadata necessary to get it (otherwise). Just a few thoughts about this:

  • We intentionally left the human-readable name and description fields out of the design for the new APIs because they didn't seem relevant anymore.
  • Although I agree maintaining reflection-level compatibility and the result ordering are non-goals, I think we need to keep in mind bringing these new API to .NET Framework (so that hopefully profilers will be able to use this public API across all modern versions of .NET at some point in the future).
  • In that sense, while we only touched on GetProviderInvariantNames briefly during the API review meeting, I left with the impression that for a DataTable-based implementation, you would need to lock and copy to a list, then either return AsReadonly() or defer this work to GetEnumerator(). I am not convinced this defeats the purpose of having the API return an IEnumerable<T>. At least that seems a simpler surface than other alternatives. Is there anything else you think the API should return?

cc @terrajobst @ajcvickers in case they have other insights on this.

@FransBouma
Copy link
Contributor

FransBouma commented Nov 9, 2017

@divega In the current implementation I don't fill the columns name and description indeed, the returned datatable from GetFactories returns the columns for api compatibility but I fill them with string.Empty. If people need these values it's always possible to add overloads in the future, so not a big deal indeed.

Regarding the overwriting of the internal datatable... I've always found that a dirty hack to be honest :), but it was the only way to intercept the factories. Now that this API allows to register a different factory under an invariant name which is already used with a registered factory, it's easier to wrap the factories now: first call GetProviderInvariantNames, loop over that and for each name obtain the factory and wrap it with your wrapping factory and re-register it under the invariant name (so replacing the real one).

The profiler using the hack could stop working if the inner workings of NetFX's implementation change from datatable to concurrentdictionary of course, but as it's a hack relying on an internal implementation which is nowhere documented, it's not weird that it perhaps breaks at some point. As someone who potentially is affected by such breakage, I don't mind this breaking change.

That said, the current code I have now is still datatable based, it depends on the GetProviderInvariantNames implementation whether a DataTable based approach is feasible or not: I have to upscale the lock to a write lock for the method, while using the ConcurrentDictionary I can simply enumerate that and yield the key.

It's not code that'll appear on any hot path however, most of this code is called once and at startup, and I doubt multiple threads will be either registering and reading info about factories at the same time so if some thread is blocked for a short moment I don't think it's that big a deal to upscale the lock to a Write lock, do the DataTable enumeration and copy the names into a structure and return that in some form, but just to be sure I'd like your final opinion what to do :)

In that sense, while we only touched on GetProviderInvariantNames briefly during the API review meeting, I left with the impression that for a DataTable-based implementation, you would need to lock and copy to a list, then either return AsReadonly() or defer this work to GetEnumerator(). I am not convinced this defeats the purpose of having the API return an IEnumerable<T>. At least that seems a simpler surface than other alternatives. Is there anything else you think the API should return?

That sounds like a reasonable implementation, the AsReadOnly() call. Personally what I always find about IEnumerable is that if it's an enumerator over a set anyway, it's a bit of a waste: e.g. if you want to do a Count, you have to enumerate it first and as it's under the hood an IList anyway why not return an IList<string> or ICollection<string>, so it's more clear what the caller gets.

However, if the implementation is meant to be potentially streaming the names, IEnumerable<T> is the only way of course. In this context that's a bit cumbersome to do as it potentially will block readers longer than necessary due to the lock needed (a lock at write level, so everything is blocked), which can only be lifted after the enumeration is complete which can potentially take a long time as it depends on what the caller is doing.

From the API description it's unclear whether GetProviderInvariantNames is a streaming method or a method which returns a projected set, hence my question :)

PS: I don't want to re-do the API review all over again, I just ran into this when implementing it and thought I should mention it :)

(edited: clarity)

@divega
Copy link
Contributor Author

divega commented Nov 9, 2017

@FransBouma ok, let me summarize what I believe you are saying, just to confirm:

  1. For .NET Core, the implementation will use ConcurrentDictionary for storage 👍
  2. For .NET Framework, a future implementation of GetProviderInvariantName is still feasible: although it may have to introduce additional synchronization around accessing the DataTable to be safe, it does not seem to be a big concern due to the usage of the API
  3. You have a slight preference to return an actual collection type, e.g. ICollection<string> from GetProviderInvariantName, but you don't want to reset the API review 😸

I don't have a strong opinion on (3). I would like @terrajobst to weigh in for that. I know for your example of Count, the corresponding LINQ operator optimizes if the runtime type can be converted to ICollection<T> or ICollection.

@FransBouma
Copy link
Contributor

@terrajobst @divega
There's another subtle issue with the API, which only occurs in the CoreFX variant because the CoreFX API allows registering another factory with an already known invariant name. I ran into it when I was refactoring the code to ConcurrentDictionary<T, U>.

The method GetFactory(DataRow) is problematic. Consider the scenario:

  • you register a factory FReal under the invariant name N.
  • you call GetFactoryClasses(). This gives you a DataTable with the type of FReal defined with N in a DataRow R. You store this datatable somewhere (I know, why would you do this, but people do idiotic things! :) )
  • you elsewhere register another factory under N, FProfiler, using RegisterFactory.
  • you grab the DataRow of N from your stored datatable and pass it to GetFactory(DataRow)

What factory should be returned? FReal or FProfiler? On NetFX it will return FReal, as that's the type in the DataRow, and the code assumes that it can't differ from the internal store as there's no way to change the internal store (unless you use reflection). However on CoreFX we do have re-registration of an invariant name, so on CoreFX, it's not clear what to do: return the factory type stored in the row, or return the type currently registered under the invariant name in the DataRow, ignoring the type in the DataRow?

If I look at the documentation of the current NetFX API (GetFactory(DataRow)), it says:

The providerRow parameter corresponds to the DataRow of a table returned by GetFactoryClasses.

Thing is: if I call GetFactoryClasses() right before GetFactory(DataRow) I get a datatable with FProfiler as registered factory type.

I know, nitpicking perhaps, but better safe than sorry. :)

FransBouma referenced this issue in FransBouma/corefx Nov 10, 2017
@FransBouma
Copy link
Contributor

FransBouma commented Nov 10, 2017

@divega @terrajobst @karelz
Ok, I have a temp implementation done, with tests which all pass:
https://github.com/FransBouma/corefx/tree/DbConnection_ProviderFactory_19826

If the replies to my (small) questions above are in line with what I have implemented, I'll post a PR from the code, otherwise will change the code to make it match what's decided and then push the PR (after squashing some commits I recon).

I saw in this list of changes it does add a test for SqlClient.SqlConnection.ProviderFactory property (done back in May) as it's required for this work to function (GetFactory(DbConnection))

(edit) to elaborate what's implemented wrt the small points above:

  • Internally it uses ConcurrentDictionary, which stores per key the DbProviderFactory instance. (no other info is needed)
  • It builds the DataTable which is returned by GetFactoryClasses, every time it's called. This was much simpler than introducing locks to invalidate a cached datatable, and it otherwise would create a copy anyway so allocations would be there, regardless.
  • GetFactory(DataRow) returns the factory instance related to the type in the DataRow, this is in line with the NetFX code, but as described above in my previous post, could lead to a weird situation.
  • GetProviderInvariantNames simply does a ToList() on the Keys property of the concurrentdictionary. This is thread safe as ConcurrentDictionary copies the Keys set before returning it.

@divega
Copy link
Contributor Author

divega commented Nov 15, 2017

@FransBouma, trying to answer your questions:

On NetFX it will return FReal, as that's the type in the DataRow, and the code assumes that it can't differ from the internal store as there's no way to change the internal store ...

That is interesting indeed. It sounds like the current implementation on .NET Framework relies on the fact that the configuration cannot change at runtime to implement some shortcuts. At first glance, here is how I think about it: When we add the RegisterFactory methods to .NET Framework implementation, the assumption that the configuration cannot change will simply become incorrect, so any shortcuts based on it will have to be removed. Adding a test that breaks unless that is the case seems to be the best way to go about it for now. Sounds good?

Internally it uses ConcurrentDictionary, which stores per key the DbProviderFactory instance. (no other info is needed)

When we discussed the overload that takes the assembly-qualified name in API review, we said the only reason to have it in the API was that it allows delaying the actual loading of the assemblies and instantiation of the providers, which could be useful if the application configures many providers, but ends up using only a few of them (which was the common case with .NET Framework).

@FransBouma
Copy link
Contributor

FransBouma commented Nov 15, 2017

@divega

That is interesting indeed. It sounds like the current implementation on .NET Framework relies on the fact that the configuration cannot change at runtime to implement some shortcuts. At first lance, here is how I think about it: When we add the RegisterFactory methods to .NET Framework implementation, the assumption that the configuration cannot change will simply become incorrect, so any shortcuts based on it will have to be removed. Adding a test that breaks unless that is the case seems to be the best way to go about it for now. Sounds good?

So in short:

  • the GetFactory(datarow) has to ignore the type name in the row and simply use the invariant name in the row to lookup the real factory (it currently uses the typename in the row)
  • a test has to check whether the type of the returned factory is the same as the one in the row passed to the GetFactory method. If they differ the test has to fail.

? Sounds good.

When we discussed the overload that takes the assembly-qualified name in API review, we said the only reason to have it in the API was that it allows delaying the actual loading of the assemblies and instantiation of the providers, which could be useful if the application configures many providers, but ends up using only a few of them (which was the common case with .NET Framework).

I completely overlooked that part. Hmm, then there's another problem tho: how to check whether the specified typename is valid? I can't simply accept is as valid without loading the type, but loading the type does bring in the assembly into the appdomain I think, mitigating the purpose of the method...

Storing the type name as string as-is will cause a problem when the factory is obtained later on and the type doesn't resolve to a factory at all (e.g. a typo was made, wrong type was specified), which is odd as the registration was successful; other RegisterFactory methods fail when the wrong info is specified (e.g. the wrong Type is specified). That's inconsistent (and unexpected, as the error will pop up with GetFactory, handling code for a failure that a type is registered wrongly won't be at that spot, but will be at the spot where the RegisterFactory method is called)

(edit) It's also not enough to reflect the type, a valid factory type is a type which implements DbProviderFactory, has a static Instance property and that property returns an instance of the type. All those checks are done with the other RegisterFactory methods, but the string based variant then has to postpone all those checks to a later point, which makes this implementation rather odd.

I get the theoretical use case of the method. I think however it's a bit of a theoretical exercise rather than a practical situation anyone will run into, as, when it's the case they run into memory problems due to the registration of too many provider factories, they can also do the filtering of what to use themselves. Running into memory problems however due to too many assemblies being registered (I don't know what other problems there might be that this is needed) when you're talking to databases is... curious, to say the least ;)

@divega
Copy link
Contributor Author

divega commented Nov 15, 2017

That's inconsistent (and unexpected, as the error will pop up with GetFactory, handling code for a failure that a type is registered wrongly won't be at that spot...

What happens today in .NET Framework when you call GetFactory() if the config files contain an invalid assembly-qualified type name or one that cannot be resolved or loaded? I am not sure what validation happens at registration time and what exceptions occur in GetFactory().

I get the theoretical use case of the method...

Certainly this overload and the one that takes the type are IMO much less important than the one that takes the instance. We could re-discuss (without resetting the whole thing 😄). The argument was always non-functional: avoid the waste of cycles and memory in that scenario.

the GetFactory(datarow) has to ignore the type name in the row and simply use the invariant name in the row to lookup the real factory (it currently uses the typename in the row)

Thanks! Put this way it is actually clearer to me that ignoring either the type on the row or the type currently registered could be equally unexpected. This overload is messed up. Perhaps throw if the types diverge? Or keep the current behavior but deprecate the overload (but only if we can deprecate it on .NET Framework too)? cc @terrajobst

@FransBouma
Copy link
Contributor

What happens today in .NET Framework when you call GetFactory() if the config files contain an invalid assembly-qualified type name or one that cannot be resolved or loaded? I am not sure what validation happens at registration time and what exceptions occur in GetFactory().

It actually happens twice. First the validation occurs when the config is parsed and the datatable is created internally, see:
https://github.com/Microsoft/referencesource/blob/master/System.Data/System/Data/Common/DbProviderFactories.cs#L117
called from: https://github.com/Microsoft/referencesource/blob/master/System.Data/System/Data/Common/DbProviderFactories.cs#L237
So if an invalid type is specified it will cause an exception to occur there. It is triggered by a GetFactory() call however, as there's no static ctor. So the first time a GetFactory() method is called, the table is created and the exception will occur.
It's tested again when the row is obtained from the internal datatable and passed to the GetFactory(Datarow) (there it is).

That said, the RegisterFactory(invariant name, typename) overload has as comment:

    /// <summary>
    /// Registers a provider factory using the assembly qualified name of the factory and an 
    /// invariant name
    /// </summary>
    public static void RegisterFactory(string providerInvariantName, string factoryTypeAssemblyQualifiedName);

which tells me it will register the factory using the specified name. It doesn't register the type name, it registers the provider factory, and to do that, it uses the name specified (like it would also use the type specified in the other overload). At least that's how I interpret that documentation/spec. Looking at the method, I'd expect it to throw an exception if I call it with "Foo" as typename specified, and not succeed (as tests are happening later on in the case where the type name is stored and not examined, to prevent cycles/memory (but how much is really saved here?)).

Put this way it is actually clearer to me that ignoring either the type on the row or the type currently registered could be equally unexpected. This overload is messed up. Perhaps throw if the types diverge? Or keep the current behavior but deprecate the overload (but only if we can deprecate it on .NET Framework too)?

Heh 😄 yeah I scratched my head a few times too why that method was there. It's the central method for all other GetFactory methods to obtain the real factory (so it needs to instantiate the type specified in the row it receives, see above), but apparently the initial developer overlooked the case where you can pass in a datarow which contains a completely different type that's not registered at all. No idea if there are design docs for this particular method tho.

How I see it, I think it shouldn't be a public facing method, or at least test whether the type in the row is in the internal table as well, if the containing table isn't the same as the internal table, but I have no idea if there is code actively using this (I guess there is, there always is ;)).

So perhaps a solution here is to document it properly what will happen with this method: either it will return the instance of the type specified in the row (the netfx implementation) or it will return the instance registered with the invariant name in the row (which for netfx currently likely is the same), whatever the decision is, so the behavior isn't totally unexpected: the documentation states what will be done. The current documentation of this method isn't helpful in this (https://msdn.microsoft.com/en-us/library/s90hx22e(v=vs.110).aspx).
On corefx, we could throw an exception if types differ between the type specified in the row and the type registered in the dbproviderfactories internal datastructure for the invariant name in the datarow.

Such a little API and so much complexity... :)

@divega
Copy link
Contributor Author

divega commented Nov 15, 2017

It actually happens twice. First the validation occurs when the config is parsed and the datatable is created internally...

That code you pointed to special cases the providers defined by .NET Framework itself (e.g. in System.Data.dll, assembly that you would have been already loaded) and in particular, it instantiates Microsoft's Oracle provider.

Unless I am missing something, I believe if you call GetFactory for provider A, and provider B is registered using its assembly qualified name, provider B will not be loaded and instantiated, unless it is Microsoft.Data.OracleClient.

FWIW, there was once a perf bug reported internally about the fact that the Oracle provider is loaded into memory unnecessarily here, but IIRC, the behavior had to be preserved for backwards compatibility.

it doesn't register the type name, it registers the provider factory, and to do that, it uses the name specified (like it would also use the type specified in the other overload). At least that's how I interpret that documentation/spec.

Yeah, I happen to be the person that wrote that comment 😄, and I am sure that is not what I meant. From my point of view "registering a provider" is establishing a mapping between the invariant name and the provider, and establishing that mapping does not imply that the actual type has to be loaded or instantiated, only identified in the mapping. Any feedback on how to convey that better without making it overly complicated?

On corefx, we could throw an exception if types differ between the type specified in the row and the type registered in the dbproviderfactories internal datastructure for the invariant name in the datarow.

Yes, that is what I thought, but I am still on the fence. We need to be ok with porting that behavior to .NET Framework as well. Perhaps for this API it is ok to just keep returning whatever the dataRow says, as long as the behavior is noted in documentation as you suggested.

@FransBouma
Copy link
Contributor

FransBouma commented Nov 16, 2017

@divega

Unless I am missing something, I believe if you call GetFactory for provider A, and provider B is registered using its assembly qualified name, provider B will not be loaded and instantiated, unless it is Microsoft.Data.OracleClient.

You're not missing something, you're right. It registers type names and defers the checks to the GetFactory() method.

Yeah, I happen to be the person that wrote that comment 😄, and I am sure that is not what I meant. From my point of view "registering a provider" is establishing a mapping between the invariant name and the provider, and establishing that mapping does not imply that the actual type has to be loaded or instantiated, only identified in the mapping. Any feedback on how to convey that better without making it overly complicated?

Ok :) I have struggled with coming up with a use case for the method, but last night I thought of one and it's close to home as well: an ORM can pre-register all supported ADO.NET providers with DbProviderFactories using this method, so the user doesn't have to :) (they then have to make sure they dll(s) they want to use are in the fusion probing path and everything works out fine).

So the RegisterFactory(string, string) overload defers registering the actual type till it's requested the first time. Now that I have found a legitimate use case it's more clear to me when the method will be used and what the expectations are. IMHO they're not the same as for the other RegisterFactory() methods. So, to me, there are 3 options:

  1. Document the RegisterFactory(string, string) method with a Remarks that it won't perform actual checks on the specified type name till the type is used through a GetFactory call. I think this essential, the current comment is too vague (sorry) for communicating this.
  2. Rename this specific overload so that it communicates the fact it defers registration of the factory type/instance to a later point, so it differs from the other overloads.
  3. Leave things as is with respect to the name/comments/docs of the method, and let the user figure out the hard way that things aren't checked when the method is called. (this is a tongue in cheek reference to the current documentation of DbProviderFactories' methods which is very sloppy)

There's no question anymore whether the string variant needs deferred registration, I'll add that to the code, the API spec how it is now is IMHO however vague in its communication to the user, which I think needs more clarity.

// cc @terrajobst for an opinion on this?

(edit). Calling RegisterFactory(string, string) after a RegisterFactory(string, type) has been performed will replace the already registered factory instance again with a deferred one. This is obvious perhaps, but just in case it's not ;)

(edit). TryGetFactory(name), should that return false in case of a deferred registration which causes an error? Or should it throw the exception (e.g. type isn't found, type is badly formed etc.), as the registration is there, but it's not correct ?

@FransBouma
Copy link
Contributor

@divega
In any case, I've updated the implementation: https://github.com/FransBouma/corefx/blob/DbConnection_ProviderFactory_19826/src/System.Data.Common/src/System/Data/Common/DbProviderFactories.cs
added tests for the deferred behavior, but NOT for the earlier discussed weird GetFactory(datarow) behavior, as @terrajobst e.a. have to decide what to do.

Current implementation defers the registration of RegisterFactory(string, string) till GetFactory() is called (any overload). Will also throw when TryGetFactory() is called with an invariant name which matches a registered, but deferred, factory, and the assemblyqualifiedname is wrong/lot loadable. This is debatable tho, see previous post, and easily corrected.

RegisterFactory(string, string) has a comment regarding the deferred registration but not an xml comment, so it's likely going to be missed by documentation writers if they don't peek into the code.

I think I've covered all the bases for a PR now in the code (IMHO it's 100% complete), but the points still open needs addressing before PR submit.

@divega
Copy link
Contributor Author

divega commented Nov 16, 2017

@FransBouma glad to hear you arrived to that scenario for the assembly-qualified name overload. I think it is basically the same thing we had in mind, but thinking of O/RMs doing this is a very concrete example in which it would make sense.

Feel free to propose improvements to the XML doc comments of the methods. The comments were not really the focus of the API discussion and should not be considered part of the spec.

Calling RegisterFactory(string, string) after a RegisterFactory(string, type) ...

Last-one has to wins always, otherwise I can't think of a way to explain the API.

I think at this point it would be good to just finalize any small details on comments on the PR.

Thanks a lot for your patience!

@FransBouma
Copy link
Contributor

PR is up :) dotnet/corefx#25410. I've tried to word the decisions made here as accurate as possible, please correct me if I made a mistake there. I also tried to rebase stuff but as I'm a git newbie I didn't get very far so I hope the PR is alright as it is. All tests pass. Almost there!

saurabh500 referenced this issue in dotnet/corefx Dec 5, 2017
* Implements #19826: DbConnection doesn't offer an internal property ProviderFactory

* Added test for #19826. Due to project misconfiguration it fails locally.

* Removed writeline

* Added comment to illustrate the purpose of the property which currently is dead code.

* Implementation of DbProviderFactories, ported from NetFx and adjusted to CoreFx

See #4571, #19825 and #19826

* Update for project files to get a netcoreapp test set

* Changes to project system to get the stuff compiled. Failed.

* Various changes:

- Updated config/csproj files for netcoreapp for tests.
- Properly moved DbProviderFactories file to right location
- Added DbProviderFactories type to ref project, split across 2 files
(one for netcoreapp specific methods)
- Changed SetFactory to ConfigureFactory
- Changed generic type argument to normal method parameter
- Removed default parameter values and added overloads

* Added tests for DbProviderFactories.

* Better subclass testing added and more tests added

* Moved all hard-coded strings to SR/Strings.resx. Added test for factory retrieval using a connection

* Removal of now redundant comment. See: #19885 (review)

* Comment correction for bad English

* Refactored API a bit, see: dotnet/standard#356 (comment)

* Added tests for reworked API. Added tests for wrong type passing

* Reverting change in sln file: See #19885 (review)

* Almost done implementation using DataTable internal storage. Refactoring now to concurrentdictionary

* Final work for #20903

* Corrections of the implementation->

- registrations of assembly type name are now deferred
- storage is now a struct, as the typename has to be registrated as well.
- corrected a wrong nameof() call.
- added tests for the change behavior in RegisterFactory(string, string)

* Final implementation

* Corrections made to code by request of @saurabh500

* Github for windows didn't commit this file in last commit... :/

* Again correcting component elements. These are automatically inserted so if they're back again, I can't remove them

* ... annnnd another component element. Last one I hope

* @divega requested changes

- Corrected sln file netcoreapp->netstandard
- Corrected wording in exception messages.
- Component elements are back, I'll leave them now, they get back regardless.
- Renamed column name constants to have the 'ColumnName' suffix for more clarity
- Made Instance constant be initialized with nameof(Instance)
- Added using for System.Reflection so less verbose code

* More @divega requested changes

- factored out a clone into its own method (GetProviderTypeFromTypeName)
- Reverted 'nameof(Instance)' to string, and renamed field
- Removed redundant null/length check in ctor of ProviderRegistration ctor

* Last nit removal for @divega
@saurabh500
Copy link
Contributor

Closing the issue with PR dotnet/corefx#25410
Thanks @FransBouma @divega @terrajobst @karelz

@divega
Copy link
Contributor Author

divega commented Dec 18, 2017

Adding netfx-port-consider label for the new APIs.

Note for implementer: For the .NET Core implementation of the new APIs we decided to ignore aspects of a provider registration like the name and the description. In .NET Framework the new API will probably don't have these either, but the underlying implementation will have to set them to something (null?) in the DataTable used for storage. All this API is static, so it needs to be thread-safe. The .NET Core implementation simply uses a ConcurrentDictionary, but for the DataTable-based implementation on .NET Framework you will probably need locks on access.

mairaw referenced this issue in dotnet/dotnet-api-docs Oct 25, 2019
* Add missing documentation for DbProviderFactories APIs

Introduced in https://github.com/dotnet/corefx/issues/20903

* minor fixes
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Data.SqlClient
Projects
None yet
Development

No branches or pull requests

9 participants