Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Work - Stabilization #203

Merged
merged 2,223 commits into from
Apr 25, 2017
Merged

API Work - Stabilization #203

merged 2,223 commits into from
Apr 25, 2017

Conversation

NightOwl888
Copy link
Contributor

@NightOwl888 NightOwl888 commented Jan 25, 2017

This is a branch of #191 and supersedes that pull request. To build this branch, see the instructions on #191.

The plan is to work on the API here to keep from releasing several successive releases with breaking API changes. Once the API is relatively stable, we can merge to master and create a release that can be consumed without having to worry about so many breaking API changes between releases.

DO NOT MERGE this pull request until the API is stable and failing tests are addressed.

API Polishing

This fixes several API issues that were making Lucene.Net deviate from looking like a normal .NET application - mainly by using properties and methods the way that are specified in the MSDN guidelines and following .NET naming conventions for properties, methods, fields, parameters, and types. Also several types are being renamed and/or moved to address naming collisions, incorrect porting corrected, bugs fixed, and other issues such as CLS compliance are being addressed.

These specific changes are being made that make Lucene.Net differ from Lucene because they are framework conventions that are different between Java and .NET:

  1. Interfaces start with "I"
  2. Size() changed to Count or Length property
  3. Comparator changed to Comparer on all classes, interfaces , properties, methods, documentation and comments
  4. Type names in methods and properties being .NETified. For example instead of .SetLongValue(long), we will have .SetInt64Value(). NOTE: This isn't done yet. I could use some advice on if or how this one should be done, since WriteVInt64() admittedly looks odd. The class and interface names are being changed to match. In certain special cases, type names are being changed to differentiate "Single" (the datatype) from "Singular" (meaning one) because there are types in the same context that use both, and it would be confusing.

Status

This branch is now more stable than master.

The Test Framework is now randomizing Codec, InfoStream, Time Zone, and Culture (the same as Lucene). This has revealed many new issues, most of which have now been addressed. There are now no tests failing constantly, and a few that fail randomly.

Key:

  1. Type and Member Accessibility
  2. Properties vs Methods
  3. Properties vs Fields
  4. Method, Property, Field, and Parameter Naming Conventions
  5. Class and Interface Naming Conventions
  6. Eliminate Nullable Enums, if Possible (see this discussion)
  7. Parameter and Return Types (IDictionary and IList rather than Dictionary and List)
  8. Change Property Setters with no Getter (per MSDN, this is not a good practice) - either add a getter or change to a setter method
  9. CLS Compliant
  10. Compiler Warnings Remaining (corresponding test project included)
  11. TODOs Remaining (search for LUCENENET TODO|LUCENE TO-DO using the regular expression option to find them)

Project 1 2 3 4 5 6 7 8 9 10 11
Core X X X X X X 14 117
Analysis.Common X X X X X X X 0 33
Analysis.Stempel X X X X X X X X X 0 2
Classification X X X X X X X X X 0 0
Codecs X X X X X X X 0 18
Expressions X X X X X X X X X 0 0
Facet X X X X X X X 0 1
Grouping X X X X X X X X X 0 0
Highlighter X X X X X X X X 0 6
Join X X X X X X X X X 0 0
Memory X X X X X X X X X 0 0
Misc X X X X X X X X 4 4
Queries X X X X X X X X 0 15
QueryParser X X X X X X X X 0 21
Sandbox X X X X X X X X X 0 0
Spatial X X X X X X X X 0 3
Suggest X X X X X X X X 0 4
Test Framework N/A N/A N/A N/A N/A N/A N/A N/A N/A 4 5

Tests

Do note that currently this branch has more failing tests than master. In fact, the test suite is currently crashing for several tests, making the whole thing bomb before it can finish. Right now, the priority is finishing the API changes before addressing these problems.

However, if you would like to help out by debugging the failing tests, please fork this branch and submit the pull request back here. Due to the fast changing state of the API, please fix one bug at a time and submit the pull request to this branch (not master) ASAP.

Documentation Comments

Also up-for-grabs is to complete the documentation comments in the following sections:

  1. Lucene.Net.Core (project)
    1. Codecs (namespace)
    2. Index (namespace) (Except for types starting with letter A-D, LiveIndexWriterConfig, IndexWriter, IndexWriterConfig, IndexReader, and TieredMergePolicy)
    3. Search (namespace)
    4. Support (namespace)
    5. Util (namespace) (Except for Util.Fst)
  2. Lucene.Net.Codecs (project)

While it is assumed that the documentation comments for the other projects are finished, they could probably all use a review.

The automated Java converter brought much of the comment structure over, however some comments have been left behind in Lucene, and they need to be formatted and corrected to match MSDN's guidelines. Some common issues that need to be addressed:

  1. Change seealso to see (seealso are for the links that go at the bottom of the documentation page, not the direct links within the content).
  2. {@code paramName} or <code>paramName</code> need to be changed to . Note that not all of the "code" are parameter names, and the normal case need to be in tags for single line and` tags for multiline code examples. Do note the code also needs to be converted to match our API.
  3. {@link TypeName} need to be changed to the appropriate <see cref="TypeName"/> or anchor tag (if external).
  4. Many times in the documentation, the types are referred to (in Pascal case) with no link. In these cases a link to the type should be added (change TypeName to <see cref="TypeName"/>).

Be sure to check the comments against Lucene 4.8.0 to ensure they are correct and complete!

Pull requests should be directed to this branch (not master).

@conniey
Copy link
Contributor

conniey commented Feb 14, 2017

@NightOwl888 I apologise for the delayed response to your NUnit tests. I've been working on another project. Is there anything I can help with?

My current tasks for Lucene.NET that I would want to merge into this PR are:

Icu-dotnet

  • Bring BreakIterator changes into local icu-dotnet .NET Core branch.
  • Use icu-dotnet's new way of loading NativeLibrary dependencies
    • There have been a lot of churn in the icu-dotnet repository since I created my issue about migrating to .NET Core. They changed the way that they ingested Native library dependencies and split it into a different repository that I have to merge to and take into account.
  • Fix assembly loading changes in .NET Core
    • Recently, icu-dotnet introduced a new way of loading assemblies that uses native Windows/Linux API calls rather than using a [DllImport] from C#. Since we implemented a new way to load native libraries in .NET Core (via /runtimes/{RID}/ rather than adding to the Environment.Path), it breaks loading in .NET Core. I need to research how to get the native library runtime paths in .NET Core. Then figure out how to load it first when they are using the Windows/Linux APIs
  • Modify BreakIteratorRules.java to consumable rules for icu-dotnet

@NightOwl888
Copy link
Contributor Author

NightOwl888 commented Feb 27, 2017

@conniey

I also apologize for the delayed response.

Is there anything I can help with?

Yes, there is. I have been working on cleaning the API up and making it more .NET-like.

CLS Compliance

Part of this process is making the whole project CLS compliant, so it can be consumed by VB and other .NET languages.

  1. Currently, there are only 8 remaining CLS-related warnings that need to be addressed. But before we fix them, it would be nice if we could ensure that the fixes don't cause any negative effects - in other words, we need as many tests fixed as possible (namely in Lucene.Net.Core). If you could help with the debugging efforts, that would be great.
  2. In addition, there was a bit of a snag with icu-dotnet because it is not marked CLS Compliant. Therefore, by default none of its types are CLS Compliant and cannot be exposed publicly in Lucene.Net (including Enum types). I would appreciate it if you mark the assembly with the CLSCompliant attribute.

I reverted the ThaiAnalyzer back to its previous state and adapted the ICUBreakIterator to work with it - seems to work pretty well, for now (and it is now CLS Compliant). Let me know if/when you have a rule-based break iterator so I can ensure compatibility.

Collation

Also, I ended up marking the Collation-related types CLSCompliant(false). I am now pretty much convinced that there is no way we can get them working with icu-dotnet, based on this documentation. It also seems to indicate that the collation functionality in the Analysis.ICU package is better (and it is based on ICU 4J, so it should work). Anyway, my thought is to add FEATURE_COLLATION as an option and leave it out of the compile, since it is pretty much useless unless we port over a large part of the JDK to make it work. So, ignore those failing tests for the time being.

Test Framework

I also reverted the changes to OLD_FORMAT_IMPERSONATION_IS_ACTIVE and made it static again. There really doesn't seem to be a reasonable way to ensure this flag is read from everywhere if it is non-static. This fixed more than a dozen failing tests. I am working on finishing up the Codec functionality now - specifically the randomizing of codecs and the IgnoreCodecs functionality.

But, I could use some help. The test framework has been a major source of test failures. Many parts of it were not translated correctly or are still unfinished. IMO, the test framework should be reviewed line-by-line to ensure that the equivalent functionality is implemented.

Documentation Comments

Also, as I mentioned above, some of the documentation comments need to be finished. Mostly, they are malformed so they don't show up in Visual Studio, but there is also some documentation that is either incorrect or has not been brought over from Lucene.

…ause they have been replaced by the Collections.GetHashCode() and Collections.Equals() functionality
…to be more like the original JDK version, including the ability to build the hash code based on the values of the nested value if it is a collection
… more like the original JDK version, including the ability to test equality based on the values of the nested value if it is a collection
…a reference type, we use Collections.Equals() and Collections.GetHashCode() so its values are compared if it happens to be a collection"

This reverts commit d48493d.
… and GetHashCode() methods (that in Java were inherited from AbstractSet<T>)
…ails randomly and when it does it causes an index out of range exception when building the message, so the message has been commented for the time being
…otFoundException to the random errors that are thrown
…st projects and fixed all exception classes to actually be serializable
…ene.Net.Search.TestSort.config (old .NET remoting configurations)
…et and added Icu4c.Win.Full.Lib dependency for .NET 4.5.1"
…ourceLoader: Changed test to write the embedded resource to a known location rather than being dependent on an external file that may/may not be in the file system.
@davhdavh
Copy link

When do you expect the merge to happen? And will it have x64 support?

@NightOwl888
Copy link
Contributor Author

NightOwl888 commented Apr 24, 2017

@davhdavh

The merge will be happening soon. @synhershko, do you need to review this before merging, or can the review be done after? I am ready to merge when you are.

As for x64 support, I believe we do have it now that we have switched from ICU4NET to icu-dotnet, but @conniey might be able to answer that for sure. Also, the ICU-specific functionality has been moved to its own package so the majority of users don't need to deal with that dependency anymore.

@davhdavh
Copy link

Excellent. I notice that the .xml files are still not in the nuget packages, so no api docs?

@NightOwl888
Copy link
Contributor Author

Nope, there is still a lot of broken XML in the documentation comments (see above) that are preventing the process from being complete. It would take a single person the better part of a week to correct this - it would sure be nice if others participated. But now that you mention it, I could probably turn the option to produce the XML on for all but the 2 incomplete packages at least.

BTW - I am legally obligated to tell you (thanks to Apache) that those packages are not for production release.

@NightOwl888
Copy link
Contributor Author

@davhdavh - I have now added the XML documentation to the packages.

@davhdavh
Copy link

A small bug in the port in Term.ToString(BytesRef termText). The call to termText.Utf8ToString(); will never throw, it should use the system utf8encoder instead: return new System.Text.UTF8Encoding(false, true).GetString(termText.Bytes, termText.Offset, termText.Length);. but IHMO, it is still fundamentally broken, ran into this on an int that happens to have a valid utf8 encoding when serialized to bytes.

@NightOwl888
Copy link
Contributor Author

@davhdavh - thanks for the report. I was wondering about that piece of code, but since none of the tests were complaining I left it alone. Thanks for complaining in their place. This is just another place where someone deviated from the original code that happened to be the wrong choice (as usually is the case).

@asfgit asfgit merged commit 1495bff into master Apr 25, 2017
@davhdavh
Copy link

Thanks for all the quick fixes :) I found two more issues. 1. The assemblies are not strong-named in either debug or release build. 2. The assembly version is 4.0.0.0. I believe the right thing to do would be to set [assembly: AssemblyVersion("4.8.*")] and [assembly: AssemblyInformationalVersion("4.8-apiwork-ci")]

@davhdavh
Copy link

After trying to get strong names on, I can say that the entire build system is broken on latest version of dotnet sdk. 1. Restore needs a specific sln, since there are two, ie: & dotnet.exe restore $base_directory\Lucene.Net.sln 2. build command no longer accepts a project.json, but instead uses the csproj files. That can be fixed by finding the csproj files before build-assemblies call:

		pushd $base_directory
		$projects = Get-ChildItem -Path "*.csproj" -Recurse
		popd

		Build-Assemblies $projects

and yet with those fixes, I still cant get it working.

@NightOwl888
Copy link
Contributor Author

@davhdavh

  1. The assemblies are not strong-named in either debug or release build.

Per Itamar (the project manager), Lucene.Net will not be strong-named going forward. I don't agree with all of his points, but for now I am just going to see how many people complain. I did strong-name the spatial4n dependency just in case we need to do it. He might be right that it is time to ditch it as some other open-source projects have. Personally, I don't need it - do you? If so, I suggest you complain loudly about it on the dev mailing list and open an issue on JIRA about it so we can see how many votes it gets.

  1. The assembly version is 4.0.0.0. I believe the right thing to do would be to set [assembly: AssemblyVersion("4.8.*")] and [assembly: AssemblyInformationalVersion("4.8-apiwork-ci")]

In case we do strong-name because of popular demand, setting the assembly version to 4.0.0.0 is the right way to go. This is what the MVC team did - all versions of MVC 4 were 4.0.0.0 (until a they found a security vulnerability so severe that they had to force everyone to upgrade to it, then it incremented to 4.0.0.1). If you read the SemVer document, the behavior of strong-naming acts exactly like changing the major version, since whenever it is changed it breaks binary compatibility. Therefore, it should never change unless the major version is changed.

After trying to get strong names on, I can say that the entire build system is broken on latest version of dotnet sdk. 1. Restore needs a specific sln, since there are two, ie: & dotnet.exe restore $base_directory\Lucene.Net.sln 2. build command no longer accepts a project.json, but instead uses the csproj files.

If you look at the global.json file, the build requires the 1.0.0-preview2-1-003177 SDK, which is the current one that supports project.json (but now that you mention it, the README needs to state the prerequisites). The global.json file ensures the right SDK is used even if a newer one is installed.

I made an attempt to unify to one solution and upgrade to .csproj, but ran into many issues:

  1. The NUnit test adapter doesn't yet support .csproj on .NET Core, so no debugging in VS2017 on .NET Core unless you fire off the tests manually or with NUnitLite.
  2. Some of the tests would not complete after the switch.
  3. With .csproj, versioning is still broken with respect to what I mentioned above about the assembly version (which I plan to file an issue about) - when you specify an AssemblyVersion, it always uses the same version for the AssemblyFileVersion, meaning they cannot differ. Also, it is broken in respect to using a non SemVer scheme (which I don't see an alternative for a port, since we will almost certainly have multiple production releases that correspond to Lucene 4.8.0). You cannot pass a version number to the dotnet pack command, only a version suffix (which always assumes there will be a - before it and always assumes there will be a version prefix). In addition, when generating the NuGet packages for a pre-release, it doesn't update the project dependency version numbers to pre-release (which gives you compile warnings).

These issues can probably be overcome and I much prefer the new format, but I didn't want to delay the release any longer. I think it would be best to wait until Microsoft announces they are feature-complete and NUnit Test Adapter has .NET Core support rather than trading one broken build system for another.

I wouldn't object if you want to contribute a build option to turn on strong-naming (provided it doesn't break the build in TeamCity). But keep in mind, many of the assemblies are using InternalsVisibleTo so we will need a conditional compilation symbol (FEATURE_STRONGNAME) to toggle them between the standard and strong-named form. The strong-name option would not necessarily need to extend to the Lucene.Net.sln solution or its projects, since they are not used for the build.

@hubot hubot deleted the api-work branch April 28, 2017 15:56
@hubot hubot restored the api-work branch April 28, 2017 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants