Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Additional Metadata Attributes #12174

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Proposal: Additional Metadata Attributes #12174

wants to merge 1 commit into from

Conversation

Redth
Copy link

@Redth Redth commented Oct 19, 2022

Allow the inclusion of additional metadata properties in package authoring and allow them to be used in search queries.

The motivation for this is to create associations of NuGet packages which bind and/or redistribute platform native libraries on other platforms and package management systems.

@Redth Redth requested a review from a team as a code owner October 19, 2022 19:04
@erdembayar
Copy link
Contributor

@joelverhagen @JonDouglas
Fyi, related to nuget.org too.


## Rationale and alternatives

While there are no known alternatives, we have previously considered embedding custom files in the package containing this metadata. This would be of some benefit, but ultimately supporting search queries is necessary for achieving the full benefit of the proposal for the scenarios described.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to create well-known tags for this purpose? We've extension points in the past, based on tag. Here's a prototype I tried on our DEV environment:
https://dev.nugettest.org/packages?q=Tags%3A%22attr_fruit%3Alemon%22

Miraculously, our quotes actually work properly here 😂.

Prior art is "AzureSiteExtension" used for finding Azure site extension packages, before the package type filtering was enabled:
https://www.nuget.org/packages?q=Tags%3A%22AzureSiteExtension%22

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-known tags could potentially work... I suppose there's not really any more enforced convention with arbitrary metadata key/value pairs... Though it doesn't seem like there would be currently a way to search with multiple tag combinations with "AND" (eg: Tags:"attr_fruit:lemon" Tags:"ArtifactId:NONE" returns results matching just one of the tags, not results only matching both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the AND/OR combination on NuGet.org search is ... not great :)

For non-field-scoped terms foo bar it performs an AND. For multiple field-scoped terms tags:foo tags:bar it performs an OR. For a mixture, at least one of the field-scoped terms per field name must exist in the doc.
https://www.nuget.org/packages?q=owner%3Amicrosoft+owner%3Ajver+tags%3Aentity+tags%3Afoo+design

I think a reasonable step here could be to change the interaction of field-scoped terms to "AND" to unblock this scenario. I think it would be a net win for general usage of field-scoped terms anyway since it would align with the non-field scoped term behavior.

The history here is that we have invested heavily in relevance on non-field scoped queries since they are the 99% case. We have not done the same investment for field-scoped queries or other advanced syntax like + (not supported), - (not supported), " (acts weird).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How valuable/important would the combination of atributes/tags be?

How many scenarios would a single attribute/tag solve?


Inside of the .nuspec file's `<package>` and then `<metadata>` elements, create a new `<attributes>` element which can contain zero or more `<attribute key="[string]" value="[string]" />` elements. Attributes must have unique key values and there cannot have more than one attribute with the same key value.

In the NuGet search query (`q`) parameter, allow attributes to be specified as a query filter just like `owner`. That is, for example: `q=attr_[keyValue]:[attribute_value]` where the `attr_` prefix denotes matching a particular attribute key by its `[attribute_value]`. The search should look for exact, case-insensitive matches.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stuff inside the q (search text) parameter is not spec'd at the protocol level. So different package source implementations will interpret this differently. AND/OR logic, ranking, quote behavior, supported field-scoped terms, etc. These are all package source specific. In general the q property is for search relevance and less about strict package filtering. It's certainly a grey area since it is unspec'd but I think a safer approach is to introduce a new query parameter for these attribute filters.


These attribute key/value pairs would be searchable within the NuGet search service, via the query property, similar to how search by `owner` or `packageid` is available currently.

### Technical explanation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone wants to filter by multiple key-value pairs, is that possible? AND or OR behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I hadn't realized, there does not seem to be a way to "AND" query terms... This would definitely be helpful, though I guess narrowing search results to a potentially reasonable number (ie: GroupId, or having a concatenated MavenId field) and inspecting the details of the results might be reasonable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nuget.org search is less concern (for me)

https://www.nuget.org/packages?q=artifact

Huge help would be metadata + API for maintaining 450+ artifacts.


Being able to cross-link native dependency identities against existing nuget package references would help in creating experiences that automatically resolve and link in the correct set of build time dependencies across native and nuget assets.

Example: Maintaining a [list of popular known packages that map to maven artifacts](https://github.com/Redth/Xamarin.Binding.Helpers/blob/main/Xamarin.Binding.Helpers/NuGetResolvers/KnownMavenNugetResolver.cs#L12-L99) is not a scalable solution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for NuGet.org or a community member to build their own index based on NuGet.org packages using the V3 catalog. There is a guide on using the V3 catalog here: https://learn.microsoft.com/en-us/nuget/guides/api/query-for-all-published-packages

So the "productionized" version of this map would be to write catalog reader that looks at each published package, checks if there is a cgmanifest.json, then add it to an index. Surface the index on an independent web service. This allows custom projects/views of NuGet.org without the need to block on official service or client support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I didn't realize this was available... We don't currently publish the manifest in the packages (I don't think anyway), but looking at this sort of approach might mean we could create our own conventions - the problem is that unless they are 'officially' supported, conventions can be hard to gain traction with.

Identity can consist of multiple attributes, for example a Maven package has:

- Group Id (eg: `com.company.product`)
- Artifact Id (eg: `ProductSdk`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, is this spec suggesting the same model as Maven? I'm wondering if extensions or next steps for the Maven model ever came up. For example: what about values with non-string types allowing range queries ("os_version:10" and querying "os_version>=10") or allowing multiple attributes with the same key but different values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the question here, could you elaborate? It might be nice to support additional types and query operators if that's the basic question? I guess if something is being considered, may as well consider all possible useful cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the confusion.

From your spec, it sounds like Maven already has a feature like what you're suggesting. Given that, can we learn from any growing pains their ecosystem had? For example, did they originally ship with the string KVP model with unique keys, then run into problems that necessitated a richer data model?

Said another way, if we can learn from another ecosystem's implementation in this area, we can maybe skip some intermediate steps or painful migrations. Or we can know that what we're proposing here is actually enough.

I'm not clued into the Maven ecosystem so I can't provide that perspective.

As a side note, if there are "prior art" design spec/docs about this feature in other ecosystems that would be cool to link in here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I'm not really that familiar with how they built up their model.

Having the ability to query on the version as a version number would be useful in an "AND" query scenario, I had just considered the MVP implementation of this not needing to do that since the result set matching only the GroupId and ArtifactId would presumably be small enough to iterate over versions of the results (also if it wasn't clear, different NuGet package versions would potentially have different attr_MavenVersion:1.2.3 values). The one potential gotcha here is that Maven's versioning rules might be a bit different than NuGet (though I think semver is adopted there too).

One other consideration though is the maven version might be used to assert if it satisfies a Maven version range - again, similar to NuGet's version range semantics, but not necessarily identical in rules/implementation. For the binding helpers project/experiment I linked in the proposal, this is part of the process, so in this case the matching GroupId/ArtifactId results would still need to be iterated over, asserting each version's maven range compatibility. Long way to say that there's maybe too many operators to consider for querying by version to make the effort of adding some simple >= particularly useful in value? This is just one example though, and maybe that would be valuable for other scenarios.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for coming late to the show.

I maintain:

And when I get some air I work on "bindings improvements" which should improve productivity and more.

So, I will express my opinion only for Android (.NET for Android - formerly Xamarin.Android), though
IMO this should be extended to .NET for iOS and maybe other platforms.

"Bindings improvements" include

I am already using some of the utilities in our repos for

Up to recently we have added Maven fully qualified metadata for artifact in 2 forms:

  • artifact=androidx.compose.material:material-ripple
  • artifact_versioned=androidx.compose.material:material-ripple:1.0.5

to nuget fields

  • Description
  • Summary (sometimes)
  • Tags

Visible here:

https://api.nuget.org/v3/registration5-gz-semver2/xamarin.androidx.compose.material.ripple/index.json

This was OK and I am able to use server side NuGet protocol (HttpClient + JSON/XML parsing) to increase productivity of maintenance on both of repos.

Last updates .NET for Android team decided to keep this information only in Tags node.

  • identifying binaries used (either distributed or downloaded by the package during the build)

  • dependency identification

    • type
      • maven
      • native
    • identity
    • version

With this there would be 1:1 mapping from NuGet package (versioned) to Maven/Native package (versioned)
This would help maintainers with

  1. keeping track of published (bound Maven or native libraries),

    Getting data for latest nuget package and mapping it to maven fully qualified versioned id
    would ease discoverability what is to be updated.

  2. updates and

    see 1.

  3. troubleshooting

    Primarily checking dependency graphs

    • for duplicate transitive dependencies (possibly with different versions)
  4. security checks (component governance)

  5. curation (currated package publishing)

    With lowering the bar for bindings via "bindings improvements" it is to be expected to have
    flood of bindings packages.
    NuGet publishing proces could add step to verify if given Maven/Native artifact is already
    published in some other NuGet package.

  • Maven

    • project

      <ItemGroup>
          <PackageAttribute Include="maven.GroupId" Value="androidx.activity" />
          <PackageAttribute Include="maven.ArtifactId" Value="activity" />
          <PackageAttribute Include="maven.VersionId" Value="1.6.0" />
      </ItemGroup>

      NOTE: this could be derived from curernt (and future) .NET for Android (Xamarin.Android)
      BuildActions for binding artifacts (Embedd)

    • nuspec

      <!--
          ... snip
      -->
      <package>
          <metadata>
              <attributes>
              <attribute key="maven.GroupId">androidx.activity</attribute>
              <attribute key="maven.ArtifactId">activity</attribute>
              <attribute key="maven.Version">1.6.0</attribute>
              </attributes>
          </metadata>
      <!--
          ... snip
      -->
      </package>
  • Native

    • project (packaging)

      <ItemGroup>
          <AndroidNativeLibrary Include="path/to/libfoo.so">
              <Abi>armeabi</Abi>
          </AndroidNativeLibrary>
      </ItemGroup>
    • nuspec

      <package>
          <metadata>
              <attributes>
              <attribute key="native.LibraryName">libfoo</attribute>
              <attribute key="native.Version">1.6.0</attribute>
              </attributes>
          </metadata>
      </package>


The NuGet search API already allows the specification of various package [metadata fields to search by in the query parameter](https://learn.microsoft.com/en-us/nuget/consume-packages/finding-and-choosing-packages#search-syntax). This proposal is simply an extension of that existing query syntax to include additional, potentially arbitrary attributes both in the .nuspec format as well as the search query.

### Functional explanation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any considerations beyong discoverability?
Maybe something specific to management some of these related packages within your project or is that not a big concern?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any considerations beyong discoverability?

1:1 mapping of native (maven or native lib) to nuget would make

  • security checks easier
  • curation (optional) easier

Maybe something specific to management some of these related packages within your project or is that not a big concern?

We do that, but formal/standardized/central method would help, both us and (IMO) nuget team.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking a bit deeper 1:1 is oversimplification. In cross platform scenario there will be artifact per platform (Android, iOS) and sometimes multiple artifacts per platform.

@ghost
Copy link

ghost commented Nov 24, 2022

This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch.

@ghost ghost closed this Dec 9, 2022
@JonDouglas JonDouglas reopened this Jan 24, 2023
@ghost ghost removed the Status:No recent activity No recent activity. label Jan 24, 2023
@ghost ghost added the Status:No recent activity No recent activity. label Feb 23, 2023
@ghost
Copy link

ghost commented Feb 23, 2023

This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment, unless it has a "Status:Do not auto close" label. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch.

@nkolev92 nkolev92 added Status:Do not auto close Do not auto close for PRs needs long review process and removed Status:No recent activity No recent activity. labels Feb 23, 2023
@ghost ghost added the Status:No recent activity No recent activity. label Mar 26, 2023
@ghost
Copy link

ghost commented Mar 26, 2023

This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment, unless it has a "Status:Do not auto close" label. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch.

@donnie-msft
Copy link
Contributor

Hi, we have removed our "proposed" folder, so please move this proposal to the "accepted" folder.
See the update here, and let me know if you have questions/concerns: https://github.com/NuGet/Home/blob/b18b5cc1507df04ea9785f8ba613b1ceb2ad93ea/meta/README.md#what-happens-to-a-proposal
Thanks!

@ghost ghost removed the Status:No recent activity No recent activity. label Nov 21, 2023
@ghost ghost added the Status:No recent activity No recent activity. label Dec 21, 2023
@ghost
Copy link

ghost commented Dec 21, 2023

This PR has been automatically marked as stale because it has no activity for 30 days. It will be closed if no further activity occurs within another 15 days of this comment, unless it has a "Status:Do not auto close" label. If it is closed, you may reopen it anytime when you're ready again, as long as you don't delete the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status:Do not auto close Do not auto close for PRs needs long review process Status:No recent activity No recent activity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants