Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the time taken to only detect a license #1277

Open
alestiago opened this issue Nov 3, 2023 · 14 comments
Open

Reduce the time taken to only detect a license #1277

alestiago opened this issue Nov 3, 2023 · 14 comments
Labels
type-enhancement A request for a change that isn't a bug

Comments

@alestiago
Copy link

alestiago commented Nov 3, 2023

Description

As a developer, I would like to use Pana's license detector algorithm to detect licenses in text in an efficient and performant manner.

One can detect a license by using the PackageAnalyzer:

import 'dart:io';
import 'package:pana/pana.dart';

void main() async {
  final analyzer = await PackageAnalyzer.create();
  final packageDirectory = Directory.current;

  final timer = Stopwatch()..start();
  final report = await analyzer.inspectDir(packageDirectory.path);
  print('Elapsed time: ${timer.elapsed}');

  if (report.licenses != null) {
    for (final license in report.licenses!) {
      print(license.spdxIdentifier);
    }
  }
}

Detecting the license for a single package (using inspectDir) took on average 18 seconds on a M1 MacBook Pro. This time taken is considerable. As a developer I would like to have an API to detect licenses from a license file using Pana's license detector algorithm with a shorter time frame.

Proposals

  • Create a new published package license_detector that is then consumed by Pana.
  • Expose an API that allows only detecting licenses files without performing additional checks (if there's already one, I'm not aware, let me know!)

Additional context

@alestiago alestiago reopened this Nov 3, 2023
@alestiago alestiago changed the title Expose the license detector as part of the public API Improve license detector speed Nov 3, 2023
@alestiago alestiago changed the title Improve license detector speed Reduce the time taken to only detect a license Nov 3, 2023
@sigurdm
Copy link
Contributor

sigurdm commented Nov 3, 2023

I don't think we have the bandwidth to provide a generic license detection interface.

The license detection is not exposed from pana, but you could theoretically just import it from 'lib/src/license_detection/license_detector.dart' anyway. That is of course at your own risk, and we don't promise to keep the interface in later versions.

Depending on your use-case it would probably be better to call out to something like https://github.com/src-d/go-license-detector

@sigurdm
Copy link
Contributor

sigurdm commented Nov 3, 2023

FWIW I believe github uses https://github.com/licensee/licensee

@alestiago
Copy link
Author

alestiago commented Nov 7, 2023

I don't think we have the bandwidth to provide a generic license detection interface.

That's understandable, regardless thank you for the work you're doing!

Personally, I would like to have the license detection aligned with pub.dev's reported license since my use-case is highly targeted to Flutter and Dart packages. Out of curiosity, do you know why PANA uses its own license detector instead of the ones out there (like the few you mentioned)?

@isoos
Copy link
Collaborator

isoos commented Nov 7, 2023

Out of curiosity, do you know why PANA uses its own license detector instead of the ones out there (like the few you mentioned)?

pana is being used in various environments (local devs, CIs, pub.dev's workers) and we wanted to keep the external dependencies (e.g. calling out to external processes) to the minimum.

@isoos
Copy link
Collaborator

isoos commented Nov 7, 2023

Personally, I would like to have the license detection aligned with pub.dev's reported license since my use-case is highly targeted to Flutter and Dart packages.

I'm curious: what's the use case? Can we support it in a different way?

@sigurdm
Copy link
Contributor

sigurdm commented Nov 7, 2023

Yeah - I don't really remember. @jonasfj might be able to give more context.

I think we considered using the github one, but really didn't want to have ruby set up in our deployment. Now with sandboxed workers for the analysis that might not be as bad...

@sigurdm
Copy link
Contributor

sigurdm commented Nov 9, 2023

One argument in favour of having our own, is that it makes it a lot easier to run on all platforms and deployments.

@sigurdm
Copy link
Contributor

sigurdm commented Nov 9, 2023

@alestiago are you happy with importing from lib/src? Can we close this issue?

@sigurdm sigurdm added the needs-info Additional information needed from the issue author label Nov 9, 2023
@alestiago
Copy link
Author

@sigurdm I didn't intend to do so, but I will consider it, please be mindful of the changes over there (if possible 😉).

I'm still waiting on @jonasfj to provide some more context on "why PANA uses its own license detector instead of the ones out there".


I'm curious: what's the use case? Can we support it in a different way?

The use case is a CLI command that scans the dependencies of a Dart or Flutter project and compares them against an allow or block list. Hence, my original plan was to keep the license detection aligned with Pana's.

@isoos
Copy link
Collaborator

isoos commented Nov 9, 2023

The use case is a CLI command that scans the dependencies of a Dart or Flutter project and compares them against an allow or block list. Hence, my original plan was to keep the license detection aligned with Pana's.

I think we have talked about a similar feature to implement in pana. As pana already deals with the dependencies and licenses, it would make sense to expose this in the report (whether all the dependencies are on the same license or not) or in the result JSON (listing all the dependency licenses).

Would you be interested in contributing this to pana? (Otherwise the feature is not the highest priority right now, and won't be worked on in the next few months).

@alestiago
Copy link
Author

Would you be interested in contributing this to pana? (Otherwise the feature is not the highest priority right now, and won't be worked on in the next few months).

I could be interested, is there a design specification you made when talking about the "similar feature"? Is there another channel of communication we can deviate to (like Discord) to discuss and plan how this feature should look like?

@isoos
Copy link
Collaborator

isoos commented Nov 13, 2023

@alestiago: there was no formal specification. The basic idea was that we need to list the licenses of all the dependent package, and if everything is on the same license (or known to be compatible), we can give a green icon, if there is a known incompatibility we could give a red icon, and a yellow warning sign for everything in-between. This could be part of the report too, may even be part of the score.

Note: we have considered this a bit of gray area where we don't want to pretend that (a) we understand all of the license text, esp. if it is not a 100% match (which is most of the time), and (b) we can form an official opinion about license compatibility. As a start, we can exposed the list of licenses and indicate if it needs a manual review.

@alestiago
Copy link
Author

@isoos thank you for the details, they sound great to me! I will try making a separate issue, between this and the following week, with a formalized proposal based on what you've shared here.

@sigurdm sigurdm added type-enhancement A request for a change that isn't a bug and removed needs-info Additional information needed from the issue author labels Nov 23, 2023
@alestiago
Copy link
Author

alestiago commented Dec 14, 2023

I've not forgotten about writing the proposal, I haven't prioritised this work so far. If I'm blocking your progress with the proposal feel free to take it over.

Regarding the original issue and following with this comment "I don't think we have the bandwidth to provide a generic license detection interface", if a community member abstracted away the "license detection" from the package:pana into package:pana_license_detector would you be willing to accept the contribution into this repository?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

3 participants