Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unused may report stale results #671

Closed
dominikh opened this issue Jan 16, 2020 · 3 comments
Closed

Unused may report stale results #671

dominikh opened this issue Jan 16, 2020 · 3 comments
Labels

Comments

@dominikh
Copy link
Owner

Our analysis runner doesn't load packages from source if it doesn't have to, e.g. because they haven't changed. In those cases, it uses export data and cached problems/facts. However, this breaks unused in whole-program mode, because changing one package may affect the list of unused identifiers in another package.

  1. have a package A define an exported identifier, have package B use the identifier
  2. run staticcheck -unused.whole-program on A and B
  3. modify B to remove the use of A's identifier
  4. rerun staticcheck -unused.whole-program on A and B
  5. observe that we do not flag the identifier in A.

A similar problem arises where we cache A's identifier as being unused, then start using it in another package. We will not recheck package A.

Unfortunately, this problem even exists when not running in whole-program mode, because of test variants.

  1. define an unexported identifier in foo.go
  2. run staticcheck
  3. add a foo_test.go and use the previously defined identifier
  4. rerun staticcheck
  5. observe that we still flag the identifier as unused

Both of these problems are caused by the fact that we cache the found problems per package. It seems that in whole-program mode, we must re-analyse all reverse dependencies. In normal mode, it should be enough to re-analyse a package when its tests change.

@dominikh dominikh added the bug label Jan 16, 2020
@dominikh
Copy link
Owner Author

dominikh commented Jan 23, 2020

For analyses that aren't unused, we traverse the dependency graph in a DFS, recording and using facts. If package B depends on package A, then we have the graph B -> A, and A is processed before B. When a dependency's code changes, all dependents have to be analyzed again, because facts go stale.

unused could be expressed as the same problem but in the reverse direction. Since we're interested in dependents using identifiers, we have the dependent graph A -> B, where we process B before A, and where B contributes "identifier is being used" information towards A. When a dependent's code changes, all dependencies have to be analyzed again. For unused's non-global mode, we could optimize this to only consider test dependents, because exported identifiers are always used, and only tests can contribute to the use of unexported identifiers.

One problem with the reverse order, of course, is that we either have to keep all packages in memory until we've processed them in both directions, or we consider the two runners entirely separate and load packages twice, once per direction. This is less of a problem for the non-global mode, because tests and the package under test get loaded together, anyway.

Unless I'm missing something obvious, this should allow us to fit neatly into the go/analysis framework, with the only oddity being that we need a second runner operating in reverse order. Other than that, this should be a normal, modular analysis.

@dominikh
Copy link
Owner Author

One nuisance is that we cannot express "package A uses object B.O" as an object fact. Object facts need to concern objects from the package under analysis, and the same object fact may not exist twice – that is, two packages can't both export an object fact saying that B.O is used".

However, we should be able to use package facts instead, i.e. a single fact on package A listing all the objects that it uses. We'd just have to take care of serializing object identifiers ourselves instead of relying on the analysis runner.

@dominikh
Copy link
Owner Author

First actual problem: in an analysis, the only way to obtain a package's dependencies is via types.Package. Which means that we have no way of passing in a package's dependents. But we need our dependents to be able to import their package facts.

@dominikh dominikh added this to the v2020.2 milestone Feb 6, 2020
@dominikh dominikh added the started Issues we've started working on label Apr 29, 2020
dominikh added a commit that referenced this issue May 4, 2020
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-690
Closes gh-691
dominikh added a commit that referenced this issue May 7, 2020
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-690
Closes gh-691
dominikh added a commit that referenced this issue May 7, 2020
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

As part of porting unused's tests, we discovered a flaky false
negative, caused by an incorrect implementation of our version of
types.Identical. We were still using types.Identical under the hood,
which wouldn't correctly account for nested types. This has been fixed.

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-690
Closes gh-691
dominikh added a commit that referenced this issue May 8, 2020
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

As part of porting unused's tests, we discovered a flaky false
negative, caused by an incorrect implementation of our version of
types.Identical. We were still using types.Identical under the hood,
which wouldn't correctly account for nested types. This has been
fixed.

More changes to unused

Several planned improvements to 'unused' also made it easier to
integrate with the new runner, which is why these changes are part of
this commit.

TODO

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-675
Closes gh-690
Closes gh-691
dominikh added a commit that referenced this issue May 8, 2020
This commit completely replaces the analysis runner of Staticcheck. It
fixes several performance shortcomings, as well as subtle bugs in U1000.

To explain the behaviors of the old and new runners, assume that we're
processing a package graph that looks like this:

	  A
	 ↙ ↘
	B   C
	↓
	⋮
	↓
	X

Package A is the package we wish to check. Packages B and C are direct
dependencies of A, and X is an indirect dependency of B, with
potentially many packages between B and X

In the old runner, we would process the graph in a single DFS pass. We
would start processing A, see that it needed B and C, start loading B
and C, and so forth. This approach would unnecessarily increase memory
usage. Package C would be held in memory, ready to be used by A, while
the long chain from X to B was being processed. Furthermore, A may not
need most of C's data in the first place, if A was already fully
cached. Furthermore, processing the graph top to bottom is harder to
parallelize efficiently.

The new runner, in contrast, first materializes the graph (the
planning phase) and then executes it from the bottom up (the execution
phase). Whenever a leaf node finishes execution, its data would be
cached on disk, then unloaded from memory. The only data that will be
kept in memory is the package's hash, so that its dependents can
compute their own hashes.

Next, all dependents that are ready to run (i.e. that have no more
unprocessed leaf nodes) will be executed. If the dependent decides
that it needs information of its dependencies, it loads them from disk
again. This approach drastically reduces peak memory usage, at a
slight increase in CPU usage because of repeated loading of data.
However, knowing the full graph allows for more efficient
parallelization, offsetting the increased CPU cost. It also favours
the common case, where most packages will have up to date cached data.

Changes to unused

The 'unused' check (U1000 and U1001) has always been the odd one out.
It is the only check that propagates information backwards in the
import graph – that is, the sum of importees determines which objects
in a package are considered used. Due to tests and test variants, this
applies even when not operating in whole-program mode.

The way we implemented this was not only expensive – whole-program
mode in particular needed to retain type information for all packages
– it was also subtly wrong. Because we cached all diagnostics of a
package, we cached stale 'unused' diagnostics when an importee
changed.

As part of writing the new analysis runner, we make several changes to
'unused' that make sure it behaves well and doesn't negate the
performance improvements of the new runner.

The most obvious change is the removal of whole-program mode. The
combination of correct caching and efficient cache usage means that we
no longer have access to the information required to compute a
whole-program solution. It never worked quite right, anyway, being
unaware of reflection, and having to grossly over-estimate the set of
used methods due to interfaces.

The normal mode of 'unused' now considers all exported package-level
identifiers as used, even if they are declared within tests or package
main. Treating exported functions in package main unused has been
wrong ever since the addition of the 'plugin' build mode. Doing so in
tests may have been mostly correct (ignoring reflection), but
continuing to do so would complicate the implementation for little
gain.

In the new implementation, the per-package information that is cached
for U1000 consists of two lists: the list of used objects and the list
of unused objects. At the end of analysis, the lists of all packages
get merged: if any package uses an object, it is considered used.
Otherwise, if any package didn't use an object, it is considered
unused.

This list-based approach is only correct if the usedness of an
exported object in one package doesn't depend on another package.

Consider the following package layout:

	foo.go:
	package pkg

	func unexported() {}

	export_test.go
	package pkg

	func Exported() { unexported() }

	external_test.go
	package pkg_test

	import "pkg"

	var _ = pkg.Exported

This layout has three packages: pkg, pkg [test] and pkg_test. Under
unused's old logic, pkg_test would be responsible for marking pkg
[test]'s Exported as used. This would transitively mark 'unexported'
as used, too. However, with our list-based approach, we would get the
following lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used:
  unused: unexported, Exported

pkg_test:
  used: Exported
  unused:

Merging these lists, we would never know that 'unexported' was used.
Instead of using these lists, we would need to cache and resolve full
graphs.

This problem does not exist for unexported objects. If a package is
able to use an unexported object, it must exist within the same
package, which means it can internally resolve the package's graph
before generating the lists.

For completeness, these are the correct lists:

pkg:
  used:
  unused: unexported

pkg [test]:
  used: Exported, unexported
  unused:

pkg_test:
  used: Exported
  unused:

(The inclusion of Exported in pkg_test is superfluous and may be
optimized away at some point.)

As part of porting unused's tests, we discovered a flaky false
negative, caused by an incorrect implementation of our version of
types.Identical. We were still using types.Identical under the hood,
which wouldn't correctly account for nested types. This has been
fixed.

More changes to unused

Several planned improvements to 'unused' also made it easier to
integrate with the new runner, which is why these changes are part of
this commit.

TODO

Closes gh-233
Closes gh-284
Closes gh-476
Closes gh-538
Closes gh-576
Closes gh-671
Closes gh-675
Closes gh-690
Closes gh-691
@dominikh dominikh removed the started Issues we've started working on label May 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant