Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix: Add a tool for checking Haskell imports and exports #1768

Merged
merged 19 commits into from
Apr 6, 2021

Conversation

monacoremo
Copy link
Member

This uses the actual GHC parser, so it should be fairly robust. Works on both our Haskell modules and the ones generated with -ddump-minmal-imports.

To use:

nix-build -A haskellimports
./result src/PostgREST/App.hs

Example output so far:

source,moduleName,impQualified,impAs,impHiding,symbolName
src/PostgREST/App.hs,Control.Monad.Except,false,,false,liftEither
src/PostgREST/App.hs,Data.Either.Combinators,false,,false,mapLeft
src/PostgREST/App.hs,Data.IORef,false,,false,IORef
src/PostgREST/App.hs,Data.IORef,false,,false,readIORef
src/PostgREST/App.hs,Data.List,false,,false,union
src/PostgREST/App.hs,Data.Time.Clock,false,,false,UTCTime
src/PostgREST/App.hs,Data.ByteString.Char8,true,BS8,,
src/PostgREST/App.hs,Data.ByteString.Lazy,true,LBS,,
src/PostgREST/App.hs,Data.Set,true,Set,,
src/PostgREST/App.hs,Hasql.DynamicStatements.Snippet,true,SQL,,
src/PostgREST/App.hs,Hasql.Pool,true,SQL,,
src/PostgREST/App.hs,Hasql.Transaction,true,SQL,,
src/PostgREST/App.hs,Hasql.Transaction.Sessions,true,SQL,,
src/PostgREST/App.hs,Network.HTTP.Types.Header,true,HTTP,,
src/PostgREST/App.hs,Network.HTTP.Types.Status,true,HTTP,,
src/PostgREST/App.hs,Network.HTTP.Types.URI,true,HTTP,,
src/PostgREST/App.hs,Network.Wai,true,Wai,,
src/PostgREST/App.hs,PostgREST.ApiRequest,true,ApiRequest,,
src/PostgREST/App.hs,PostgREST.Auth,true,Auth,,
src/PostgREST/App.hs,PostgREST.DbRequestBuilder,true,ReqBuilder,,
src/PostgREST/App.hs,PostgREST.DbStructure,true,DbStructure,,
src/PostgREST/App.hs,PostgREST.Error,true,Error,,
src/PostgREST/App.hs,PostgREST.Middleware,true,Middleware,,
src/PostgREST/App.hs,PostgREST.OpenAPI,true,OpenAPI,,
src/PostgREST/App.hs,PostgREST.QueryBuilder,true,QueryBuilder,,
src/PostgREST/App.hs,PostgREST.RangeQuery,true,RangeQuery,,
src/PostgREST/App.hs,PostgREST.Statements,true,Statements,,
src/PostgREST/App.hs,PostgREST.ApiRequest,false,,false,Action
src/PostgREST/App.hs,PostgREST.ApiRequest,false,,false,ApiRequest
src/PostgREST/App.hs,PostgREST.ApiRequest,false,,false,InvokeMethod
src/PostgREST/App.hs,PostgREST.ApiRequest,false,,false,Target
src/PostgREST/App.hs,PostgREST.Config,false,,false,AppConfig
src/PostgREST/App.hs,PostgREST.Error,false,,false,Error
src/PostgREST/App.hs,PostgREST.Types,false,,,
src/PostgREST/App.hs,Protolude,false,,true,Handler
src/PostgREST/App.hs,Protolude,false,,true,toS
src/PostgREST/App.hs,Protolude.Conv,false,,false,toS

Example usage in nix/devtools.nix, WIP.

@wolfgangwalther : This might be useful for many automated checks and analysis on our imports. Any comments or ideas?

@wolfgangwalther
Copy link
Member

That's really cool!

Some ideas we can use this for:

  • Automated check that no modules have different aliases between files (consistent aliases across the projects would increase readability)
  • Automated check for certain rules regarding unqualified imports, aliases etc. - would have to make up those rules first, though.
  • Manually use with -ddump-minmal-imports to find a set of commonly used imports across different files, that we could pack into our own prelude/protolude.
  • Manually use with -ddump-minmal-imports to find functions in the PostgREST namespace, that are only imported from 1 file. Could be an indication that those functions could better be moved to the importing file.
  • Maybe we can develop some kind of simple metric to "count" the imports within our namespace - to easily check the "complexity" of imports between different files. The goal would be to reduce this complexity of course, so that our dependency graphs are straight-forward to read, too.

WDYT?

@monacoremo
Copy link
Member Author

Those are great use cases! Thinking a bit about the best way to implement them, it will probably be easier to express them directly in Haskell instead of fiddling with the dump using nix/bash/python etc. We could even build a small tool around this with optparse-applicative:

hsimps dump --src ./src --imports /tmp/imports -s main -i imports2 Mod.hs Mod2.hs # dump imports as CSV (or Json with --json?)

hsimps graph-symbols … # print a dot file showing which symbols are imported from which modules

hsimps graph-modules … # print a dot file showing module dependencies (would replace the tooling I added earlier)

hsimps check-aliases … # check that aliases are consistent

hsimps check-wildcards … # check that imports are either qualified or explicit

…

Where

  • --src/-s includes a source directory
  • --imports/-i includes an imports directory created with -ddump-minimal-imports
  • Individual files (zero or more) can also be included, module name is parsed from file if available

This would help getting the module names right based on the filenames (as the module declaration is not included in the ddump, for example)

The nix/haskellimports/Main.hs would grow a bit, but it should be low maintenance and simplify everything else a lot. Does that make sense?

@wolfgangwalther
Copy link
Member

Those are great use cases! Thinking a bit about the best way to implement them, it will probably be easier to express them directly in Haskell instead of fiddling with the dump using nix/bash/python etc. We could even build a small tool around this with optparse-applicative:
[...]
The nix/haskellimports/Main.hs would grow a bit, but it should be low maintenance and simplify everything else a lot. Does that make sense?

Yes. Sounds better than dumping, reloading the file in python etc.

hsimps dump --src ./src --imports /tmp/imports -s main -i imports2 Mod.hs Mod2.hs # dump imports as CSV (or Json with --json?)

Personally, I like CSV more here. I can load this directly into Calc/Excel and play around.

--imports/-i includes an imports directory created with -ddump-minimal-imports

Not sure if that's possible with GHC as a lib - but can we remove the extra step of doing a dump to a tmp directory? Maybe just have a flag --use-minimal-imports or something like that to make the script operate on the output of that. If the flag is not given it operates on the "raw" imports in the source files.

@monacoremo
Copy link
Member Author

Still work in progress, but the basic idea works:

[nix-shell]$ hsie check-aliases --src src
The following imports have inconsistent aliases:

Module 'Data.ByteString' has the aliases:
  'B' in file:
    src/PostgREST/Config.hs
  'BS' in file:
    src/PostgREST/ApiRequest.hs
    src/PostgREST/Types.hs

Module 'Data.ByteString.Char8' has the aliases:
  'BS' in files:
    src/PostgREST/Config.hs
    src/PostgREST/Middleware.hs
    src/PostgREST/Private/QueryFragment.hs
    src/PostgREST/QueryBuilder.hs
    src/PostgREST/RangeQuery.hs
    src/PostgREST/Statements.hs
  'BS8' in file:
    src/PostgREST/App.hs

Module 'Data.ByteString.Lazy' has the aliases:
  'BL' in files:
    src/PostgREST/ApiRequest.hs
    src/PostgREST/Private/QueryFragment.hs
    src/PostgREST/Types.hs
  'LBS' in file:
    src/PostgREST/App.hs
    src/PostgREST/OpenAPI.hs

Module 'Data.HashMap.Strict' has the aliases:
  'HM' in file:
    src/PostgREST/Private/QueryFragment.hs
  'HashMap' in file:
    src/PostgREST/OpenAPI.hs
  'M' in files:
    src/PostgREST/ApiRequest.hs
    src/PostgREST/Auth.hs
    src/PostgREST/DbRequestBuilder.hs
    src/PostgREST/DbStructure.hs
    src/PostgREST/Middleware.hs
    src/PostgREST/Parsers.hs
    src/PostgREST/Types.hs

Module 'Data.Set' has the aliases:
  'S' in files:
    src/PostgREST/ApiRequest.hs
    src/PostgREST/DbRequestBuilder.hs
    src/PostgREST/DbStructure.hs
    src/PostgREST/Parsers.hs
    src/PostgREST/QueryBuilder.hs
    src/PostgREST/Types.hs
  'Set' in file:
    src/PostgREST/App.hs

Module 'Hasql.DynamicStatements.Snippet' has the aliases:
  'H' in files:
    src/PostgREST/Private/Common.hs
    src/PostgREST/Private/QueryFragment.hs
    src/PostgREST/QueryBuilder.hs
    src/PostgREST/Statements.hs
  'SQL' in file:
    src/PostgREST/App.hs

Module 'Hasql.Pool' has the aliases:
  'P' in file:
    src/PostgREST/Error.hs
  'SQL' in file:
    src/PostgREST/App.hs

Module 'Hasql.Transaction' has the aliases:
  'H' in file:
    src/PostgREST/Middleware.hs
  'HT' in file:
    src/PostgREST/DbStructure.hs
  'SQL' in file:
    src/PostgREST/App.hs

Module 'Network.HTTP.Types.Status' has the aliases:
  'HT' in file:
    src/PostgREST/Error.hs
  'HTTP' in file:
    src/PostgREST/App.hs

@monacoremo
Copy link
Member Author

Current outputs from hsie dump:
https://gist.github.com/monacoremo/a0b8277f9baa45238ca36610d924ecaf

@monacoremo
Copy link
Member Author

Personally, I like CSV more here. I can load this directly into Calc/Excel and play around.

We get both! The --json flag is optional, CSV is the default

Not sure if that's possible with GHC as a lib - but can we remove the extra step of doing a dump to a tmp directory? Maybe just have a flag --use-minimal-imports or something like that to make the script operate on the output of that. If the flag is not given it operates on the "raw" imports in the source files.

That's extremely tricky, as we would need to run the whole compiler machinery, compile all dependencies etc. I'll add a bash wrapper script that provides the --ddump-minimal-imports as an input

@wolfgangwalther
Copy link
Member

Just skimmed through the source code: It doesn't look like the exit code is any different for check-aliases, when there are any errors. Can we get a non-zero exit code here to possibly break tests?

@monacoremo
Copy link
Member Author

Just skimmed through the source code: It doesn't look like the exit code is any different for check-aliases, when there are any errors. Can we get a non-zero exit code here to possibly break tests?

Yes, great idea! Added.

Also added check-wildcards:

[nix-shell]$ hsie check-wildcards --ok Protolude --src src
Modules in the following files were imported as wildcards:

In src/PostgREST/ApiRequest.hs:
  Data.Ranged.Boundaries
  PostgREST.Types

In src/PostgREST/App.hs:
  PostgREST.Types

In src/PostgREST/Config.hs:
  Control.Applicative
  Data.Monoid
  Options.Applicative

In src/PostgREST/DbRequestBuilder.hs:
  Control.Applicative
  Data.Tree
  PostgREST.Parsers
  PostgREST.Types

In src/PostgREST/DbStructure.hs:
  PostgREST.Private.Common
  PostgREST.Types

In src/PostgREST/Error.hs:
  Network.HTTP.Types.Header
  PostgREST.Types

In src/PostgREST/Middleware.hs:
  Network.Wai
  Network.Wai.Middleware.RequestLogger
  PostgREST.Private.Common

In src/PostgREST/OpenAPI.hs:
  Control.Lens
  Data.Swagger

In src/PostgREST/Parsers.hs:
  Data.Tree
  PostgREST.Types
  Text.Parsec.Error
  Text.ParserCombinators.Parsec

In src/PostgREST/Private/Common.hs:
  Data.Maybe

In src/PostgREST/Private/QueryFragment.hs:
  Data.Maybe
  PostgREST.Private.Common
  PostgREST.Types

In src/PostgREST/QueryBuilder.hs:
  Data.Maybe
  PostgREST.Private.Common
  PostgREST.Private.QueryFragment
  PostgREST.Types

In src/PostgREST/RangeQuery.hs:
  Control.Applicative
  Data.Ranged.Boundaries
  Data.Ranged.Ranges
  Network.HTTP.Types.Header
  Network.HTTP.Types.Status

In src/PostgREST/Statements.hs:
  Data.Aeson
  Data.Maybe
  Network.HTTP.Types.Status
  PostgREST.Error
  PostgREST.Private.Common
  PostgREST.Private.QueryFragment
  PostgREST.Types

In src/PostgREST/Types.hs:
  Data.Tree

Nearly done with this :-)

@monacoremo monacoremo changed the title [WIP] Nix: Add a tool to list symbols imported from Haskell modules as csv Nix: Add a tool for checking Haskell imports and exports Apr 3, 2021
@monacoremo monacoremo marked this pull request as ready for review April 3, 2021 13:05
@monacoremo
Copy link
Member Author

Left for later PRs:

  • CI integration, will need to fix actual issues flagged by check-aliases and check-wildcards. For now, you can run hsie check-aliases -s src -s main in nix-shell
  • Analyzing exports (e.g., wildcard exports, unused exported symbols)

Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have the chance, yet, to actually run the tools, but will do so soon.

Doing the review, I will admit that I didn't go through the meat of the hsie functions in detail. The code looks very organized and well readable, though.

I thought briefly about possibly renaming the dump command, because I was initially a bit confused when reading the part about dump-minimal-imports. But dump is probably the best way to describe it...

nix/hsie/README.md Outdated Show resolved Hide resolved
nix/hsie/README.md Outdated Show resolved Hide resolved
nix/devtools.nix Outdated Show resolved Hide resolved
nix/devtools.nix Outdated Show resolved Hide resolved
nix/hsie/Main.hs Outdated Show resolved Hide resolved
nix/hsie/Main.hs Outdated Show resolved Hide resolved
nix/hsie/Main.hs Outdated Show resolved Hide resolved
monacoremo and others added 5 commits April 3, 2021 17:24
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
@monacoremo
Copy link
Member Author

Many thanks for your review @wolfgangwalther !

I thought briefly about possibly renaming the dump command, because I was initially a bit confused when reading the part about dump-minimal-imports. But dump is probably the best way to describe it...

Maybe dump-imports? Could make sense once we also analyze exports.

@wolfgangwalther
Copy link
Member

Maybe dump-imports? Could make sense once we also analyze exports.

I don't think I would have been less confused. But if we plan to have dump-exports, then that might make sense. But now dump-imports --imports xxx really is confusing :).

While trying the tools (see below), I thought about this a bit more and I think I now know where my confusion is coming from. When we look at everything that is done with --ddump-minimal-imports we are actually looking at usage data. So we have 3 parts to look at:

  • (real) import of symbols
  • export of symbols
  • usage of symbols

This will also be important later on, when you want to analyze exports - because you actually don't care about imports but usage of those.

I don't think it ever makes sense to dump imports (--src) and usage (--import) at the same time. I suggest the following interface, that should be a lot less confusing for me:

hsie dump-imports main src
hsie dump-exports main src
hsie dump-usage usage # assume --ddump-minimal-imports is saved in "usage"

Passing the folders to work on as arguments also avoids the ambiguity between --src and --imports. Basically abstract the whole --ddump-minimal-imports away as much as possible, except for mentioning it in the readme as the input to dump-usage.

This takes away the possibility of running the check- or graph- against the usage data. But I really don't think this is useful anyway - running those against the imports made does give us a good picture of how our code-base is set up.


I just tried the tools and noted the following:

  • Running postgrest-hsie-minimal-import dump, I get output like this:
    impSourceFile,impSourceType,impFromModule,impModule,impQualified,impAs,impHiding,impSymbol
    /run/user/1000/tmp.TmIMph9zFs/UnixSocket.imports,imports,UnixSocket,Network.Socket,not qualified,,not hiding,Family
    ....
    
    The first column is not really useful, I think - especially with the temporary path. Since impFromModule is taken from the filename, I don't think we lose anything by removing impSourceFile from the output. At least the first thing I do, when loading the CSV is removing the column :).
  • If possible, another column to tell whether the symbol is internal (aka within our code-base) or external, would be helpful. I can filter by PostgREST. prefix, but have to manually include UnixSocket all the time. This would also allow to add this as a cli flag.
  • Assuming the interface is changed as suggested above postgrest-hsie-minimal-import could be renamed postgrest-hsie-dump-usage and include the dump-usage subcommand (+ folder to work on) already.
  • The impHiding column naming is a bit off, when it includes "wildcard". Maybe something like this:
    data ImportType
      = Wildcard
      | Explicit
      | Hiding

@monacoremo
Copy link
Member Author

This takes away the possibility of running the check- or graph- against the usage data. But I really don't think this is useful anyway - running those against the imports made does give us a good picture of how our code-base is set up.

It should be useful e.g. when we analyze exports - we need the full internal imports from ddump-minimal-imports, and the exports from the actual source code. To graph the full symbol imports, we also need ddump-minimal-imports.

But agree that having -i and -s separated does not add value at this point, so I removed it and commands like hsie dump-imports src main work (as do hsie src dump-imports main for that matter based on the optparse-applicative magic..).

The first column is not really useful, I think - especially with the temporary path. Since impFromModule is taken from the filename, I don't think we lose anything by removing impSourceFile from the output. At least the first thing I do, when loading the CSV is removing the column :).

We need it to display the file that issues are found in, e.g. with check-aliases. Hiding them would add a bit of complexity which I'd rather avoid. I moved the fields to the end of the record, where they are less obtrusive hopefully!

If possible, another column to tell whether the symbol is internal (aka within our code-base) or external, would be helpful. I can filter by PostgREST. prefix, but have to manually include UnixSocket all the time. This would also allow to add this as a cli flag.

Great idea, and needed anyway e.g. for the graphing commands. Added.

Assuming the interface is changed as suggested above postgrest-hsie-minimal-import could be renamed postgrest-hsie-dump-usage and include the dump-usage subcommand (+ folder to work on) already.

dump-usage would be redundant, as dump-imports works as-is. Suggest that we keep the minimal-imports wording from GHC, as it's for specialized use and we avoid introducing new terminology. postgrest-hsie-minimal-import works with any command as is.

The impHiding column naming is a bit off, when it includes "wildcard". Maybe something like this:

Yes, much better! That's one up from what our GHC friends came up with (ideclHiding, which was super confusing for me :-) )

Copy link
Member

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried check-wildcards and it gives me an error:

$ hsie check-wildcards --ok Protolude src main
option --ok: cannot parse value `Protolude'

Usage: hsie check-wildcards [-o|--ok OKMODULE]
  Check that no unwanted modules are imported as wildcards

Just noting for now:
There is also a reverse approach for check-aliases. Some aliases are used for different modules in different files. Example:

file import alias
PostgREST.Config Data.Configurator C
Main Hasql.Connection C

This is probably not something that we can auto-detect easily, because there could be lots of false-positives, too. Think the different Hasql packages - could all be imported with SQL..

nix/hsie/Main.hs Outdated Show resolved Hide resolved
nix/hsie/Main.hs Outdated Show resolved Hide resolved
shell.nix Show resolved Hide resolved
monacoremo and others added 3 commits April 5, 2021 22:59
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
Co-authored-by: Wolfgang Walther <wolfgangwalther@users.noreply.github.com>
@monacoremo
Copy link
Member Author

Just tried check-wildcards and it gives me an error:

Good catch - not sure when that broke. Fixed!

We're reaching a point where we need a test suite, might be worth to spin this off at some point.

@monacoremo monacoremo merged commit 65e7f9e into PostgREST:main Apr 6, 2021
@monacoremo monacoremo deleted the importscsv branch April 6, 2021 09:46
@monacoremo
Copy link
Member Author

Many thanks for your review @wolfgangwalther !

monacoremo added a commit to monacoremo/postgrest that referenced this pull request Jul 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants