Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verilog source files are misidentified as Coq sources #520

Open
ravenexp opened this issue May 10, 2020 · 5 comments
Open

Verilog source files are misidentified as Coq sources #520

ravenexp opened this issue May 10, 2020 · 5 comments

Comments

@ravenexp
Copy link

languages.json lists *.vg as a Verilog source file extension, while everyone has been using *.v for Verilog sources for decades.
I have never seen a *.vg Verilog source file in my life.

@NickHackman
Copy link
Contributor

The issue with just changing Verilog to *.v is then it conflicts with Coq.

Scc handles this conflict by more intelligently guessing the filetype by looking for keywords in the first 20,000 lines of code. This is something that could be implemented in Tokei.

Downsides

  • Implementing this in Tokei would require a lot more information and therefore a more bloated languages.json.
  • Guessing the filetype could reduce performance

If @XAMPPRocky decides this is worth doing then I'll be happy to implement a similar solution to Scc 😄

@XAMPPRocky
Copy link
Owner

@NickHackman Thank you for your interest. At this point I don't want to add heuristics that are based on the source code for the downsides you mentioned, as well as I don't think that added complexity would add much. If you're interested in a solution, I had a design to resolve this that allows users to override the extensions as part of .tokeirc. I didn't release it because the toml library had a limitation where it wouldn't parse the languages map into HashMap<LanguageType, LanguageConfig>, but that might not be the case anymore.

columns = 80
treat_doc_strings_as_comments = true

[languages.Verilog]
extensions = ["v"]

@NickHackman
Copy link
Contributor

@XAMPPRocky sadly that doesn't work for directories that contain both Verilog and Coq files. I have no idea who has that sort of file structure or really what a Coq file is, but still.

In the future it would be nice if Tokei gained some of the features that scc has over it.

@XAMPPRocky
Copy link
Owner

@NickHackman It's true that it doesn't cover that case, but I would consider that quite pathological, a project that is using the same file extension for two different languages in the same source directory is not something I've ever seen, and I would need some pretty heavy convincing that it would actually be useful to someone. This concern could also be partially if not fully addressed by allowing .tokeirc's to work recursively but that's also a lot of work.

@lf-
Copy link

lf- commented Jan 7, 2021

Also watch out for introducing this bug linguist got with this particular language pair: github-linguist/linguist#5041

Verilog has synthesis attributes in (* ... *), which will get misidentified as Coq comments. It looks like the current comment detector used by tokei may hit this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants