New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Compare files ignoring whitespace and comments #499
Comments
Yup, this is definitely something you can use the AST for. You could do this pretty easily with the typescript compiler API even. Basically get all the nodes with no children (leaf nodes) in both source files and then compare their texts. You don't have to worry about comments because they aren't stored in the AST (read more). For example: import * as ts from "typescript";
const sourceFile1 = ts.createSourceFile("file.ts", "/* testing */ const t: string;", ts.ScriptTarget.Latest);
const sourceFile2 = ts.createSourceFile("file2.ts", " const t : string ; // test", ts.ScriptTarget.Latest);
console.log(areSame(sourceFile1, sourceFile2)); // true
function areSame(sourceFile1: ts.SourceFile, sourceFile2: ts.SourceFile) {
const leafNodes1 = getLeafNodes(sourceFile1);
const leafNodes2 = getLeafNodes(sourceFile2);
while (true) {
const leaf1 = leafNodes1.next();
const leaf2 = leafNodes2.next();
if (leaf1.done && leaf2.done)
return true;
if (leaf1.done || leaf2.done)
return false;
if (leaf1.value.getText(sourceFile1) !== leaf2.value.getText(sourceFile2))
return false;
}
function* getLeafNodes(sourceFile: ts.SourceFile) {
yield* searchNode(sourceFile);
function* searchNode(node: ts.Node) {
const children = node.getChildren(sourceFile);
if (children.length === 0)
yield node;
else {
for (const child of children)
yield* searchNode(child);
}
}
}
} It would be a similar idea if using this library: import { Project, SourceFile, Node } from "ts-simple-ast";
const project = new Project();
const sourceFile1 = project.createSourceFile("file.ts", "/* testing */ const t: string;");
const sourceFile2 = project.createSourceFile("file2.ts", " const t : string ; // test");
console.log(areSame(sourceFile1, sourceFile2)); // true
function areSame(sourceFile1: SourceFile, sourceFile2: SourceFile) {
const leafNodes1 = getLeafNodes(sourceFile1);
const leafNodes2 = getLeafNodes(sourceFile2);
while (true) {
const leaf1 = leafNodes1.next();
const leaf2 = leafNodes2.next();
if (leaf1.done && leaf2.done)
return true;
if (leaf1.done || leaf2.done)
return false;
if (leaf1.value.getText() !== leaf2.value.getText())
return false;
}
function* getLeafNodes(sourceFile: SourceFile) {
yield* searchNode(sourceFile);
function* searchNode(node: Node) {
const children = node.getChildren();
if (children.length === 0)
yield node;
else {
for (const child of children)
yield* searchNode(child);
}
}
}
} |
Thanks a lot, I'll try that later today. |
@no-more by the way, after reading #502 I realized that this is possible just using the scanner in the compiler api. It's the fastest solution as you wouldn't have to parse the entire file to figure out if they're the same. For example: import * as ts from "typescript";
console.log(areSame(
{ fileName: "./test.ts", text: "/* testing */ const t : \tstring;" },
{ fileName: "./test2.ts", text: "const t: string; // testing" }
));
interface FileInfo {
fileName: string;
text: string;
}
function areSame(fileInfo1: FileInfo, fileInfo2: FileInfo) {
const tokens1 = getTokens(fileInfo1);
const tokens2 = getTokens(fileInfo2);
while (true) {
const token1 = tokens1.next();
const token2 = tokens2.next();
if (token1.done && token2.done)
return true;
if (token1.done || token2.done)
return false;
if (token1.value !== token2.value)
return false;
}
}
function* getTokens(fileInfo: FileInfo) {
const scanner = ts.createScanner(ts.ScriptTarget.Latest, true);
scanner.setText(fileInfo.text);
scanner.setOnError(message => console.error(message));
scanner.setLanguageVariant(getLanguageVariantFromFileName(fileInfo.fileName));
while (scanner.scan() !== ts.SyntaxKind.EndOfFileToken)
yield scanner.getTokenText();
}
function getLanguageVariantFromFileName(fileName: string) {
const lowerCaseFileName = fileName.toLowerCase();
const isJsxOrTsxFile = lowerCaseFileName.endsWith(".tsx") || lowerCaseFileName.endsWith(".jsx")
return isJsxOrTsxFile ? ts.LanguageVariant.JSX : ts.LanguageVariant.Standard;
} |
Sorry for the late response. |
Did not yet tested code above for my needs. |
I doubt it. Perhaps it would be appropriate to add this into ts-morph as a method. Maybe |
It's up to you. But I still did not yet test it. |
As a note to future readers of this issue, the above snippets will fail on newer typescript versions with the error This is caused by a breaking change in Typescript Generators. I believe the breaking changes were introduced in typescript >= 4.2. This can be addressed with the following type annotation (thanks to @jeremyFMP for helping me with this): function areSame(sourceFile1: SourceFile, sourceFile2: SourceFile) {
const leafNodes1 = getLeafNodes(sourceFile1);
const leafNodes2 = getLeafNodes(sourceFile2);
while (true) {
const leaf1 = leafNodes1.next();
const leaf2 = leafNodes2.next();
if (leaf1.done && leaf2.done)
return true;
if (leaf1.done || leaf2.done)
return false;
if (leaf1.value.getText() !== leaf2.value.getText())
return false;
}
function* getLeafNodes(sourceFile: SourceFile){
yield* searchNode(sourceFile);
/* -> */function* searchNode(node: Node): Generator<Node, void, Node> {
const children = node.getChildren();
if (children.length === 0)
yield node;
else {
for (const child of children)
yield* searchNode(child);
}
}
}
} My colleague @guidoschmidt was able to get it running using an Iterator annotation when targeting esnext instead. function* getLeafNodes(sourceFile: ts.SourceFile): Iterator<Node> {
yield* searchNode(sourceFile);
function* searchNode(node: ts.Node) {
const children = node.getChildren(sourceFile);
if (children.length === 0) yield node;
else {
for (const child of children) yield* searchNode(child);
}
}
}
} Lastly and most obviously, the name of the project has since changed to ts-morph, which should be reflected in the import paths: // Node.js
import { Project, SourceFile, Node } from "ts-morph";
// Deno
import { Project, SourceFile, Node } from "https://deno.land/x/ts_morph/mod.ts"; |
Hello,
this might be quite weird, but I'm looking for a way to compare two typescript file but ignoring comments and indentations.
I need to do this before saving a file, if no code changes detected I don't want the file to be saved (in same case).
I was wondering if there was any way to achieve this with ast or if I should code something myself.
Thanks.
The text was updated successfully, but these errors were encountered: