Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Compare files ignoring whitespace and comments #499

Closed
no-more opened this issue Nov 19, 2018 · 8 comments
Closed

Question: Compare files ignoring whitespace and comments #499

no-more opened this issue Nov 19, 2018 · 8 comments
Labels

Comments

@no-more
Copy link

no-more commented Nov 19, 2018

Hello,

this might be quite weird, but I'm looking for a way to compare two typescript file but ignoring comments and indentations.
I need to do this before saving a file, if no code changes detected I don't want the file to be saved (in same case).
I was wondering if there was any way to achieve this with ast or if I should code something myself.

Thanks.

@dsherret
Copy link
Owner

dsherret commented Nov 19, 2018

Yup, this is definitely something you can use the AST for. You could do this pretty easily with the typescript compiler API even.

Basically get all the nodes with no children (leaf nodes) in both source files and then compare their texts. You don't have to worry about comments because they aren't stored in the AST (read more).

For example:

import * as ts from "typescript";

const sourceFile1 = ts.createSourceFile("file.ts", "/* testing */ const t: string;", ts.ScriptTarget.Latest);
const sourceFile2 = ts.createSourceFile("file2.ts", "  const t  : string ; // test", ts.ScriptTarget.Latest);

console.log(areSame(sourceFile1, sourceFile2)); // true

function areSame(sourceFile1: ts.SourceFile, sourceFile2: ts.SourceFile) {
    const leafNodes1 = getLeafNodes(sourceFile1);
    const leafNodes2 = getLeafNodes(sourceFile2);

    while (true) {
        const leaf1 = leafNodes1.next();
        const leaf2 = leafNodes2.next();

        if (leaf1.done && leaf2.done)
            return true;
        if (leaf1.done || leaf2.done)
            return false;
        if (leaf1.value.getText(sourceFile1) !== leaf2.value.getText(sourceFile2))
            return false;
    }

    function* getLeafNodes(sourceFile: ts.SourceFile) {
        yield* searchNode(sourceFile);

        function* searchNode(node: ts.Node) {
            const children = node.getChildren(sourceFile);
            if (children.length === 0)
                yield node;
            else {
                for (const child of children)
                    yield* searchNode(child);
            }
        }
    }
}

It would be a similar idea if using this library:

import { Project, SourceFile, Node } from "ts-simple-ast";

const project = new Project();

const sourceFile1 = project.createSourceFile("file.ts", "/* testing */ const t: string;");
const sourceFile2 = project.createSourceFile("file2.ts", "  const t  : string ;  // test");

console.log(areSame(sourceFile1, sourceFile2)); // true

function areSame(sourceFile1: SourceFile, sourceFile2: SourceFile) {
    const leafNodes1 = getLeafNodes(sourceFile1);
    const leafNodes2 = getLeafNodes(sourceFile2);

    while (true) {
        const leaf1 = leafNodes1.next();
        const leaf2 = leafNodes2.next();

        if (leaf1.done && leaf2.done)
            return true;
        if (leaf1.done || leaf2.done)
            return false;
        if (leaf1.value.getText() !== leaf2.value.getText())
            return false;
    }

    function* getLeafNodes(sourceFile: SourceFile) {
        yield* searchNode(sourceFile);

        function* searchNode(node: Node) {
            const children = node.getChildren();
            if (children.length === 0)
                yield node;
            else {
                for (const child of children)
                    yield* searchNode(child);
            }
        }
    }
}

@no-more
Copy link
Author

no-more commented Nov 20, 2018

Thanks a lot, I'll try that later today.

@dsherret
Copy link
Owner

dsherret commented Nov 26, 2018

@no-more by the way, after reading #502 I realized that this is possible just using the scanner in the compiler api. It's the fastest solution as you wouldn't have to parse the entire file to figure out if they're the same.

For example:

import * as ts from "typescript";

console.log(areSame(
    { fileName: "./test.ts", text: "/* testing */ const t  :  \tstring;" },
    { fileName: "./test2.ts", text: "const t: string; // testing" }
));

interface FileInfo {
    fileName: string;
    text: string;
}

function areSame(fileInfo1: FileInfo, fileInfo2: FileInfo) {
    const tokens1 = getTokens(fileInfo1);
    const tokens2 = getTokens(fileInfo2);

    while (true) {
        const token1 = tokens1.next();
        const token2 = tokens2.next();

        if (token1.done && token2.done)
            return true;
        if (token1.done || token2.done)
            return false;
        if (token1.value !== token2.value)
            return false;
    }
}

function* getTokens(fileInfo: FileInfo) {
    const scanner = ts.createScanner(ts.ScriptTarget.Latest, true);
    scanner.setText(fileInfo.text);
    scanner.setOnError(message => console.error(message));
    scanner.setLanguageVariant(getLanguageVariantFromFileName(fileInfo.fileName));

    while (scanner.scan() !== ts.SyntaxKind.EndOfFileToken)
        yield scanner.getTokenText();
}

function getLanguageVariantFromFileName(fileName: string) {
    const lowerCaseFileName = fileName.toLowerCase();
    const isJsxOrTsxFile = lowerCaseFileName.endsWith(".tsx") || lowerCaseFileName.endsWith(".jsx")
    return isJsxOrTsxFile ? ts.LanguageVariant.JSX : ts.LanguageVariant.Standard;
}

@dsherret dsherret changed the title Weird question: compare files Question: Compare files ignoring whitespace and comments Nov 27, 2018
@no-more
Copy link
Author

no-more commented Nov 28, 2018

Sorry for the late response.
Your solution is working great! Thanks a lot!

@no-more no-more closed this as completed Nov 28, 2018
@unlight
Copy link

unlight commented Aug 11, 2020

Did not yet tested code above for my needs.
But is this somewhere published as npm module?

@dsherret
Copy link
Owner

I doubt it. Perhaps it would be appropriate to add this into ts-morph as a method. Maybe Node#areTokensEqual(node)?

@unlight
Copy link

unlight commented Aug 12, 2020

It's up to you. But I still did not yet test it.
My method should ignore comments, spaces, trailing commas, type of quotes, other stylistic stuff.

@dsherret dsherret reopened this Apr 3, 2021
@dsherret dsherret closed this as completed Apr 3, 2021
Repository owner locked and limited conversation to collaborators Apr 3, 2021
Repository owner unlocked this conversation Apr 3, 2021
@nicmr
Copy link

nicmr commented Feb 11, 2022

As a note to future readers of this issue, the above snippets will fail on newer typescript versions with the error error: TS7023 [ERROR]: 'searchNode' implicitly has return type 'any' because it does not have a return type annotation and is referenced directly or indirectly in one of its return expressions.

This is caused by a breaking change in Typescript Generators. I believe the breaking changes were introduced in typescript >= 4.2.

This can be addressed with the following type annotation (thanks to @jeremyFMP for helping me with this):

function areSame(sourceFile1: SourceFile, sourceFile2: SourceFile) {
    const leafNodes1 = getLeafNodes(sourceFile1);
    const leafNodes2 = getLeafNodes(sourceFile2);

    while (true) {
        const leaf1 = leafNodes1.next();
        const leaf2 = leafNodes2.next();

        if (leaf1.done && leaf2.done)
            return true;
        if (leaf1.done || leaf2.done)
            return false;
        if (leaf1.value.getText() !== leaf2.value.getText())
            return false;
    }

    function* getLeafNodes(sourceFile: SourceFile){
        yield* searchNode(sourceFile);

/* -> */function* searchNode(node: Node): Generator<Node, void, Node> {
            const children = node.getChildren();
            if (children.length === 0)
                yield node;
            else {
                for (const child of children)
                    yield* searchNode(child);
            }
        }
    }
}

My colleague @guidoschmidt was able to get it running using an Iterator annotation when targeting esnext instead.

function* getLeafNodes(sourceFile: ts.SourceFile): Iterator<Node> {
    yield* searchNode(sourceFile);
    function* searchNode(node: ts.Node) {
      const children = node.getChildren(sourceFile);
      if (children.length === 0) yield node;
      else {
        for (const child of children) yield* searchNode(child);
      }
    }
  }
}

Lastly and most obviously, the name of the project has since changed to ts-morph, which should be reflected in the import paths:

// Node.js
import { Project, SourceFile, Node } from "ts-morph";
// Deno
import { Project, SourceFile, Node } from "https://deno.land/x/ts_morph/mod.ts";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants