Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Multibyte characters break positions #25

Closed
EgorBu opened this issue Aug 8, 2018 · 10 comments · Fixed by #51
Closed

Multibyte characters break positions #25

EgorBu opened this issue Aug 8, 2018 · 10 comments · Fixed by #51
Assignees

Comments

@EgorBu
Copy link

EgorBu commented Aug 8, 2018

Hi,
I used vis tool for position and found that positions are broken at this file.

/* @noflow */

import type { NodePath } from "@babel/traverse";
import wrapFunction from "@babel/helper-wrap-function";
import annotateAsPure from "@babel/helper-annotate-as-pure";
import * as t from "@babel/types";

const awaitVisitor = {
  Function(path) {
    path.skip();
  },

  AwaitExpression(path, { wrapAwait }) {
    const argument = path.get("argument");

    if (path.parentPath.isYieldExpression()) {
      path.replaceWith(argument.node);
      return;
    }

    path.replaceWith(
      t.yieldExpression(
        wrapAwait
          ? t.callExpression(t.cloneNode(wrapAwait), [argument.node])
          : argument.node,
      ),
    );
  },
};

export default function(
  path: NodePath,
  helpers: { wrapAsync: Object, wrapAwait: Object },
) {
  path.traverse(awaitVisitor, {
    wrapAwait: helpers.wrapAwait,
  });

  const isIIFE = checkIsIIFE(path);

  path.node.async = false;
  path.node.generator = true;

  wrapFunction(path, t.cloneNode(helpers.wrapAsync));

  const isProperty =
    path.isObjectMethod() ||
    path.isClassMethod() ||
    path.parentPath.isObjectProperty() ||
    path.parentPath.isClassProperty();

  if (!isProperty && !isIIFE && path.isExpression()) {
    annotateAsPure(path);
  }

  function checkIsIIFE(path: NodePath) {
    if (path.parentPath.isCallExpression({ callee: path.node })) {
      return true;
    }

    // try to catch calls to Function#bind, as emitted by arrowFunctionToExpression in spec mode
    // this may also catch .bind(this) written by users, but does it matter? 🤔
    const { parentPath } = path;
    if (
      parentPath.isMemberExpression() &&
      t.isIdentifier(parentPath.node.property, { name: "bind" })
    ) {
      const { parentPath: bindCall } = parentPath;

      // (function () { ... }).bind(this)()

      return (
        // first, check if the .bind is actually being called
        bindCall.isCallExpression() &&
        // and whether its sole argument is 'this'
        bindCall.node.arguments.length === 1 &&
        t.isThisExpression(bindCall.node.arguments[0]) &&
        // and whether the result of the .bind(this) is being called
        bindCall.parentPath.isCallExpression({ callee: bindCall.node })
      );
    }

    return false;
  }
}

and it's visualization

_____________

import type { ________ } from _________________;
import ____________ from _____________________________;
import ______________ from ________________________________;
import * as _ from ______________;

const ____________ = {
  ________(____) {
    ____.____();
  },

  _______________(____, { _________ }) {
    const ________ = ____.___(__________);

    if (____.__________._________________()) {
      ____.___________(________.____);
      return;
    }

    ____.___________(
      _._______________(
        _________
          ? _.______________(_._________(_________), _______________)
          : ________.____,
      ),
    );
  },
};

export default function(
  ______________,
  _________________________________________________,
) {
  ____.________(____________, {
    _________: _______._________,
  });

  const ______ = ___________(____);

  ____.____._____ = _____;
  ____.____._________ = ____;

  ____________(____, _._________(_______._________));

  const __________ =
    ____.______________() ||
    ____._____________() ||
    ____.__________.________________() ||
    ____.__________._______________();

  if (!__________ && !______ && ____.____________()) {
    ______________(____);
  }

  function ___________(______________) {
    if (____.__________.________________({ ______: ____.____ })) {
      return ____;
    }

    ____________________________________________________________________________________________
    ___________________________________________________________________________    const { p__________} = p____
    if (
      p__________i__________________) &&
      t_i____________p__________n____p________ { n____ "______})
    ) {
      const { p__________ b________} = p__________

      /_____________________________________
      return (
        /_____________________________________________________        b________i________________) &&
        /__________________________________________        b________n____a_________l______=== 1_&&
        t_i________________b________n____a_________0_) &&
        /____________________________________________________________        b________p__________i________________{ c______ b________n____})
      );
    }

    return f_____
  }
}

we can see that offset is broken by 1 symbol after this comment

    // try to catch calls to Function#bind, as emitted by arrowFunctionToExpression in spec mode
    // this may also catch .bind(this) written by users, but does it matter? 🤔

could it be because of special character 🤔?

@EgorBu
Copy link
Author

EgorBu commented Aug 9, 2018

One more file: probably because of 💾

@dennwc dennwc added the bug label Nov 7, 2018
@dennwc dennwc changed the title [bug] positions are broken Multibyte characters break positions Nov 7, 2018
@dennwc
Copy link
Member

dennwc commented Jan 9, 2019

I cannot reproduce this issue.

According to the comment on the Offset field:

Offset is ... an absolute byte offset.

@EgorBu Maybe you are using the offset as a UTF8 character offset instead of a byte offset?

@dennwc dennwc added the question label Jan 9, 2019
@dennwc dennwc self-assigned this Jan 9, 2019
@dennwc
Copy link
Member

dennwc commented Jan 9, 2019

OK, nevermind, I reproduced an issue. The native driver returns JS offsets in UTF8 runes, not bytes.

@zurk
Copy link

zurk commented Jan 10, 2019

@dennwc do you plan to fix it not only in the lastest v2 driver but for the latest v1 too?
yeah, we still use old drivers, sorry for that, but as soon as we have enough time we plan to switch to v2.

@dennwc
Copy link
Member

dennwc commented Jan 10, 2019

I don't think we can backport it down to v1, but all new drivers should be able to run in v1 compatibility mode.

@zurk
Copy link

zurk commented Jan 11, 2019

oh, I did not know it is possible. Sorry for offtopic, but how can I enable it? From my experiments, I remember that pure driver replacement gives different UAST structure and it breaks ML team code.

@zurk
Copy link

zurk commented Jan 23, 2019

@dennwc what about v1 compatibility mode and my last question?

@dennwc
Copy link
Member

dennwc commented Jan 23, 2019

@zurk Yeah, sorry, missed the notification.

I checked the code, v1 compatibility is still enabled in the new bblfshd releases. So you can use new driver to get the good old Node structure.

But, since we fixed lots of things in new drivers, you might see some slight changes in the AST. Those are mostly bug fixes, but I don't know if they are severe enough to break assumptions in your pipeline.

I would say you should give it a try and feel free to ping me if you find any differences that break the pipeline. Some might be fixable upstream.

@zurk
Copy link

zurk commented Jan 23, 2019

ok, thank you, @dennwc! I will check it.

@creachadair
Copy link
Contributor

See also bblfsh/documentation#228, which was motivated by issues like this. I think we will need to make a clearer contract about what constitutes "acceptable" changes to the UAST structure as a result of bug fixes and natural evolution

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants