Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes and improvements for emit/__emit #211

Merged
merged 43 commits into from
Dec 30, 2017

Conversation

Daniel-Cortez
Copy link
Contributor

@Daniel-Cortez Daniel-Cortez commented Nov 18, 2017

  • Changed the syntax of the `emit`/`__emit` operator to more expression/statement-like.
    Now multiple instructions can be put within a single `emit` by being enclosed in parentheses.
    Example:
    emit(const.pri 0, add.c 1, add.c 2);
    printf("%d, %d", emit const.pri 0, emit(const.pri 0, add.c 1, add.c 2));
    emit const.pri 0, printf(""), emit(const.pri 0, add.c 1, add.c 2);

    Also single-line emit statements require a trailing semicolon.

    emit const.pri 0 // error 001: expected token: ";", but found "emit"
    emit const.pri 0; // ok
  • Added an ability to use expressions for the arguments of type "numeric value".
    Example:
    const SOME_CONST = 500;
    emit const.pri (SOME_CONST * 3 + 1);
    emit const.alt ((SOME_CONST / 2 - 1) % 2);
    

    Note that the expression should be enclosed in parentheses in order to be properly recognized by the compiler.

    Also it's possible to use the "-" sign before single named constants.

    const SOME_CONST = 1;
    emit const.pri -SOME_CONST; // PRI = -1
    
    static some_array[] = { 0, 1, 2, 3, 4, 5 };
    static some_var;
    new const some_var_offset = emit { const.pri some_var }, emit { add.c -some_array };
  • Made error messages for argument type mismatch more informative.
    Example:
    main()
    {
        new x;
        emit load.pri x // 'x' is a local variable located in the stack/heap section
                        // while 'load.pri' reads from the data section.
    }

    Output before this PR:

    error 017: undefined symbol "x"
    

    After:

    error 001: expected token: "-data offset-", but found "-local variable-"
    
  • Added argument type check for opcodes that take local variable offsets.
    Example:
    SomeFunction()
    {
        static x;
        new y;
    
        emit load.s.pri x; // error 001: expected token: "-local variable-", but found "-data offset-"
    
        emit load.s.pri y; // ok
    
        // The use of named constants is allowed as well
        const FUNC_ARGS_SIZE_OFFSET = 8;
        new const num_args = emit { load.s.pri FUNC_ARGS_SIZE_OFFSET } / (cellbits / charbits);
    }
  • Now when macro-optimisations are enabled (the '-O2' key) 'emit sysreq.c ' turns into the following sequence:
    const.pri <ntvid>
    sysreq.pri
    

    This is because in order to call native functions the compiler uses either 'sysreq.n' or 'sysreq.c', not both, so having both 'sysreq.c' and 'sysreq.n' instructions in one script can be considered invalid.

    For the same reason emit sysreq.n <ntvid> <argsize> is turned into a sequence of non-macro instructions when macro-optimisations aren't enabled, so now sysreq.n can be safely used in SA-MP gamemodes as a macro for native function calls.

    Additionally all macro-opcodes can be automatically replaced by non-macro ones.
    Example:

    static const fmt_str[] = "%s\n";
    static const str[] = "Hello world";
    
    emit push2.c str fmt_str;
    emit sysreq.n printf (2 * cellbits / charbits);

    When compiled with no macro-optimisations the code above turns into

    push.c 10 ; "Hello world"
    push.c 0  ; "%s\n"
    push.c 8
    sysreq.c printf
    stack 12
    

    This should allow to write more elegant and portable code with the emit/__emit operator.

  • Added proper argument checking for opcodes 'lctrl', 'sctrl', 'lodb', 'strb', 'sysreq.c' and 'sysreq.n'.
    // Only IDs 0-7 are valid: 0=COD, 1=DAT, 2=HEA, 3=STP,
    // 4=STK, 5=FRM, 6=CIP, 7=JIT status (1=enabled, 0=disabled/not present).
    emit lctrl -1; // error 050: invalid range
    emit lctrl 8; // error 050: invalid range
    
    emit lctrl 2; // ok
    
    emit sysreq.c 0; // error 001: expected token: "-native function-", but found "-integer value-"
    emit sysreq.c printf; // ok
    
    const regSTK = 4;
    emit lctrl regSTK; // ok
    
    // Registers 0, 1, 3 and 7 are read-only.
    emit sctrl 3; // error 050: invalid range
    emit sctrl 4; // ok

    "Invalid range" is probably not the best description for this kind of error, but I'm not sure if adding a new error type to sc5.c (093?) is a good idea, so I'm leaving it as is for now.

  • Changed the way opcode 'casetbl' is handled.
    The number of further case table entries and the 'default' label are now passed to 'casetbl', not the leading 'case' entry.
    Before this PR:
        emit
        {
            load.s.pri x
            switch lbl_switch_casetbl
        lbl_switch_casetbl:
            casetbl
            case 3 lbl_switch_default
            case 0 lbl_switch_case0
            case 1 lbl_switch_case1
            case 2 lbl_switch_case2
        }

    After:

        emit
        {
            load.s.pri x
            switch lbl_switch_casetbl
        lbl_switch_casetbl:
            casetbl 3 lbl_switch_default
            case 0 lbl_switch_case0
            case 1 lbl_switch_case1
            case 2 lbl_switch_case2
        }
  • Fixed multiple occurences of NULL pointer dereference.
    Example of code that could have made the compiler crash (before this PR):
    main()
    {
        new y;
        emit load.pri y;
    }
  • Now the code generated by `emit`/`__emit` within expressions is not being changed by the bytecode optimizer.
  • Fixed a bug whence the sign bit of negative floating-point numbers wasn't set correctly on 64-bit cells.
    (Also fixed the same bug in '#emit'.)
  • Other minor fixes and overall code cleanup.

Daniel-Cortez and others added 30 commits November 11, 2017 20:50
The instruction name is copied in lower case anyway (see emit_parse_line()).
Conflicts:
	source/compiler/sc1.c
This changes the way 'switch', 'casetbl' and 'case' are handled; now 'casetbl' can't be used directly and the arguments for the first case table entry (the number of entries and 'default' label) are specified to 'switch'.
Example:
    emit
    {
        load.s.pri my_var
        switch 3 lbl_deafult
        case 0 lbl_0
        case 2 lbl_2
        case 4 lbl_4
    }
It was copied from sc2.c with a check for a trailing '}' added and was too specific to be reused anywhere else.
Example:
    main()
    {
        new x;
        emit load.pri x
    }
Before this commit:
    error 017: undefined symbol "x"
After:
    error 001: expected token: "-data offset-", but found "-local variable-"
…ram_num()

Now it should be possible to do this:
    const SOME_VAL = 1;
    emit const.pri -SOME_VAL

or this:
    static some_array[] = { 0, 1, 2, 3, 4, 5 };
    static some_var;
    emit { const.pri some_var };
    new some_var_offset = emit { add.c -some_array };
Now the function takes the array itself before the array size, as the argument handlers in sc1.c do.
@Y-Less
Copy link
Member

Y-Less commented Nov 29, 2017

In that case, why not just branch based on the number of values, since CASETBL has a count already - >30, binary search; < 30, linear.

@Daniel-Cortez
Copy link
Contributor Author

It's not that simple. The results of the test above may differ on machines other than my PC, especially on those with different architecture.
I just ran the same test on my tablet (ARMv8, 1.1GHz, 1GB RAM), in there binary search starts to work faster than linear search at 10 case table entries.

@Daniel-Cortez
Copy link
Contributor Author

Anyway, I changed the syntax of emit/__emit in order for it to look more natural.
Here are some examples:

// No need to use curly brackets with 'emit' within expressions anymore
new x = emit const.pri 1;
new y = (emit stack 0, emit move.pri);

// Now ';' can be optionally put at the end
// of the line for single-line syntax
emit const.pri 1;
emit zero.pri // Omitting ';' is still valid

// For multiline syntax putting ';' after the closing brace
// is no longer valid since it goes against all other syntax
emit{
	nop
}; // error 036: empty statement

@Southclaws
Copy link
Collaborator

Omitting ';' is still valid

I think this should respect the -;+ behaviour. i.e.: when -;+ is set, ; is not optional. It seems like this is moving towards using statement/expression syntax more than preprocessor directives so it would only make sense to conform it to the same rules as other statement/expression grammar.

@Daniel-Cortez
Copy link
Contributor Author

I think this should respect the -;+ behaviour. i.e.: when -;+ is set, ; is not optional.

Don't want to sound rude or anything, but how would anyone benefit from this?
I made the semicolon optional so the end user could choose if they want emit/__emit syntax to be close to either real statements/expressions or the #emit directive.

@Daniel-Cortez
Copy link
Contributor Author

Made the compiler issue an error if a label within the emit/__emit block is redefined.
Example:

emit
{
my_label:
    jump my_label
my_label:
}

Before: <no errors/warnings>
After: error 021: symbol already defined: "my_label"

Also fixed a similar bug with regular labels, but made it as a separate PR since it's not related to emit/__emit.

@Y-Less
Copy link
Member

Y-Less commented Dec 2, 2017

Don't want to sound rude or anything, but how would anyone benefit from this?
I made the semicolon optional so the end user could choose if they want emit/__emit syntax to be close to either real statements/expressions or the #emit directive.

Surely that's why the flag exists in the first place. The semi-colon is already optional, and you can already omit it if you choose. Or, you can choose to make it required with a flag. In that case, the user has made an explicit choice, and ignoring it for the sake of more "choice" is very counter-intuative.

@Daniel-Cortez
Copy link
Contributor Author

Surely that's why the flag exists in the first place. The semi-colon is already optional, and you can already omit it if you choose. Or, you can choose to make it required with a flag. In that case, the user has made an explicit choice, and ignoring it for the sake of more "choice" is very counter-intuative.

This logic could work in theory, but in reality the only people I've seen not using -;+ did so just because they used some other code editor than pawno and didn't know about this flag, not because they didn't want to use semicolons.
There still might be people that could want to use emit/__emit with the same syntax as the #emit directive, without a mandatory semicolon at the end of line, so I'm not going to risk changing that. At least, not after a poll that could represent the preferences among the community, that is.

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Dec 3, 2017

Another commit, now it should be possible to use constant expressions as opcode arguments.
Examples:

emit const.pri ((MAX_PLAYERS - 4 * 3) + 1);

new x = random(1);
emit
{
    load.s.pri x
    add.c (MAX_PLAYERS / 2) // MAX_PLAYERS / 2 + x
}

Note that the expressions must be enclosed in parentheses.

Expressions can be used with const.pri/alt, add.c, jrel, push.c and other opcodes that take arguments of type "-numeric value-".
I'll probably make a table of opcode argument types for inclusion into the wiki later.

@Y-Less
Copy link
Member

Y-Less commented Dec 4, 2017

At least, not after a poll that could represent the preferences among the community, that is.

What poll?

And this again comes back to the same issue I mentioned before about making this a full expression instead of unique extra syntax. Using braces and not semi-colons within expressions is not done anywhere else, although I like that it now supports the normal way of chaining expressions with new y = (emit stack 0, emit move.pri);. #if and if don't have the same syntax, because one is a pre-processor directive and the other is a statement. Or a better example is #assert and assert.

#assert MAX_PLAYERS > 10
assert(GetPlayerPoolSize() > 10);

Are these now possible:

printf("%d", (emit LCTRL 3, emit HEAP -4, emit STOR.I, emit PUSH.alt)), emit HEAP 4;
printf("%d", emit { LCTRL 3; HEAP -4; STOR.I; PUSH.alt }), emit HEAP 4;

? I don't mind the brace syntax in that context quite so much, though if we were really following the standard syntax and precedence rules these could both be valid:

printf("%d", (emit LCTRL 3, emit HEAP -4, emit STOR.I, emit PUSH.alt)), emit HEAP 4;
printf("%d", emit (LCTRL 3, HEAP -4, STOR.I, PUSH.alt)), emit HEAP 4;

Again, no reason for new subtly different syntax when the existing syntax works. This would see emit as more like char - a keyword attached to expressions rather than a full statement. Still slightly unique in that it is prefix, and also a statement as well. So it has its own unique properties, but IMHO should be kept as close to the rules of the language as possible.

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Dec 4, 2017

Are these now possible:

printf("%d", (emit LCTRL 3, emit HEAP -4, emit STOR.I, emit PUSH.alt)), emit HEAP 4;
printf("%d", emit { LCTRL 3; HEAP -4; STOR.I; PUSH.alt }), emit HEAP 4;

?

Not exactly, you can't chain emit HEAP 4 at the end, it's only possible in expressions. I'd like to implement it myself, but right now I'm not sure how to do that.
Also you can't use multiline syntax in expressions like in your example, but I'll probably try to implement something like that as well.

@Daniel-Cortez
Copy link
Contributor Author

Fixed a bug that caused bad code generation when using emit/__emit within expressions.

const MAX_PLAYERS = 1000;
new x = 1 + emit const.pri (MAX_PLAYERS / 2) + 4; // x = MAX_PLAYERS / 2 + 5;

This code caused the compiler to erase all code in the staging buffer that was before the emit expression (in this case it's stack allocation for variable x). Now this is fixed.

I also fixed the instructions generated by emit/__emit within expressions being altered by the optimizer.
Example:

new x = (emit const.pri 0, emit add.c 1, emit add.c 1);

Code generated before the fix:

stack fffffffc
zero.pri
add.c 2
stor.s.pri fffffffc

After:

stack fffffffc
const.pri 0
add.c 1
add.c 1
stor.s.pri fffffffc

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Dec 6, 2017

Since a compiled script that uses both sysreq.c and sysreq.n instructions might be considered invalid by advanced abstract machines (depending on the bytecode optimisation level, the compiler either uses sysreq.c or sysreq.n, not both), I implemented compatibility fallbacks for these opcodes.

Now when sysreq.n is used within emit/__emit and macro optimisations are disabled (-O0, -O1 or -d3), the compiler would automatically turn it into a chain of non-macro instructions.
Example:

emit sysreq.n printf 8;

With -O2 the output is just as it should be

sysreq.n printf 8

but with -O0, -O1 or -d3 it will result into

push.c 8
sysreq.c printf
stack 12

The similar rule applies to sysreq.c, but in order to convert it into sysreq.n the compiler needs to know the size of function arguments, which is passed in a separate opcode (push.c) and there's no easy way to get it, so I made the compiler turn sysreq.c <ntvid> into a chain of const.pri <ntvid> \ sysreq.pri (sysreq.pri can be safely used with sysreq.n).

These changes won't really affect SA-MP users since macro opcodes won't work on the SA-MP server, but

  • I'm making this set of changes/improvements for both this project and my own fork of Pawn 3.2, and having any differences between the two of them would require extra work of conflict resolving on my side, which may complicate making further changes for this PR.
  • IMHO, it wouldn't really hurt to have this feature here. Besides, a single sysreq.n <ntvid> <argsize> might be more convenient to use than a chain of push.c <argsize> \ sysreq.c <ntvid> \ stack <argsize + sizeof(cell)>.

@Daniel-Cortez
Copy link
Contributor Author

Apparently the 'anti-optimisation' trick for instructions within emit/__emit expressions messed up function calls, so I had to rework it again. Now bytecode optimisation is disabled for the whole expression that contains a emit subexpression.

I also made some improvements to the syntax, now when emit is used in expressions, multiple instructions can be put within a single emit by being enclosed in parentheses.
Here are some examples:

emit(const.pri 0, add.c 1, add.c 2);
printf("%d, %d", emit const.pri 0, emit(const.pri 0, add.c 1, add.c 2));
emit const.pri 0, printf(""), emit(const.pri 0, add.c 1, add.c 2);

A trailing ; is no longer optional for single-line emit statements.

emit const.pri 0 // error 001: expected token: ";", but found "emit"
emit const.pri 0; // ok

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Dec 17, 2017

Updated the 1'st comment with all of the latest changes.

I suppose this PR should be considered ready to be merged, since there are no any further suggestions (of course, the final decision is yours, @Zeex ).

@Zeex Zeex merged commit 064d352 into pawn-lang:master Dec 30, 2017
@Daniel-Cortez Daniel-Cortez deleted the emit_op_fixes branch January 6, 2018 10:52
@Daniel-Cortez Daniel-Cortez mentioned this pull request Nov 8, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants