Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/char type #957

Merged
merged 56 commits into from
May 26, 2021
Merged

Feature/char type #957

merged 56 commits into from
May 26, 2021

Conversation

gluax
Copy link
Contributor

@gluax gluax commented May 13, 2021

Implements the basic char type in Leo.
Closes #939.
Closes #940.
Closes #942.
Closes #946.
Closes #950.

What it does is allow the base type into language.
Adds Parser Tests for the char type.
Adds Compiler Tests for the char type. Should I merge in the compiler tests branch first and have that branch dep on this one? Then we can merge this branch to master after that one is in, or we can merge this one into that branch.
The ASG passes both the character and the field version to the compiler. This is so the character can be printed, and the compiler uses the field portion for everything else.

The follow is allowed in Leo and those are also allowed in input files.

let a = 'a';
let b = '\t';
let c: char = '\u{2764}';
let d = '\u{00306E}';
let e = '❤️';
let f = 'の';
let g = '\x2A';
let h = '\x09';

Output files lose the single quotes on the char; should I force it to put them there? Or should I have them write the field to the output instead? Example of current behavior:

r: char = a;

Note: Operators are not yet implemented.

@gluax gluax added feature A new feature. Implementation labels May 13, 2021
@gluax gluax added this to the Leo Developer Preview III milestone May 13, 2021
@gluax gluax self-assigned this May 13, 2021
@codecov
Copy link

codecov bot commented May 13, 2021

Codecov Report

Merging #957 (bdcd945) into master (7aa3fa3) will increase coverage by 0.38%.
The diff coverage is 78.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #957      +/-   ##
==========================================
+ Coverage   74.97%   75.35%   +0.38%     
==========================================
  Files         429      435       +6     
  Lines       17170    17565     +395     
==========================================
+ Hits        12873    13236     +363     
- Misses       4297     4329      +32     
Impacted Files Coverage Δ
asg/src/error/mod.rs 54.19% <ø> (ø)
ast/src/errors/reducer.rs 5.55% <0.00%> (-1.59%) ⬇️
compiler/src/errors/expression.rs 2.22% <0.00%> (-0.06%) ⬇️
compiler/src/errors/function.rs 23.43% <0.00%> (-0.76%) ⬇️
compiler/src/errors/value/char.rs 0.00% <0.00%> (ø)
compiler/src/value/mod.rs 100.00% <ø> (ø)
input/src/errors/parser.rs 26.53% <0.00%> (-3.71%) ⬇️
input/src/expressions/expression.rs 6.06% <0.00%> (-0.61%) ⬇️
input/src/values/value.rs 14.28% <0.00%> (-1.72%) ⬇️
parser/src/tokenizer/mod.rs 97.50% <ø> (+12.31%) ⬆️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7aa3fa3...bdcd945. Read the comment docs.

Copy link
Collaborator

@acoglio acoglio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for putting single quotes (e.g. 'a') in output files, for greater clarity and syntactic uniformity.

Regarding support for

let g = '\xH2A';
let h = '\xH9';
let i = '\xO011';
let j = '\xO172';

actually the RFC (https://github.com/AleoHQ/leo/blob/master/docs/rfc/001-initial-strings.md#characters), and the ABNF PR (#954), only allow

let a = '\x0a';
let b = '\x7E';
let c = '\x44';
...

that is \x followed by an octal digit followed by a hex digit. The rationale was to follow Rust (https://doc.rust-lang.org/stable/reference/tokens.html#ascii-escapes) and to have something simple. So we need to change either the implementation, or the RFC and ABNF PR to make things consistent one way or the other.

@gluax
Copy link
Contributor Author

gluax commented May 14, 2021

Ah, I totally misread how that worked, @acoglio. Whoops. Either way, I don't mind changing the code to match the RFC to match rust. We could also change the RFC, but I'm more in favor of matching rust's behavior.

@acoglio
Copy link
Collaborator

acoglio commented May 14, 2021

I'm also in favor of matching Rust's behavior -- thanks.

@gluax
Copy link
Contributor Author

gluax commented May 14, 2021

Output changes now write chars as 'a'.
Hex variables have been changed.

Using #920 for tests. Though current tests that should pass currently fail till snarkVM has support for field equality.
Need to add more purposely failed test first, wanna see what I'm missing from codecov first.

damirka and others added 2 commits May 18, 2021 18:56
- currently uses back quotes "`" for strings, change later
- ast -> asg unimplemented, strings need to be processed on
canonicalization stage
@gluax
Copy link
Contributor Author

gluax commented May 18, 2021

I started on #946. This is possible bc even though field eq ops are not implemented yet, the function body still lives. The char type calls these existing templates on the field type. Thus when those are implanted the char ones will just succeed.

@gluax gluax requested review from acoglio and collinc97 May 24, 2021 18:26
Copy link
Collaborator

@acoglio acoglio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Contributor

@damirka damirka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good!

One note. Should we have consistency in naming of Tokens? We have CharLit and AddressLit tokens but string is StringLiteral. I can submit a quick patch.

UPD: done.

@damirka
Copy link
Contributor

damirka commented May 25, 2021

Found two issues:

  1. We can define multiple symbols within one char, it's going to take only first one. Should we keep it or error out?
let b: char = 'abcdefg';
let c: char = '\u{bbbbb}\u{aaaa}'; // works only with unicode escapes
let d: char = '😭😂😘';
  1. For some reason strings with emojis fail on getting subtendril:
let s = "😒";

// thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ValidationFailed',
// 5: leo_parser::tokenizer::lexer::<impl leo_parser::tokenizer::token::Token>::eat
//             at ./parser/src/tokenizer/lexer.rs:227:29

@damirka damirka self-requested a review May 25, 2021 13:29
@gluax
Copy link
Contributor Author

gluax commented May 25, 2021

@damirka #1 shouldn't be allowed but should be an easy fix; I definitely just messed that up when I changed the char parsing style.

#2 I believe it is probably an issue since a Unicode symbol is many u8 bytes, and since it's not an escape, I probably attempted to send over one byte over at a time. At least that's my guess.

I'll look into these as well and see what we can get fixed.

@gluax
Copy link
Contributor Author

gluax commented May 25, 2021

#1 has been resolved.
#2 was resolved by @damirka and me through some pair programming! Though it only supports up to 4 bytes. Meaning emojis similar to🤷🏿‍♀️ will not work. This is because the first byte of a Unicode specifies the length but in this case, this has more than 4 bytes. Not sure how to resolve that. Rust also fails to print this as one character and instead prints three. This is what Leo does as well now.

Copy link
Contributor

@damirka damirka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested multiple scenarios, and it looks like we got it working.

ast/src/expression/value.rs Outdated Show resolved Hide resolved
ast/src/input/input_value.rs Show resolved Hide resolved
ast/src/statements/console/formatted_string.rs Outdated Show resolved Hide resolved
parser/src/tokenizer/token.rs Outdated Show resolved Hide resolved
@gluax gluax requested a review from Protryon May 25, 2021 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature.
Projects
None yet
5 participants