Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests #14

Closed
wants to merge 39 commits into from
Closed

Tests #14

wants to merge 39 commits into from

Conversation

kirawi
Copy link
Contributor

@kirawi kirawi commented Jul 19, 2021

cc @dfrg

I'm having some trouble with a few tests, so could you possibly take a look at what I need to do? I'll also take a look myself tomorrow. The results should be in the CI. I would guess around 200-300 of the fails are because of this:
[uni1A45|uni1A32@592,0|uni1A5B.ratha_nohost@1523,0|uni1A69@1523,0]
versus
[uni1A45|uni1A321A5B@592,0|uni1A69@1184,-734]
but I'm not sure how to do this.

(The code is very terrible right now, it'll be much better by merging time).

Generator:

```rs use std::{ fmt::{self, Display}, fs::File, io::{BufWriter, Write}, mem, path::PathBuf, };

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct ParseError;

impl fmt::Display for ParseError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "missing closing quote")
}
}

impl std::error::Error for ParseError {}

enum State {
/// Within a delimiter.
Delimiter,
/// After backslash, but before starting word.
Backslash,
/// Within an unquoted word.
Unquoted,
/// After backslash in an unquoted word.
UnquotedBackslash,
/// Within a single quoted word.
SingleQuoted,
/// Within a double quoted word.
DoubleQuoted,
/// After backslash inside a double quoted word.
DoubleQuotedBackslash,
/// Inside a comment.
Comment,
InPath,
}

pub fn split(s: &str) -> Result<Vec, ParseError> {
use State::*;

let mut words = Vec::new();
let mut word = String::new();
let mut chars = s.chars();
let mut state = Delimiter;

loop {
    let c = chars.next();
    state = match state {
        Delimiter => match c {
            None => break,
            Some('\'') => SingleQuoted,
            Some(c @ '[') => {
                word.push(c);
                SingleQuoted
            }
            Some('\"') => DoubleQuoted,
            Some('\\') => Backslash,
            Some('\t') | Some(' ') | Some('\n') | Some(':') | Some(';') => Delimiter,
            Some('#') => Comment,
            Some(c) => {
                word.push(c);
                Unquoted
            }
        },
        Backslash => match c {
            None => {
                word.push('\\');
                words.push(mem::replace(&mut word, String::new()));
                break;
            }
            Some('\n') => Delimiter,
            Some(c) => {
                word.push(c);
                Unquoted
            }
        },
        Unquoted => match c {
            None => {
                words.push(mem::replace(&mut word, String::new()));
                break;
            }
            Some('\'') => SingleQuoted,
            Some(c @ '[') => {
                word.push(c);
                SingleQuoted
            }
            Some('\"') => DoubleQuoted,
            Some('\\') => UnquotedBackslash,
            Some('\t') | Some(' ') | Some('\n') | Some(':') | Some(';') => {
                words.push(mem::replace(&mut word, String::new()));
                Delimiter
            }
            Some(c @ '/') => {
                word.push(c);
                InPath
            }
            Some(c) => {
                word.push(c);
                Unquoted
            }
        },
        UnquotedBackslash => match c {
            None => {
                word.push('\\');
                words.push(mem::replace(&mut word, String::new()));
                break;
            }
            Some('\n') => Unquoted,
            Some(c) => {
                word.push(c);
                Unquoted
            }
        },
        SingleQuoted => match c {
            None => return Err(ParseError),
            Some(c @ ']') => {
                word.push(c);
                Unquoted
            }
            Some('\'') => Unquoted,
            Some(c) => {
                word.push(c);
                SingleQuoted
            }
        },
        DoubleQuoted => match c {
            None => return Err(ParseError),
            Some('\"') => Unquoted,
            Some('\\') => DoubleQuotedBackslash,
            Some(c) => {
                word.push(c);
                DoubleQuoted
            }
        },
        DoubleQuotedBackslash => match c {
            None => return Err(ParseError),
            Some('\n') => DoubleQuoted,
            Some(c @ '$') | Some(c @ '`') | Some(c @ '"') | Some(c @ '\\') => {
                word.push(c);
                DoubleQuoted
            }
            Some(c) => {
                word.push('\\');
                word.push(c);
                DoubleQuoted
            }
        },
        Comment => match c {
            None => break,
            Some('\n') => Delimiter,
            Some(_) => Comment,
        },
        InPath => match c {
            None => break,
            Some(';') => {
                words.push(mem::replace(&mut word, String::new()));
                Delimiter
            }
            Some(c) => {
                word.push(c);
                InPath
            }
        },
    }
}

Ok(words)

}

#[derive(Default, Debug)]
struct TestCase {
name: String,
path: PathBuf,
font_size: usize,
features: Vec<(String, u16)>,
variations: Vec<(String, f32)>,
input: Vec,
output: Vec,
}

impl Display for TestCase {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
if self.output.contains(&"*".to_string()) {
write!(
f,
"create_test!({}, {:?}, {}, &{:?}, &{:?}, &{:?});",
self.name,
self.path.to_str().unwrap(),
self.font_size,
self.features,
self.variations,
self.input,
)
} else {
write!(
f,
"create_test!({}, {:?}, {}, &{:?}, &{:?}, &{:?}, &{:?});",
self.name,
self.path.to_str().unwrap(),
self.font_size,
self.features,
self.variations,
self.input,
self.output,
)
}
}
}

impl TestCase {
pub fn parse(arguments: Vec) -> Self {
let mut case = Self::default();
case.font_size = 75;

    for arg in arguments.iter() {
        if arg.starts_with("../") {
            // Path
            case.path = arg.into();
        } else if arg.starts_with("--") {
            // Features/Variations
            let arg = arg.strip_prefix("--").unwrap();
            if arg.contains("=") {
                // Variation
                let split: Vec<_> = arg.split("=").collect();
                let variation: (String, f32) = (
                    split.first().unwrap().to_string(),
                    split.last().unwrap().parse().unwrap_or(0.0),
                );
                case.variations.push(variation);
            } else {
                // Feature
                case.features.push((arg.to_string(), 1));
            }
        } else if arg.starts_with("[") || arg == "*" {
            // Output
            let output = arg.trim_matches(|chr| chr == '[' || chr == ']');
            for chr in output.split('|') {
                // if chr.contains('=') {
                    // Failed attempt at parsing runs/positions.
                    // let mut split = chr.split('=');
                    // let unicode = split.next().unwrap().to_string();
                    // let mut split_for_offsets = split.next().unwrap().split('+');
                    // let x = split_for_offsets.next().unwrap().parse().expect(chr);
                    // let y = split_for_offsets.next().unwrap().parse().unwrap();
                    // case.output.push((unicode, x, y));
                // } else {
                    case.output.push(chr.to_string());
                // }
            };
        }
    }

    let input = arguments
        .get(
            arguments
                .len()
                .checked_sub(2)
                .unwrap_or_else(|| panic!("{:#?}", arguments)),
        )
        .unwrap();
    for chr in input.split(',') {
        let chr = chr.trim_start_matches("U+");
        let parsed_chr: char =
            char::from_u32(u32::from_str_radix(chr, 16).expect(chr)).expect(chr);
        case.input.push(parsed_chr);
    }

    case
}

}

fn main() -> std::io::Result<()> {
let input = PathBuf::from("data");
let output = PathBuf::from("output");
let rendering_tests = input.join("text-rendering-tests");
let aot_tests = input.join("aots");
let inhouse_tests = input.join("in-house");
std::fs::create_dir_all(output.clone())?;

if inhouse_tests.exists() {
    let directory = std::fs::read_dir(inhouse_tests.join("tests"))?
        .flatten()
        .filter(|file| file.file_type().unwrap().is_file());
    let mut file = BufWriter::new(File::create(output.join("text-rendering-tests.txt"))?);

    // Iterate over each file in the directory and generate tests.
    for test in directory {
        let content = std::fs::read_to_string(test.path())?;
        for line in content.lines() {
            // file.write(format!("{:#?}\n", split(line).unwrap()).as_bytes())?;
            let val = split(line).unwrap();
            if val.is_empty() {
                continue;
            }
            let formatted_string = format!("{}\n", TestCase::parse(val));
            file.write(formatted_string.as_bytes())?;
        }
    }
} else {
    panic!()
}

Ok(())

}

</p>
</details>

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

I thought I had kept the code consistently formatted, but that appears to have not been the case. I'm going to go ahead and push a mass formatting commit to remove all the noise from the PR.

It seems that a lot of the failures may be due to missing script information. I'll address that in a review in a moment.

@dfrg dfrg marked this pull request as ready for review July 19, 2021 04:55
@dfrg dfrg marked this pull request as draft July 19, 2021 04:55
tests/shaping_impl.rs Outdated Show resolved Hide resolved
@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

Yep, I've applied those. The remaining fails appear to be legitimate bugs, some bug on my part with runs, another bug on my part with opening the font, and the aforementioned issue where it doesn't seem to "merge" certain glyphs (referenced up above). I'm not sure if the last problem is just because I'm missing something in the test harness implementation.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

Some of these are tests of ancient unused cmap formats that I'm unlikely to support. There also appear to be tests that should probably pass but are missing glyph names for some reason. A couple are showing a y-offset that doesn't match.

@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

I (painfully) manually went through the failing tests, and there appear to be only 134 legitimate ones. Some may still be errors on my part, such as off by one errors (possibly rounding), but I included them just in case.

Notably, gpos, hvar, shknda, shlana (though Harfbuzz themselves has all that disabled because of a problem they had as well), and morx.

@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

Do you want to disable shlana as well? I'm not sure what it is, but 125/209 fail.

@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

With the tests I've disabled, the ratio is now 361:148. 108 are the errors I noted as being legitimate. Pretty good! By the way, I included glyph name issues as legitimate errors as well, so the actual number of "legitimate" errors may be significantly lower, possibly ~30 less.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

If HarfBuzz isn't testing against it, then probably best to remove it for now.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

This looks great so far! Thanks so much for taking the time.

@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

Of note:

#[test]
fn morx_24_0() {
    assert_eq!(
        shape("fonts/TestMORXTwentyfour.ttf", 0, &[], "ABCDE"),
        "*"
    )
}

seems to cause an endless loop. I've disabled it for now.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

So the gvar tests are odd. Looks like it's failing to load the fonts outright, so it might be using the old Apple style true header word or something. I'm not sure what's going on in those tests, but if they're expecting to extract mark offsets from contour points, then those will definitely be ignored as I don't have any intention of loading/scaling/hinting outlines during layout.

The hvar tests are good from what I can see. The interpolation is likely to be slightly different depending on implementation of the item variation store code (floating point vs fixed point), so off-by-one errors are to be expected.

The morx tests that panic with OOB in the glyph buffer are very troubling, so I'll likely prioritize those.

tests/shaping_impl.rs Outdated Show resolved Hide resolved
tests/shaping_impl.rs Outdated Show resolved Hide resolved
@kirawi
Copy link
Contributor Author

kirawi commented Jul 19, 2021

So I think for the rounding errors, what I think we could do is use a tuple struct over the vec used in shape(), and then implement the compare trait and allow it to be equal within <1 of error. We might want to identify what is actually rounding error or a logic bug first before that though.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

Agreed with rounding errors. I'd rather leave them in for the moment to make sure they're not errors in the lookups.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

So for RTL runs, it looks like it has the proper glyph order but incorrect offsets. This is going to be tricky because HarfBuzz simply handles this differently. It does a full logical reverse of the glyph buffer and provides appropriate offsets (so marks precede bases). The swash shaper maintains logical order and provides per-cluster mark offsets with the intention that clusters will be reversed but not glyphs. There is a mapping between the two but I can't think of a trivial change to the shape function right now.

@dfrg
Copy link
Owner

dfrg commented Jul 19, 2021

Some fixes pushed to master should correct the failed font loading in the gvar tests. I believe adding .insert_dotted_circles(true) while building the shaper should also fix most of the Balinese tests. The RTL tests will require more work and may need some changes to the shaper. I'm going to have to give some more thought to this.

dfrg and others added 27 commits September 6, 2021 21:43
Co-authored-by: Chad Brokaw <cbrokaw@gmail.com>
Co-authored-by: Chad Brokaw <cbrokaw@gmail.com>
* support up to 256 features per stage
* only apply nested contextual lookup to specified index
* Return success from matching contextual subtables regardless of nested lookup results
* Push potentially intermediate ligatures back onto stack in ligature subtable
* Handle END_OF_TEXT in insertion subtable
* Correct advance count in insertion subtable
@kirawi
Copy link
Contributor Author

kirawi commented Sep 7, 2021

... I'm going to create a new PR, I have no idea what happened.

@kirawi kirawi closed this Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants