-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A MrzParseException is thrown when the date fields are not parseable #15
Comments
Well, I don't think the issue is in the scope of this project. The root cause is the improper previous OCR process, e.g. digit "8" is often recognized as "B". I recommend the following procedure that works in my environment: |
The example above ("BB0915") is a particular case where indeed the caller can implement some rules as you explained (8 <=> B, ...), and call the library again with a "better" MRZ. But there will still remain cases where you can't fix the MRZ completely. In that cases you won't have any information at all because the library has stopped everything and has thrown an Exception without any result. You could just return null objects instead. Looking at the code it seems it's already the case sometimes (partial results instead of Exception), so it could be extended to all the fields and all reasons of parsing errors. Unless you want the library to be a parser of valid MRZ only. Alex. |
Indeed, an "Exception without any result" does not help uncovering the problematic part of the MRZ line. Meantime, as we are talking about error handling, I am getting closer to plan that "corrector" class. |
I will make a pull request that leaves the property to null if the date parsing fails. We could also set the day, month or year property of the date to -1, when it specifically fails on that element, and try to parse the other element of the date. That way the MrzModel could be returned with a partially parsed date. |
Good workaround. With this, we can get the most data as possible. Can I ask you to also take care of the validity flags, I mean to set all applicable ones (even overall) to false. That helps code users to know something is wrong with the MRZ lines. |
What do yo mean about "all the applicable validity flags" ? Because if the date is not parseable because of some OCR failure, the check digit calculation will fail, no ? |
Exactly, so the code should ensure that in MrzRecord, relevant booleans
"valid*" gets false.
…On 2017. szept. 16., Szo 19:38 P-A Gonnord ***@***.***> wrote:
What do yo mean about "all the applicable validity flags" ? Because if the
date is not parseable because of some OCR failure, the check digit
calculation will fail, no ?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#15 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEu5OyPcWJa11VCn_4YyFer0Ewlec_Ckks5sjAeTgaJpZM4PXrzZ>
.
|
Hi, Could it be possible to make the distinction between the check digit verification results and the fact the fields can actually be parsed? Indeed the check digit value can be the right one, but a date still unparseable (I have some examples like this where the MRZ is coming from fraudulent passports). The name of the 4 booleans in the MrzRecord class is too ambiguous in my opinion. They should be named "validDateOfBirthCheckDigit" instead of "validDateOfBirth". The MrzDate could also have a boolean coming along with the year, month and day to indicate if the string was actually valid or not. Best regards, |
@Alex-D14 |
@ZsBT Right now it just formats the date properties like an mrz date ("yymmdd"). But if the original date was not parseable, this one for example, "651502", the MrzDate.toMrz() will give the following result, "65-102", which doesn't really make sense. |
The purpose of the booleans validDateOfBirth and expiry is to quickly indicate that something is wrong with that field. Further inspection of the MrzDate object (isValidDate, and maybe a new boolean validCheckdigit?) could show what is the exact issue. |
When parsing the following MRZ, mrz-java throws an MrzParseException and stops the parsing:
"P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<<"
"9250764733GBRBB09157F2007162<<<<<<<<<<<<<<08" unparseable date of birth
"P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<<"
"9250764733GBR8809117F20HH162<<<<<<<<<<<<<<08" unparseable date of expiry
I suggest that the library continues the parsing, keeps track of the raw text that was unparseable and send back a MrzModel to the caller.
The text was updated successfully, but these errors were encountered: