-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full implementation of standard signature syntax in container signatures #305
Conversation
- currently splits subsequences as it goes - not sure if this is the right approach. The parser should probably just represent the expression passed in as a single parse tree. The compiler should worry about how to split this up.
the results of compilation. These either return immutable objects, or make defensive copies.
* There are four PRONOM signatures which currently cannot be parsed, as they use unofficial syntax. Waiting for TNA to respond on what they want to do / support for the PRONOM language officially.
* This is used to test that the PRONOM parser is capable of reading known good and real-world signatures.
* There are four PRONOM signatures which currently cannot be parsed, as they use unofficial syntax. Waiting for TNA to respond on what they want to do / support for the PRONOM language officially.
* Some of these are failing at present as they aren't yet supported by the parser. - string literals in ranges ['a'-'z'] - arbitrary sets of bytes [2227] - the & bitwise operator [&01] - ranges use both hyphens and colons to separate them in different signatures.
* also some code cleanup.
…sigs. * Haven't validated that it is correct yet, only that it can process all of them without error.
@nishihatapalmer the demo for rebasing: My simple rebase command: worked right away without any conflict: I think it's better for maintenance because when we merge back onto master, we don't have crazy branches all other the place. you can also merge with rebase on github and have a clean history without even needing a merge commit. |
if you want, I'll let you do it and push to upstream to practice :) . You will have to force push it to override the history.
I'll start fixing the checkstyle errors for travis and add documentation, to better understand your work . |
* Use a strategy pattern to encapsulate different strategies on what kinds of elements can appear in anchor sequences. * PRONOM can only support bytes in anchors. * DROID can support anything in anchors, but it can lead to performance problems if sets or bitmasks are too big, so we limit the size. * If we can't find an anchoring sequence for DROID given the size limits, we remove all restrictions on size and look again. The advantage of this approach is that we get strict compliance for PRONOM, but have a two-fold strategy for DROID - one which will probably give better performance than the PRONOM strategy, and a fallback position for signatures for which no anchoring sequence could be found (there are three signatures which fall into this category at present - the [&01] signatures with no other bytes in them.
- currently splits subsequences as it goes - not sure if this is the right approach. The parser should probably just represent the expression passed in as a single parse tree. The compiler should worry about how to split this up.
the results of compilation. These either return immutable objects, or make defensive copies.
* There are four PRONOM signatures which currently cannot be parsed, as they use unofficial syntax. Waiting for TNA to respond on what they want to do / support for the PRONOM language officially.
* This is used to test that the PRONOM parser is capable of reading known good and real-world signatures.
* There are four PRONOM signatures which currently cannot be parsed, as they use unofficial syntax. Waiting for TNA to respond on what they want to do / support for the PRONOM language officially.
* Some of these are failing at present as they aren't yet supported by the parser. - string literals in ranges ['a'-'z'] - arbitrary sets of bytes [2227] - the & bitwise operator [&01] - ranges use both hyphens and colons to separate them in different signatures.
This doesn't touch on the differences between binary and container syntax. DROID itself doesn't care, but PRONOM won't understand container syntax. We should probably link to a description of the syntax somewhere, and the fact that you can use all of the syntax in either binary or container signatures if you like (but PRONOM won't be able to parse them for binary signatures if you want to submit them to TNA). Of course, nothing stops TNA using sigtool to rewrite a container signatures syntax in binary compatible format. |
Aaaaallright sorry @nishihatapalmer for the confusion, some things are indeed a bit clearer, and I better understood your previous comment. Thanks for taking the time to explain!
I mentioned in README.md "PRONOM Syntax also provides details on the regular expression syntax supported by DROID.", which I am going to improve with a relative link on github to the .md file. Isn't it sufficient?
what about adding the following paragraph, after what you offered to put in The full syntax can be used in either binary or container signatures. However, if you use the new syntax, to submit them to TNA and get those signatures included in PRONOM registry, you will need to compile those signatures for PRONOM using Sigtool. |
I'm going to be pedantic here, but it's kind of important. Only binary signatures which are to be submitted to TNA need to be in binary format. Container signatures submitted to TNA can - and should - use the full syntax, since PRONOM doesn't compile those, and it allows things which binary signatures don't support. |
And if you don't intend to submit local binary signatures to TNA, you can use the full container syntax in the Sequence attribute of a binary signature ByteSequence. |
@nishihatapalmer could you please refine the README? I won't be as clear as you can be, I'd rather let you do it if you have some time, and that will save us from going back and forth here. |
Sure, but may be a few days before I'll have time.
…On Wed, 27 Nov 2019, 16:16 Jeremie Charlet, ***@***.***> wrote:
@nishihatapalmer <https://github.com/nishihatapalmer> could you please
refine the README? I won't be as clear as you can be, I'd rather let you do
it if you have some time, and that will save us from going back and forth
here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#305?email_source=notifications&email_token=ABBY4JHT7JCGKKO3KLXVTT3QV2MO7A5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFKAORQ#issuecomment-559155014>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBY4JCXJNJBHMR26EETXLLQV2MO7ANCNFSM4JKE6MBA>
.
|
* explanation of what kinds of signature exist. * explanation of sigtool capabilities * explanation of simpler XML format.
OK, I've posted some updates to the README.md file in my droidsyntaxparser
branch.
I'm tempted to simplify it a bit - remove the "Types" and "Syntax" sections
(or considerably shorten them, or move them into another markdown file).
The real new capabilities are "sigtool" and "Simpler signature XML".
I kind of feel that putting in a quick education on what kinds of
signatures exist in the middle of a high level overview of what's in the
DROID repo is the wrong place for this - but telling users about sigtool
and simpler signature XML is OK, as they're big and important changes.
…On Wed, 27 Nov 2019 at 21:02, Matt Palmer ***@***.***> wrote:
Sure, but may be a few days before I'll have time.
On Wed, 27 Nov 2019, 16:16 Jeremie Charlet, ***@***.***>
wrote:
> @nishihatapalmer <https://github.com/nishihatapalmer> could you please
> refine the README? I won't be as clear as you can be, I'd rather let you do
> it if you have some time, and that will save us from going back and forth
> here.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#305?email_source=notifications&email_token=ABBY4JHT7JCGKKO3KLXVTT3QV2MO7A5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFKAORQ#issuecomment-559155014>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABBY4JCXJNJBHMR26EETXLLQV2MO7ANCNFSM4JKE6MBA>
> .
>
|
thanks for those changes! what about
|
Yup, those sound perfect.
…On Thu, 28 Nov 2019, 14:53 Jeremie Charlet, ***@***.***> wrote:
thanks for those changes!
what about
- renaming PRONOM Syntax to signature syntax.
- moving types and syntax section in that signature syntax readme.
- add a paragraph below Since version 6.5, DROID adds some new
capabilities to support developing and testing signatures. :
Signature Syntax provides details on the types of signatures and regular
expression syntax supported by DROID.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#305?email_source=notifications&email_token=ABBY4JBJO6FSFBY3UE7PV3DQV7LPJA5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM2J2I#issuecomment-559523049>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBY4JCUSYJPP2UQXZVSLNLQV7LPJANCNFSM4JKE6MBA>
.
|
As long as we retain a link to the signature syntax MD from the readme
…On Thu, 28 Nov 2019, 14:59 Matt Palmer, ***@***.***> wrote:
Yup, those sound perfect.
On Thu, 28 Nov 2019, 14:53 Jeremie Charlet, ***@***.***>
wrote:
> thanks for those changes!
>
> what about
>
> - renaming PRONOM Syntax to signature syntax.
> - moving types and syntax section in that signature syntax readme.
> - add a paragraph below Since version 6.5, DROID adds some new
> capabilities to support developing and testing signatures. :
>
> Signature Syntax provides details on the types of signatures and regular
> expression syntax supported by DROID.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#305?email_source=notifications&email_token=ABBY4JBJO6FSFBY3UE7PV3DQV7LPJA5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM2J2I#issuecomment-559523049>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABBY4JCUSYJPP2UQXZVSLNLQV7LPJANCNFSM4JKE6MBA>
> .
>
|
I'm not sure the current link is actually working come to think of it...
…On Thu, 28 Nov 2019, 15:03 Matt Palmer, ***@***.***> wrote:
As long as we retain a link to the signature syntax MD from the readme
On Thu, 28 Nov 2019, 14:59 Matt Palmer, ***@***.***> wrote:
> Yup, those sound perfect.
>
> On Thu, 28 Nov 2019, 14:53 Jeremie Charlet, ***@***.***>
> wrote:
>
>> thanks for those changes!
>>
>> what about
>>
>> - renaming PRONOM Syntax to signature syntax.
>> - moving types and syntax section in that signature syntax readme.
>> - add a paragraph below Since version 6.5, DROID adds some new
>> capabilities to support developing and testing signatures. :
>>
>> Signature Syntax provides details on the types of signatures and regular
>> expression syntax supported by DROID.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#305?email_source=notifications&email_token=ABBY4JBJO6FSFBY3UE7PV3DQV7LPJA5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM2J2I#issuecomment-559523049>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/ABBY4JCUSYJPP2UQXZVSLNLQV7LPJANCNFSM4JKE6MBA>
>> .
>>
>
|
fixed https://github.com/digital-preservation/droid/blob/droidsyntaxparser/README.md
|
…section in README to signature syntax doc
done! |
Looks great
By the way, why does the standard non jre version of DROID have a Unix
suffix on the filename? It should be completely platform independent.
…On Thu, 28 Nov 2019, 15:21 Jeremie Charlet, ***@***.***> wrote:
done!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#305?email_source=notifications&email_token=ABBY4JEQ6473OAYMOSOXESTQV7OXDA5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM4QIQ#issuecomment-559532066>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABBY4JHEAC2VLIZCEUDJOGTQV7OXDANCNFSM4JKE6MBA>
.
|
One more comment, the signature syntax file now talks about the sigtool
"below". It's not below now as that's still in the original readme. Maybe
change to a link back to the readme or the using sigtool documentation?
…On Thu, 28 Nov 2019, 15:48 Matt Palmer, ***@***.***> wrote:
Looks great
By the way, why does the standard non jre version of DROID have a Unix
suffix on the filename? It should be completely platform independent.
On Thu, 28 Nov 2019, 15:21 Jeremie Charlet, ***@***.***>
wrote:
> done!
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#305?email_source=notifications&email_token=ABBY4JEQ6473OAYMOSOXESTQV7OXDA5CNFSM4JKE6MBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFM4QIQ#issuecomment-559532066>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABBY4JHEAC2VLIZCEUDJOGTQV7OXDANCNFSM4JKE6MBA>
> .
>
|
mmh it used to because we had a windows version with jre and a unix version without jre. maybe I missed something, where did you see that? |
sorry for that.. fixed |
@nishihatapalmer
solves #237