Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html Sgml confusion #27

Closed
ekremucar opened this issue Mar 23, 2015 · 4 comments
Closed

Html Sgml confusion #27

ekremucar opened this issue Mar 23, 2015 · 4 comments

Comments

@ekremucar
Copy link

i have tried to match an html file mime type detected sgml
both starts with 'doctype' but html file continues with 'html'
maybe it is required to order mathchers

@aurelien-baudet
Copy link

Same issue for me. Is there a way to make it work anyway ?

@arimus
Copy link
Owner

arimus commented May 20, 2015

Currently, the following matchers already exist with a higher precedence than the sgml matcher:

<match>
    <mimetype>text/html</mimetype>
    <extension>html</extension>
    <description>HTML document text</description>
    <test offset="0" type="string" comparator="=">&lt;!DOCTYPE HTML</test>
</match>
<match>
    <mimetype>text/html</mimetype>
    <extension>html</extension>
    <description>HTML document text</description>
    <test offset="0" type="string" comparator="=">&lt;!doctype html</test>
</match>

Does your document have something other than exactly the following at position 0 in the file? Note that the default matchers are exact matches and don't ignore whitespace, etc.

@aurelien-baudet
Copy link

I found why the detection is not working. The file case is important and the file starts with:

@arimus
Copy link
Owner

arimus commented May 20, 2015

You can, just not with the string matcher. You'll need to use the regex matcher type. See the magic.xml for a couple examples. Sorry for the bad paste above. There are existing matchers for this, which are actually regex already, they just aren't using the /i flag.

<match>
    <mimetype>text/html</mimetype>
    <extension>html</extension>
    <description>HTML Document</description>
    <test offset="0" type="regex" comparator="=">/^\s*&lt;!DOCTYPE HTML PUBLIC/</test>
</match>
<match>
    <mimetype>text/html</mimetype>
    <extension>html</extension>
    <description>HTML Document</description>
    <test offset="0" type="regex" comparator="=">/^\s*&lt;html&gt;/</test>
</match>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants