Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaner script damages EPUB #19

Open
zwettemaan opened this issue May 22, 2019 · 9 comments
Open

Cleaner script damages EPUB #19

zwettemaan opened this issue May 22, 2019 · 9 comments

Comments

@zwettemaan
Copy link
Collaborator

@LauraB7 Please provide me with the EPUB that got mangled by the Cleaner script, so I can investigate...

@LauraB7
Copy link

LauraB7 commented May 23, 2019

Sure thing, @zwettemaan. This zip file has the raw EPUB, and the post-Cleaner EPUB.
Archive.zip

@zwettemaan
Copy link
Collaborator Author

Kewl, thanks!

@zwettemaan
Copy link
Collaborator Author

Hi @LauraB7 - what am I looking at? When I compare the two they seem to be identical (as they should be)? If there is nothing wrong with the headers, the cleaner will leave the file alone.

Can you elaborate on what exactly was wrong after you ran the Cleaner script?

@LauraB7
Copy link

LauraB7 commented May 23, 2019

@zwettemaan: I messed that attachment up. My work laptop doesn't have a tonne of memory so I had already trashed the affected EPUB. I will try to recreate the problem tomorrow, if you still need it.

@zwettemaan
Copy link
Collaborator Author

zwettemaan commented May 23, 2019

Yes, please. Any sign of odd behavior needs to be investigated. Very often, things work fine on my examples and my workstation, but that means nothing: there are a lot of factors I cannot control that might make the scripts misbehave. By running it in all kinds of different environments we can try and make it all more robust.

When working with @flittle8 we've experienced first hand how seemingly innocuous things like a slightly older Mac OS X version or a slightly older Sigil version can throw major spanners in the works.

Hence, yes, please: try to re-create it.

@LauraB7
Copy link

LauraB7 commented May 23, 2019

Will do. I will post it Friday morning.

@zwettemaan
Copy link
Collaborator Author

Ha. I found some issues that might have been what you saw using some of Farrah's files. Try the latest version - the issue might be fixed...

https://github.com/BCLibCoop/nnels-a11y-publishing/tree/master/ReleaseVersions

@LauraB7
Copy link

LauraB7 commented May 24, 2019

So far as I can tell, @zwettemaan, the Cleaner script still does something to the declaration. I am attaching here the pre- and post-Cleaner EPUBs for you to have a look at.

Archive.zip

@zwettemaan
Copy link
Collaborator Author

Hi @LauraB7, I think that's 'as designed' (which is not the same as 'sensible' :-( It means I thought it might have been a good idea, but I just made that up).

Cleaner will add or reset the headers to a standard header which comes from a replacement instruction in the GREP:

https://github.com/BCLibCoop/nnels-a11y-publishing/blob/master/DropScripts/Cleaner/Cleaner.config.txt

If you don't want the enforced HTML header, you could change to the following config (untested) instead:

{
	"replacements": [
		{
		  // Strip old headers
			"from": "~(((\\s*<![^>]*>)|(\\s*<\\?[^>]*\\?>))+\\s*)~si",
			"to": ""
		},
		{
		  // Add new headers
			"from": "~\\s*([\\s\\S]*\\S)\\s*~si",
			"to": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n$1"
		}
	]
}

Essentially, the Cleaner script and the MakeBreaksConform are exactly the same script, just with a different config: they are both a sequence of find-replace operations.

By adding or removing search-and-replace patterns we can make them do more or less...

We could have a whole bunch of these scripts, targeting different issues...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants