-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding preserve_empty_lines option #103
Conversation
This is going to need some more work. I guess all of my testing was with njobs=1. With parallel jobs, it seems the flattener doesn't the empty lines (at least with espeak). I'll look into fixing it. |
It turns out this was an existing bug in the library. With njobs>1 and an empty input (or one that gets preprocessed to empty), Even if you decide not to merge this new feature, it might still be a good idea to add this fix (or equivalent) |
…pty_lines_v2 # Conflicts: # phonemizer/phonemize.py
Codecov Report
@@ Coverage Diff @@
## master #103 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 23 23
Lines 1139 1152 +13
=========================================
+ Hits 1139 1152 +13
Continue to review full report at Codecov.
|
Alright, this is almost ready for merge. Could you try and add a test or two in order to make codecov happy again, and then we're go :) PS: don't forget to pull, i've merged the master onto your branch to fix some conflicts. |
Okay, great! I'll have a look at this soon. |
These extra tests should hit those missed lines. [Sorry, I don't have access to the festival backend on my local computer at the moment, but I can't imagine any reason why they wouldn't pass.] |
Alright, we're go! I'll merge that and (if I have the right permissions) upload it to pipy tomorrow! Great work, thanks for that @jncasey ! |
BTW: you might want to close the issues related to this PR. |
Adding the feature I requested in #95.
Not stripping out the empty lines was causing problems with the festival backend and preserve_punctuation, so I took the approach of stripping out the empty lines pre-phonemization and then reinserting them afterward.
I want to flag that I changed the conversion of the input text to a list from a generator to list comprehension here since I needed to run through the list a second time to preserve the empty lines, and I thought this made the code easier to read than making another generator. I'm assuming that this won't make a big difference on performance given how I think phonemizer is generally used.