Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding preserve_empty_lines option #103

Merged
merged 6 commits into from Mar 28, 2022

Conversation

jncasey
Copy link
Collaborator

@jncasey jncasey commented Dec 22, 2021

Adding the feature I requested in #95.

Not stripping out the empty lines was causing problems with the festival backend and preserve_punctuation, so I took the approach of stripping out the empty lines pre-phonemization and then reinserting them afterward.

I want to flag that I changed the conversion of the input text to a list from a generator to list comprehension here since I needed to run through the list a second time to preserve the empty lines, and I thought this made the code easier to read than making another generator. I'm assuming that this won't make a big difference on performance given how I think phonemizer is generally used.

@jncasey
Copy link
Collaborator Author

jncasey commented Dec 29, 2021

This is going to need some more work.

I guess all of my testing was with njobs=1. With parallel jobs, it seems the flattener doesn't the empty lines (at least with espeak). I'll look into fixing it.

@jncasey
Copy link
Collaborator Author

jncasey commented Dec 29, 2021

It turns out this was an existing bug in the library. With njobs>1 and an empty input (or one that gets preprocessed to empty), _flatten in the espeak backend would throw an IndexError.

Even if you decide not to merge this new feature, it might still be a good idea to add this fix (or equivalent)

…pty_lines_v2

# Conflicts:
#	phonemizer/phonemize.py
@codecov
Copy link

codecov bot commented Mar 28, 2022

Codecov Report

Merging #103 (1447a85) into master (8cbdb5a) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #103   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           23        23           
  Lines         1139      1152   +13     
=========================================
+ Hits          1139      1152   +13     
Impacted Files Coverage Δ
phonemizer/__init__.py 100.00% <100.00%> (ø)
phonemizer/backend/espeak/espeak.py 100.00% <100.00%> (ø)
phonemizer/main.py 100.00% <100.00%> (ø)
phonemizer/phonemize.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8cbdb5a...1447a85. Read the comment docs.

@hadware
Copy link
Collaborator

hadware commented Mar 28, 2022

Alright, this is almost ready for merge. Could you try and add a test or two in order to make codecov happy again, and then we're go :)

PS: don't forget to pull, i've merged the master onto your branch to fix some conflicts.

@jncasey
Copy link
Collaborator Author

jncasey commented Mar 28, 2022

Okay, great! I'll have a look at this soon.

@jncasey
Copy link
Collaborator Author

jncasey commented Mar 28, 2022

These extra tests should hit those missed lines.

[Sorry, I don't have access to the festival backend on my local computer at the moment, but I can't imagine any reason why they wouldn't pass.]

@hadware
Copy link
Collaborator

hadware commented Mar 28, 2022

Alright, we're go!

I'll merge that and (if I have the right permissions) upload it to pipy tomorrow! Great work, thanks for that @jncasey !

@hadware hadware merged commit 9572e63 into bootphon:master Mar 28, 2022
@hadware
Copy link
Collaborator

hadware commented Mar 28, 2022

BTW: you might want to close the issues related to this PR.

@jncasey jncasey deleted the preserve_empty_lines_v2 branch March 28, 2022 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants