Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty in Processing "Verified Commentary" #12

Closed
eoppe1022 opened this issue May 10, 2018 · 3 comments
Closed

Difficulty in Processing "Verified Commentary" #12

eoppe1022 opened this issue May 10, 2018 · 3 comments

Comments

@eoppe1022
Copy link
Contributor

Check out the output from this:

https://genius.com/Fleet-foxes-third-of-may-odaigahara-lyrics

# A tibble: 55 x 3
   track_title               lyric                                         line
   <chr>                     <chr>                                        <int>
 1 Third of May / Ōdaigahara Light ended the night, but the song remained     1
 2 Third of May / Ōdaigahara "And I was "                                     2
 3 Third of May / Ōdaigahara hiding by the stair                              3
 4 Third of May / Ōdaigahara half here                                        4
 5 Third of May / Ōdaigahara Half there                                       5
 6 Third of May / Ōdaigahara ", past the "                                    6
 7 Third of May / Ōdaigahara lashing rain                                     7
 8 Third of May / Ōdaigahara "And as the "                                    8
 9 Third of May / Ōdaigahara " would "                                        9
10 Third of May / Ōdaigahara petal white                                     10
# ... with 45 more rows

I'm not sure if this is consistent, but I thought I'd bring it up.

@JosiahParry
Copy link
Owner

Shoot. This is a problem causes from the annotation. The annotations are in a <p> tag, not a <span> tag. This makes it genius_url() think it's a new line.

On line 8 of the tibble, sky[e] is omitted because of the bracket. I currently have the function written so that lines with brackets are omitted. This is because they tend to be things like guest vocals or instrumentation etc.

@eoppe1022
Copy link
Contributor Author

eoppe1022 commented May 10, 2018

I'm gonna do some investigating, but I'm guessing if the set of brackets is on it's own line, it should always be omitted.

There's probably some regex for this.
Also, this is less robust but you could do something like:

Omit and skip line if characters between [] > 3 or includes ?, !, or other punct.
else delete brackets and contents but maintain line structure

EDIT: I realize this probably wouldn't solve anything. I'm gonna think hard about this though

@JosiahParry
Copy link
Owner

fixed in pr #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants