Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for automating process for converting files and running generateSoundMetadata.rb #27615

Merged
merged 1 commit into from Apr 1, 2019

Conversation

kimj42
Copy link
Contributor

@kimj42 kimj42 commented Mar 20, 2019

  • What changed:
    A new script: autorun_generateSoundMetadata.rb has been added.

  • Where it was added:

    • Location: code-dot-org/tools/scripts/autorun_generateSoundMetadata.rb.
  • Which parts of the site does this change affect?

    • This script is part of the process of updating the sound library manifest which affects the sound library dialog for App lab, Sprite Lab, and Game Lab. Once the script is run, the developer that uses that script will now have mp3 files and corresponding JSON on their local machine to upload to S3.
    • The script is currently set up for the developer to enter the full path to the folder AND the csv file so that multiple sound files in that folder can be converted/used by generateSoundMetadata.rb. I uploaded the mp3 and JSON manually to S3 by going to AWS and clicking upload.
    • Steps:
      • Download one folder from: such as Achievements
      • Unzip Achievements folder in your ~/Downloads
      • Run ./autorun_generateSoundMetadata.rb ~/Downloads/Achievements ./sound_metadata.csv
      • Script will create a folder called "mp3_files" which will hold all the mp3 files and corresponding JSONs
      • Go to cdo-sound-library in S3 -> click appropriate category folder (ex: category_achievements) -> upload all files under mp3_files
  • Why are we making this change?

    • The first PR in order to update the sound library manifest included the a script similar to this as well as the taglib-ruby gem but the drone test continued to fail. Brad mentioned that it is worth removing the gem and the script to pass the test and to only include the updated manifest. Thus, it was removed and the sound library was updated with all of the new changes applied to it.
    • However, for the next time that we update the sound library manifest, which isn't often, it will be helpful to have this script to have an automated process for updating the sound library. Thus one script that did the converting and the other script that autoran the generateSoundMetadata has been combined to this one script.
    • A test was written after the sound library was pushed with some corrupted filenames but the test logic is best placed in the script. It is because that allows the developer that is updating the manifest to catch corrupted filenames earlier than later after they push their PR and drone is testing whether or not there is a corrupted one which is time consuming.
  • How I tested the script:

    • A lot of the original sound filenames were corrupted with spaces in between, underscores in the beginning and/or end of the filename, uppercases, multiple underscores instead of one in between words, and typos. All of the spaces, multiple underscores, underscores before and after, and uppercases were fixed and visible on my local machine after running this test.
    • For the current filenames that are corrupted in the live site can be re-run by running this script on those files on a local machine. I am 95% confident that this script will fix corrupted filenames now.
  • What's next?

    • One file was found with spaces still and it is live on the site. Ryan noted one file is fine to drop the live sound and upload the new file with the correct filename with underscores. So I will submit a PR for that soon.
    • If the script doesn't pass the drone test again, then we will have to end up adding some comments in the generateSoundmetadata.rb regarding instructions/precautions for the next developer that updates the sound library without a script.

@kimj42 kimj42 force-pushed the soundlibrary-updating-script branch from 9e7d828 to 662bdce Compare March 20, 2019 22:56
Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Karis! This is a neat tool. I've got lots of feedback on this code, but you can consider most of it optional since this is a one-off script and we'll probably need to make some changes to it next time we use it one way or another. Thank you for the detailed PR description!

@@ -0,0 +1,123 @@
#!/usr/bin/env ruby

# FOR USE WHEN WORKIN WITH MULTIPLE SOUND FILES IN A FOLDER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# FOR USE WHEN WORKIN WITH MULTIPLE SOUND FILES IN A FOLDER
# FOR USE WHEN WORKING WITH MULTIPLE SOUND FILES IN A FOLDER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops! I'll add the 'G' in there.

end

def parse_filenames_in_csv(csv)
filename = csv.by_col[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this script assumes it will receive a CSV in a particular format (that is, particular columns in a particular order). It might be good to document the expected format in your comment at the top of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! I'll go ahead and add the comments at the top.


if name =~ /(^\_+|\_+$|_{2,}|\s|[A-Z])/
fix_corrupted_filename(name, "csv")
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like fix_corrupted_filename is a no-op anyway if this pattern isn't matched - you could probably remove the if-statement and just call it for every filename, and not need to worry about this regex getting out-of-sync with the implementation above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I'll remove the if statement and just keep the fix_corrupted_filename(name, "csv").

name << ".mp3" if type_of_file == "csv"
end

return name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits with this function:

  • In several places, the conditional is redundant.
    For example, name.downcase! if name =~ /[A-Z]/ could just be name.downcase! - the end result would be the same.
  • The function mutates the name parameter but also returns it, which is a little confusing - and later code depends on both behaviors. I'd either name the function with an exclamation point fix_corrupted_filename! (a Ruby convention for a function with potentially unexpected side-effects) and use it that way in both places, or make this a pure function and not modify the input parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I removed redundancies by removing all conditionals for that regex in the fix_corrupted_filename function since they are mutating the name parameter anyway and I'll rename the function as fix_corrupted_filename!.

end

def create_json
filename = parse_filenames_in_csv(@csv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be filenames since parse_filenames_in_csv probably returns an array of filenames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! Thanks for letting me know.

aliases_with_spaces_removed = csv.by_col[3].each do |aliase|
aliase.gsub!(/,\s+/, ',').downcase!
aliase.tr!(" ", ',')
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: I'd prefer map and non-mutating versions of these string functions.

aliases_with_spaces_removed = csv.by_col[3].map do |aliase|
  aliase.gsub(/,\s+/, ',').tr(' ', ',').downcase
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored as you suggested!

end

return aliases_with_spaces_removed.map do |aliase|
"\"#{aliase.split(/\,/).join('\', \'').gsub!(/,\s+/, ',')}\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly, you've got double-quotes on the outside and single-quotes on the inside, but no outer single-quotes... so I'm a bit confused by the format.

Trying this out in irb:

irb(main):001:0> aliase = 'air,fire,water'
=> "air,fire,water"
irb(main):002:0> "\"#{aliase.split(/\,/).join('\', \'').gsub!(/,\s+/, ',')}\""
=> "\"air','fire','water\""

Or am I making a bad assumption about the initial input? I suppose if the aliases are already single-quoted collectively you get this:

irb(main):003:0> aliase = "\'air,fire,water\'"
=> "'air,fire,water'"
irb(main):004:0> "\"#{aliase.split(/\,/).join('\', \'').gsub!(/,\s+/, ',')}\""
=> "\"'air','fire','water'\""

I also think the gsub! call here is redundant with the one a few lines above, and could be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I missed changing the 2nd double quote in "\"air' to a single quote after changing the double quotes in .join('\', \'') to single quotes when the rubocop linter indicated that it prefers single-quoted strings inside interpolations. The script continued to run and generate appropriate aliases so I carelessly missed it again!

In my attempt to change to double quote, I found out that generateSoundMetadata doesn't need the aliases to be written as "hello","world" which is why I formatted the them that way but "hello,world" is suffice to create 2 aliases "hello", "world" so I ended up removing the double quotes and just kept the aliases as "hello,world,yay".

Dir[folder_path + "/mp3_files/*"].each do |file_path|
regex = /^#{name}$/

if file_path.split('/')[-1].downcase =~ /(^\_+|\_+$|_{2,}|\s|[A-Z])/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex shows up several times in your code - maybe worth extracting to a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! Should have thought of that! I'll make it a constant.

…ta.rb for multiple sound files to update Sound Library Manifest
@kimj42 kimj42 force-pushed the soundlibrary-updating-script branch from 662bdce to 98d797e Compare March 26, 2019 20:23
Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup! Nice work Karis!

@kimj42 kimj42 merged commit 3be7add into staging Apr 1, 2019
@kimj42 kimj42 deleted the soundlibrary-updating-script branch April 1, 2019 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants