Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filenames, file organization and scalability #8

Open
craigsapp opened this issue Sep 13, 2020 · 0 comments
Open

Filenames, file organization and scalability #8

craigsapp opened this issue Sep 13, 2020 · 0 comments

Comments

@craigsapp
Copy link
Collaborator

The filenames for the scores, such as the mensural score files, should include the source manuscripts.

Where to place the sources will depend on how you want to view the list of files no Github and in the File Explorer in MacOS or Windoze. If you want to have the files sorted by source, the source would go first:

Mo-in_arboris-MENSURAL.mei

Or if you want to sort by title, they could go last:

in_arboris-Mo-MENSURAL.mei

I would also recommend using dashes (-) for separating data fields in the filename, and then keep underscores (_) for the spaces in the title.

Adding the sources to the filenames will make them unique, so you do not get into artificial situations such as naming files:

in_arboris_MENSURAL.mei
in_arboris_2_MENSURAL.mei

The problem is: is the "2" part of the title (which the dash system would help disambiguate). And what is the significance of "2" (none whatsoever other than this was the second work encountered with the same title). If there were more than one file, would the first one be renamed in_arboris_1_MENSURAL.mei. If you use the source system, then disambiguation would only be required within one source, and there would be no potential name clashes between sources:

in_arboris-Mo-MENSURAL.mei
in_arboris-DUs-MENSURAL.mei

Filenames such as 305-MENSURAL.mei are also not useful, particularly for scalability. This should instead be:

Mo305-li_dous_maus-MENSURAL.mei

or

li_dous_maus-Mo305-MENSURAL.mei

If sources can have numbers after them, then it would be better to use an underscore to separate the work number in the source from the source:

Mo_305-li_dous_maus-MENSURAL.mei

Somewhat related, there is a repository for the MusicXML and Sibelius files used to prepare the MEI files. These should also have their names match any updates made to the MEI files.

It would also be more logical to keep the MEI scores in this repository rather than in the website repository. This would be similar to the JRP and Tasso projects, where the data and the websites are in separate repositories:

https://github.com/TassoInMusicProject/tasso-scores

https://github.com/josquin-research-project/jrp-scores

I do not keep any source files used to prepare the final Humdrum scores in these repositories, because (1) they are much larger and would bloat the repositories, and (2) the Humdrum score are corrected while the preparatory files are not. However, it is important to keep such files somewhere for archival purposes (I keep them in Dropbox/Box/Google Drive depending on a project's workflow and collaboration with other people on the project).

This point is less important than the other things mentioned, and keeping the MEI files in the assets folder is not a problem. This is particularly due to the fact that you probably edit the MEI files, and the other preparatory files are not updated in parallel (so they are not necessarily equivalent).

However, it is important to add archival PDF files generated from Sibelius to the mp-music-files repository. The MusicXML files from Sibelius will have bugs and/or musical features that are not exported from Sibelius. Having a PDF of the data is a backup for these limitations. In addition, the Sibelius file will work for the current version of Sibelius but will be less dependable over time (such as 20 years) with different versions of Sibelius maybe not correctly understanding old versions of its data files.


We discussed grouping files by source, which is also less important but good to decide now for scalability purposes. If you plan to have more than 1000 works in your database, I would recommend splitting out the files into separate folders/directories based on their sources (but I would also still place the source in the filenames to create a unique filename across all sources). If you do split out the files by source, you could think about placing the modern and mensural editions in the same directory (otherwise a parallel source structure for modern and mensural scores is not a problem if you want to keep it that way).


For a larger number of files, it may be useful to include the composer's name as well. The Tasso project contains multiple pieces of information in each filename:

https://github.com/TassoInMusicProject/tasso-scores

https://github.com/TassoInMusicProject/tasso-scores/tree/master/Trm/kern

Such as the filename: Trm0047m-Non_e_questa_la_mano--Giovannelli_1588.krn

(1) the Tasso catalog number, such as Trm0047a which has sub-information: (a) this is in the rime genre, this is based on poem number 47 according to Solerti, and it is the earliest known setting of this poem (the "a" give the setting, sorted alphabetically by date of publication). Trm0047m is the 13th oldest setting of the poem. Such precise dating for your era is not possible, except perhaps with carbon dating :-)

(2) The title of the setting (not necessarily of the poem, if different).

(3) The composer

(4) The year of publication

The double dash (--) is for giving a subtitle or other comments as necessary and/or also useful to distinguish between dashes in the titles.

Adding the composer may be more of a hassle in the long run if you re-assign a composer to a work, such as giving an attribution to a previous anonymous work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant