# User's Guide, Chapter 53: Advanced Corpus and Metadata Searching

We saw in :ref:`Chapter 11<usersGuide_11_corpusSearching>` some ways to work with and search through the "core" corpus.  Not everything is in the core corpus, of course, so the `converter.parse()` function is a great way of getting files from a local hard drive or the internet.  But the "core" corpus also has many great search functions, and these can be helpful for working with your own files and files on the web as well.

In this chapter, we'll introduce the other "Corpora" in addition to the "core" corpus.  These include a "virtual" corpus of internet pieces as well as one or more "local" corpora for storing pieces that you might have but which you can't or do not wish to share on the net.  We'll start with the one that everyone has access to, the "virtual" corpus.

## The Virtual Corpus

Music21 also has the notion of a *virtual* corpus: a collection of musical
works to be found at various locations online which, for reasons of licensing,
haven't been included in the *core* corpus.  There are not too many files in there, but it is something we hope to expand.  Load the virtual corpus as follows:

In [1]:
from music21 import *

virtualCorpus = corpus.corpora.VirtualCorpus()
virtualCorpus

<music21.corpus.corpora.VirtualCorpus>

The virtual corpus can be searched just like the main corpus:

In [2]:
coltraneSearch = virtualCorpus.search('coltrane')
coltraneSearch

<music21.metadata.bundles.MetadataBundle {0 entries}>

In [3]:
coltraneSearch[0]

IndexError: list index out of range

These pieces can be parsed just like anything on your own computer, they just take a little longer and depend on having an internet connection.

In [4]:
coltraneEntry = coltraneSearch[0]
coltraneScore = coltraneEntry.parse()
coltraneScore.measures(1, 2).show()

CorpusException: Could not find an xml or mxl work that met this criterion: http://impromastering.com/uploads/transcription_file/file/196/Giant_Steps__John_Coltrane_C.xml; if you are searching for a file on disk, use "converter" instead of "corpus".

In [None]:
x = 'http://impromastering.com/uploads/transcription_file/file/196/Giant_Steps__John_Coltrane_C.xml'
virtualCorpus.getWorkList(x)

In [7]:
coltraneEntry.sourcePath

'http://impromastering.com/uploads/transcription_file/file/196/Giant_Steps__John_Coltrane_C.xml'

In [4]:
virtualCorpus.getPaths()

['http://kern.ccarh.org/cgi-bin/ksdata?l=cc/bach/cello&file=bwv1007-01.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=cc/bach/cello&file=bwv1007-01.krn&f=kern',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=osu/classical/bach/inventions&file=inven01.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=osu/classical/bach/inventions&file=inven02.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=osu/classical/bach/inventions&file=inven02.krn&f=kern',
 'http://impromastering.com/uploads/transcription_file/file/196/Giant_Steps__John_Coltrane_C.xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=cc/pachelbel&file=canon.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=cc/schubert/piano/d0576&file=d0576-06.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=users/craig/classical/schubert/piano/d0576&file=d0576-02.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=users/craig/classical/schubert/piano/d0576&file=d0576-03.krn&f=xml',
 'http://kern.ccarh.org/cgi-bin/ksdata?l=users/craig/clas

In [5]:
mdb = virtualCorpus.metadataBundle
mdb.corpusPathToKey('http://impromastering.com/uploads/transcription_file/file/196/Giant_Steps__John_Coltrane_C.xml')

'http:__impromastering_com_uploads_transcription_file_file_196_Giant_Steps__John_Coltrane_C_xml'

In [8]:
core = corpus.corpora.CoreCorpus()
pal = core.getPaths()[1000]
pal

'/Users/cuthbert/git/music21base/music21/corpus/palestrina/Credo_40.krn'

In [10]:
core.metadataBundle.corpusPathToKey(pal)

'palestrina_Credo_40_krn'

In [11]:
pal1 = corpus.search('palestrina')[0]
pal1

<music21.metadata.bundles.MetadataEntry: palestrina_Agnus_krn>

In [14]:
pal1.sourcePath

'palestrina/Agnus.krn'

In [16]:
trec = corpus.corpora.LocalCorpus('trecento')
jac1 = corpus.search('jacopo')[0]
jac1

<music21.metadata.bundles.MetadataEntry: trecento_PMFC_06-Jacopo-01-Aquila-Altera_xml>

In [19]:
jac1.corpusPath

'trecento_PMFC_06-Jacopo-01-Aquila-Altera_xml'

In [18]:
jac1.parse()

<music21.stream.Score 0x10e482080>

In [24]:
corpus.parse('coltrane')

<music21.stream.Score 0x10e482438>

## Creating multiple corpus repositories via local corpora

In addition to the default local corpus, music21 allows users to create
and save as many named local corpora as they like, which will persist from
session to session.

Let's create a new *local* corpus, give it a directory to find music files in,
and then save it:


In [None]:
from music21 import *

aNewLocalCorpus = corpus.corpora.LocalCorpus('newCorpus')
aNewLocalCorpus.existsInSettings

In [None]:
aNewLocalCorpus.addPath('~/Desktop')
#_DOCS_SHOW aNewLocalCorpus.directoryPaths
print("('/Users/josiah/Desktop',)") #_DOCS_HIDE

In [None]:
aNewLocalCorpus.save()
aNewLocalCorpus.existsInSettings

We can see that our new *local* corpus is saved by checking for the names of
all saved *local* corpora using the corpus.manager list:

In [None]:
#_DOCS_SHOW corpus.manager.listLocalCorporaNames()
print("[None, 'funk', 'newCorpus', 'bach']") #_DOCS_HIDE

Finally, we can delete the *local* corpus we previously created like this:

In [None]:
aNewLocalCorpus.delete()
aNewLocalCorpus.existsInSettings

## Inspecting metadata bundle search results

Let's take a closer look at some search results:

In [None]:
bachBundle = corpus.corpora.CoreCorpus().search('bach', 'composer')
bachBundle

In [None]:
bachBundle[0]

In [None]:
bachBundle[0].sourcePath

In [None]:
bachBundle[0].metadataPayload

In [None]:
mdpl = bachBundle[0].metadataPayload
mdpl.noteCount

In [None]:
bachAnalysis0 = bachBundle[0].parse()
bachAnalysis0.show()

## Manipulating multiple metadata bundles

Another useful feature of `music21`'s metadata bundles is that they can be
operated on as though they were sets, allowing you to union, intersect and
difference multiple metadata bundles, thereby creating more complex search
results:

In [None]:
corelliBundle = corpus.search('corelli', field='composer')
corelliBundle

In [None]:
bachBundle.union(corelliBundle)

Consult the API for class:`~music21.metadata.bundles.MetadataBundle` for a more
in depth look at how this works.

## Getting a metadata bundle

In music21, metadata is information *about* a score, such as its composer,
title, initial key signature or ambitus. A metadata *bundle* is a collection of
metadata pulled from an arbitrarily large group of different scores. Users can
search through metadata bundles to find scores with certain qualities, such as
all scores in a given corpus with a time signature of ``6/8``, or all scores
composed by Monteverdi.

There are a number of different ways to acquire a metadata bundle.  The easiest way
to get the metadataBundle for the core corpus is simply to download music21: we
include a pre-made metadataBundle (in ``corpus/metadataCache/core.json``) so
that this step is unnecessary for the core corpus unless you're contributing to
the project.  But you may want to create metadata bundles for your own local corpora.
Access the ``metadataBundle`` attribute of any ``Corpus`` instance to get its
corresponding metadata bundle:

In [None]:
coreCorpus = corpus.corpora.CoreCorpus()
coreCorpus.metadataBundle

Music21 also provides a handful of convenience methods for getting metadata
bundles associated with the *virtual*, *local* or *core* corpora:

In [None]:
coreBundle = corpus.corpora.CoreCorpus().metadataBundle
localBundle = corpus.corpora.LocalCorpus().metadataBundle
otherLocalBundle = corpus.corpora.LocalCorpus('blah').metadataBundle
virtualBundle = corpus.corpora.VirtualCorpus().metadataBundle

But really advanced users can also make metadata bundles manually, by passing in the name of the
corpus you want the bundle to refer to, or, equivalently, an actual ``Corpus`` instance
itself:

In [None]:
coreBundle = metadata.bundles.MetadataBundle('core')
coreBundle = metadata.bundles.MetadataBundle(corpus.corpora.CoreCorpus())

However, you'll need to read the bundle's saved data from disk before you can
do anything useful with the bundle. Bundles don't read their associated JSON
files automatically when they're manually instantiated.

In [None]:
coreBundle

In [None]:
coreBundle.read()

## Creating persistent metadata bundles

Metadata bundles can take a long time to create.  So it'd be nice if they could be written to and read from disk.  Unfortunately we never got around to...nah, just kidding.  Of course you can.  Just call `.write()` on one:

In [None]:
coreBundle = metadata.bundles.MetadataBundle('core')
coreBundle.read()

In [None]:
#_DOCS_SHOW coreBundle.write()

They can also be completely rebuilt, as you will want to do for local
corpora. To add information to a bundle, use the ``addFromPaths()`` method:

In [None]:
newBundle = metadata.bundles.MetadataBundle()
paths = corpus.corpora.CoreCorpus().getBachChorales()
#_DOCS_SHOW failedPaths = newBundle.addFromPaths(paths)
failedPaths = [] #_DOCS_HIDE
failedPaths

then call ``.write()`` to save to disk

In [None]:
#_DOCS_SHOW newBundle
print("<music21.metadata.bundles.MetadataBundle {402 entries}>") # did not actually run addFromPaths... #_DOCS_HIDE

You can delete, rebuild and save a metadata bundle in one go with the
``rebuild()`` method:

In [None]:
virtualBundle = corpus.corpora.VirtualCorpus().metadataBundle
#_DOCS_SHOW virtualBundle.rebuild()

The process of rebuilding will store the file as it goes (for safety) so at the end there is 
no need to call ``.write()``.

To delete a metadata bundle's cached-to-disk JSON file, use the ``delete()``
method:

In [None]:
#_DOCS_SHOW virtualBundle.delete()

Deleting a metadata bundle's JSON file won't empty the in-memory contents of
that bundle. For that, use ``clear()``:

In [None]:
virtualBundle.clear()