# Dunya Beijing Opera Collection - Corpus

With this notebook, it is possible to download and analyse the entire corpus of Jingju in Dunya. The corpus is composed of 1698 recordings. The source of the metdata is Musicbrainz, which is imported into dunya and presented in a data model which represents the specific characteristics of this musical culture. With this notebook, it is possible to select all or part of this corpus and download the data .

In [1]:
import collections
import os

import compmusic
from compmusic import dunya
from compmusic.dunya import jingju

In [2]:
dunya.set_token('0d8cd9be63c10c5dc67f70e1052acec836de29bd') # set your own token

In [3]:
jingju.set_collections(['40d0978b-0796-4734-9fd4-2b3ebe0f664c'])

## Getting basic data from the collection
You can get all recordings in the collection you have set by using ``jingju.get_recordings()``

In [4]:
recs = jingju.get_recordings()
print('This collection has %s recordings' % len(recs))
first = recs[0]
print('The first recording: mbid - %s, title: %s' % (first['mbid'], first['title']))
recs[0:5]

This collection has 1698 recordings
The first recording: mbid - 4eb558fa-85a5-41cb-8ca3-816b7f6c2778, title: 目连救母 【听一言不由我喜之不尽】 （胡璇）


[{'mbid': '4eb558fa-85a5-41cb-8ca3-816b7f6c2778',
  'title': '目连救母 【听一言不由我喜之不尽】 （胡璇）'},
 {'mbid': '8dbb1a7e-1ade-41ad-acf1-4c45924304b2',
  'title': '铡美案 【包龙图打坐在开封府】 （安平）'},
 {'mbid': '58ddf00f-0a4c-4b1d-bac6-589f8e31970c',
  'title': '穆桂英挂帅 【小儿女探军情尚无音信】 （史敏）'},
 {'mbid': '3604d31b-fbd3-4040-8a6e-b426569cc8e9',
  'title': '郑板桥 【黑云翻墨滚长天】 （李军）'},
 {'mbid': 'ca119190-ed6f-442e-a9d0-d8c5a09b70ef', 'title': '《玉堂春》'}]

There are other methods available to retrieve additional information. Documentation for these methods is available at http://dunya.compmusic.upf.edu/docs/

To get detailed information about a recording, use `jingju.get_recording(mbid)`

In [5]:
rec = jingju.get_recording('4ae1d3e8-5c83-4567-965d-280aaac95ad0')
print('The release which this recording belongs to: %s' % rec['release'][0]['title'])
print('The performers of this recording:')
for perf in rec['performers']:
    print(' ', perf['name'])
print('Instruments played in this recording:')
for instrumentalist in rec['instrumentalists']:
    print(' ', instrumentalist['instrument']['name'])
rec

The release which this recording belongs to: 京剧名家名剧： 白蛇传
The performers of this recording:
  李炳淑
  焦宝宏
  沈雁西
  陆柏平
  方小亚
  张启洪
Instruments played in this recording:
  jinghu
  bangu


{'instrumentalists': [{'artist': {'alias': None,
    'mbid': 'c7e0d3f5-2379-427e-9807-fb97d7483659',
    'name': '沈雁西'},
   'instrument': {'mbid': '89e4a2ef-172f-4f50-a507-316917a9b98a',
    'name': 'jinghu'}},
  {'artist': {'alias': None,
    'mbid': '15b4ef4b-ad03-41d6-864f-66b1c436db43',
    'name': '焦宝宏'},
   'instrument': {'mbid': '42349583-c10d-4c6e-b553-28d916113856',
    'name': 'bangu'}}],
 'mbid': '4ae1d3e8-5c83-4567-965d-280aaac95ad0',
 'performers': [{'alias': 'Li Bingshu',
   'mbid': '80c8cd8b-db54-4c08-882d-8160192f1290',
   'name': '李炳淑'},
  {'alias': None,
   'mbid': '15b4ef4b-ad03-41d6-864f-66b1c436db43',
   'name': '焦宝宏'},
  {'alias': None,
   'mbid': 'c7e0d3f5-2379-427e-9807-fb97d7483659',
   'name': '沈雁西'},
  {'alias': 'Lu Baiping',
   'mbid': '12ee725e-ca6a-45d3-a4b4-e5a5969f062a',
   'name': '陆柏平'},
  {'alias': 'Fang Xiaoya',
   'mbid': '123c78b2-9444-492b-8b0f-406436a0c232',
   'name': '方小亚'},
  {'alias': 'Zhang Qihong',
   'mbid': 'f29163e5-bdac-48ff-92a5-08cf41

# Artist analysis

We have over 300 artists in the database, including both actors and instrumentalists. We can collect all of the metadata that we have for each artist

Actors have a particular _role type_ in which they perform. We can group these artists by their roletype, listing the top few artists for each roletype (ordered by the number of recordings of theirs that we have in the collection)

In [6]:
artist_list = jingju.get_artists()
artists = {}
for artist in artist_list:
    artists[artist['mbid']] = jingju.get_artist(artist['mbid'])
    
actors = [a for a in artists.values() if a['role_type']]
instrumentalists = [a for a in artists.values() if a['instrument']]

## Count  recordings by role type
A brief introduction of role type in Jingju: https://en.wikipedia.org/wiki/Peking_opera#Classification_of_performers_and_roles

Here are role types in Jingju dataset of Dunya:

| name | transliteration |
| :----: | :--------------- |
| 旦   |  dan            |
| 老旦 |  laodan         |
| 小生 | xiaosheng       |
| 老生 | laosheng        |
| 净   | jing            |
| 末   | mo              |

We can get a list of all recordings and the actors who perform them, and find all recordings by all actors who perform a given role type.

In [None]:
# Detailed information for all recordings
recordings = {}
recording_list = jingju.get_recordings()
for r in recording_list:
    recordings[r['mbid']] = jingju.get_recording(r['mbid'])

In [None]:
# Map a roletype to the number of recordings by all the actors who perform this type
rt_artist = collections.defaultdict(collections.Counter)
for a in actors:
    roletype = a['role_type']
    rt_artist[roletype['transliteration']][a['mbid']] = len(a['recordings'])

In [None]:
# Show all roletypes and the top perform actors for that type
for rt, rtas in rt_artist.items():
    print('Roletype: {}'.format(rt))
    print('  Number of artists: {}'.format(len(rtas)))
    top_artists = rtas.most_common(2)
    print('  Top artists')
    for a, numrec in top_artists:
        artist = artists[a]
        print('    {} [ {}, https://musicbrainz.org/artist/{} ] ({} recordings)'.format(artist['name'], artist['alias'], a, numrec))

We can select all of the recordings where the actor performs a given role type

In [None]:
def get_recordings_by_role_type(recordings, artists, role_type):
    rec_set = set()
    for recid, recording in recordings.items():
        for perf in recording['performers']:
            artist_mbid = perf['mbid']
            artist = artists[artist_mbid]
            if artist['role_type'] and artist['role_type']['name'] == role_type:
                rec_set.add(recid)
    return rec_set

In [None]:
xiaosheng_recs = get_recordings_by_role_type(recordings, artists, u'小生')
print('Number of recordings with roletype xiaosheng/小生: {}'.format(len(xiaosheng_recs)))

## Instrumentalists

We can see which instrumentalists have played with the largest number of distinct actors

In [None]:
instrumentalist_actor_count = collections.Counter()
for artist in instrumentalists:
    inst_actors = set()
    artist_recordings = artist['recordings']
    for rec in artist_recordings:
        recording = recordings[rec['mbid']]
        for performer in recording['performers']:
            inst_actors.add(performer['mbid'])
    instrumentalist_actor_count[artist['mbid']] = len(inst_actors)

In [None]:
print('Instrumentalists who have played with the most actors')
for instrumentalist_mbid, count in instrumentalist_actor_count.most_common(10):
    artist = artists[instrumentalist_mbid]
    print('{} [ {} https://musicbrainz.org/artist/{} ] ({} actors)'.format(artist['name'], artist['alias'], artist['mbid'], count))

# Recording analysis

Recordings are characterised by a shengquiang and banshi. We can find the most common ones in the collection of recordings that we have.

In [None]:
shb_count = collections.Counter()
for mbid, r in recordings.items():
    for s in r['shengqiangbanshi']:
        shb_count[s['name']] += 1

In [None]:
for shb, count in shb_count.most_common():
    print(  '{}: {} recording{}'.format(shb, count, '' if count == 1 else 's'))

In [None]:
def get_recordings_for_shengqiangbanshi(shengqiangbanshi):
    ret = []
    for mbid, r in recordings.items():
        for s in r['shengqiangbanshi']:
            if s['name'] == shengqiangbanshi:
                ret.append(r)
    return ret

In [None]:
xipi_duoban_recordings = get_recordings_for_shengqiangbanshi('xipi duoban')
xipi_duoban_recordings

## Download recordings by mbids

You can download recordings by using ``jingju.download_mp3(recordingid, location)``, the parameter *recordingid* is a mbid and the parameter *location* is where to save mp3s


In [None]:
os.mkdir('./xiaosheng_recordings')
for rec in xiaosheng_recs:
    jingju.download_mp3(rec, './xiaosheng_recordings')