# The Jingju Music Corpus

With this notebook, it is possible to download and analyse the entire Jingju Music Corpus in Dunya. The corpus is composed of 1698 recordings. The source of the metdata is Musicbrainz, which is imported into Dunya and stored in a data model which represents the specific characteristics of this musical culture. With this notebook, it is possible to select all or part of this corpus and download the data.

To be able to download sounds from Dunya, you need to have a user and obtain an API authentication key (token). Please create a user: https://dunya.compmusic.upf.edu/social/register/ In order to get your API token you have to log in to dunya and then go to your profile where you will find your token.

In [1]:
import collections
import os

import compmusic
from compmusic import dunya
from compmusic.dunya import jingju

In [2]:
# Set your token here from https://dunya.compmusic.upf.edu/user/profile/
dunya.set_token('')

In [3]:
jingju.set_collections(['40d0978b-0796-4734-9fd4-2b3ebe0f664c'])

## Getting basic data from the collection
You can get all recordings in the collection you have set by using ``jingju.get_recordings()``

In [4]:
recs = jingju.get_recordings()
print('This collection has %s recordings' % len(recs))
first = recs[0]
print('The first recording: mbid - %s, title: %s' % (first['mbid'], first['title']))
recs[0:5]

This collection has 1698 recordings
The first recording: mbid - 4eb558fa-85a5-41cb-8ca3-816b7f6c2778, title: 目连救母 【听一言不由我喜之不尽】 （胡璇）


[{'mbid': '4eb558fa-85a5-41cb-8ca3-816b7f6c2778',
  'title': '目连救母 【听一言不由我喜之不尽】 （胡璇）'},
 {'mbid': '8dbb1a7e-1ade-41ad-acf1-4c45924304b2',
  'title': '铡美案 【包龙图打坐在开封府】 （安平）'},
 {'mbid': '58ddf00f-0a4c-4b1d-bac6-589f8e31970c',
  'title': '穆桂英挂帅 【小儿女探军情尚无音信】 （史敏）'},
 {'mbid': '3604d31b-fbd3-4040-8a6e-b426569cc8e9',
  'title': '郑板桥 【黑云翻墨滚长天】 （李军）'},
 {'mbid': 'ca119190-ed6f-442e-a9d0-d8c5a09b70ef', 'title': '《玉堂春》'}]

There are other methods available to retrieve additional information. Documentation for these methods is available at http://dunya.compmusic.upf.edu/docs/

To get detailed information about a recording, use `jingju.get_recording(mbid)`

In [5]:
rec = jingju.get_recording('4ae1d3e8-5c83-4567-965d-280aaac95ad0')
print('The release which this recording belongs to: %s' % rec['release'][0]['title'])
print('The performers of this recording:')
for perf in rec['performers']:
    print('  {} [{}]'.format(perf['name'], perf['alias']))
print('Instruments played in this recording:')
for instrumentalist in rec['instrumentalists']:
    print(' ', instrumentalist['instrument']['name'])
rec

The release which this recording belongs to: 京剧名家名剧： 白蛇传
The performers of this recording:
  李炳淑 [Li Bingshu]
  焦宝宏 [None]
  沈雁西 [None]
  陆柏平 [Lu Baiping]
  方小亚 [Fang Xiaoya]
  张启洪 [Zhang Qihong]
Instruments played in this recording:
  jinghu
  bangu


{'instrumentalists': [{'artist': {'alias': None,
    'mbid': 'c7e0d3f5-2379-427e-9807-fb97d7483659',
    'name': '沈雁西'},
   'instrument': {'mbid': '89e4a2ef-172f-4f50-a507-316917a9b98a',
    'name': 'jinghu'}},
  {'artist': {'alias': None,
    'mbid': '15b4ef4b-ad03-41d6-864f-66b1c436db43',
    'name': '焦宝宏'},
   'instrument': {'mbid': '42349583-c10d-4c6e-b553-28d916113856',
    'name': 'bangu'}}],
 'mbid': '4ae1d3e8-5c83-4567-965d-280aaac95ad0',
 'performers': [{'alias': 'Li Bingshu',
   'mbid': '80c8cd8b-db54-4c08-882d-8160192f1290',
   'name': '李炳淑'},
  {'alias': None,
   'mbid': '15b4ef4b-ad03-41d6-864f-66b1c436db43',
   'name': '焦宝宏'},
  {'alias': None,
   'mbid': 'c7e0d3f5-2379-427e-9807-fb97d7483659',
   'name': '沈雁西'},
  {'alias': 'Lu Baiping',
   'mbid': '12ee725e-ca6a-45d3-a4b4-e5a5969f062a',
   'name': '陆柏平'},
  {'alias': 'Fang Xiaoya',
   'mbid': '123c78b2-9444-492b-8b0f-406436a0c232',
   'name': '方小亚'},
  {'alias': 'Zhang Qihong',
   'mbid': 'f29163e5-bdac-48ff-92a5-08cf41

# Artist analysis

We have over 300 artists in the database, including both actors and instrumentalists. We can collect all of the metadata that we have for each artist

In [6]:
artist_list = jingju.get_artists(artist_detail=True)
artists = {}
for artist in artist_list:
    artists[artist['mbid']] = artist
    
actors = [a for a in artists.values() if a['role_type']]
instrumentalists = [a for a in artists.values() if a['instrument']]

## Count  recordings by role type
A brief introduction of role type in Jingju: https://en.wikipedia.org/wiki/Peking_opera#Classification_of_performers_and_roles

Actors and actresses in jingju specialize in the performance of a role type. These acting categories determine the set of conventions each artist should master, including those related to singing. Different role types emphasize different performance skills. Those who focus on singing are 老生 laosheng, 旦 dan, 净 jing, 小生 xiaosheng, and 老旦 laodan.

We can get a list of all recordings and the actors who perform them, and find all recordings by all actors who perform a given role type.

In [7]:
# Detailed information for all recordings
recordings = {}
recording_list = jingju.get_recordings(recording_detail=True)
for r in recording_list:
    recordings[r['mbid']] = r

We can group these artists by their roletype, listing the top few artists for each role type (ordered by the number of recordings of theirs that we have in the collection)

In [8]:
# Map a roletype to the number of recordings by all the actors who perform this type
rt_artist = collections.defaultdict(collections.Counter)
for a in actors:
    roletype = a['role_type']
    rt_artist[roletype['transliteration']][a['mbid']] = len(a['recordings'])

In [9]:
# Show all roletypes and the top performing actors for that type
for rt, rtas in rt_artist.items():
    print('Roletype: {}'.format(rt))
    print('  Number of artists: {}'.format(len(rtas)))
    top_artists = rtas.most_common(2)
    print('  Top artists')
    for a, numrec in top_artists:
        artist = artists[a]
        print('    {} [ {}, https://musicbrainz.org/artist/{} ] ({} recordings)'.format(artist['name'], artist['alias'], a, numrec))

Roletype: jing
  Number of artists: 8
  Top artists
    孟广禄 [ Meng Guanglu, https://musicbrainz.org/artist/3583396f-8a8c-4dd7-8deb-da79cf268c11 ] (64 recordings)
    安平 [ An Ping, https://musicbrainz.org/artist/24c43dbd-ef9b-47cd-bb6a-7118f7a78c7a ] (20 recordings)
Roletype: chou
  Number of artists: 4
  Top artists
    徐孟珂 [ Xu Mengke, https://musicbrainz.org/artist/c48b1b38-7555-40d1-b5ca-01a5e4edd834 ] (4 recordings)
    严庆谷 [ Yan Qinggu, https://musicbrainz.org/artist/a963bdbd-dfa0-493c-a5a5-31b4c2b690f4 ] (4 recordings)
Roletype: laosheng
  Number of artists: 25
  Top artists
    于魁智 [ Yu Kuizhi, https://musicbrainz.org/artist/ee527a26-dfce-48e3-851d-fb6c2d223c9a ] (93 recordings)
    朱宝光 [ Zhu Baoguang, https://musicbrainz.org/artist/dcb60e61-a1a6-4524-8b7a-2df4b0521e22 ] (82 recordings)
Roletype: laodan
  Number of artists: 11
  Top artists
    赵葆秀 [ Zhao Baoxiu, https://musicbrainz.org/artist/c192c1ad-fe47-4ae3-b88f-3aeb1bd72cc4 ] (147 recordings)
    吕昕 [ Lü Xin, https://music

We can select all of the recordings where the actor performs a given role type

In [10]:
def get_recordings_by_role_type(recordings, artists, role_type):
    rec_set = set()
    for recid, recording in recordings.items():
        for perf in recording['performers']:
            artist_mbid = perf['mbid']
            artist = artists[artist_mbid]
            if artist['role_type'] and artist['role_type']['name'] == role_type:
                rec_set.add(recid)
    return rec_set

In [11]:
xiaosheng_recs = get_recordings_by_role_type(recordings, artists, u'小生')
print('Number of recordings with roletype xiaosheng/小生: {}'.format(len(xiaosheng_recs)))

Number of recordings with roletype xiaosheng/小生: 71


## Instrumentalists

We can see which instrumentalists have played with the largest number of distinct actors

In [12]:
instrumentalist_actor_count = collections.Counter()
for artist in instrumentalists:
    inst_actors = set()
    artist_recordings = artist['recordings']
    for rec in artist_recordings:
        recording = recordings[rec['mbid']]
        for performer in recording['performers']:
            inst_actors.add(performer['mbid'])
    instrumentalist_actor_count[artist['mbid']] = len(inst_actors)

In [13]:
print('Instrumentalists who have played with the most actors')
for instrumentalist_mbid, count in instrumentalist_actor_count.most_common(10):
    artist = artists[instrumentalist_mbid]
    print('{} [ {} https://musicbrainz.org/artist/{} ] ({} actors)'.format(artist['name'], artist['alias'], artist['mbid'], count))

Instrumentalists who have played with the most actors
胡希芳 [ Hu Xifang https://musicbrainz.org/artist/3b3aa74a-190d-46d5-af76-9230bbe9db3b ] (27 actors)
赵旭 [ Zhao Xu https://musicbrainz.org/artist/19c34660-fe86-4a80-8d1f-4c8cd5dae46c ] (26 actors)
尚东辉 [ Shang Donghui https://musicbrainz.org/artist/9962b7a5-796b-47b1-baf8-8bf7e17e0161 ] (25 actors)
赵琪 [ Zhao Qi https://musicbrainz.org/artist/45ca544b-360e-4f1f-96f9-d8a623bb2247 ] (25 actors)
叶铁森 [ Ye Tiesen https://musicbrainz.org/artist/26ac0b37-7004-4c3d-987b-c85b060eabb3 ] (25 actors)
霍建华 [ Huo Jianhua https://musicbrainz.org/artist/96269b3a-48bf-4533-a562-4499d6c9571a ] (25 actors)
崔玉坤 [ Cui Yukun https://musicbrainz.org/artist/678d3efd-b42c-4efe-a58e-af87c5dde89b ] (25 actors)
李金平 [ Li Jinping https://musicbrainz.org/artist/26e0046b-b82c-4470-9a02-c1f05406dcd8 ] (24 actors)
谢温之 [ Xie Wenzhi https://musicbrainz.org/artist/cda852a7-6776-403a-bf00-4607ad6f93f7 ] (23 actors)
张顺翔 [ Zhang Shunxiang https://musicbrainz.org/artist/5b19e243-

# Obtaining information about _shengqiang_ and _banshi_

Music in traditional jingju is arranged according to a series of orally transmitted creative principles. Among them, 声腔 shengqiang mainly accounts for the melodic material, and 板式 banshi mainly accounts for the metre and tempo in which this material is rendered. Recordings in the corpus are tagged with the corresponding information about shengqiang and banshi. Each tag is formed by a combination of a particular shengqiang and banshi. Since it is common that one aria is arranged to more than one banshi, most recordings have more than one shengqiangbanshi tag.

In [14]:
shb_count = collections.Counter()
for mbid, r in recordings.items():
    for s in r['shengqiangbanshi']:
        shb_count[s['name']] += 1

In [15]:
for shb, count in shb_count.most_common():
    print(  '{}: {} recording{}'.format(shb, count, '' if count == 1 else 's'))

二黄原板: 310 recordings
erhuang yuanban: 306 recordings
xipi liushui: 258 recordings
西皮流水: 258 recordings
xipi yuanban: 238 recordings
西皮原板: 238 recordings
西皮导板: 237 recordings
xipi daoban: 234 recordings
西皮散板: 198 recordings
xipi sanban: 197 recordings
西皮摇板: 187 recordings
xipi yaoban: 185 recordings
西皮二六: 182 recordings
xipi er'liu: 177 recordings
二黄散板: 176 recordings
erhuang sanban: 174 recordings
二黄慢板: 172 recordings
erhuang manban: 170 recordings
西皮快板: 147 recordings
xipi kuaiban: 147 recordings
二黄导板: 130 recordings
erhuang daoban: 129 recordings
西皮慢板: 108 recordings
xipi manban: 105 recordings
二黄回龙: 98 recordings
erhuang huilong: 96 recordings
fan'erhuang manban: 82 recordings
反二黄慢板: 82 recordings
反二黄原板: 79 recordings
fan'erhuang yuanban: 79 recordings
sipingdiao: 57 recordings
四平调: 57 recordings
二黄快三眼: 52 recordings
erhuang kuaisanyan: 52 recordings
nanbangzi: 50 recordings
西皮三眼: 49 recordings
南梆子: 49 recordings
xipi sanyan: 49 recordings
fan'erhuang sanban: 48 recordings
反二黄散板: 48

In [16]:
def get_recordings_for_shengqiangbanshi(shengqiangbanshi):
    ret = []
    for mbid, r in recordings.items():
        for s in r['shengqiangbanshi']:
            if s['name'] == shengqiangbanshi:
                ret.append(r)
    return ret

In [18]:
xipi_duoban_recordings = get_recordings_for_shengqiangbanshi('xipi duoban')
for recording in xipi_duoban_recordings:
    print(recording['title'])
    for perf in recording['performers']:
        print('  {} [{}]'.format(perf['name'], perf['alias']))


金龟记 雪冤 7. 慢搬移，禀大人
  赵葆秀 [Zhao Baoxiu]
  张达谋 [None]
  王福隆 [None]
  金惠武 [None]
《西厢记》 （凄凉萧寺春将晚）
  阮宝利 [Ruan Baoli]
辕门射戟
  叶少兰 [Ye Shaolan]
【东郭先生】 见此狼
  刘勉宗 [Liu Mianzong]
洪母骂畴：大胆狂徒歹心肠
  郭瑶瑶 [Guo Yaoyao]
《诗文会》： 听兄言不由我花容惊变
  王蓉蓉 [Wang Rongrong]
  赵莹 [Zhao Ying]
  赵旭 [Zhao Xu]
  陈敏泽 [None]
  杨广同 [Yang Guangtong]
  王松涛 [Wang Songtao]
  王军 [Wang Jun]
  陈浩 [Chen Hao]
  于海琴 [Yu Haiqin]
  李佳力 [Li Jiali]
  郑煜 [Zheng Yu]
金龟记：慢搬移，禀大人
  赵葆秀 [Zhao Baoxiu]
捌．范进中举
  张建国 [Zhang Jianguo]
  柳东 [Liu Dong]
  陈熙凯 [Chen Xikai]
  汤振刚 [Tang Zhen'gang]
  李金平 [Li Jinping]
  李世英 [Li Shiying]
四郎探母
  赵葆秀 [Zhao Baoxiu]
对花枪 1. “我的家祖居南阳地” 反二黄慢板
  康静 [Kang Jing]
  封千 [Feng Qian]
  姚利 [None]
金龟记 雪冤 7. 慢搬移，禀大人
  赵葆秀 [Zhao Baoxiu]
  张达谋 [None]
  王福隆 [None]
  金惠武 [None]
《铡美案》 选段 （包龙图打坐在开封府）
  孟广禄 [Meng Guanglu]
范进中举（西皮导板、回龙、原板）
  张建国 [Zhang Jianguo]
  柳东 [Liu Dong]
  陈熙凯 [Chen Xikai]
  汤振刚 [Tang Zhen'gang]
  李金平 [Li Jinping]
  李世英 [Li Shiying]
赵氏孤儿
  薛亚萍 [Xue Yaping]
平原作战 【西皮导板】 青纱帐举红缨一望无际
  李维康 [Li Weikang]
金龟记 雪冤 7. 慢搬移，禀大

## Download recordings by mbids

You can download recordings by using ``jingju.download_mp3(recordingid, location)``, the parameter *recordingid* is a mbid and the parameter *location* is where to save mp3s


In [None]:
os.mkdir('./xiaosheng_recordings')
for rec in xiaosheng_recs:
    jingju.download_mp3(rec, './xiaosheng_recordings')