In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Crawling xml songs database:

We will start from a set of xml song files, that explicitly describe harmony, beat, melody, tempo and other musical features. After that, we use them in order to create a database with audio content, converting xml files to midi. It means that, initially, we are going to work with synthetically generated data, due to its facility to synchronize audio and chord labels. 

#### Jazz repo:

In [59]:
url_base = 'https://effendi.me/jazz/repo/'
iters = ['I/', 'II/', 'III/', 'IV_Part-1/', 'IV_Part-2/', 'V/']
url_list = []
for i in iters:
    url = url_base + i
    site = requests.get(url)
    soup = BeautifulSoup(site.text, 'lxml')
#     print(url, len(soup.find_all('a')))
    songs = soup.find_all('a')
    for link in songs:
        link = link.get('href')
#         print(links)
        if link.startswith('?'):
            pass
        else:
            url_list.append(url+link)

In [85]:
del_links = ['https://effendi.me/jazz/repo/I//jazz/repo/', 
             'https://effendi.me/jazz/repo/II//jazz/repo/', 
             'https://effendi.me/jazz/repo/III//jazz/repo/'
             'https://effendi.me/jazz/repo/IV_Part-1//jazz/repo/', 
             'https://effendi.me/jazz/repo/IV_Part-2//jazz/repo/', 
             'https://effendi.me/jazz/repo/V//jazz/repo/']

url_list = [l for l in url_list if l not in del_links]
url_list

['https://effendi.me/jazz/repo/I/630blues.xml',
 'https://effendi.me/jazz/repo/I/728.xml',
 'https://effendi.me/jazz/repo/I/1974%20Blues.xml',
 'https://effendi.me/jazz/repo/I/AllBlues%201.xml',
 'https://effendi.me/jazz/repo/I/Ambidextrous.xml',
 'https://effendi.me/jazz/repo/I/Amsterdam%20After%20DarkII.5.xml',
 'https://effendi.me/jazz/repo/I/A%20smooth%20One.xml',
 'https://effendi.me/jazz/repo/I/Ballad.xml',
 'https://effendi.me/jazz/repo/I/Blues%20For%20Junior.Pyramid.xml',
 'https://effendi.me/jazz/repo/I/Blues%20after%20Dark%20II.xml',
 'https://effendi.me/jazz/repo/I/Blues%20for%20PK.xml',
 'https://effendi.me/jazz/repo/I/Bluesinthebasement.xml',
 'https://effendi.me/jazz/repo/I/Bold%20and%20Black.xml',
 'https://effendi.me/jazz/repo/I/Boogie%20Stop%20Shuffle.xml',
 'https://effendi.me/jazz/repo/I/Boogie%20Woogie%20Bossa%20Nova.xml',
 'https://effendi.me/jazz/repo/I/Cantaloupe%20Island.xml',
 'https://effendi.me/jazz/repo/I/Chicago%20Seranade.xml',
 'https://effendi.me/jazz/re

Saving list of links in a txt file:

In [61]:
with open('corpus/xml_songs.txt', 'w') as f:
    for item in url_list:
        f.write("%s\n" % item)

## Music 21
Testing music21 to generate midi files:

In [63]:
import music21

In [87]:
# creating midi songs corpus
for url in url_list:
    song = music21.converter.parse(url)
    title = url.split('/')[-1].split('.xml')[0]
    song.write('midi', 'corpus/midi_music21/' + title + '.mid')
    print(title)

728
1974%20Blues
AllBlues%201
Ambidextrous
Amsterdam%20After%20DarkII.5
A%20smooth%20One
Ballad
Blues%20For%20Junior.Pyramid
Blues%20after%20Dark%20II
Blues%20for%20PK
Bluesinthebasement
Bold%20and%20Black
Boogie%20Stop%20Shuffle
Boogie%20Woogie%20Bossa%20Nova
Cantaloupe%20Island
Chicago%20Seranade
Cissy%20Strut%20III
Cissy%20Strut%20IV
Clockwise
Cold%20Duck%20B%20with%20bass
Cold%20duck%20Lead
Crisis
CrisisC%23
Cryin%20Blues
Dangerous%20Curves%20A
Dangerous%20Curves%20B
Greenonions
Groovy%20Samba
Hattie%20Wall%20-%20001%20Alto%20Sax%201
Hattie%20Wall%20-%20002%20Alto%20Sax%202
Hattie%20Wall%20-%20003%20Tenor%20Sax
Hattie%20Wall%20-%20004%20Mallets
Hattie%20Wall%20-%20005%20Baritone%20Sax
Hattie%20Wall%20-%20006%20Drum%20Set
Hattie%20Wall
IDWNOBY
IdleMoments
Instant%20Death
It's%20All%20Right%20Now
James%20Bond
Jean%20De%20Fleur
Jive%20Samba
Just%20the%20Two%20of%20Us
ListenHereNew
Love%20For%20Sale
Lovely%20is%20Today
Mating%20Call
Mean%20GreansII
Midnight%20Blue
MoBetterII
More%20Sou

ConverterException: cannot determine file format of url: https://effendi.me/jazz/repo/III//jazz/repo/

In [80]:
url_list[7].split('/')[-1].split('.xml')[0]

'A%20smooth%20One'

In [64]:
c = music21.converter.parse('https://effendi.me/jazz/repo/III/HerbsAndRoots.xml')
# c.show('musicxml.pdf')
c.show('midi')

In [72]:
c.write('midi', 'teste.mid')

'teste.mid'