In [8]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import requests
import os
from bs4 import BeautifulSoup

### Crawling xml songs database:

We will start from a set of xml song files, that explicitly describe harmony, beat, melody, tempo and other musical features. After that, we use them in order to create a database with audio content, converting xml files to midi. It means that, initially, we are going to work with synthetically generated data, due to its facility to synchronize audio and chord labels. 

#### Jazz repo:

In [3]:
url_base = 'https://effendi.me/jazz/repo/'
iters = ['I/', 'II/', 'III/', 'IV_Part-1/', 'IV_Part-2/', 'V/']
url_list = []
for i in iters:
    url = url_base + i
    site = requests.get(url)
    soup = BeautifulSoup(site.text, 'lxml')
#     print(url, len(soup.find_all('a')))
    songs = soup.find_all('a')
    for link in songs:
        link = link.get('href')
#         print(links)
        if link.startswith('?'):
            pass
        else:
            url_list.append(url+link)

In [4]:
url_list = [l for l in url_list if l.endswith('xml')]
#url_list

Saving list of links in a txt file:

In [5]:
with open('corpus/xml_songs.txt', 'w') as f:
    for item in url_list:
        f.write("%s\n" % item)

#### Downloading xml corpus

In [6]:
! wget -i corpus/xml_songs.txt --directory-prefix=corpus/xml

--2022-06-16 17:07:54--  https://effendi.me/jazz/repo/I/630blues.xml
Resolvendo effendi.me (effendi.me)... 165.227.6.59
Conectando-se a effendi.me (effendi.me)|165.227.6.59|:443... conectado.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 40868 (40K) [application/xml]
Salvando em: “corpus/xml/test/630blues.xml”


2022-06-16 17:07:55 (192 KB/s) - “corpus/xml/test/630blues.xml” salvo [40868/40868]

--2022-06-16 17:07:55--  https://effendi.me/jazz/repo/I/728.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 42919 (42K) [application/xml]
Salvando em: “corpus/xml/test/728.xml”


2022-06-16 17:07:55 (18,0 MB/s) - “corpus/xml/test/728.xml” salvo [42919/42919]

--2022-06-16 17:07:55--  https://effendi.me/jazz/repo/I/1974%20Blues.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 61441 (60K) [application/xml]
Salvan

A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 49394 (48K) [application/xml]
Salvando em: “corpus/xml/test/Cissy Strut IV.xml”


2022-06-16 17:08:05 (123 KB/s) - “corpus/xml/test/Cissy Strut IV.xml” salvo [49394/49394]

--2022-06-16 17:08:05--  https://effendi.me/jazz/repo/I/Clockwise.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 51717 (51K) [application/xml]
Salvando em: “corpus/xml/test/Clockwise.xml”


2022-06-16 17:08:05 (248 KB/s) - “corpus/xml/test/Clockwise.xml” salvo [51717/51717]

--2022-06-16 17:08:05--  https://effendi.me/jazz/repo/I/Cold%20Duck%20B%20with%20bass.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 57599 (56K) [application/xml]
Salvando em: “corpus/xml/test/Cold Duck B with bass.xml”


2022-06-16 17:08:06 (290 KB/s) - “corpus/xml/test/Cold Duck B with bass.xml” salvo [57599/575


2022-06-16 17:08:14 (191 KB/s) - “corpus/xml/test/Hattie Wall.xml” salvo [400082/400082]

--2022-06-16 17:08:14--  https://effendi.me/jazz/repo/I/IDWNOBY.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 56160 (55K) [application/xml]
Salvando em: “corpus/xml/test/IDWNOBY.xml”


2022-06-16 17:08:15 (295 KB/s) - “corpus/xml/test/IDWNOBY.xml” salvo [56160/56160]

--2022-06-16 17:08:15--  https://effendi.me/jazz/repo/I/IdleMoments.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 35635 (35K) [application/xml]
Salvando em: “corpus/xml/test/IdleMoments.xml”


2022-06-16 17:08:15 (70,2 MB/s) - “corpus/xml/test/IdleMoments.xml” salvo [35635/35635]

--2022-06-16 17:08:15--  https://effendi.me/jazz/repo/I/Instant%20Death.xml
Reaproveitando a conexão existente para effendi.me:443.
A requisição HTTP foi enviada, aguardando resposta... 200 

## Music 21
Testing music21 to generate midi files:

In [7]:
import music21

In [18]:
corpus_folder = 'corpus/xml'
for file in os.listdir(corpus_folder):
#     print(file.split('.')[0])
    try:
        song = music21.converter.parse(corpus_folder + '/' + file)
        title = file.split('.')[0]
        song.write('midi', 'corpus/midi_music21/' + title + '.mid')
        print(title)
    except:
        pass

Sham Time
Mr
630blues
WaltzInA,A
TheOddCouple
JumpinWithSymphonySid
BeiMir
CrossRiver
James Bond
MessAround
Blues A La Mode
LetItFlow
Atchafalaya
AtchFalaya
This Here
AloneIntheMorning
ThatIsWhyYouAreOverweight
HomeFires
Chill
WaynesThang
Afrodisia
stlouisblues
Number9
ItCouldHappenToYou
Ambidextrous
Tricotism
batida
VivaDeFunk
Blues for PK
JoeAvery'sBlues
ColdDuckTimeII
PIWYWIPT
Liberia
SourceThe
GrooveMerchant
Ease Back
Yeah You Right
1612
dorado 3
ToTheTop
dat dere
Song For M
Cold Duck B with bass
Back at the Chicken Shack
HighSeas
Four By Five
HeroTown
Harlem Nocturne
StickyJuly
BlueBossa
BrazosRiverBreakdown
Set Us Free
Song For My Lady
Isn't She Lovely
Its your thing
LullabyOfTheLeaves
Cool Struttin'
lyresto
Movin on Out
StormyMonday
sambop
Montara
Humanism
NoProblem
M
8CountsForRita
CrisisC#
Amsterdam After DarkII
Groovin'
TheInCrowd
Broski
OutOfNowhere
Blues on Sunday
Root Down
AllOfMe_F
Mamacita
HappyPeople
StolenMoments
InAllMy
PingPong
Work Song C
MysticBrew
It's All Right N

In [31]:
# creating midi songs corpus
# for url in url_list:
#     song = music21.converter.parse(url)
#     title = song.metadata.title
#     song.write('midi', 'corpus/midi_music21/' + title + '.mid')
#     print(title)

6:30 Blues
728
1974 Blues
All Blues
Ambidextrous
Amsterdam After Dark (II)
A Smooth One
Ballad
Pyramid
Blues After Dark
Blues for P.K.
Blues In the Basement
Bold and Black
Boogie Stop Shuffle
Boogie Woogie Bossa Nova
Cantaloupe Island
Chicago Seranade
Cissy Strut
Cissy Strut
Clockwise
Cold Duck Time
Cold Duck Time
Crisis
Crisis
Cryin' Blues
Dangerous Curves
Dangerous Curves 
Green Onions
Groovy Samba
Hattie Wall
Hattie Wall
Hattie Wall
Hattie Wall
Hattie Wall
Hattie Wall
Hattie Wall
I Don't Want No One But You
Idle Moments
Instant Death
It's All Right Now
James Bond Theme
Jean De Fleur
Jive Samba
Just the Two of Us
Listen Here
Love For Sale
Lovely is Today
Mating Call
Mean Greens 
Midnight Blue
Mo Better Blues
More Soul than Soulful
Movin' On Out
My Groove, Your Move
Night And Day
Night Train 
Nothing Else to Do
Ornithology
Pass the Peas
Pink Panther
'Round Midnight
The Selma March
Set Us Free
Sham Time
Somethin' Else
Sookie Sookie
Spontaneous Combustion
Straight Street
Strasbourg St D

In [5]:
c = music21.converter.parse('https://effendi.me/jazz/repo/III/Gillette.xml')
# c.show('musicxml.pdf')
c.show('midi')

## Creating database

### mysql

In [42]:
import mysql.connector
from getpass import getpass
import pickle

In [33]:
password = getpass()

mydb = mysql.connector.connect(
  host='127.0.0.1',
  user="root",
  password=password
)

mycursor = mydb.cursor()

# listing existing databases
mycursor.execute("SHOW DATABASES")

dbs = []
for x in mycursor:
    dbs.append(x)
    
    
# checking if "songs" already exists
if ('songs',) in dbs:
    pass
else: 
    mycursor.execute("CREATE DATABASE songs")

········


Creating first table, with raw_signals of our songs:

In [34]:
# mycursor.execute('''
#           CREATE TABLE IF NOT EXISTS raw_signal
#           ([raw_signal_id] INTEGER PRIMARY KEY, [signal] BLOB)
#           ''')
                     
    
songs_db =  mysql.connector.connect(host="localhost",
                                    user="root",
                                    password=getpass(),
                                    database="songs"
)

songs_cursor = songs_db.cursor()
# dropping raw_signal table if already exists.
songs_cursor.execute("DROP TABLE IF EXISTS raw_signal")
songs_cursor.execute('''CREATE TABLE raw_signal (raw_signal_id INT AUTO_INCREMENT PRIMARY KEY,
                                                title VARCHAR(255) NOT NULL, 
                                                raw_signal BLOB,
                                                original_db VARCHAR(255) NOT NULL)''')

········


In [35]:
songs_cursor.execute("SHOW TABLES")

for x in songs_cursor:
    print(x)

('raw_signal',)


### Adding data to database

In [56]:
type(song)

music21.stream.Score

#### 28/03
Oi! Deu errado aqui... pensar num formato pra armazenar as músicas (xml mesmo? arquivo de audio tipo wav? pickle?)

In [39]:
for url in url_list:
    song = music21.converter.parse(url)
    title = song.metadata.title
#     song.write('midi', 'corpus/midi_music21/' + title + '.mid')



    sql = "INSERT INTO raw_signal (title, raw_signal, original_db) VALUES (%s, %s, %s)"
    val = (title, song, url)
    songs_cursor.execute(sql, val)

    songs_db.commit()

    print(songs_cursor.rowcount, "record inserted.")

MySQLInterfaceError: Python type Score cannot be converted