# Base de dados

Informações referentes à base de dados de plugins de áudio

## Origem dos dados

Os dados foram obtidos de duas fontes:
* http://guitarpatches.com/patches.php?unit=G3
* XML contendo informações dos plugins de áudio (nome, categoria, parâmetros e valores dos parâmetros)

In [1]:
import pandas as pd

### `pedalboard-info.csv`

Enumera os pedalboards que foram baixados do site http://guitarpatches.com/patches.php?unit=G3: 
* `index`: índice do pedalboard no guitarpatches.com;
* `artist`: nome do artista;
* `date`: data do cadastro do pedalboad;
* `has_audio`: foi disponibilizado um áudio do pedalboard?;
* `has_video`: foi disponibilizado um vídeo do pedalboard?;
* `link`: link do pedalboard;
* `rating`: nota atribuída $\frac{\sum_{r \in Ratings} r}{|Ratings|}$, onde $r \in \mathbb{N}$ tal que $1 \leq r \leq 5$
* `total_downloads`: total de downloads realizados até o momento do scrapping;
* `uploader`: usuário responsável por compartilhar o pedalboard;

Essa lista é gerada pelo scrapping e é utilizada para fazer o download dos pedalboards.

In [2]:
pedalboards_data = pd.read_json('data/pedalboard-info.json').sort_index()
pedalboards_data.index = pedalboards_data['index']

del pedalboards_data['index']
pedalboards_data.to_csv('data/pedalboard-info.csv')

pedalboards_data.head(5)

Unnamed: 0_level_0,artist,date,has_audio,has_video,link,rating,title,total_downloads,uploader
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
10870,Pink Floyd,2018-08-21,False,True,patches.php?mode=show&unit=G3&ID=10870,5.0,PinkR,83,J.D.S.
10871,Pink Floyd,2018-08-21,False,True,patches.php?mode=show&unit=G3&ID=10871,5.0,PinkS,42,J.D.S.
10872,J.D.S.,2018-08-21,False,True,patches.php?mode=show&unit=G3&ID=10872,,Bass,28,J.D.S.
10849,Krail Anuar,2018-08-07,False,False,patches.php?mode=show&unit=G3&ID=10849,,T.S.O.8.L,80,NU - OFF
10850,Krail Anuar,2018-08-07,False,False,patches.php?mode=show&unit=G3&ID=10850,,T.S.O.8.L,67,666


### `plugin-category.csv`

Enumera informações sobre o plugin de áudio:

* `id`: índice do plugin de áudio para o equipamento `Zoom G3 version 2.x`;
* `category`: categoria na qual pertence; e,
* `name`: nome do plugin de áudio

In [3]:
plugins_categories = pd.read_csv('data/plugin-category.csv', index_col='id').sort_index()
plugins_categories.head(10)

Unnamed: 0_level_0,name,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,M-Filter,Filter_EQ
1,TheVibe,Modulation
2,Z-Organ,SFX
3,Slicer,Modulation
4,PhaseDly,Delay
5,FilterDly,Delay
6,PitchDly,Delay
7,StereoDly,Delay
8,BitCrush,Modulation
9,Bomber,SFX


### `pedalboard-plugin.csv`

Enumera os 6 plugins de áudio utilizados em um pedalboard.

* `id`: índice do pedalboard (em conformidade com `pedalboard-info.csv`);
* `name`: nome ASCII do pedalboard no equipamento. Tamanho: $10$ caracteres;
* `plugin1`: Plugin de áudio na posição 1
* `plugin2`: Plugin de áudio na posição 2
* `plugin3`: Plugin de áudio na posição 3
* `plugin4`: Plugin de áudio na posição 4
* `plugin5`: Plugin de áudio na posição 5
* `plugin6`: Plugin de áudio na posição 6

`plugin4`, `plugin5` e `plugin6` podem ser simultaneamente `NaN`, quando eles são originários da versão anterior do Zoom G3 (version 1), no qual podia utilizar no máximo 3 plugins de áudio.

In [4]:
pd.read_csv('data/pedalboard-plugin.csv', index_col='id').sort_values('name').head(10)

Unnamed: 0_level_0,name,plugin1,plugin2,plugin3,plugin4,plugin5,plugin6
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
9467,!!*Cuda',23,27,73,109,106,61
8913,'70s*V.H**,39,30,100,60,107,107
7313,'70s*V.H**,39,30,100,60,107,107
9471,'70s*V.H**,39,30,100,60,107,107
7313,'90s*V.H**,23,99,31,42,53,60
9672,*!!*Wanted,23,27,85,49,60,107
9467,*!!*Wanted,23,27,85,49,60,107
9588,**********,107,107,107,27,107,31
10849,**********,107,107,107,107,107,84
9467,****EMG*81,23,27,72,101,105,107


### `pedalboard-plugin-error.csv`

Erros que ocorreram nos processos de:
* obter os arquivos do pedalboard;
* extrair os dados.


* `id`: índice do pedalboard (em conformidade com `pedalboard-info.csv`);
* `error`: erro que impediu a realização.

In [5]:
pd.read_csv('data/pedalboard-plugin-error.csv', index_col='id').head()

Unnamed: 0_level_0,error
id,Unnamed: 1_level_1
6361,'content-disposition'
7074,Unknown format: rar
7975,Unknown format: jpg
7976,Unknown format: jpg
8313,Unknown format: rar


### `pedalboard-plugin-bag-of-words.csv`

Professor Amauri pediu para enumerar os plugins de áudio like bag of words.

* `index`: índice único incremental;
* `id`: índice do pedalboard (em conformidade com `pedalboard-info.csv`);
* `name`: nome ASCII do pedalboard no equipamento. Tamanho: $10$ caracteres;
* `0 ... 116` : id do plugin de áudio.

In [6]:
bag_of_words = pd.read_csv('data/pedalboard-plugin-bag-of-words.csv', index_col=['index', 'id'])
bag_of_words.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,107,108,109,110,111,112,113,114,115,116
index,id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,9467,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,8913,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
2,7313,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
3,9471,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
4,7313,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Juntando com `plugin-categorie.csv`:

In [7]:
plugins_categories_copy = plugins_categories.copy()

plugins_categories_copy['Plugin'] = plugins_categories_copy.name
plugins_categories_copy['Category'] = plugins_categories_copy.category

bag_of_words.columns = [plugins_categories_copy.Category, plugins_categories_copy.Plugin]

bag_of_words.head()

Unnamed: 0_level_0,Category,Filter_EQ,Modulation,SFX,Modulation,Delay,Delay,Delay,Delay,Modulation,SFX,...,None,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling
Unnamed: 0_level_1,Plugin,M-Filter,TheVibe,Z-Organ,Slicer,PhaseDly,FilterDly,PitchDly,StereoDly,BitCrush,Bomber,...,None,TONE CITY,B-BREAKER,BGN DRIVE,DELUXE-R,ALIEN,REVO-1,CAR DRIVE,MS 1959,VX JMI
index,id,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
0,9467,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,8913,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
2,7313,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
3,9471,0,0,0,0,0,0,0,0,0,0,...,2,0,0,0,0,0,0,0,0,0
4,7313,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
bag_of_words.sum()

Category        Plugin   
Filter_EQ       M-Filter       28
Modulation      TheVibe        50
SFX             Z-Organ        24
Modulation      Slicer         29
Delay           PhaseDly       47
                FilterDly      71
                PitchDly       46
                StereoDly     224
Modulation      BitCrush       41
SFX             Bomber          0
Modulation      DuoPhase       13
SFX             MonoSynth      34
Filter_EQ       SeqFLTR        25
                RndmFLTR       10
Modulation      WarpPhase      10
Delay           TrgHldDly       7
Combination Fx  Cho+Dly        47
                Cho+Rev        58
                Dly+Rev        97
                Comp+Phsr      34
                Comp+AWah       9
                FLG+VCho       47
                Comp+OD        27
Dynamics        Comp          479
                RackComp      300
                M Comp        151
                SlowATTCK      55
                ZNR          1776
                NoiseG