# Database

Informations about the files in `data/` path.

## Data souce

Data were obtained from two sources:
* http://guitarpatches.com/patches.php?unit=G3
* XML containing information of audio plugins (name, category, parameters and parameter values) from Zoom Edit&Share program.

In [1]:
import pandas as pd

### `data/pedalboard-info.csv`

Patches obtained from http://guitarpatches.com/patches.php?unit=G3: 
* `index`: guitarpatches.com patch identifier;
* `artist`: artist name;
* `date`: patch registraton data;
* `has_audio`: a patch audio was made available exists an patch audio?;
* `has_video`: a patch video was made available?;
* `link`: patch link;
* `rating`: evaluate $\frac{\sum_{r \in Ratings} r}{|Ratings|}$, that $r \in \mathbb{N}$ such that $1 \leq r \leq 5$
* `total_downloads`: total downloads made up to the time of scraping;
* `uploader`: user who shared the pedalboard;

This list is generated by scrapping and is used to download the patches.

In [2]:
pedalboards_data = pd.read_csv('data/pedalboard-info.csv').sort_index()

pedalboards_data.index = pedalboards_data['index']

del pedalboards_data['index']
pedalboards_data.to_csv('data/pedalboard-info.csv')

pedalboards_data.head()

Unnamed: 0_level_0,link,title,artist,rating,has_audio,has_video,date,uploader,total_downloads
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
11066,patches.php?mode=show&unit=G3&ID=11066,kizuna music,Poppin' Party,,False,True,2019-02-02,GreenHerb,93
11037,patches.php?mode=show&unit=G3&ID=11037,koko1,Aikatsu Friends!,,False,True,2019-01-07,GreenHerb,140
11038,patches.php?mode=show&unit=G3&ID=11038,koko2,Aikatsu Friends!,,False,True,2019-01-07,GreenHerb,75
11025,patches.php?mode=show&unit=G3&ID=11025,Dream Load,Aikatsu Stars!,,False,True,2018-12-31,GreenHerb,132
11001,patches.php?mode=show&unit=G3&ID=11001,100 All In One patces,various,,False,False,2018-12-29,OneDime,444


### `data/plugins-categories.csv`

Relationship betweeen audio plugin and your category.

* `id`: audio plugin identifier for the equipment `Zoom G3 version 2.x`;
* `name`: audio plugin name;
* `category`: category in which the plugin belongs.

In [3]:
plugins_categories = pd.read_csv('data/plugins-categories.csv', index_col='id').sort_index()
plugins_categories.head(10)

Unnamed: 0_level_0,name,category
id,Unnamed: 1_level_1,Unnamed: 2_level_1
0,M-Filter,Filter_EQ
1,TheVibe,Modulation
2,Z-Organ,SFX
3,Slicer,Modulation
4,PhaseDly,Delay
5,FilterDly,Delay
6,PitchDly,Delay
7,StereoDly,Delay
8,BitCrush,Modulation
9,Bomber,SFX


### `data/patches.csv`

Patches extracted from [data/pedalboard-info.csv](data/pedalboard-info.csv).

* `id`: guitar patch identifer (based in `pedalboard-info.csv`). If a registry in guitarpatches contains more than one patch, all the patches will be the same id;
* `name`: patch ASCII name. Size: `10` characters;
* `plugin1`: Id of audio plugin in the position 1;
* `plugin2`: Id of audio plugin in the position 2;
* `plugin3`: Id of audio plugin in the position 3;
* `plugin4`: Id of audio plugin in the position 4;
* `plugin5`: Id of audio plugin in the position 5;
* `plugin6`: Id of audio plugin in the position 6;

`plugin4`, `plugin5` e `plugin6` can assume the value `NaN` when they originate from the Zoom G3 first version, in which you could use a maximum of 3 audio plugins.

In [4]:
pd.read_csv('data/patches.csv', index_col=['id', 'name']).sort_values('name').head(5)

Unnamed: 0_level_0,Unnamed: 1_level_0,plugin1,plugin2,plugin3,plugin4,plugin5,plugin6
id,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
9467,!!*Cuda',23,27,73,109.0,106.0,61.0
9471,'70s*V.H**,39,30,100,60.0,107.0,107.0
7313,'70s*V.H**,39,30,100,60.0,107.0,107.0
8913,'70s*V.H**,39,30,100,60.0,107.0,107.0
7313,'90s*V.H**,23,99,31,42.0,53.0,60.0


## `data/patches-filtered.csv`

Similar from `data/patches.csv`, but some columns were reorganized and patches were filtered. See [Processing_data_3_-_data_transformations.ipynb](Processing_data_3_-_data_transformations.ipynb) to view the details.

In [5]:
pd.read_csv('data/patches-filtered.csv', index_col=['id', 'name']).sort_values('name').head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,plugin1,plugin2,plugin3,plugin4,plugin5,plugin6
id,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
9467,!!*Cuda',23.0,27.0,73.0,109.0,106.0,61.0
7313,'70s*V.H**,39.0,30.0,100.0,60.0,107.0,107.0
7313,'90s*V.H**,23.0,99.0,31.0,42.0,53.0,60.0
9467,*!!*Wanted,23.0,27.0,85.0,49.0,60.0,107.0
9467,****EMG*81,23.0,27.0,72.0,101.0,105.0,107.0
10974,***E.V.H.*,23.0,27.0,80.0,110.0,37.0,105.0
9467,***MESA*BG,23.0,27.0,80.0,100.0,107.0,107.0
10974,***Sitar**,23.0,27.0,92.0,54.0,64.0,107.0
9467,**BIG*FOUR,23.0,27.0,72.0,101.0,45.0,61.0
9467,**CTX-Dave,23.0,27.0,80.0,99.0,65.0,107.0


### `data/patches-error.csv`

List of problems that occurred while attempting to download and process the patch files.

* `id`: patch index (based in `pedalboard-info.csv`);
* `error`: error that blocked the extraction of patches.

In [6]:
pd.read_csv('data/patches-error.csv', index_col='id').head()

Unnamed: 0_level_0,error
id,Unnamed: 1_level_1
7074,Directory does not have any data: 28.06.2013
7975,Unknown format: jpg
7976,Unknown format: jpg
8447,Unknown format: g5p
9416,Unknown format: g5a


### `data/patches-bag-of-words.csv`

Audio plugins as like bag of words.

* `index`: unique index;
* `id`: patch index (based in `pedalboard-info.csv`);
* `name`: patch ASCII name. Size: `10` characters;
* `0 ... 116` : audio plugin column. Value is the total of audio plugins in the patch.

In [7]:
bag_of_words = pd.read_csv('data/patches-bag-of-words.csv', index_col=['index', 'id'])
bag_of_words.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,107,108,109,110,111,112,113,114,115,116
index,id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,5299,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
1,5300,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
2,5301,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
3,5303,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
4,5304,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0


Join with the `plugins-categories.csv`:

In [8]:
plugins_categories_copy = plugins_categories.copy()

plugins_categories_copy['Plugin'] = plugins_categories_copy.name
plugins_categories_copy['Category'] = plugins_categories_copy.category

bag_of_words.columns = [plugins_categories_copy.Category, plugins_categories_copy.Plugin]

bag_of_words.head()

Unnamed: 0_level_0,Category,Filter_EQ,Modulation,SFX,Modulation,Delay,Delay,Delay,Delay,Modulation,SFX,...,None,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling,Amp Modeling
Unnamed: 0_level_1,Plugin,M-Filter,TheVibe,Z-Organ,Slicer,PhaseDly,FilterDly,PitchDly,StereoDly,BitCrush,Bomber,...,None,TONE CITY,B-BREAKER,BGN DRIVE,DELUXE-R,ALIEN,REVO-1,CAR DRIVE,MS 1959,VX JMI
index,id,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
0,5299,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
1,5300,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
2,5301,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
3,5303,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0
4,5304,0,0,0,0,0,0,0,0,0,0,...,3,0,0,0,0,0,0,0,0,0


### `data/patches-one-hot-encodigon.csv`

Audio plugins with [one-hot](https://en.wikipedia.org/wiki/One-hot) encoding.

* `index`: unique index;
* `id`: patch index (based in `pedalboard-info.csv`);
* `name`: patch ASCII name. Size: `10` characters;
* `0 ... 116`: audio plugin column. Value indicate if the audio plugin is used or not in the patch in the determined position.

In [9]:
one_hot_encoding = pd.read_csv('data/patches-one-hot-encoding.csv', index_col=['index', 'id'])
one_hot_encoding.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,0,1,2,3,4,5,6,7,8,9,...,107.5,108.5,109.5,110.5,111.5,112.5,113.5,114.5,115.5,116.5
index,id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0,5299,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
1,5300,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,5301,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,5303,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,5304,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
