<a href="https://colab.research.google.com/github/JoDeMiro/Ember/blob/main/Cosine_Distance_and_MIDI_Reader.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cosine distance

Arra vagyok kiváncsi, hogy mi a reláció a cosine és az euclidean distance között. Továbbá, hogy ennek segítségével kiszámohatom egy két hagsor közötti távolságot, hasonlóságot.

In [1]:
import numpy as np

In [2]:
a1 = np.array([1, 5, 9, 13, 17])
a2 = np.array([3, 7, 11, 15, 19])

b1 = np.array([1, 5, 7, 15, 13])

Olyan metrikát kell választanom ahol az jön ki, hogy $D(a1,a2) = 0$ és $D(a1,b1) > D(a1,a2)$ és $D(a2, b1) > D(a1,a2)$

Képlet szerint a Cosine distance:

https://en.wikipedia.org/wiki/Cosine_similarity

$$\text{cosine similarity} = S_c(A,B) := cos(\theta) = \frac{\text{A} \cdot \text{B}}{\parallel{\text{A}}\parallel \parallel{\text{B}}\parallel} = \frac{\sum_{i=1}^{n}A_iB_i}{\sqrt{\sum_{i=1}^{n}A_i^2} \sqrt{\sum_{i=1}^{n}B_i^2}}$$

ahol $A_i$ és $B_i$ a vektorok komponensei (elemei).

Mellesleg, ha a vektorok centralizálva vannak az átlagra, akkor a fenti képlet értéke meg fog egyezni a Pearsons féle korrelációs együttható értékével.

In [4]:
def euclidean_distance(a, b):
  return np.sqrt(np.sum((a -b)**2))

In [6]:
print('Euclidean distance a1,a2 \n {:.5f} \n'.format(euclidean_distance(a1, a2)))

Euclidean distance a1,a2 
 4.47214 



In [12]:
def cosine_distance(a, b):
  return np.sum((a * b)) / (np.sqrt(np.sum(a**2)) * np.sqrt(np.sum(b**2)))

In [24]:
def cosine_similarity(a, b):
  'Ugyan az, csak np.dot() függvénnyel'
  return np.dot(a, b) / (np.sqrt(np.dot(a, a)) * np.sqrt(np.dot(b, b)))

In [25]:
print('Cosine similarity a1,a2 \n {:.5f} \n'.format(cosine_distance(a1, a2)))

Cosine similarity a1,a2 
 0.99629 



In [26]:
print('Cosine similarity a1,a2 \n {:.5f} \n'.format(cosine_similarity(a1, a2)))

Cosine similarity a1,a2 
 0.99629 



In [14]:
print('Cosine similarity a1,a2 \n {:.5f} \n'.format(cosine_distance(a1, a2)))
print('Cosine similarity a1,b1 \n {:.5f} \n'.format(cosine_distance(a1, b1)))
print('Cosine similarity a2,b1 \n {:.5f} \n'.format(cosine_distance(a2, b1)))


Cosine similarity a1,a2 
 0.99629 

Cosine similarity a1,b1 
 0.98103 

Cosine similarity a2,b1 
 0.97999 



In [20]:
def pearsons_correlation(a, b):
  '''
  https://en.wikipedia.org/wiki/Correlation
  '''
  cov = np.sum((a-np.mean(a)) * (b-np.mean(b)))
  div = np.sqrt(np.sum((a-np.mean(a))**2) * np.sum((b-np.mean(b))**2))
  cor = cov / div
  return cor

In [23]:
print('Correaltion a1,a2 \n {:.5f} \n'.format(pearsons_correlation(a1, a2)))

Correaltion a1,a2 
 1.00000 



In [22]:
print('Correaltion a1,a2 \n {:.5f} \n'.format(pearsons_correlation(a1, a2)))
print('Correaltion a1,b1 \n {:.5f} \n'.format(pearsons_correlation(a1, b1)))
print('Correaltion a2,b1 \n {:.5f} \n'.format(pearsons_correlation(a2, b1)))

Correaltion a1,a2 
 1.00000 

Correaltion a1,b1 
 0.93300 

Correaltion a2,b1 
 0.93300 



## Kiegészítés

### L1, és L2 Normák

def l1_normalize(v):

    norm = np.sum(v)
    return v / norm

def l2_normalize(v):

    norm = np.sqrt(np.sum(np.square(v)))
    return v / norm

In [27]:
def l1_normalize(v):
    norm = np.sum(v)
    return v / norm

def l2_normalize(v):
    norm = np.sqrt(np.sum(np.square(v)))
    return v / norm

In [30]:
print('L1 norm a1 \n {} \n'.format(l1_normalize(a1)))
print('L1 norm a2 \n {} \n'.format(l1_normalize(a2)))
print('L1 norm b1 \n {} \n'.format(l1_normalize(b1)))

L1 norm a1 
 [0.02222222 0.11111111 0.2        0.28888889 0.37777778] 

L1 norm a2 
 [0.05454545 0.12727273 0.2        0.27272727 0.34545455] 

L1 norm b1 
 [0.02439024 0.12195122 0.17073171 0.36585366 0.31707317] 



In [31]:
print('L2 norm a1 \n {} \n'.format(l2_normalize(a1)))
print('L2 norm a2 \n {} \n'.format(l2_normalize(a2)))
print('L2 norm b1 \n {} \n'.format(l2_normalize(b1)))

L2 norm a1 
 [0.04207032 0.21035158 0.37863285 0.54691411 0.71519538] 

L2 norm a2 
 [0.10846523 0.25308553 0.39770584 0.54232614 0.68694645] 

L2 norm b1 
 [0.04617571 0.23087855 0.32322997 0.69263564 0.60028423] 



## Megoldás

Úgy tűnik, hogy erre a feladatra a korrelácói lesz számomra a legjobb megoldás.

## MIDI

https://freemidi.org



In [32]:
!pip install mido==1.2.9

Collecting mido==1.2.9
  Downloading mido-1.2.9-py2.py3-none-any.whl (52 kB)
[?25l[K     |██████▎                         | 10 kB 21.1 MB/s eta 0:00:01[K     |████████████▌                   | 20 kB 14.8 MB/s eta 0:00:01[K     |██████████████████▊             | 30 kB 10.3 MB/s eta 0:00:01[K     |█████████████████████████       | 40 kB 9.4 MB/s eta 0:00:01[K     |███████████████████████████████▏| 51 kB 5.5 MB/s eta 0:00:01[K     |████████████████████████████████| 52 kB 900 kB/s 
[?25hInstalling collected packages: mido
Successfully installed mido-1.2.9


In [34]:
%%capture
!wget https://github.com/JoDeMiro/Data/raw/main/Midi/1.mid

In [35]:
from mido import MidiFile

mid = MidiFile('1.mid', clip=True)
print(mid)

<midi file '1.mid' type 1, 16 tracks, 15995 messages>


- type 0 (single track): all messages are saved in one track
- type 1 (synchronous): all tracks start at the same time
- type 2 (asynchronous): each track is independent of the others

In [36]:
for track in mid.tracks:
    print(track)

<midi track 'Survivor' 8 messages>
<midi track 'MIDI MAN!A 3000!' 1125 messages>
<midi track '(c) 2001 MM3K!' 5215 messages>
<midi track 'midimania3k.terrashare.c' 783 messages>
<midi track 'om' 149 messages>
<midi track 'midimania3k@hotmail.com' 123 messages>
<midi track '-' 1325 messages>
<midi track '--' 2253 messages>
<midi track '---' 135 messages>
<midi track '----' 687 messages>
<midi track '------' 3077 messages>
<midi track '-------' 11 messages>
<midi track 'Enjoy' 633 messages>
<midi track 'This' 431 messages>
<midi track 'MIDI' 33 messages>
<midi track 'Tempo Track' 7 messages>


This allows you to see the track titles and how many messages are in each track. You can loop through the messages in a track:

In [37]:
for msg in mid.tracks[0]:
    print(msg)

<meta message track_name name='Survivor' time=0>
<meta message text text="By Destiny's Child" time=0>
<meta message copyright text='(c) 2001 MIDI MAN!A 3000!' time=0>
<meta message copyright text='midimania3k.terrashare.com / midimania3k@hotmail.com' time=0>
<meta message text text='Generated by NoteWorthy Composer' time=0>
<meta message set_tempo tempo=363636 time=0>
<meta message time_signature numerator=4 denominator=4 clocks_per_click=24 notated_32nd_notes_per_beat=8 time=0>
<meta message end_of_track time=0>


Szöveges magyarázat az alábbi kódhoz:

This code loops through the tracks in our MIDI file, searches for tracks that
have the same exact number of messages, and removes them from the overall MIDI
file to get rid of the duplicates.


In [38]:
import os

from mido import MidiFile

cv1 = MidiFile('1.mid', clip=True)

message_numbers = []
duplicates = []

for track in cv1.tracks:
    if len(track) in message_numbers:
        duplicates.append(track)
    else:
        message_numbers.append(len(track))

for track in duplicates:
    cv1.tracks.remove(track)

cv1.save('new_song.mid')

Szöveges magyarázat az alábbi kódhoz:

This code deletes the bass and drum tracks from the first file, and adds the bass and drum tracks from the second file. Notice that we are opening `new_song.mid` so that we have the version of the MIDI with no duplicate tracks, and saving the new tune to a file called `mashup.mid`.

Run this code and open `mashup.mid` and jam out to our new remix of Vampire Killer from Castlevania 1 and 3.


In [39]:
import os

from mido import MidiFile

cv1 = MidiFile('new_song.mid', clip=True)
cv3 = MidiFile('1.mid', clip=True)

del cv1.tracks[4]
del cv1.tracks[4]

cv1.tracks.append(cv3.tracks[4])
cv1.tracks.append(cv3.tracks[5])

cv1.save('mashup.mid')

# An other aproach

https://blog.ouseful.info/2016/09/13/making-music-and-embedding-sounds-in-jupyter-notebooks


In [6]:
!pip install git+https://github.com/kroger/pyknon

Collecting git+https://github.com/kroger/pyknon
  Cloning https://github.com/kroger/pyknon to /tmp/pip-req-build-gmya_9rz
  Running command git clone -q https://github.com/kroger/pyknon /tmp/pip-req-build-gmya_9rz
Building wheels for collected packages: pyknon
  Building wheel for pyknon (setup.py) ... [?25l[?25hdone
  Created wheel for pyknon: filename=pyknon-1.2-py3-none-any.whl size=19384 sha256=ad8f988d4075993cf361f638e0fcccc79ec60e8f80b7acebaa7884060c3b876a
  Stored in directory: /tmp/pip-ephem-wheel-cache-zv0pb0ms/wheels/9c/70/e6/ffa8b490317517ad0d84ae97f165d772f5c38319711cf354e0
Successfully built pyknon
Installing collected packages: pyknon
Successfully installed pyknon-1.2


In [9]:
from pyknon.genmidi import Midi
from pyknon.music import NoteSeq, Note

melody = [10, 10, 11, 13, 13, 11, 10, 8, 4, 3]

def makeMidi(notes, name, filename='tunel.midi'):
  notes1 = map(Note, notes)
  midi = Midi(1, tempo = 90)
  midi.seq_notes(notes1, track = 0)
  midi.write(filename)

makeMidi(melody, 'Melody1')

In [10]:
!pip install music21



In [40]:
from music21 import midi

def playMidi(filename):
  mf = midi.MidiFile()
  mf.open(filename)
  mf.read()
  mf.close()
  s = midi.translate.midiFileToStream(mf)
  # sajnos Google Colab alatt nem működik
  # s.show('midi')
  # Ennek az az oka, hogy ott nincs Audio Device


In [41]:
# Google Colab cumi

playMidi('tunel.midi')

In [50]:
!sudo apt-get update -y

!sudo apt-get install -y lilypond

0% [Working]            Ign:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.142)] [Waiting for headers] [Wa                                                                               Hit:2 http://security.ubuntu.com/ubuntu bionic-security InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.142)] [Waiting for headers] [Wa                                                                               Hit:3 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
0% [Connecting to archive.ubuntu.com (91.189.88.142)] [Connecting to ppa.launch0% [2 InRelease gpgv 88.7 kB] [Connecting to archive.ubuntu.com (91.189.88.142)                                                                               Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu180

In [51]:
from music21 import converter
from IPython.display import Image

s = converter.parse('tunel.midi')
Image(filename=s.write('lily.png'))

AttributeError: ignored

# Vázlat

Hogyan kell függvényeket írni.

In [30]:
def minta(learning_rate: float = 0.1, solver: str = 'sgd', n_epoch: int = 1):
  """
  Ez egy mintafüggvény amin bemutatom hogyan kéne írni egy függvyényhez
  tartozó dokumentumot, hogy az olvasható és használható legyen

  Ez csak egy megjegyzés.

  Parameters
  ----------
  learning_rate_init : float, default=0.001
      The initial learning rate used. It controls the step-size
      in updating the weights. Only used when solver='sgd' or 'adam'.


  solver : {'lbfgs', 'sgd', 'adam'}, default='adam'
      The solver for weight optimization.

      - 'lbfgs' is an optimizer in the family of quasi-Newton methods.

      - 'sgd' refers to stochastic gradient descent.

      - 'adam' refers to a stochastic gradient-based optimizer proposed by
        Kingma, Diederik, and Jimmy Ba

      Note: The default solver 'adam' works pretty well on relatively
      large datasets (with thousands of training samples or more) in terms of
      both training time and validation score.
      For small datasets, however, 'lbfgs' can converge faster and perform
      better.

  n_epochs: int, default=1
      Maximum number of iterations.
  """

  from sklearn.neural_network import MLPRegressor

  model = MLPRegressor(hidden_layer_sizes=(2, ),
                       solver = solver,
                       max_iter = n_epoch,
                       learning_rate_init=learning_rate)
  
  return model

In [31]:
mlp = minta(learning_rate=0.1)
mlp = minta()