## Import
Import **numpy**, **pandas** and **matplotlib**.

In [50]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

-------------------------------------

### Data Representation

Phase 1: Loading the Dataset into a dataframe

Notes: <br>
<li> For this Project our group has chosen "Dataset2" as the dataset we will be working on. </li>
<li> We will be representing the data here as "df2". </li>

In [82]:
df2 = pd.read_csv('Dataset2.csv')

-------------------------------------

Phase 2: Representation and Alteration of the variables (columns) of the dataset

Notes: <br>
<li> In this phase, we will give the columns of the dataset a presumed representation by assigning to it variables based on a certain theme </li>
<li> For this dataset, our theme will be music and our columns will represent the different aspects of a piece of music, namely: <br>
    tempo <br>
    rythm <br>
    pitch <br>
    acousticness <br>
    danceability <br>
    energy <br>
    instrumentalness <br>
    loudness <br>
    liveness <br>
    valence <br></li>
<li> For the "class" column, which is our clusters, these will be represented as 3 certain genres of music: "Pop Music", "Alternative Rock", and "Jazz".
<li> We will also be removing the "Unnamed: 0" column, since it is identical to the row id column. </li>

In [83]:
df2_copy = df2

new_columns1 = ['tempo','rythm','pitch','acousticness','danceability']
new_columns2 = ['energy','instrumentalness','loudness','liveness','valence']
new_columns = new_columns1 + new_columns2

for i in range(10):
    df2_copy = df2_copy.rename(columns={'f'+str(i+1) : new_columns[i]})

In [84]:
df2_copy['class'] = df2_copy['class'].replace(0,'pop music')
df2_copy['class'] = df2_copy['class'].replace(1,'alternative rock')
df2_copy['class'] = df2_copy['class'].replace(2,'jazz')

In [85]:
df2_copy = df2_copy.drop(['Unnamed: 0'], axis = 1)

In [86]:
df2 = df2_copy
df2

Unnamed: 0,tempo,rythm,pitch,acousticness,danceability,energy,instrumentalness,loudness,liveness,valence,class
0,10.652692,16.042400,24.565548,26.272129,45.647680,58.632068,45.045855,67.675526,80.165594,90.417547,pop music
1,5.722232,17.035886,15.896998,30.542138,45.783895,47.886595,54.478161,80.577335,77.942159,76.989651,pop music
2,-12.363202,8.080350,22.206259,28.204266,54.067783,55.704987,64.482570,61.861936,83.913536,93.484662,pop music
3,-0.067071,4.439547,16.885748,30.573906,35.575125,65.921801,64.191040,78.823812,71.224372,93.153639,pop music
4,4.970688,6.228799,25.188235,29.065907,35.151767,57.690875,58.618101,66.280558,102.556176,91.872643,pop music
...,...,...,...,...,...,...,...,...,...,...,...
895,-8.768710,3.987602,20.381588,25.381834,39.937148,64.779896,53.026529,60.877374,96.129077,111.075092,jazz
896,-4.393518,19.802175,15.762081,40.011495,34.117872,47.558485,53.307285,66.338300,76.216410,71.210365,jazz
897,7.874213,14.960790,27.058597,25.058451,36.062587,52.591125,53.276791,85.413482,70.506658,83.810104,jazz
898,-4.410361,-0.526504,15.592582,38.051297,41.533565,45.096312,67.538675,69.194555,71.118547,92.926016,jazz


In [87]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900 entries, 0 to 899
Data columns (total 11 columns):
tempo               900 non-null float64
rythm               900 non-null float64
pitch               900 non-null float64
acousticness        900 non-null float64
danceability        900 non-null float64
energy              900 non-null float64
instrumentalness    900 non-null float64
loudness            900 non-null float64
liveness            900 non-null float64
valence             900 non-null float64
class               900 non-null object
dtypes: float64(10), object(1)
memory usage: 77.5+ KB


-------------------------------------

Phase 3: Representation and Alteration of the variables (columns) of the dataset

Notes: <br>
<li>In this phase, we wil give the rows of the dataset a presumed representation by renaming it according to the type of observation they are based on our given theme in Phase 2. </li>
<li>In this case, each observation represents a certain "song" or music. </li>
<li>Each value in the row id column will be renamed as "Song" plus the id number of the observation (ex. Song 102). </li>

In [88]:
df2_copy = df2

rows = ['Song ' + str(x) for x in range(df2_copy.shape[0])]
df2_copy.index = rows

In [89]:
df2 = df2_copy
df2

Unnamed: 0,tempo,rythm,pitch,acousticness,danceability,energy,instrumentalness,loudness,liveness,valence,class
Song 0,10.652692,16.042400,24.565548,26.272129,45.647680,58.632068,45.045855,67.675526,80.165594,90.417547,pop music
Song 1,5.722232,17.035886,15.896998,30.542138,45.783895,47.886595,54.478161,80.577335,77.942159,76.989651,pop music
Song 2,-12.363202,8.080350,22.206259,28.204266,54.067783,55.704987,64.482570,61.861936,83.913536,93.484662,pop music
Song 3,-0.067071,4.439547,16.885748,30.573906,35.575125,65.921801,64.191040,78.823812,71.224372,93.153639,pop music
Song 4,4.970688,6.228799,25.188235,29.065907,35.151767,57.690875,58.618101,66.280558,102.556176,91.872643,pop music
...,...,...,...,...,...,...,...,...,...,...,...
Song 895,-8.768710,3.987602,20.381588,25.381834,39.937148,64.779896,53.026529,60.877374,96.129077,111.075092,jazz
Song 896,-4.393518,19.802175,15.762081,40.011495,34.117872,47.558485,53.307285,66.338300,76.216410,71.210365,jazz
Song 897,7.874213,14.960790,27.058597,25.058451,36.062587,52.591125,53.276791,85.413482,70.506658,83.810104,jazz
Song 898,-4.410361,-0.526504,15.592582,38.051297,41.533565,45.096312,67.538675,69.194555,71.118547,92.926016,jazz


In [90]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 900 entries, Song 0 to Song 899
Data columns (total 11 columns):
tempo               900 non-null float64
rythm               900 non-null float64
pitch               900 non-null float64
acousticness        900 non-null float64
danceability        900 non-null float64
energy              900 non-null float64
instrumentalness    900 non-null float64
loudness            900 non-null float64
liveness            900 non-null float64
valence             900 non-null float64
class               900 non-null object
dtypes: float64(10), object(1)
memory usage: 84.4+ KB
