#### Final Project (Artificial Intelligence)

In [1]:
### Importing Libraries

import pandas as pd
from matplotlib import pyplot as plt 

In [2]:
#### Reading files
dfcsv = pd.read_csv('./data/fma-rock-vs-hiphop.csv')
dfjson = pd.read_json('./data/echonest-metrics.json')

In [3]:
#### Do a merge between data frames
df = dfcsv.merge(dfjson, on='track_id', how='inner')

In [4]:
#### Show original dimensions
print(dfcsv.shape)
print(dfjson.shape)

(17734, 21)
(13129, 9)


In [5]:
#### Merged Dataframe dimensions
df.shape

(4802, 29)

In [6]:
#### Dataframe Columns
df.columns

Index(['track_id', 'bit_rate', 'comments', 'composer', 'date_created',
       'date_recorded', 'duration', 'favorites', 'genre_top', 'genres',
       'genres_all', 'information', 'interest', 'language_code', 'license',
       'listens', 'lyricist', 'number', 'publisher', 'tags', 'title',
       'acousticness', 'danceability', 'energy', 'instrumentalness',
       'liveness', 'speechiness', 'tempo', 'valence'],
      dtype='object')

In [7]:
#### Dataframe info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4802 entries, 0 to 4801
Data columns (total 29 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   track_id          4802 non-null   int64  
 1   bit_rate          4802 non-null   int64  
 2   comments          4802 non-null   int64  
 3   composer          106 non-null    object 
 4   date_created      4802 non-null   object 
 5   date_recorded     1234 non-null   object 
 6   duration          4802 non-null   int64  
 7   favorites         4802 non-null   int64  
 8   genre_top         4802 non-null   object 
 9   genres            4802 non-null   object 
 10  genres_all        4802 non-null   object 
 11  information       334 non-null    object 
 12  interest          4802 non-null   int64  
 13  language_code     2599 non-null   object 
 14  license           4789 non-null   object 
 15  listens           4802 non-null   int64  
 16  lyricist          13 non-null     object 


In [8]:
#### Null values in each column
df.isnull().sum()

track_id               0
bit_rate               0
comments               0
composer            4696
date_created           0
date_recorded       3568
duration               0
favorites              0
genre_top              0
genres                 0
genres_all             0
information         4468
interest               0
language_code       2203
license               13
listens                0
lyricist            4789
number                 0
publisher           4775
tags                   0
title                  0
acousticness           0
danceability           0
energy                 0
instrumentalness       0
liveness               0
speechiness            0
tempo                  0
valence                0
dtype: int64

##### **Track id:** Unique Identifier of the music track.
##### **Bit rate:** Amount of digital information to represent the sound in a musical track. The larger the amount, the better the sound is represented.
##### **Comments:** Number of comments each track has received.
##### **Composer:** Composer of the track.
##### **Date created:** Track creation date.
##### **Date recorded:** The date the track was recorded.
##### **Duration:** Track duration in seconds.
##### **Favorites:** The number of times the track was selected as a favorite.
##### **Genre top:** The main genre of the music track.
##### **Genres:** Genres associated with a track.
##### **Genres all:** All genres associated with a track.
##### **Information:** Track information.
##### **Interest:** Number of times the track was of interest.
##### **Language code:** Representation of the language in the code.
##### **License:** The license under which the music is distributed.
##### **Listens:** The number of times the track has been listened to.
##### **Lyricist:** The lyricist of the track
##### **Number:** ????
##### **Publisher:** The track publisher.
##### **Tags:** Tags or keywords associated with a track.
##### **Title:** Track title.
##### **Acousticness:** A measure of track acoustics.
##### **Danceability:** A measure of track danceability.
##### **Energy:** A measure of track energy.
##### **Instrumentalness:** A measure of how much the track is instrumental.
##### **Liveness:** A measure of the audience's presence at the recording.
##### **Speechiness:** A measure of the presence of spoken words in the musical track.
##### **Tempo:** The tempo or speed of the track.
##### **Valence:** A measure of the positivity of the musical track.


In [9]:
#### Print the values of each column
for c in df.columns:
    print(c + " column")
    print()
    print(df[c])
    print()


track_id column

0          153
1          154
2          155
3          169
4          170
         ...  
4797    124718
4798    124719
4799    124720
4800    124721
4801    124722
Name: track_id, Length: 4802, dtype: int64

bit_rate column

0       256000
1       256000
2       192000
3       192000
4       192000
         ...  
4797    224206
4798    217951
4799    199442
4800    235940
4801    192418
Name: bit_rate, Length: 4802, dtype: int64

comments column

0       0
1       0
2       0
3       0
4       0
       ..
4797    0
4798    0
4799    0
4800    0
4801    0
Name: comments, Length: 4802, dtype: int64

composer column

0       Arc and Sender
1       Arc and Sender
2                  NaN
3        James Squeaky
4                  NaN
             ...      
4797               NaN
4798               NaN
4799               NaN
4800               NaN
4801               NaN
Name: composer, Length: 4802, dtype: object

date_created column

0       2008-11-26 01:45:00
1       2008-

##### Evaluate the 'genre_top' values

In [26]:
df['genre_top'].value_counts()

genre_top
Rock       3892
Hip-Hop     910
Name: count, dtype: int64

##### Separate by genre_top

In [13]:
rock = df[df['genre_top'] == 'Rock']
hiphop = df[df['genre_top'] == 'Hip-Hop']

In [36]:
info = rock.merge(hiphop, on='information', how='inner').copy()

In [38]:
info['information'].value_counts().empty

True