## Learn about the data in a csv file before importing into MySQL

We'll review the data in our csv file so we know which tables to create in our MySQL database.


In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
spotify_df = pd.read_csv('spotify-data/genres.csv', low_memory=False)

Now that we've loaded our data, we'll see what types of data we have in our dataset

In [21]:
spotify_df.describe()

spotify_df.dtypes




Unnamed: 0.1,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature,Unnamed: 0
count,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,42305.0,20780.0
mean,0.639364,0.762516,5.37024,-6.465442,0.549462,0.136561,0.09616,0.283048,0.214079,0.357101,147.474056,250865.846685,3.97258,10483.970645
std,0.156617,0.183823,3.666145,2.941165,0.497553,0.126168,0.170827,0.370791,0.175576,0.2332,23.844623,102957.713571,0.268342,6052.359519
min,0.0651,0.000243,0.0,-33.357,0.0,0.0227,1e-06,0.0,0.0107,0.0187,57.967,25600.0,1.0,0.0
25%,0.524,0.632,1.0,-8.161,0.0,0.0491,0.00173,0.0,0.0996,0.161,129.931,179840.0,4.0,5255.75
50%,0.646,0.803,6.0,-6.234,1.0,0.0755,0.0164,0.00594,0.135,0.322,144.973,224760.0,4.0,10479.5
75%,0.766,0.923,9.0,-4.513,1.0,0.193,0.107,0.722,0.294,0.522,161.464,301133.0,4.0,15709.25
max,0.988,1.0,11.0,3.148,1.0,0.946,0.988,0.989,0.988,0.988,220.29,913052.0,5.0,20999.0


In [43]:
spotify_df.dtypes

danceability        float64
energy              float64
key                   int64
loudness            float64
mode                  int64
speechiness         float64
acousticness        float64
instrumentalness    float64
liveness            float64
valence             float64
tempo               float64
type                 object
id                   object
uri                  object
track_href           object
analysis_url         object
duration_ms           int64
time_signature        int64
genre                object
song_name            object
Unnamed: 0          float64
title                object
dtype: object

In [20]:
spotify_df.title

0                                NaN
1                                NaN
2                                NaN
3                                NaN
4                                NaN
                    ...             
42300             Euphoric Hardstyle
42301    Greatest Hardstyle Playlist
42302         Best of Hardstyle 2020
42303             Euphoric Hardstyle
42304         Best of Hardstyle 2020
Name: title, Length: 42305, dtype: object

In [18]:
spotify_df.genre

0        Dark Trap
1        Dark Trap
2        Dark Trap
3        Dark Trap
4        Dark Trap
           ...    
42300    hardstyle
42301    hardstyle
42302    hardstyle
42303    hardstyle
42304    hardstyle
Name: genre, Length: 42305, dtype: object

Since we're not sure what will happen if we try to import "Object" types into MySQL, we'll create tables that store those values as strings. 

CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255)
);

## Getting data ready to create our "genres" table

We're first going to create a dictionary for this and check to make sure when we create this content we create the right data type

In [111]:
sql_table_vars= dict(spotify_df.dtypes)

final_keys = list(sql_table_vars.keys())

draft_values = []


for value in sql_table_vars.values():
    draft_values.append(str(value))

## change the value of objet to str since sql does not have this data type (or i don't know how to insert into it)

final_values = ['str' if item == 'object' else item for item in draft_values]

final_dict= dict(zip(final_keys, final_values))

for k in final_dict:
    
    print (k, final_dict[k])



danceability float64
energy float64
key int64
loudness float64
mode int64
speechiness float64
acousticness float64
instrumentalness float64
liveness float64
valence float64
tempo float64
type str
id str
uri str
track_href str
analysis_url str
duration_ms int64
time_signature int64
genre str
song_name str
Unnamed: 0 float64
title str
