# Linguistic Description of Songs

Exploring how we can deduce song genres or narrow down artists from a user-inputted linguistic description.

Columns afforded by the Kaggle dataset:
 - acousticness
 - artists
 - danceability
 - duration_ms
 - energy
 - explicit
 - id
 - instrumentalness
 - key
 - liveness
 - loudness
 - mode
 - name
 - popularity
 - release_date
 - speechiness
 - tempo
 - valence
 - year

In [51]:
import nltk
import json
import numpy as np
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

In [35]:
def read_mard_json(input_filename):
    loaded_data_ = []
    with open(input_filename, 'r') as file_:
        loaded_string_ = file_.read()
        loaded_data_ = [json.loads(s) for s in loaded_string_.split('\n') if s is not None and len(s) > 0]
        file_.close()
    return loaded_data_

In [37]:
# Loading Metadata
mard_metadata = read_mard_json('raw/mard_reviews/mard_metadata.json')
    
# Loading Reviews
mard_reviews = read_mard_json('raw/mard_reviews/mard_reviews.json')

In [47]:
all_md_keys = set([])
all_rev_keys = set([])

for row in mard_metadata:
    all_md_keys.update(list(row.keys()))
for row in mard_reviews:
    all_rev_keys.update(list(row.keys()))

all_md_keys = list(all_md_keys)
all_rev_keys = list(all_rev_keys)

print('METADATA KEYS')
print('\n'.join(all_md_keys))
print()
print()
print('REVIEWS KEYS')
print('\n'.join(all_rev_keys))

METADATA KEYS
artist
label
brand
imUrl
root-genre
artist-mbid
salesRank
artist_url
related
release-mbid
title
amazon-id
first-release-year
songs
release-group-mbid
confidence
categories
price


REVIEWS KEYS
helpful
reviewerID
reviewText
reviewTime
summary
overall
unixReviewTime
amazon-id
reviewerName


In [49]:
for row in mard_reviews[:10]:
    print(row['reviewText'])
    print()

Buy this album. Now.  Don't worry about the reviews.  If you love pure, honest music buy this album....you will not be let down.

The Sudden Passion did a great job with this one. The lyrics are witty, with just the right amount of twang.  This is a whiskey-soaked indie rock jam that captures the soul of modern americana!

I received this CD as a gift a few weeks ago from a friend. I was a bit skeptical at first but decided to try it anyway. This CD is great! The first two tracks talk you through how to use the CD and give you great techniques for breathing. I like track 3 because it's only 10 minutes which helps a lot with my busy schedule. It's already helped me sleep better and feel better during the day. Highly recommended!

I am a beginner and have tried a couple of meditation CDs on the market but have disliked them due to poor audio quality and all the Far East philosophy they try to get you to buy into. I simply wanted a quick way to get started meditating so that I could relie

In [56]:
reviews_list = []
vectorizer = TfidfVectorizer(analyzer='word', stop_words={'english'})

for i,row in enumerate(mard_reviews):
    reviews_list.append(row['reviewText'])

tfidf_mtx = vectorizer.fit_transform(reviews_list)

In [57]:
tfidf_mtx.shape

(263525, 311696)