# Linguistic Description of Songs

Exploring how we can deduce song genres or narrow down artists from a user-inputted linguistic description.

Columns afforded by the Kaggle dataset:
 - acousticness
 - artists
 - danceability
 - duration_ms
 - energy
 - explicit
 - id
 - instrumentalness
 - key
 - liveness
 - loudness
 - mode
 - name
 - popularity
 - release_date
 - speechiness
 - tempo
 - valence
 - year

In [1]:
import nltk
import json
import numpy as np
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

import pandas as pd
import sqlite3

## Mard Reviews Dataset

In [2]:
def read_mard_json(input_filename):
    loaded_data_ = []
    with open(input_filename, 'r') as file_:
        loaded_string_ = file_.read()
        loaded_data_ = [json.loads(s) for s in loaded_string_.split('\n') if s is not None and len(s) > 0]
        file_.close()
    return loaded_data_

In [3]:
# Loading Metadata
mard_metadata = read_mard_json('raw/mard_reviews/mard_metadata.json')
    
# Loading Reviews
mard_reviews = read_mard_json('raw/mard_reviews/mard_reviews.json')

In [4]:
all_md_keys = set([])
all_rev_keys = set([])

for row in mard_metadata:
    all_md_keys.update(list(row.keys()))
for row in mard_reviews:
    all_rev_keys.update(list(row.keys()))

all_md_keys = list(all_md_keys)
all_rev_keys = list(all_rev_keys)

print('METADATA KEYS')
print('\n'.join(all_md_keys))
print()
print()
print('REVIEWS KEYS')
print('\n'.join(all_rev_keys))

METADATA KEYS
first-release-year
salesRank
title
amazon-id
artist
artist_url
songs
root-genre
imUrl
categories
release-mbid
artist-mbid
related
brand
label
release-group-mbid
confidence
price


REVIEWS KEYS
helpful
amazon-id
reviewTime
overall
reviewerID
unixReviewTime
reviewerName
summary
reviewText


In [5]:
for row in mard_reviews[:10]:
    print(row['reviewText'])
    print()

Buy this album. Now.  Don't worry about the reviews.  If you love pure, honest music buy this album....you will not be let down.

The Sudden Passion did a great job with this one. The lyrics are witty, with just the right amount of twang.  This is a whiskey-soaked indie rock jam that captures the soul of modern americana!

I received this CD as a gift a few weeks ago from a friend. I was a bit skeptical at first but decided to try it anyway. This CD is great! The first two tracks talk you through how to use the CD and give you great techniques for breathing. I like track 3 because it's only 10 minutes which helps a lot with my busy schedule. It's already helped me sleep better and feel better during the day. Highly recommended!

I am a beginner and have tried a couple of meditation CDs on the market but have disliked them due to poor audio quality and all the Far East philosophy they try to get you to buy into. I simply wanted a quick way to get started meditating so that I could relie

In [6]:
reviews_list = []
vectorizer = TfidfVectorizer(analyzer='word', stop_words={'english'})

for i,row in enumerate(mard_reviews):
    reviews_list.append(row['reviewText'])

tfidf_mtx = vectorizer.fit_transform(reviews_list)

In [7]:
tfidf_mtx.shape

(263525, 311696)

## Pitchfork Reviews

In [14]:
def run_query_on_sqlite_db(input_query, input_filename):
    """
    
    Returns a Pandas DataFrame object containing the query results,
    given the user's query and the filename for the sqlite database.
    
    Input:
     - input_query: string representation of the SQL query to run on the sqlite db
     - input_filename: the file location of the sqlite database
     
    """
    conn_ = sqlite3.connect(input_filename)
    df_ = pd.read_sql_query(input_query, conn_)
    conn_.close()
    return df_

In [24]:
all_tables = ['artists', 'content', 'genres', 'labels', 'reviews', 'years']
pitchfork_db = {}

for table in all_tables:
    pitchfork_db[table] = run_query_on_sqlite_db("SELECT * FROM " + table, "./raw/pitchfork_reviews.sqlite")

In [25]:
print(pd.unique(pitchfork_db['genres']['genre']))

['electronic' 'metal' 'rock' None 'rap' 'experimental' 'pop/r&b'
 'folk/country' 'jazz' 'global']


In [29]:
print(sorted(pd.unique(pitchfork_db['artists']['artist'])))



In [30]:
pitchfork_db['reviews']['title'][:20]

0                                             mezzanine
1                                          prelapsarian
2                                  all of them naturals
3                                           first songs
4                                             new start
5         insecure (music from the hbo original series)
6                               stillness in wonderland
7                                              tehillim
8                                            reflection
9                          filthy america its beautiful
10                                clear sounds/perfetta
11                                     run the jewels 3
12                                                nadir
13                                        december 99th
14                                     don't smoke rock
15    punk45: les punks: the french connection (the ...
16                                      brnshj (puncak)
17                             merry christmas l

In [31]:
pitchfork_db['content']['content'][:20]

0     “Trip-hop” eventually became a ’90s punchline,...
1     Eight years, five albums, and two EPs in, the ...
2     Minneapolis’ Uranium Club seem to revel in bei...
3     Kleenex began with a crash. It transpired one ...
4     It is impossible to consider a given release b...
5     In the pilot episode of “Insecure,” the critic...
6     Rapper Simbi Ajikawo, who records as Little Si...
7     For the last thirty years, Israel’s electronic...
8     Ambient music is a funny thing. As innocuous a...
9     There were innumerable cameos at the Bad Boy F...
10    Lots of drone musicians have been called sound...
11    On 2006’s “That’s Life,” Killer Mike boasted “...
12    “Why so sad?/Don’t feel so bad/Get out of bed,...
13    In January 2016, rapper/actor Yasiin Bey annou...
14    Don’t take your eyes off Pete Rock. The early-...
15    Soul Jazz’s Punk 45 series has made it its mis...
16    It’s safe to say there is no other band on the...
17    When Chance the Rapper performed “Sunday C