**Notebook beinhaltet**:
preprocessing des Datensatzes:
+ entferne Satzzeichen, "chorus", nur Kleinbuchstaben
+ füge Spalte mit Songlänge/ Anzahl der Token hinzu
+ erstelle ein Sample aus 25% des Originaldatensatzes
+ Stemming der Songtexte

In [1]:
import pandas as pd
import numpy as np
from collections import Counter

import matplotlib.pyplot as plt
import seaborn as sns

### Preprocessing

load raw data

In [2]:
songs = pd.read_csv("../songdata.csv")
songs

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"Look at her face, it's a wonderful face \nAnd..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"Take it easy with me, please \nTouch me gentl..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,I'll never know why I had to go \nWhy I had t...
3,ABBA,Bang,/a/abba/bang_20598415.html,Making somebody happy is a question of give an...
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,Making somebody happy is a question of give an...
...,...,...,...,...
57645,Ziggy Marley,Good Old Days,/z/ziggy+marley/good+old+days_10198588.html,Irie days come on play \nLet the angels fly l...
57646,Ziggy Marley,Hand To Mouth,/z/ziggy+marley/hand+to+mouth_20531167.html,Power to the workers \nMore power \nPower to...
57647,Zwan,Come With Me,/z/zwan/come+with+me_20148981.html,all you need \nis something i'll believe \nf...
57648,Zwan,Desire,/z/zwan/desire_20148986.html,northern star \nam i frightened \nwhere can ...


In [3]:
for i, row in songs.iterrows():
    row.text = row.text.replace(',',' ').replace('\n', '').replace("[Chorus]", "").lower().split()
songs

Unnamed: 0,artist,song,link,text
0,ABBA,Ahe's My Kind Of Girl,/a/abba/ahes+my+kind+of+girl_20598417.html,"[look, at, her, face, it's, a, wonderful, face..."
1,ABBA,"Andante, Andante",/a/abba/andante+andante_20002708.html,"[take, it, easy, with, me, please, touch, me, ..."
2,ABBA,As Good As New,/a/abba/as+good+as+new_20003033.html,"[i'll, never, know, why, i, had, to, go, why, ..."
3,ABBA,Bang,/a/abba/bang_20598415.html,"[making, somebody, happy, is, a, question, of,..."
4,ABBA,Bang-A-Boomerang,/a/abba/bang+a+boomerang_20002668.html,"[making, somebody, happy, is, a, question, of,..."
...,...,...,...,...
57645,Ziggy Marley,Good Old Days,/z/ziggy+marley/good+old+days_10198588.html,"[irie, days, come, on, play, let, the, angels,..."
57646,Ziggy Marley,Hand To Mouth,/z/ziggy+marley/hand+to+mouth_20531167.html,"[power, to, the, workers, more, power, power, ..."
57647,Zwan,Come With Me,/z/zwan/come+with+me_20148981.html,"[all, you, need, is, something, i'll, believe,..."
57648,Zwan,Desire,/z/zwan/desire_20148986.html,"[northern, star, am, i, frightened, where, can..."


remove artists with less than 60 songs

In [4]:
c = Counter(songs.artist)

In [5]:
for v, k in c.items():
    if k<60:
        songs = songs[songs.artist!=v]

In [6]:
songs['number_of_tokens'] = songs.text.str.len() 

In [7]:
songs.drop('link', axis=1, inplace=True)
songs

Unnamed: 0,artist,song,text,number_of_tokens
0,ABBA,Ahe's My Kind Of Girl,"[look, at, her, face, it's, a, wonderful, face...",153
1,ABBA,"Andante, Andante","[take, it, easy, with, me, please, touch, me, ...",260
2,ABBA,As Good As New,"[i'll, never, know, why, i, had, to, go, why, ...",312
3,ABBA,Bang,"[making, somebody, happy, is, a, question, of,...",200
4,ABBA,Bang-A-Boomerang,"[making, somebody, happy, is, a, question, of,...",198
...,...,...,...,...
57642,Ziggy Marley,Friend,"[i, wanna, thank, you, for, the, things, you'v...",132
57643,Ziggy Marley,G7,"[seven, richest, countries, in, the, world, th...",283
57644,Ziggy Marley,Generation,"[many, generation, have, passed, away, fightin...",251
57645,Ziggy Marley,Good Old Days,"[irie, days, come, on, play, let, the, angels,...",175


In [None]:
songs.to_csv("songtexte_bereinigt_gekuerzt")

#### sample data

In [15]:
songs_sample = songs.copy()
songs_sample = songs_sample.sample(frac=.25, replace=False, random_state=42)
songs_sample

Unnamed: 0,artist,song,text,number_of_tokens
20468,Van Halen,Take Your Whiskey Home,"[well, my, baby, she, don't, want, me, around....",224
41605,Lil Wayne,Army Gunz,"[yeah, yeah, yeah, (i, got, army, gunz), yeah,...",578
52292,Stevie Wonder,Isn't She Lovely,"[isn't, she, lovely, isn't, she, wonderful, is...",108
42697,Mariah Carey,O Holy Night,"[o, holy, night, the, stars, are, brightly, sh...",78
27993,Clash,I'm So Bored With The U.S.A.,"[yankee, soldier, he, want, to, shoot, some, s...",113
...,...,...,...,...
25325,Billie Holiday,My Last Affair,"[can't, you, see, what, love, and, romance, ha...",97
28783,Dave Matthews Band,The Space Between,"[you, cannot, quit, me, so, quickly, is, no, h...",310
18645,Squeeze,Crying In My Sleep,"[breaking, up, is, breaking, my, heart, is, sh...",178
32684,Foo Fighters,Dear Lover,"[dear, lover, do, you, remember?, the, sound, ...",134


In [16]:
songs_sample.to_csv("sample_25percent.csv")

### Stemming

In [4]:
songs = pd.read_csv("../songs_25.csv")
songs.head(2)

Unnamed: 0.1,Unnamed: 0,artists,song,text,number_of_tokens,Genre1,Genre2,check,POS
0,0,ABBA,Me And I,sometimes when i'm mad there's a part of me t...,1551,Pop,Rock,True,mad little sad mean other rainy gloomy funny d...
1,1,ABBA,My Mama Said,tried to sneak out without saying with my lou...,1029,Pop,Rock,True,record la la la la red la la la la dead la la ...


In [27]:
import spacy
import en_core_web_sm
from nltk.stem import PorterStemmer
from nltk.stem import SnowballStemmer
from nltk.tokenize import sent_tokenize, word_tokenize

sentence = songs.text[0]

ps = PorterStemmer()

for i, row in songs.iterrows():
    words = word_tokenize(row.text)
    sent = ""
    for word in words:
        sent = sent + " " + ps.stem(word)
    songs.at[i,"stemmed_text"] = sent
songs

Unnamed: 0.1,Unnamed: 0,artists,song,text,number_of_tokens,Genre1,Genre2,check,POS,stemmed_text
0,0,ABBA,Me And I,sometimes when i'm mad there's a part of me t...,1551,Pop,Rock,True,mad little sad mean other rainy gloomy funny d...,sometim when i 'm mad there 's a part of me t...
1,1,ABBA,My Mama Said,tried to sneak out without saying with my lou...,1029,Pop,Rock,True,record la la la la red la la la la dead la la ...,tri to sneak out without say with my loudest ...
2,2,ABBA,Hole In Your Soul,you feel bad let me tell you we all get the ...,1831,Pop,Rock,True,bad bad bad sad loose cool bright romantic bad...,you feel bad let me tell you we all get the b...
3,3,ABBA,Cassandra,down in the street they're all singing and sho...,1964,Pop,Rock,True,alive dead hollow smart sorry last sorry final...,down in the street they 're all sing and shou...
4,4,ABBA,Just A Notion,just a notion that's all just a feeling that ...,514,Pop,Rock,True,mistaking right wrong special notion feeling m...,just a notion that 's all just a feel that yo...
...,...,...,...,...,...,...,...,...,...,...
6795,6795,ZZ Top,36-22-36,what what what you want? hey my thing is a ...,646,Rock,Pop,True,real fine real fine round thing thing thing th...,what what what you want ? hey my thing is a r...
6796,6796,ZZ Top,Deal Goin' Down,when the deal goin' down and you gonna take yo...,746,Rock,Pop,True,deal chance mystery trance future deal deal no...,when the deal goin ' down and you gon na take...
6797,6797,ZZ Top,Can't Stop Rockin',i heard about the rock for sometime i know. ...,995,Rock,Pop,True,easy wrong right low loose high rock roll rock...,i heard about the rock for sometim i know . i...
6798,6798,ZZ Top,I Got The Message,i'm picking up on a signal that's in the air ...,495,Rock,Pop,True,straight electric about empty straight straigh...,i 'm pick up on a signal that 's in the air t...


In [29]:
songs.drop("Unnamed: 0", axis=1, inplace=True)

Unnamed: 0,artists,song,text,number_of_tokens,Genre1,Genre2,check,POS,stemmed_text
0,ABBA,Me And I,sometimes when i'm mad there's a part of me t...,1551,Pop,Rock,True,mad little sad mean other rainy gloomy funny d...,sometim when i 'm mad there 's a part of me t...
1,ABBA,My Mama Said,tried to sneak out without saying with my lou...,1029,Pop,Rock,True,record la la la la red la la la la dead la la ...,tri to sneak out without say with my loudest ...
2,ABBA,Hole In Your Soul,you feel bad let me tell you we all get the ...,1831,Pop,Rock,True,bad bad bad sad loose cool bright romantic bad...,you feel bad let me tell you we all get the b...
3,ABBA,Cassandra,down in the street they're all singing and sho...,1964,Pop,Rock,True,alive dead hollow smart sorry last sorry final...,down in the street they 're all sing and shou...
4,ABBA,Just A Notion,just a notion that's all just a feeling that ...,514,Pop,Rock,True,mistaking right wrong special notion feeling m...,just a notion that 's all just a feel that yo...
...,...,...,...,...,...,...,...,...,...
6795,ZZ Top,36-22-36,what what what you want? hey my thing is a ...,646,Rock,Pop,True,real fine real fine round thing thing thing th...,what what what you want ? hey my thing is a r...
6796,ZZ Top,Deal Goin' Down,when the deal goin' down and you gonna take yo...,746,Rock,Pop,True,deal chance mystery trance future deal deal no...,when the deal goin ' down and you gon na take...
6797,ZZ Top,Can't Stop Rockin',i heard about the rock for sometime i know. ...,995,Rock,Pop,True,easy wrong right low loose high rock roll rock...,i heard about the rock for sometim i know . i...
6798,ZZ Top,I Got The Message,i'm picking up on a signal that's in the air ...,495,Rock,Pop,True,straight electric about empty straight straigh...,i 'm pick up on a signal that 's in the air t...


In [30]:
songs.to_csv("../songs_25.csv")