In [56]:
import pandas as pd

In [57]:
# death_metal.csv
url = 'https://drive.google.com/file/d/11HsCgxJL_PtJ8xxdT5VZbw6e0y0-VKag/view?usp=sharing' 
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
bands = pd.read_csv(path)

In [58]:
bands.sample(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active
13,**Nemesis?>,colombia,su,1986.0,Death/Thrash Metal,Destiny| Death,1986-?
10,**Total Carnage?>,switzerland,su,2010.0,Death Metal/Crust,Post-Apocalypse| Nuclear Warfare| WW3,2010-2011
18,**Desecrate?>,malaysia,ac,2014.0,Death/Thrash/Progressive Metal,Anti Nuclear Power,2014-present
3,**Misanthrope?>,mexico,oh,2010.0,Death Metal,Death| Destruction| War| Decadence,2010-present
23,**Stigmatizer?>,germany,su,2004.0,Death Metal,Life| Death| Society,2004-2007


In [59]:
bands.formed_in.describe()

count      26.000000
mean     2001.269231
std         8.042675
min      1986.000000
25%      1997.000000
50%      2001.500000
75%      2006.750000
max      2014.000000
Name: formed_in, dtype: float64

In [60]:
bands.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   name       26 non-null     object 
 1   country    26 non-null     object 
 2   status     26 non-null     object 
 3   formed_in  26 non-null     float64
 4   genre      26 non-null     object 
 5   theme      26 non-null     object 
 6   active     26 non-null     object 
dtypes: float64(1), object(6)
memory usage: 1.5+ KB


### **Exercise 1:** 
Cleaning the 'name' column

The names of the bands are messy. They have some extra characters like '\*\*' at the begining and '?>' at the end. Remove them using any or all of these methods:  
- [pandas.Series.str.replace](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html)
- [pandas.Series.str.lstrip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.lstrip.html)
- [pandas.Series.str.rstrip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.rstrip.html)
- [pandas.Series.str.strip](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.strip.html)

In [61]:
bands.name.replace(r'[^0-9a-zA-Z]+',' ', regex=True)
bands['name'] = bands.name.replace(r'[^0-9a-zA-Z]+',' ', regex=True)
bands.head(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active
0,Act of Destruction,united states,ac,2005.0,Melodic Death/Thrash Metal,Death| Love| Life| Evil| Darker Tones,2005-present
1,Nirvana 2002,sweden,su,1988.0,Death Metal,Metaphysical Philosophy| Parapsychology,1988-1992
2,Olemus,austria,ac,1993.0,Death/Black/Gothic Metal,Sadness| Life| Death,1993-present
3,Misanthrope,mexico,oh,2010.0,Death Metal,Death| Destruction| War| Decadence,2010-present
4,Detonator,russia,su,1991.0,Technical Death/Thrash Metal,Loneliness philosophy| state of mind of the pe...,1991-2002


### **Exercise 2:** 
Cleaning the country column

The country column has all countries written with small capital letter. Change them so they have all capital letters.  
Best method for this is: [pandas.Series.str.title](https://pandas.pydata.org/docs/reference/api/pandas.Series.str.title.html)

In [62]:
bands['country'].head(5)

0    united states
1           sweden
2          austria
3           mexico
4           russia
Name: country, dtype: object

In [63]:
bands['country'] = bands.country.str.title()
bands.head(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active
0,Act of Destruction,United States,ac,2005.0,Melodic Death/Thrash Metal,Death| Love| Life| Evil| Darker Tones,2005-present
1,Nirvana 2002,Sweden,su,1988.0,Death Metal,Metaphysical Philosophy| Parapsychology,1988-1992
2,Olemus,Austria,ac,1993.0,Death/Black/Gothic Metal,Sadness| Life| Death,1993-present
3,Misanthrope,Mexico,oh,2010.0,Death Metal,Death| Destruction| War| Decadence,2010-present
4,Detonator,Russia,su,1991.0,Technical Death/Thrash Metal,Loneliness philosophy| state of mind of the pe...,1991-2002


### **Exercise 3:**
Cleaning the status column

The status column has some abbreviations instead of the real status.  
Change them in accordance with this:  
* ac = Active  
* su = Split-up  
* cn = Changed name  
* oh = On hold  
* un = Unknown

You can use  
pandas.Series.str.replace -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html  
or you can create a dictionary and remap values according to it with  
pandas.Series.replace -> https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html

In [64]:
bands['status'].head(5)

0    ac
1    su
2    ac
3    oh
4    su
Name: status, dtype: object

In [65]:
bands['status'] = bands.status.replace({'ac':'Active', 'su':'Split-up', 'cn':'Changed name', 'oh':'On hold', 'un':'Unknown'})
bands.sample(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active
17,Midian,Korea| South,Active,2009.0,Melodic Death Metal,Despair| Salvation| Duality,2009-present
21,Cast the Stone,United States,Unknown,2002.0,Death Metal,Death| Corruption,2002-?
12,Feu Gregeois,France,Changed name,2005.0,Black/Death Metal,Medieval themes,2005-?
25,Riverden,United States,On hold,2004.0,Thrash/Death Metal,Life| Experiences| Nature| History,2004-2012
16,Dark,Germany,Split-up,1991.0,Death/Thrash Metal (early)| Dark/Gothic Metal ...,Darkness| Depression| Nature| Love,1991-?


### **Exercise 4:**
Cleaning the genre column

The column genre has genres in a single string separated by character /  
1. First, transform the string to list of strings  
(e.g. 'Avant-garde Black/Death Metal'    to     \[Avant-garde Black, Death Metal\]  
1. Then, create a new column 'number_of_genres' where you will store the number of genres in each list.  

Methods you can use are:  
pandas.Series.str.split -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.split.html  
pandas.Series.str.len -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.len.html

In [66]:
bands['genre'].head(5)

0      Melodic Death/Thrash Metal
1                     Death Metal
2        Death/Black/Gothic Metal
3                     Death Metal
4    Technical Death/Thrash Metal
Name: genre, dtype: object

In [67]:
bands['genre'] = bands['genre'].str.split('/')
bands['genre'].head(5)
bands['number_of_genres'] = bands['genre'].str.len()
bands[['genre','number_of_genres']].head(5)

Unnamed: 0,genre,number_of_genres
0,"[Melodic Death, Thrash Metal]",2
1,[Death Metal],1
2,"[Death, Black, Gothic Metal]",3
3,[Death Metal],1
4,"[Technical Death, Thrash Metal]",2


### **Exercise 5:** 
Cleaning the active column

The column `active` contains information about the years when a band was active or '?' if the status or year is unknown.  

Create two new columns 
- `active_from`: when the band formed
- `active_to`: when the band broke up

and fill them with the information contained in the column `active`.  

Method you can use is: pandas.Series.str.extract -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html#

In [68]:
bands['active'].sample(5)

1        1988-1992
18    2014-present
13          1986-?
0     2005-present
8        2001-2006
Name: active, dtype: object

In [69]:
bands[['active_from','active_to']] = bands.active.str.split("-", expand=True)
bands.sample(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active,number_of_genres,active_from,active_to
9,Coldworker,Sweden,Split-up,2006.0,[Death Metal],Death,2006-2013,1,2006,2013
19,Mortum,Sweden,Split-up,1997.0,"[Melodic Death, Black Metal]",Love| Death| Fantasy,1997-1998,2,1997,1998
1,Nirvana 2002,Sweden,Split-up,1988.0,[Death Metal],Metaphysical Philosophy| Parapsychology,1988-1992,1,1988,1992
0,Act of Destruction,United States,Active,2005.0,"[Melodic Death, Thrash Metal]",Death| Love| Life| Evil| Darker Tones,2005-present,2,2005,present
10,Total Carnage,Switzerland,Split-up,2010.0,"[Death Metal, Crust]",Post-Apocalypse| Nuclear Warfare| WW3,2010-2011,2,2010,2011


### **Exercise 6** 
Counting the themes

Count how many times do the words Love, Life, Death repeat in a themes column.  
Method you can use is:  
pandas.Series.str.count -> https://pandas.pydata.org/docs/reference/api/pandas.Series.str.count.html  

In [70]:
bands[['name','theme']].head(5)

Unnamed: 0,name,theme
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones
1,Nirvana 2002,Metaphysical Philosophy| Parapsychology
2,Olemus,Sadness| Life| Death
3,Misanthrope,Death| Destruction| War| Decadence
4,Detonator,Loneliness philosophy| state of mind of the pe...


In [71]:
bands.loc[bands.theme.str.count('Lovel|Lifel|Deathl')][['name','theme']].head(5)

Unnamed: 0,name,theme
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones
0,Act of Destruction,Death| Love| Life| Evil| Darker Tones


In [72]:
print(f"Love: {bands.theme.str.findall('Love').str.len().sum()} \nDeath: {bands.theme.str.findall('Death').str.len().sum()} \nLife: {bands.theme.str.findall('Life').str.len().sum()}")

Love: 3 
Death: 12 
Life: 4


In [74]:
#Alternatively 
bands.theme.str.findall('Love').str.len().sum()
bands.theme.str.findall('Death').str.len().sum()
bands.theme.str.findall('Life').str.len().sum()

4

## Final result

In [76]:
bands.sample(5)

Unnamed: 0,name,country,status,formed_in,genre,theme,active,number_of_genres,active_from,active_to
5,Blood Agent,Germany,Active,2013.0,"[Death, Thrash Metal]",War| Death| Apocalypse,2013-present,2,2013,present
19,Mortum,Sweden,Split-up,1997.0,"[Melodic Death, Black Metal]",Love| Death| Fantasy,1997-1998,2,1997,1998
16,Dark,Germany,Split-up,1991.0,"[Death, Thrash Metal (early)| Dark, Gothic Met...",Darkness| Depression| Nature| Love,1991-?,3,1991,?
10,Total Carnage,Switzerland,Split-up,2010.0,"[Death Metal, Crust]",Post-Apocalypse| Nuclear Warfare| WW3,2010-2011,2,2010,2011
24,Ravenage,United Kingdom,Active,2007.0,"[Folk, Melodic Death Metal]",Battles| Mythology,2007-present,2,2007,present
