<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

## Lab: Cleaning Rock Song Data

_Authors: Dave Yerrington (SF)_

---


In [1]:
import pandas as pd
import numpy as np 
import seaborn as sns

%matplotlib inline

### 1. Load `rock.csv` and do an initial examination of its data columns.

In [2]:
rockfile = "../datasets/rock.csv"

In [3]:
# Load the data.

df = pd.read_csv(rockfile)

In [4]:
# Look at the information regarding its columns.
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2230 entries, 0 to 2229
Data columns (total 8 columns):
Song Clean      2230 non-null object
ARTIST CLEAN    2230 non-null object
Release Year    1653 non-null object
COMBINED        2230 non-null object
First?          2230 non-null int64
Year?           2230 non-null int64
PlayCount       2230 non-null int64
F*G             2230 non-null int64
dtypes: int64(4), object(4)
memory usage: 139.5+ KB


In [5]:
df.dtypes

Song Clean      object
ARTIST CLEAN    object
Release Year    object
COMBINED        object
First?           int64
Year?            int64
PlayCount        int64
F*G              int64
dtype: object

In [6]:
df.sample(10)

Unnamed: 0,Song Clean,ARTIST CLEAN,Release Year,COMBINED,First?,Year?,PlayCount,F*G
2097,Mysterious Ways,U2,1991,Mysterious Ways by U2,1,1,38,38
1144,I Melt With You,Modern English,1982,I Melt With You by Modern English,1,1,15,15
392,Southern Cross,"Crosby, Stills & Nash",1982,"Southern Cross by Crosby, Stills & Nash",1,1,9,9
1196,Come As You Are,Nirvana,1991,Come As You Are by Nirvana,1,1,39,39
1478,Let's Spend the Night Together,Rolling Stones,1967,Let's Spend the Night Together by Rolling Stones,1,1,6,6
934,Small Town,John Mellencamp,1985,Small Town by John Mellencamp,1,1,55,55
1357,Fat Bottomed Girls,Queen,1978,Fat Bottomed Girls by Queen,1,1,70,70
1766,Everybody's Got Something To Hide Except Me An...,The Beatles,1968,Everybody's Got Something To Hide Except Me An...,1,1,1,1
1349,Blurry,Puddle of Mudd,2001,Blurry by Puddle of Mudd,1,1,1,1
1129,The Memory Remains,Metallica,1997,The Memory Remains by Metallica,1,1,3,3


### 2.  Clean up the column names.

Let's clean up the column names. There are two ways we can accomplish this:

#### 2.A Change the column names when you import the data using `pd.read_csv()`.

Notice that, when passing `names=[..A LIST OF STRING..]` with a number of columns that matches the number of strings in the passed list, you replace the column names.

NOTE: When you create custom column names, the first row of the `.csv` already represents a header. It is important to tell `pandas` to skip that row. The `skiprows=1` keyword argument to `read_csv()` will tell `pandas` to skip the first row.

In [7]:
df.columns

Index(['Song Clean', 'ARTIST CLEAN', 'Release Year', 'COMBINED', 'First?',
       'Year?', 'PlayCount', 'F*G'],
      dtype='object')

In [8]:
import string
# the string library has default strings that contain all letters or numbers
uppercase = string.ascii_uppercase
lowercase = string.ascii_lowercase

df.columns = [''.join([c.lower() for c in column if c in uppercase+lowercase]) for column in df.columns]

In [9]:
df.columns

Index(['songclean', 'artistclean', 'releaseyear', 'combined', 'first', 'year',
       'playcount', 'fg'],
      dtype='object')

#### 2.B Change column names using the `.rename()` function.

The `.rename()` function takes an argument, `columns=name_dict`, in which `name_dict` is a dictionary containing the original column names as keys and the new column names as values.

In [10]:
# Change the column names using the `.rename()` function.
df = df.rename(columns = lambda column : ''.join([c.lower() for c in column if c in uppercase+lowercase]))

In [11]:
df.head(2)

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
0,Caught Up in You,.38 Special,1982.0,Caught Up in You by .38 Special,1,1,82,82
1,Fantasy Girl,.38 Special,,Fantasy Girl by .38 Special,1,0,3,0


#### 2.C Reassigning the `.columns` attribute of a DataFrame.

You can also just reassign the `.columns` attribute to a list of strings containing the new column names. 

The only caveat with reassigning `.columns` is that you have to reassign all of the column names at once. You can't partially replace a value by working on `.columns` directly. You have to reassign `.columns` with a list of equal length. 

In [12]:
# Replace the column names by reassigning the `.columns` attribute.
df.columns = [''.join([c.lower() for c in column if c in uppercase+lowercase]) for column in df.columns]

In [13]:
df.columns

Index(['songclean', 'artistclean', 'releaseyear', 'combined', 'first', 'year',
       'playcount', 'fg'],
      dtype='object')

### 3. Subsetting data where null values exist.

We have mixed `str` and `NaN` values in the `release` column. `NaN` stands for "not a number" and is the way `pandas` handles "nulls" or nonexistent data. We can use the `.isnull()` method of a Series to find null values.

Print the header of the data subset to where the `release` column is null values.

In [14]:
df.columns

Index(['songclean', 'artistclean', 'releaseyear', 'combined', 'first', 'year',
       'playcount', 'fg'],
      dtype='object')

In [15]:
df['releaseyear'].isnull()

0       False
1        True
2       False
3       False
4       False
5       False
6       False
7       False
8       False
9       False
10       True
11      False
12      False
13       True
14      False
15      False
16       True
17      False
18      False
19      False
20      False
21      False
22      False
23      False
24       True
25       True
26       True
27      False
28      False
29       True
        ...  
2200    False
2201    False
2202    False
2203     True
2204    False
2205     True
2206     True
2207    False
2208     True
2209    False
2210    False
2211    False
2212    False
2213    False
2214    False
2215     True
2216     True
2217    False
2218     True
2219    False
2220    False
2221     True
2222     True
2223    False
2224    False
2225     True
2226    False
2227    False
2228    False
2229    False
Name: releaseyear, Length: 2230, dtype: bool

In [16]:
# Show records where df['release'] is null
df[df['releaseyear'].isnull()].head()

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
1,Fantasy Girl,.38 Special,,Fantasy Girl by .38 Special,1,0,3,0
10,"Baby, Please Don't Go",AC/DC,,"Baby, Please Don't Go by AC/DC",1,0,1,0
13,CAN'T STOP ROCK'N'ROLL,AC/DC,,CAN'T STOP ROCK'N'ROLL by AC/DC,1,0,5,0
16,Girls Got Rhythm,AC/DC,,Girls Got Rhythm by AC/DC,1,0,24,0
24,Let's Get It Up,AC/DC,,Let's Get It Up by AC/DC,1,0,4,0


In [17]:
df[~df['releaseyear'].isnull()].head()

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
0,Caught Up in You,.38 Special,1982,Caught Up in You by .38 Special,1,1,82,82
2,Hold On Loosely,.38 Special,1981,Hold On Loosely by .38 Special,1,1,85,85
3,Rockin' Into the Night,.38 Special,1980,Rockin' Into the Night by .38 Special,1,1,18,18
4,Art For Arts Sake,10cc,1975,Art For Arts Sake by 10cc,1,1,1,1
5,Kryptonite,3 Doors Down,2000,Kryptonite by 3 Doors Down,1,1,13,13


### 4. Update slices of your DataFrame based on mask selection/slices.

In many scenarios, we want to upate values in our DataFrame according to criteria. Let's say we wanted to set all of the null values in `release` to 0.

With newer versions of `pandas`, in order to manipulate data in the original DataFrame, we have to use `.loc` while performing reassignment using a mask and an index.

For example, the following won't always work:
```python
df[row_mask]['column_name'] = new_value
```

The best way to accomplish the same task is:
```python
df.loc[row_mask, 'column_name'] = new_value
```

For multiple column assignment, you would use:
```python
df.loc[row_mask, ['col_1', 'col_2', 'col_3']] = new_value
```

#### 4.A Let's try it out. Make all of the null values in `release` 0.

In [18]:
# Replace release nulls with 0

In [19]:
df[df['releaseyear'].isnull()].index

Int64Index([   1,   10,   13,   16,   24,   25,   26,   29,   31,   35,
            ...
            2203, 2205, 2206, 2208, 2215, 2216, 2218, 2221, 2222, 2225],
           dtype='int64', length=577)

In [20]:
df.loc[df[df['releaseyear'].isnull()].index, 'releaseyear'] = 0

In [21]:
# cheat df['releaseyear'] = df['releaseyear'].fillna(0)

#### 4.B Verify that `release` contains no null values.

In [22]:
df[df['releaseyear'].isnull()]

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg


### 5. Ensure that the data types of the columns make sense. 

Verifying column data types is a critical part of data munging. If columns have the wrong data type, then there is usually corrupted or incorrect data in some of the observations.

#### 5.A Look at the data types for the columns. Are any incorrect given what the data represents?

In [23]:
# A:

df.dtypes

songclean      object
artistclean    object
releaseyear    object
combined       object
first           int64
year            int64
playcount       int64
fg              int64
dtype: object

In [24]:
df.sample(5)

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
1350,Amie,Pure Prairie League,0,Amie by Pure Prairie League,1,0,4,0
1099,Sweet Home Alabama,Lynyrd Skynyrd,1974,Sweet Home Alabama by Lynyrd Skynyrd,1,1,95,95
166,Only The Good Die Young,Billy Joel,1977,Only The Good Die Young by Billy Joel,1,1,21,21
1111,You Took The Words Right Out Of My Mouth,Meat Loaf,1977,You Took The Words Right Out Of My Mouth by Me...,1,1,9,9
759,Patience,Guns N' Roses,1988,Patience by Guns N' Roses,1,1,21,21


##### COMMENT: releaseyear as object?!?!

### 6. Investigate and clean up the `release` column.

The `release` column is a string data type when it should be an integer.

#### 6.A Figure out what value(s) are causing the `release` column to be encoded as a string instead of an integer.

In [25]:
df['releaseyear'].unique()

array(['1982', 0, '1981', '1980', '1975', '2000', '2002', '1992', '1985',
       '1993', '1976', '1995', '1979', '1984', '1977', '1990', '1986',
       '1974', '2014', '1987', '1973', '2001', '1989', '1997', '1971',
       '1972', '1994', '1970', '1966', '1965', '1983', '1955', '1978',
       '1969', '1999', '1968', '1988', '1962', '2007', '1967', '1958',
       '1071', '1996', '1991', '2005', '2011', '2004', '2012', '2003',
       '1998', '2008', '1964', '2013', '2006', 'SONGFACTS.COM', '1963',
       '1961'], dtype=object)

In [26]:
for c in df['releaseyear']:
    try:
        float(c)
    except:
        print(c)

SONGFACTS.COM


#### 6.B Look at the rows in which there is incorrect data in the `release` column.

In [27]:
df[df['releaseyear']=='SONGFACTS.COM']

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
1504,Bullfrog Blues,Rory Gallagher,SONGFACTS.COM,Bullfrog Blues by Rory Gallagher,1,1,1,1


#### 6.C. Clean up the data. Normally we may replace the offending data with null np.nan values, however we previously converted all of the nan values in the release column to zeros so we might as well continue with the same practice. Replacing with 0 (or nan) will allow us to convert the column to numeric.

In [28]:
df.loc[df[df['releaseyear']=='SONGFACTS.COM'].index, 'releaseyear'] = np.nan

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2230 entries, 0 to 2229
Data columns (total 8 columns):
songclean      2230 non-null object
artistclean    2230 non-null object
releaseyear    2229 non-null object
combined       2230 non-null object
first          2230 non-null int64
year           2230 non-null int64
playcount      2230 non-null int64
fg             2230 non-null int64
dtypes: int64(4), object(4)
memory usage: 139.5+ KB


In [30]:
df['releaseyear'] = df['releaseyear'].astype(float)

In [31]:
df.dtypes

songclean       object
artistclean     object
releaseyear    float64
combined        object
first            int64
year             int64
playcount        int64
fg               int64
dtype: object

### 7. Get summary statistics for the `release` column using the `.describe()` function.

Now that the `release` column is finally a numeric data type, we can apply the `.describe()` function.  

#### 7.A Print out the summary stats for the `release` column. What is the earliest and latest release date?

In [32]:
# A:
df['releaseyear'].describe()

count    2229.000000
mean     1465.988784
std       866.834789
min         0.000000
25%         0.000000
50%      1973.000000
75%      1981.000000
max      2014.000000
Name: releaseyear, dtype: float64

In [33]:
df[df['releaseyear']>0]['releaseyear'].describe()

count    1652.000000
mean     1978.019976
std        24.191247
min      1071.000000
25%      1971.000000
50%      1977.000000
75%      1984.000000
max      2014.000000
Name: releaseyear, dtype: float64

#### 7.B Based on the summary statistics, is there anything else wrong with the `release` column? 

_Looking at the DataFrame that contains the year 1071, we can see that the year was probably corrupted and should be replaced with something else if possible._

### 8. Make changes and investigate using custom functions with `.apply()`.

Let's say we want to traverse every single row in our data set and apply a function to that row.

#### 8.A Write a function that will take a row of a DataFrame and print out the song, artist, and whether or not the release date is < 1970.


In [34]:
# A:

def print_info(row):
    print(row['songclean'],row['artistclean'],row['releaseyear']<1970)

In [35]:
def print_info(row):
    if row['releaseyear']<1970:
        print(row['songclean'],row['artistclean'],'this is old')
    else:
        print(row['songclean'],row['artistclean'],'this is recent')

#### 8.B Using the `.apply()` function, apply the function you wrote to the first four rows of the DataFrame.

You will need to tell the `apply` function to operate row by row. Setting the keyword argument as `axis=1` indicates that the function should be applied to each row individually.

In [36]:
df.apply(lambda x:type(x), axis=0)

songclean      <class 'pandas.core.series.Series'>
artistclean    <class 'pandas.core.series.Series'>
releaseyear    <class 'pandas.core.series.Series'>
combined       <class 'pandas.core.series.Series'>
first          <class 'pandas.core.series.Series'>
year           <class 'pandas.core.series.Series'>
playcount      <class 'pandas.core.series.Series'>
fg             <class 'pandas.core.series.Series'>
dtype: object

In [37]:
# THIS IS BY COLUMNS --> ERROR!
df.apply(print_info, axis=0)

KeyError: ('releaseyear', 'occurred at index songclean')

In [46]:
for column_name in df:
    print(column_name)

songclean
artistclean
releaseyear
combined
first
year
playcount
fg


In [50]:
for i in df.index.values[:3]:
    print(type(df.loc[i]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


In [51]:
for i in df.index.values[:3]:
    print_info(df.loc[i])

Caught Up in You .38 Special this is recent
Fantasy Girl .38 Special this is old
Hold On Loosely .38 Special this is recent


In [52]:
df.apply(lambda x:(type(x),x['songclean']), axis=1)

0       (<class 'pandas.core.series.Series'>, Caught U...
1       (<class 'pandas.core.series.Series'>, Fantasy ...
2       (<class 'pandas.core.series.Series'>, Hold On ...
3       (<class 'pandas.core.series.Series'>, Rockin' ...
4       (<class 'pandas.core.series.Series'>, Art For ...
5       (<class 'pandas.core.series.Series'>, Kryptonite)
6            (<class 'pandas.core.series.Series'>, Loser)
7       (<class 'pandas.core.series.Series'>, When I'm...
8       (<class 'pandas.core.series.Series'>, What's Up?)
9       (<class 'pandas.core.series.Series'>, Take On Me)
10      (<class 'pandas.core.series.Series'>, Baby, Pl...
11      (<class 'pandas.core.series.Series'>, Back In ...
12         (<class 'pandas.core.series.Series'>, Big Gun)
13      (<class 'pandas.core.series.Series'>, CAN'T ST...
14      (<class 'pandas.core.series.Series'>, Dirty De...
15      (<class 'pandas.core.series.Series'>, For Thos...
16      (<class 'pandas.core.series.Series'>, Girls Go...
17      (<clas

In [53]:
# This is by rows
df.apply(print_info, axis=1)

Caught Up in You .38 Special this is recent
Fantasy Girl .38 Special this is old
Hold On Loosely .38 Special this is recent
Rockin' Into the Night .38 Special this is recent
Art For Arts Sake 10cc this is recent
Kryptonite 3 Doors Down this is recent
Loser 3 Doors Down this is recent
When I'm Gone 3 Doors Down this is recent
What's Up? 4 Non Blondes this is recent
Take On Me a-ha this is recent
Baby, Please Don't Go AC/DC this is old
Back In Black AC/DC this is recent
Big Gun AC/DC this is recent
CAN'T STOP ROCK'N'ROLL AC/DC this is old
Dirty Deeds Done Dirt Cheap AC/DC this is recent
For Those About To Rock AC/DC this is recent
Girls Got Rhythm AC/DC this is old
Hard As A Rock AC/DC this is recent
Have a Drink On Me AC/DC this is recent
Hells Bells AC/DC this is recent
Highway To Hell AC/DC this is recent
It's A Long Way To The Top AC/DC this is recent
Jailbreak AC/DC this is recent
Let There Be Rock AC/DC this is recent
Let's Get It Up AC/DC this is old
Live Wire AC/DC this is old
Mo

Holy Diver Dio this is old
Rainbow In the Dark Dio this is old
The Last In Line Dio this is old
Brothers In Arms Dire Straits this is recent
Down To The Waterline Dire Straits this is recent
Expresso Love Dire Straits this is recent
Industrial Disease Dire Straits this is recent
Lady Writer Dire Straits this is recent
Money for Nothing Dire Straits this is recent
Romeo And Juliet Dire Straits this is recent
So Far Away Dire Straits this is recent
Solid Rock Dire Straits this is old
Sultans of Swing Dire Straits this is recent
Walk of Life Dire Straits this is recent
Counting Blue Cars Dishwalla this is recent
Drift Away Dobie Gray this is recent
Alone Again Dokken this is old
In My Dreams Dokken this is old
Into The Fire Dokken this is old
Heavy Metal Don Felder this is recent
All She Wants to Do Is Dance Don Henley this is recent
Dirty Laundry Don Henley this is recent
Sunset Grill Don Henley this is recent
The Boys Of Summer Don Henley this is recent
The Heart of the Matter Don Henle

Crosstown Traffic Jimi Hendrix this is old
Dolly Dagger Jimi Hendrix this is recent
Fire Jimi Hendrix this is old
Foxey Lady Jimi Hendrix this is old
Hey Joe Jimi Hendrix this is old
If 6 Was 9 Jimi Hendrix this is old
Like a Rolling Stone Jimi Hendrix this is old
Little Wing Jimi Hendrix this is old
Manic Depression Jimi Hendrix this is old
Purple Haze Jimi Hendrix this is old
Red House Jimi Hendrix this is old
Stone Free Jimi Hendrix this is old
The Wind Cries Mary Jimi Hendrix this is old
Third Stone From The Sun Jimi Hendrix this is old
Voodoo Child (Slight Return) Jimi Hendrix this is old
Margaritaville Jimmy Buffett this is recent
Son of a Son of a Sailor Jimmy Buffett this is recent
Why Don't We Get Drunk Jimmy Buffett this is recent
Bad Reputation Joan Jett this is recent
Crimson And Clover Joan Jett this is old
I Hate Myself For Loving You Joan Jett this is recent
I Love Rock 'n Roll Joan Jett this is recent
Cry Me a River Joe Cocker this is old
Feelin' Alright Joe Cocker this

Big Time Peter Gabriel this is recent
Games Without Frontiers Peter Gabriel this is recent
In Your Eyes Peter Gabriel this is recent
Red Rain Peter Gabriel this is recent
SHOCK THE MONKEY Peter Gabriel this is recent
Sledgehammer Peter Gabriel this is recent
Solsbury Hill Peter Gabriel this is recent
Don't Lose My Number Phil Collins this is old
I Don't Care Anymore Phil Collins this is old
In the Air Tonight Phil Collins this is recent
Another Brick In The Wall Pink Floyd this is recent
Another Brick in the Wall, Pt. 2 Pink Floyd this is recent
Any Colour You Like Pink Floyd this is recent
Arnold Layne Pink Floyd this is old
Astronomy Domine Pink Floyd this is old
Brain Damage Pink Floyd this is recent
Brain Damage /Eclipse Pink Floyd this is recent
Breathe Pink Floyd this is recent
Careful With That Axe, Eugene Pink Floyd this is old
Comfortably Numb Pink Floyd this is recent
Dogs Pink Floyd this is recent
Empty Spaces Pink Floyd this is recent
Empty Spaces/young Lust Pink Floyd this

Let It Be The Beatles this is recent
LITTLE CHILD The Beatles this is old
Love Me Do The Beatles this is old
Lovely Rita The Beatles this is old
Lucy In The Sky With Diamonds The Beatles this is old
Magical Mystery Tour The Beatles this is old
Maxwell's Silver Hammer The Beatles this is old
Norwegian Wood The Beatles this is old
NOWHERE MAN The Beatles this is old
Ob-la-di, Ob-la-da The Beatles this is old
OH DARLING The Beatles this is old
P.s. I Love You The Beatles this is old
Paperback Writer The Beatles this is old
Penny Lane The Beatles this is old
Please Please Me The Beatles this is old
Reprise / Day in the Life The Beatles this is old
Revolution The Beatles this is old
ROCK AND ROLL MUSIC The Beatles this is old
Rocky Raccoon The Beatles this is old
Roll Over Beethoven The Beatles this is old
Sgt. Pepper Inner Groove The Beatles this is old
Sgt. Pepper's Lonely Hearts Club Band The Beatles this is old
Sgt. Pepper's Lonely Hearts Club Band (Reprise) The Beatles this is old
Sgt.

Where Have All the Good Times Gone! Van Halen this is recent
Why Can't This Be Love Van Halen this is recent
You Really Got Me Van Halen this is recent
And It Stoned Me Van Morrison this is old
BLUE MONEY Van Morrison this is old
Brown Eyed Girl Van Morrison this is old
CRAZY LOVE Van Morrison this is recent
Domino Van Morrison this is recent
Gloria Van Morrison this is old
Into The Mystic Van Morrison this is recent
Jackie Wilson Said Van Morrison this is old
Moondance Van Morrison this is recent
Wild Night Van Morrison this is recent
Blister In the Sun Violent Femmes this is recent
Edge of a Broken Heart Vixen this is old
Low Rider War this is recent
Spill the Wine War this is recent
The Cisco Kid War this is recent
Why Can't We Be Friends War this is recent
Cherry Pie Warrant this is recent
Uncle Tom's Cabin Warrant this is old
Lawyers, Guns and Money Warren Zevon this is recent
Werewolves of London Warren Zevon this is recent
Buddy Holly Weezer this is recent
Street Corner Serenade

0       None
1       None
2       None
3       None
4       None
5       None
6       None
7       None
8       None
9       None
10      None
11      None
12      None
13      None
14      None
15      None
16      None
17      None
18      None
19      None
20      None
21      None
22      None
23      None
24      None
25      None
26      None
27      None
28      None
29      None
        ... 
2200    None
2201    None
2202    None
2203    None
2204    None
2205    None
2206    None
2207    None
2208    None
2209    None
2210    None
2211    None
2212    None
2213    None
2214    None
2215    None
2216    None
2217    None
2218    None
2219    None
2220    None
2221    None
2222    None
2223    None
2224    None
2225    None
2226    None
2227    None
2228    None
2229    None
Length: 2230, dtype: object

You'll notice that there will be a final output Series of `None` values. The `.apply()` function, if a return value is not specified, will return a Series of `None` values (similar to how the default return for Python functions is `None` when a return statement is not specified).

### 9. Write a function that converts cells in a DataFrame to float and otherwise replaces them with `np.nan`.

If applied to our data, it would keep only the numeric information and otherwise input null values.

Recall that the try-except syntax in Python is a great way to try something and take another action if the initial step fails:

```python
try:
    Perform some action.
except:
   Perform some other action if the first failed with an error.
```

#### 9.A Write the function that takes a column and converts all of its values to float if possible and `np.nan` otherwise. The return value should be the converted Series.

In [55]:
# A:

def float_or_nan(column):
    new_col = []
    for el in column:
        try:
            new_col.append(float(el))
        except:
            new_col.append(np.nan)
    return new_col

In [56]:
def element_converter(element):
    try:
        return float(element)
    except:
        return np.nan

def float_or_nan(column):
    return column.map(element_converter)

#### 9.B Try your function out on the rock song data and ensure the output is what you expected.


In [57]:
# A:

df.apply(float_or_nan)

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
0,,,1982.0,,1.0,1.0,82.0,82.0
1,,,0.0,,1.0,0.0,3.0,0.0
2,,,1981.0,,1.0,1.0,85.0,85.0
3,,,1980.0,,1.0,1.0,18.0,18.0
4,,,1975.0,,1.0,1.0,1.0,1.0
5,,,2000.0,,1.0,1.0,13.0,13.0
6,,,2000.0,,1.0,1.0,1.0,1.0
7,,,2002.0,,1.0,1.0,6.0,6.0
8,,,1992.0,,1.0,1.0,3.0,3.0
9,,,1985.0,,1.0,1.0,1.0,1.0


#### 9.C Describe the new float-only DataFrame.

In [58]:
df.apply(float_or_nan).describe()

Unnamed: 0,songclean,artistclean,releaseyear,combined,first,year,playcount,fg
count,2.0,0.0,2229.0,0.0,2230.0,2230.0,2230.0,2230.0
mean,1012.0,,1465.988784,,1.0,0.741256,16.872646,15.04843
std,1367.544515,,866.834789,,0.0,0.438043,25.302972,25.288366
min,45.0,,0.0,,1.0,0.0,0.0,0.0
25%,528.5,,0.0,,1.0,0.0,1.0,0.0
50%,1012.0,,1973.0,,1.0,1.0,4.0,3.0
75%,1495.5,,1981.0,,1.0,1.0,21.0,18.0
max,1979.0,,2014.0,,1.0,1.0,142.0,142.0
