## 1.  Motivation.  Our Voyager Data Clean Up

In this notebook I would like to expore various ways of cleaning up *Our Messy Metadata*.  I am expecting that some of our data, like **Most/Least Likely Contexts**, and **Genres** are probably going to involve various collections of observations in the same cell.  So I will leave this for a bit later. 

First, however, I will explore some columns that will be better for my level of experience:  the **Personal Rank** and **Class Year**, since these include only a single value per cell, albeit in what I expect to be different formats.

I want to try out different ways of grappling with the problems:

- **functions** (in which I pass in the given value for each row in a column, then return various data based on certain tests
- **dictionaries** (in which I build a complete list of _all_ the possible values, then map these to more regular ones
- **str.lower()**, **str.replace()** and various ways of adjusting case, replacing punctionation marks or other sub strings
- Is there a way to **reverse** the order of names, so we can sort by surname?

If everything works out, I would then like to try some **basic sorting** of my df according to a given column.

I also want to try **merging** Our Clean Metadata with the Golden Beatles Record, since that has the Spotify URLS.

I know that next week we will grapple with the problem of making Tidy Data from all of this, but it would be good to anticipate how to start that process by spliting long strings in the **Genres** column (for ex) into "lists" of strings.  We will see!



### 1a. Import Library

In [161]:
# import libraries

import pandas as pd

# supress warnings
import warnings
warnings.filterwarnings('ignore')

### 1b. Load the Messy Data CSV to Pandas DF

In [162]:
# the url
our_messy_metadata_csv = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vTgFKikka0dJL3HoDat6kCnSKfVHjOTectJqqNiKhCrByJ9ciVYhEwDt8WpyjrHgcd62IEUi20-L-eN/pub?output=csv'

# load it to a Pandas dataframe
our_messy_metadata = pd.read_csv(our_messy_metadata_csv)

# while we're at it, the full Beatles Golden Record, which has the Spotify URLs!  I will merge this later!

# url for the full golden record
our_beatles_golden_record_csv = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRsnaW79HoLBgZMiDWM-RyS5SxFljKwbrraYZO3PJXxdQnzvKYzr_gQYUAFFpnrFkogvj62joGvCc0W/pub?output=csv'


# and how we use the Pandas `read_csv` method to load that file as a dataframe, for later!
our_beatles_golden_record = pd.read_csv(our_beatles_golden_record_csv)


In [163]:
# just view the head (or whatever!)  I noticed that you can put any integer into 'head()' to adjust the preview

our_messy_metadata.head(7)

Unnamed: 0,Timestamp,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank,Genre Tags,Most Likely Context(s),Least Likely Contexts
0,2/5/2026 16:23:46,A Day in the Life,Grace Quasebarth,2024,Ohio Exists,5.0,"Soft Rock, Alternative Rock","Giant on South 23rd Street, Cedar Point, Radio",Southwest Airlines
1,2/5/2026 16:28:24,Love You To,Grace Quasebarth,2024,Ohio Exists,,"Modern Bollywood, Exotica, Indian Electronic","Radio, Most Inconvenient Moment Imaginable","CVS, Cedar Point, Giant on South 23rd Street"
2,2/5/2026 16:31:48,Hey Jude,Grace Quasebarth,2024,Ohio Exists,6.0,"Slowcore, Piano Rock,","Bar in Spain, CVS, Cedar Point, Most Inconveni...","Southwest Airlines, Radio"
3,2/5/2026 16:35:11,"Ob-La-Di, Ob-La-Da",Grace Quasebarth,2024,Ohio Exists,4.0,"Soft Rock, Indie Poptimism, Mellow Gold","Southwest Airlines, CVS, Giant on South 23rd S...","Bar in Spain, Radio, Cedar Point"
4,2/5/2026 16:39:49,Yellow Submarine,Grace Quasebarth,2024,Ohio Exists,1.0,"Shanty, Children's Music, Folk","Cedar Point, Southwest Airlines","Bar in Spain, Radio, CVS"
5,2/5/2026 16:43:55,Back in the U.S.S.R.,Grace Quasebarth,2024,Ohio Exists,2.0,"Comedy Rock, High Vibe, Doo-Wop","A Bar in Spain, CVS, Southwest Airlines","Giant on South 23rd Street, Most Inconvenient ..."
6,2/5/2026 16:45:41,Love You To,Grace Quasebarth,2024,Ohio Exists,3.0,"Modern Bollywood, Exotica, Indian Electronic","Radio, Most Inconvenient Moment Imaginable","CVS, Cedar Point, Giant on South 23rd Street"


# 2.  Implementation

Let's get started with the clean up.  I am going to:

- a) Inspect the Columns, and Remove Any I don't Need
- b) Figure out all the values in Song Title (to make sure they are consistent), then push them to lower
- c) Figure out values in Graduating Class and regularize these
- d) Do the same for Personal Rank
- e) See if I can 'reverse' the names of users, so we get Surname, Given name
- f) See if I can regularize punctuation in Genres and Contexts, and perhaps split the strings into lists.
- g) Try to Merge Our Clean Data with the original Beatles Golden Record (but I might need to correct those song titles to lower!)





### 2.a Inspect the Columns, and Remove Any I donâ€™t Need

In [164]:
# basic column types
our_messy_metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Timestamp               37 non-null     object
 1   Song Title              37 non-null     object
 2   Your Name               37 non-null     object
 3   Your Graduating Class   37 non-null     int64 
 4   Team/Group Name         37 non-null     object
 5   Personal Rank           36 non-null     object
 6   Genre Tags              37 non-null     object
 7   Most Likely Context(s)  37 non-null     object
 8   Least Likely Contexts   35 non-null     object
dtypes: int64(1), object(8)
memory usage: 2.7+ KB


#### There are NA's in here!

- One in Personal Rank
- Two in Least Likely Contexts

```python
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 9 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Timestamp               37 non-null     object
 1   Song Title              37 non-null     object
 2   Your Name               37 non-null     object
 3   Your Graduating Class   37 non-null     int64 
 4   Team/Group Name         37 non-null     object
 5   Personal Rank           36 non-null     object
 6   Genre Tags              37 non-null     object
 7   Most Likely Context(s)  37 non-null     object
 8   Least Likely Contexts   35 non-null     object
dtypes: int64(1), object(8)
memory usage: 2.7+ KB

```

I will either `drop` these, or `fill` them with something more meaningful!

####  We don't need the Timestamp Column, so let's drop that

- Note that we can use the `inplace=True` argument to update the original df without renaming it



In [165]:
our_messy_metadata.drop(columns = ['Timestamp'], inplace=True)
our_messy_metadata.head()

Unnamed: 0,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank,Genre Tags,Most Likely Context(s),Least Likely Contexts
0,A Day in the Life,Grace Quasebarth,2024,Ohio Exists,5.0,"Soft Rock, Alternative Rock","Giant on South 23rd Street, Cedar Point, Radio",Southwest Airlines
1,Love You To,Grace Quasebarth,2024,Ohio Exists,,"Modern Bollywood, Exotica, Indian Electronic","Radio, Most Inconvenient Moment Imaginable","CVS, Cedar Point, Giant on South 23rd Street"
2,Hey Jude,Grace Quasebarth,2024,Ohio Exists,6.0,"Slowcore, Piano Rock,","Bar in Spain, CVS, Cedar Point, Most Inconveni...","Southwest Airlines, Radio"
3,"Ob-La-Di, Ob-La-Da",Grace Quasebarth,2024,Ohio Exists,4.0,"Soft Rock, Indie Poptimism, Mellow Gold","Southwest Airlines, CVS, Giant on South 23rd S...","Bar in Spain, Radio, Cedar Point"
4,Yellow Submarine,Grace Quasebarth,2024,Ohio Exists,1.0,"Shanty, Children's Music, Folk","Cedar Point, Southwest Airlines","Bar in Spain, Radio, CVS"


### 2b.  Clean Up with Functions

- Some of my columns have fairly regular data, so I think it will be possible to clean these with simple functions.
- Let's look at the unique values for each

First the Names (these are all formatted the same way, so this should be easy)

```python
list(our_messy_metadata['Your Name'].unique())
['Grace Quasebarth',
 'Patricia Tang',
 'Ella Manning',
 'Elizabeth Garozzo',
 'Jacob Jahiel']
```

Now Class Years--these are varied, but I will want to transform the two-digit integers into four-digit integers

```python
our_messy_metadata['Your Graduating Class'].unique().tolist()
[2024, 2026, 25, 2030, 20, 2027]
```


- Now some functions that will handle the data and reverse the names:


In [166]:
# reverse names so we have surname first
def reverse_name_without_comma(name):
    words = name.split()
    return ', '.join(reversed(words))

# check years and regularize
def year_clean(year):
    # checks for NA, and returns something more meaningful
    if pd.isna(year):
        return None  # or 0, or some default year
    # how check for specific values
    if year == 20:
        year = 2020
    elif year == 25:
        year = 2025
    return year


- Let's apply them to the relevant columns


In [167]:
# a quick check to see what the updated data will look like
our_messy_metadata['Your Name'].apply(reverse_name_without_comma)

0      Quasebarth, Grace
1      Quasebarth, Grace
2      Quasebarth, Grace
3      Quasebarth, Grace
4      Quasebarth, Grace
5      Quasebarth, Grace
6      Quasebarth, Grace
7         Tang, Patricia
8          Manning, Ella
9         Tang, Patricia
10    Garozzo, Elizabeth
11        Tang, Patricia
12        Tang, Patricia
13    Garozzo, Elizabeth
14         Jahiel, Jacob
15         Manning, Ella
16        Tang, Patricia
17     Quasebarth, Grace
18    Garozzo, Elizabeth
19        Tang, Patricia
20     Quasebarth, Grace
21    Garozzo, Elizabeth
22         Manning, Ella
23         Jahiel, Jacob
24    Garozzo, Elizabeth
25         Manning, Ella
26         Jahiel, Jacob
27     Quasebarth, Grace
28         Manning, Ella
29    Garozzo, Elizabeth
30     Quasebarth, Grace
31         Jahiel, Jacob
32         Manning, Ella
33    Garozzo, Elizabeth
34         Jahiel, Jacob
35     Quasebarth, Grace
36         Jahiel, Jacob
Name: Your Name, dtype: object

In [168]:
# and update the column by copying it into itself
our_messy_metadata['Your Name'] = our_messy_metadata['Your Name'].apply(reverse_name_without_comma)

In [169]:

# a quick check to see what the updated data will look like
our_messy_metadata['Your Graduating Class'].apply(year_clean)

0     2024
1     2024
2     2024
3     2024
4     2024
5     2024
6     2024
7     2026
8     2025
9     2026
10    2030
11    2026
12    2026
13    2030
14    2020
15    2025
16    2026
17    2027
18    2030
19    2026
20    2027
21    2030
22    2025
23    2020
24    2030
25    2025
26    2020
27    2027
28    2025
29    2030
30    2027
31    2020
32    2025
33    2030
34    2020
35    2027
36    2020
Name: Your Graduating Class, dtype: int64

In [170]:
# now assign it back to the original column
our_messy_metadata['Your Graduating Class'] = our_messy_metadata['Your Graduating Class'].apply(year_clean)

In [171]:
# quick check to see the result
our_messy_metadata.head()

Unnamed: 0,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank,Genre Tags,Most Likely Context(s),Least Likely Contexts
0,A Day in the Life,"Quasebarth, Grace",2024,Ohio Exists,5.0,"Soft Rock, Alternative Rock","Giant on South 23rd Street, Cedar Point, Radio",Southwest Airlines
1,Love You To,"Quasebarth, Grace",2024,Ohio Exists,,"Modern Bollywood, Exotica, Indian Electronic","Radio, Most Inconvenient Moment Imaginable","CVS, Cedar Point, Giant on South 23rd Street"
2,Hey Jude,"Quasebarth, Grace",2024,Ohio Exists,6.0,"Slowcore, Piano Rock,","Bar in Spain, CVS, Cedar Point, Most Inconveni...","Southwest Airlines, Radio"
3,"Ob-La-Di, Ob-La-Da","Quasebarth, Grace",2024,Ohio Exists,4.0,"Soft Rock, Indie Poptimism, Mellow Gold","Southwest Airlines, CVS, Giant on South 23rd S...","Bar in Spain, Radio, Cedar Point"
4,Yellow Submarine,"Quasebarth, Grace",2024,Ohio Exists,1.0,"Shanty, Children's Music, Folk","Cedar Point, Southwest Airlines","Bar in Spain, Radio, CVS"


## 2.c  Clean Personal Rank With Dictionary

- First let's check all the unique values, as we did above

```python
our_messy_metadata['Personal Rank'].unique()
['5',
 nan,
 '6',
 '4',
 '1',
 '2',
 '3',
 'First ',
 '1. A Day in the Life',
 'Second',
 'III',
 'II',
 'Third',
 'Fourth ',
 'Fifth',
 'V',
 'Fifth ',
 'IV',
 'First',
 'Sixth',
 'I',
 'Fourth']
```

- The **NaN** will be a problem, so let's fill that with something more meaningful before finding the unique values and sorting: `our_messy_metadata['Personal Rank'].fillna('unranked')`

```python
# the ranks will be the sorted unique values, but with NaN as 'unranked'
ranks = sorted(our_messy_metadata['Personal Rank'].fillna('unranked').unique().tolist())

# results sorted
ranks
['1',
 '1. A Day in the Life',
 '2',
 '3',
 '4',
 '5',
 '6',
 'Fifth',
 'Fifth ',
 'First',
 'First ',
 'Fourth',
 'Fourth ',
 'I',
 'II',
 'III',
 'IV',
 'Second',
 'Sixth',
 'Third',
 'V',
 'unranked']

```
- I notice that there are two versions of things like `Fifth`--one with a trailing space, one without a space.  Could fix this with `our_messy_metadata['Personal Rank'].str.strip()`, or just deal with it in a slightly tedious dictionary . . . 


```python
# get unique values sorted
ranks = sorted(our_messy_metadata['Personal Rank'].fillna('unranked').unique().tolist())

# make dict with those items as keys
rank_dict = dict.fromkeys(ranks)

# here is the blank dict with keys created--we just need to fill in the values
rank_dict
{'1': None,
 '1. A Day in the Life': None,
 '2': None,
 '3': None,
 '4': None,
 '5': None,
 '6': None,
 'Fifth': None,
 'Fifth ': None,
 'First': None,
 'First ': None,
 'Fourth': None,
 'Fourth ': None,
 'I': None,
 'II': None,
 'III': None,
 'IV': None,
 'Second': None,
 'Sixth': None,
 'Third': None,
 'V': None,
 'unranked': None}
```



In [172]:
# now check unique values in order
ranks = sorted(our_messy_metadata['Personal Rank'].fillna('unranked').unique().tolist())
rank_dict = dict.fromkeys(ranks)
rank_dict


{'1': None,
 '1. A Day in the Life': None,
 '2': None,
 '3': None,
 '4': None,
 '5': None,
 '6': None,
 'Fifth': None,
 'Fifth ': None,
 'First': None,
 'First ': None,
 'Fourth': None,
 'Fourth ': None,
 'I': None,
 'II': None,
 'III': None,
 'IV': None,
 'Second': None,
 'Sixth': None,
 'Third': None,
 'V': None,
 'unranked': None}

In [173]:
# now with the values filled in by hand

rank_dict = {'1': 1,
 '1. A Day in the Life': 1,
 '2': 2,
 '3': 3,
 '4': 4,
 '5': 5,
 '6': 6,
 'Fifth': 5,
 'Fifth ': 5,
 'First': 1,
 'First ': 1,
 'Fourth': 4,
 'Fourth ': 4,
 'I': 1,
 'II': 2,
 'III': 3,
 'IV': 4,
 'Second': 2,
 'Sixth': 6,
 'Third': 3,
 'V': 5,
 'unranked': 0}

# and 'map' it to the ranks and return a cleaned column.  make sure to fillna with 'unranked'!
our_messy_metadata["Personal Rank Clean"] = our_messy_metadata["Personal Rank"].fillna('unranked').map(rank_dict)

our_messy_metadata.head()

Unnamed: 0,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank,Genre Tags,Most Likely Context(s),Least Likely Contexts,Personal Rank Clean
0,A Day in the Life,"Quasebarth, Grace",2024,Ohio Exists,5.0,"Soft Rock, Alternative Rock","Giant on South 23rd Street, Cedar Point, Radio",Southwest Airlines,5
1,Love You To,"Quasebarth, Grace",2024,Ohio Exists,,"Modern Bollywood, Exotica, Indian Electronic","Radio, Most Inconvenient Moment Imaginable","CVS, Cedar Point, Giant on South 23rd Street",0
2,Hey Jude,"Quasebarth, Grace",2024,Ohio Exists,6.0,"Slowcore, Piano Rock,","Bar in Spain, CVS, Cedar Point, Most Inconveni...","Southwest Airlines, Radio",6
3,"Ob-La-Di, Ob-La-Da","Quasebarth, Grace",2024,Ohio Exists,4.0,"Soft Rock, Indie Poptimism, Mellow Gold","Southwest Airlines, CVS, Giant on South 23rd S...","Bar in Spain, Radio, Cedar Point",4
4,Yellow Submarine,"Quasebarth, Grace",2024,Ohio Exists,1.0,"Shanty, Children's Music, Folk","Cedar Point, Southwest Airlines","Bar in Spain, Radio, CVS",1


## 3. Clean Genre and Context Columns

- The challenges here are slightly different from what we have seen above!
- First, there each cell is a single long string.  We want to `split` these strings into a `list` of strings (and later we will want to `explode` these for Tidy Data!
- But before we can split them, we need to check the separators used by teach team!


```python
# check some values to see what is going on
our_messy_metadata['Genre Tags'].unique().tolist()
['Soft Rock, Alternative Rock',
 'Modern Bollywood, Exotica, Indian Electronic',
 'Slowcore, Piano Rock, ',
 'Soft Rock, Indie Poptimism, Mellow Gold',
 "Shanty, Children's Music, Folk",
 'Comedy Rock, High Vibe, Doo-Wop',
 'psychedelic rock; rock; pop',
 'Indie;Soft Rock',
 'Rock, pop',
 'rock; pop',
 'pop, rock',
 'Penitential Pop; Lonesome Rock',
 'Psychedelic Rock;Melancholia',
 'Comrade Pop; Cold War Nostalgia Soft Rock; Big Trip Hype ',
 'Sea Shanties',
 'Pop, Rock',
 'Pop Rock; Easy Listening',
 'Produce Pop; Psychedelic Rock',
 'pop,rock',
 'Psychedelic Rock; Art Rock ',
 'Zoomba Tunes; Rock-N-Roll',
 'World Music Rock of the Past Century; Plagiarism Music',
 'Rock; Folk rock ',
 'Rock; Indie',
 'God-Tier; Art Rock; Groove-Music',
 'Folk rock; Pop rock ',
 'Garbage; Rubbish',
 'Alarm Clock Music; Indie',
 'Plaintive Pop; Rock-N-Roll']
 ```

- Some have `,` as separators, others have `;`
- There are also different approaches to capitalization, so let's force everything to `lower()`
- I can also see that there are leading and trailing **whitespaces** in some substrings, but we will deal with these in a second step with `strip()`!



In [174]:
# a function to deal with separators--if we had other characters we could just add other lines of code here

def regular_separator(terms):
    # safeguard to skip NaN's
    if type(terms) is str:
        # now the actual replacemenbt
        terms = terms.replace(";", ", ")
        return terms

In [175]:

# apply regularize separator
our_messy_metadata['Genre Tags'] = our_messy_metadata['Genre Tags'].apply(regular_separator)

# lower case all genres
our_messy_metadata['Genre Tags'] = our_messy_metadata['Genre Tags'].str.lower()




We want to do the same with Most and Least Likely Contexts, but this is getting tedious!

Let's create a kind of meta function that will apply our function and split process to a list of columns we supply!



In [176]:
# selected columns to process
column_list = ['Genre Tags', 'Most Likely Context(s)', 'Least Likely Contexts']

# list comprehension
for col in column_list:
    our_messy_metadata[col] = our_messy_metadata[col].apply(regular_separator).str.lower().str.split(',')


In [177]:
our_messy_metadata.columns

Index(['Song Title', 'Your Name', 'Your Graduating Class', 'Team/Group Name',
       'Personal Rank', 'Genre Tags', 'Most Likely Context(s)',
       'Least Likely Contexts', 'Personal Rank Clean'],
      dtype='object')

### 2.d Column Cleanup, and Merging with Our Beatles Golden Record

- Merging with Our Beatles Golden Record data will allow us to add the Spotify URLs
- Also good practice for me to understand how these merges work, which will be useful for Spotify and Billboard Data

We only need the `title` and `spotify url` cols from the Beatles Golden Record:

```python
our_beatles_golden_brief = our_beatles_golden_record[['Song Title', 'Spotify URL']]
```

And we don't need the original ranking column from our_messy_metadata, so let's make a new df without that:

```python
cols_to_keep = ['Song Title', 'Your Name', 'Your Graduating Class', 'Team/Group Name',
       'Personal Rank Clean', 'Genre Tags', 'Most Likely Context(s)',
       'Least Likely Contexts']
our_clean_beatles_metadata = our_messy_metadata[cols_to_keep]
```


And we can finally merge things
```python
our_beatles_merged = pd.merge(
    left=our_clean_beatles_metadata,
    right=our_beatles_golden_brief,
    left_on='Song Title',
    right_on='Song Title',
    how='inner')
```


In [178]:
our_beatles_golden_brief = our_beatles_golden_record[['Song Title', 'Spotify URL']]
our_beatles_golden_brief.head()

Unnamed: 0,Song Title,Spotify URL
0,Back in the U.S.S.R.,https://open.spotify.com/track/0j3p1p06deJ7f9x...
1,Yellow Submarine,https://open.spotify.com/track/50xwQXPtfNZFKFe...
2,"Ob-La-Di, Ob-La-Da",https://open.spotify.com/track/1gFNm7cXfG1vSMc...
3,Eleanor Rigby,https://open.spotify.com/track/5GjPQ0eI7AgmOnA...
4,Strawberry Fields Forever,https://open.spotify.com/track/3Am0IbOxmvlSXro...


In [179]:
cols_to_keep = ['Song Title', 'Your Name', 'Your Graduating Class', 'Team/Group Name',
       'Personal Rank Clean', 'Genre Tags', 'Most Likely Context(s)',
       'Least Likely Contexts']
our_clean_beatles_metadata = our_messy_metadata[cols_to_keep]
our_clean_beatles_metadata.head()

Unnamed: 0,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank Clean,Genre Tags,Most Likely Context(s),Least Likely Contexts
0,A Day in the Life,"Quasebarth, Grace",2024,Ohio Exists,5,"[soft rock, alternative rock]","[giant on south 23rd street, cedar point, ra...",[southwest airlines]
1,Love You To,"Quasebarth, Grace",2024,Ohio Exists,0,"[modern bollywood, exotica, indian electronic]","[radio, most inconvenient moment imaginable]","[cvs, cedar point, giant on south 23rd street]"
2,Hey Jude,"Quasebarth, Grace",2024,Ohio Exists,6,"[slowcore, piano rock, ]","[bar in spain, cvs, cedar point, most incon...","[southwest airlines, radio]"
3,"Ob-La-Di, Ob-La-Da","Quasebarth, Grace",2024,Ohio Exists,4,"[soft rock, indie poptimism, mellow gold]","[southwest airlines, cvs, giant on south 23r...","[bar in spain, radio, cedar point]"
4,Yellow Submarine,"Quasebarth, Grace",2024,Ohio Exists,1,"[shanty, children's music, folk]","[cedar point, southwest airlines]","[bar in spain, radio, cvs]"


In [180]:
our_beatles_merged = pd.merge(
    left=our_clean_beatles_metadata,
    right=our_beatles_golden_brief,
    left_on='Song Title',
    right_on='Song Title',
    how='inner')

our_beatles_merged.head()

Unnamed: 0,Song Title,Your Name,Your Graduating Class,Team/Group Name,Personal Rank Clean,Genre Tags,Most Likely Context(s),Least Likely Contexts,Spotify URL
0,A Day in the Life,"Quasebarth, Grace",2024,Ohio Exists,5,"[soft rock, alternative rock]","[giant on south 23rd street, cedar point, ra...",[southwest airlines],https://open.spotify.com/track/0hKRSZhUGEhKU6a...
1,A Day in the Life,"Quasebarth, Grace",2024,Ohio Exists,5,"[soft rock, alternative rock]","[giant on south 23rd street, cedar point, ra...",[southwest airlines],https://open.spotify.com/track/0hKRSZhUGEhKU6a...
2,Love You To,"Quasebarth, Grace",2024,Ohio Exists,0,"[modern bollywood, exotica, indian electronic]","[radio, most inconvenient moment imaginable]","[cvs, cedar point, giant on south 23rd street]",https://open.spotify.com/track/69c15XPo8sYUqmn...
3,Hey Jude,"Quasebarth, Grace",2024,Ohio Exists,6,"[slowcore, piano rock, ]","[bar in spain, cvs, cedar point, most incon...","[southwest airlines, radio]",https://open.spotify.com/track/3m7V717IKZqZLW5...
4,"Ob-La-Di, Ob-La-Da","Quasebarth, Grace",2024,Ohio Exists,4,"[soft rock, indie poptimism, mellow gold]","[southwest airlines, cvs, giant on south 23r...","[bar in spain, radio, cedar point]",https://open.spotify.com/track/1gFNm7cXfG1vSMc...


### 2.e Save the Final Results as CSV!

Now that I have done all this work, let's save the result as CSV so I can just load it later

`our_beatles_merged.to_csv('our_clean_beatles_data.csv')`

In [181]:
our_beatles_merged.to_csv('our_clean_beatles_data.csv')

In [182]:
# and load that back into a df

our_beatles_reloaded = pd.read_csv('our_clean_beatles_data.csv')

# 3.  Interpretation--Takeaways and Next Steps

### What I learned:

- The importance of **checking unique values** for each col as a preparatory step, which helped me decide on the best approach
- How to **anticipate NaN values** and what to do with them--replace with things like 'unranked' or just leave them
- **Functions and Dictionaries**, and how to apply them
- How to **replace** values, **strip** whitespace and **split** on a given character
- How to **create new cols, or update old ones, then drop or select them**
- Adding **inline comments** to the code to help me extend or modify 
- **Merge** dfs and **save as CSV** for later use

### Next Steps

- Now that we have the genres and contexts split, we can figure out how to TIDY the data, and see how these map onto things like Billboard and Spotify data!