# Analyzing Spotify Music Data
In this assignment, you are a business analyst at an event management company that specializes in concerts. In an effort to identify potential performers, you’ve been tasked to analyze Spotify data on songs that have appeared in their top 10 list from 2010-2015. For each year in the dataset, you will need to complete three analysis problems. Rather than manually repeating each analysis for each individual year, you will write three distinct functions that can be reused for each year.

The CSV contains the following columns: 
- Song title
- Artist
- Year 
- Popularity; i.e. the number of times the song was played 
- Subgenre

You will begin by loading your CSV, converting it to a dictionary spreadsheet, and performing simple data cleansing. Then for each year, from 2010 through 2015, you will need to find:
- How many songs appear in the top 10 
- The average popularity of songs that appear in the top 10 
- How many songs in each of the following genres - dance pop, hip hop - appear in the top 10 

### Deliverables
To receive credit for this assignment, you must submit the following files:
- Your completed Jupyter Notebook

Your completed Jupyter Notebook will be this file, but with all of the problems solved.

When you're done with the assignment, run all cells to verify that your code executes as expected. Then, save and submit this notebook.

Good luck!


## Part 1: Loading & Exploring the Data
In Part 1, you will:
- Load the CSV
- Organize the data with a dictionary (_new_)
- Convert improperly imported data types

### Problem 1: Loading the CSV
You have been given a variable, called `filename`, containing the path to the Spotify data set. Use it to load the lines of the CSV file into a variable, called `contents`. Then, follow the steps below:
- Split the first element of `contents`  on the pipe (`|`) character, and store the result in a variable called `headers`
- Extract the remaining elements of `contents` into a variable, called `data`

Then, print out the `headers`, and the first element of `data`.

---

Your code should print the following:

```
['Title', 'Artist', 'Year', 'Popularity', 'Subgenre']
Hey, Soul Sister|Train|2010|83|neo mellow
```

---

**Hints**
- Unlike other CSV files you've used, this one uses the pipe symbol (`|`) as a separator. Make sure your code accounts for this.
- Be sure to `close` your file before proceeding.
- Recall that `splitlines` breaks a file into its constituent lines.

In [1]:
# Provided Code -- Do NOT Edit!
filename = 'SpotifyTop10.csv'

In [2]:
# TODO: `open` the CSV, then `read` it using `splitlines` into a variable called `contents`
file = open(filename, "r")
contents = file.read().splitlines()

In [3]:
# TODO: Extract first row of `contents` into a variable called `headers`
headers = contents[0]

In [4]:
headers = headers.split('|')

In [5]:
# TODO: Extract the rest of the rows from `contents` into a variable called `data`
data = contents[1:]

In [6]:
# TODO: Print headers and first element of `data`
print(headers)
print()

print(data[:1])

['Title', 'Artist', 'Year', 'Popularity', 'Subgenre']

['Hey, Soul Sister|Train|2010|83|neo mellow']


### Problem 2a: Splitting Data
Now, you will split and organize information in each data row according to its appropriate column. You have been provided with a set of empty list variables, corresponding to the columns of a spreadsheet.

To solve this problem, you must write a for loop that iterates over each row of data, splits each line Into its constituent elements, and append each element of the split to the appropriate column variable.

---

**Hints**
- Recall that you can "unpack" lists, e.g., after  `first_name, last_name = ["John", "Doe"]`, `first_name` contains `"John"`, and `last_name` contains `"Doe"`.

In [7]:
# Provided for your convenience -- DO NOT MODIFY
titles = []
artists = []
years = []
popularities = []
subgenres = []

In [8]:
# TODO: Iterate over each line in `data`
for row in data:
    
    # TODO: Split each line on `separator` into a variable called `row`
    split_row = row.split("|")
    
    title = split_row[0]
    artist = split_row[1]
    year = split_row[2]
    popularity = split_row[3]
    subgenre = split_row[4]
    
    # TODO: Append each value to appropriate list
    titles.append(title)
    artists.append(artist)
    years.append(year)
    popularities.append(popularity)
    subgenres.append(subgenre)
    

### Problem 2b: Creating a `spreadsheet` Dictionary
Next, you will create a dictionary, called `spreadsheet`, whose keys are each of the column names in `headers`, and whose values are the corresponding lists you populated in the previous problem, viz., `titles`, `artists`, `years`, `popularities`, and `subgenres`.

In [9]:
# TODO: Create `spreadsheet` dictionary
spreadsheet= {
    headers[0]: titles,
    headers[1]: artists,
    headers[2]: years,
    headers[3]: popularities,
    headers[4]: subgenres,
}

In [10]:
# Provided code -- Do NOT Edit!
for column_name, column_data in spreadsheet.items():
    print(column_name + ": " + str(column_data[:3]))

Title: ['Hey, Soul Sister', 'Love The Way You Lie', 'TiK ToK']
Artist: ['Train', 'Eminem', 'Kesha']
Year: ['2010', '2010', '2010']
Popularity: ['83', '82', '80']
Subgenre: ['neo mellow', 'detroit hip hop', 'dance pop']


### Problem 3: Converting Data Types
As usual, Python has imported all of your data as strings -- including the numeric columns, `Popularity` and `Year`. In this problem, you will convert the strings in these columns into integers.

You have been given a variable called `numerical_columns`. Use this list to implement the logic below:
- Iterate over each `column` in `numerical_columns`
- Within the loop:
  - Convert each `value` in `spreadsheet[column]` to an integer

---

**Hints**

- Use a list comprehension.

In [11]:
# Provided Code -- Do NOT Edit!
numerical_columns = ['Popularity', 'Year']

In [12]:
# TODO: Iterate over `numerical_columns`
for a in spreadsheet['Popularity']:
    a = int(a)

    # TODO: Convert each value in `column` to an integer
    for b in spreadsheet['Year']:
        b = int(b)  

## Part 2
In Part 2, you will:
- How many songs appear in the top 10 for each year, from 2010 through 2015
- The average popularity of songs that appear in the top 10 for each year, from 2010 through 2015
- How many songs in each of the following genres - dance pop and hip hop - appear in the top 10 for each year, from 2010 through 2015

### Problem 1: Number of Songs in the Top 10 Lists Each Year
Write a function, called `count_songs_in_year`, that accepts a single numerical argument, called `year`, and returns the number of songs released in the given `year`.

Your function should behave as follows:
```
>>> count_songs_in_year(2010)
51
```

In [13]:
# TODO: Declare `count_songs_in_year` function, accepting a single `year` argument
def count_songs_in_year(year):
    
    # TODO: Initialize a `counter` to `0`
    counter = 0  
    
    # TODO: Iterate over each `release_year` in `spreadsheet['Year']`
    for release_year in spreadsheet['Year']:
        
        # TOOO: If the `release_year` matches the `year` argument, increment your counter
        if  release_year == year:
            counter += 1
            
    # TODO: Return your `counter`
    return counter


After implementing `count_songs_in_year`, invoke it for each year from 2010 to 2015, to print a message like the following:

Your function should produce the following output:
```

51
53
35
71
58
95
```


In [14]:
# TODO: Call `count_songs_in_year` on the years: 2010, 2011, 2012, 2013, 2014, 2015 and print the results
print(count_songs_in_year('2010'))
print(count_songs_in_year('2011'))
print(count_songs_in_year('2012'))
print(count_songs_in_year('2013'))
print(count_songs_in_year('2014'))
print(count_songs_in_year('2015'))

51
53
35
71
58
95


### Problem 2: Average Popularity of Top 10 by Year
Write a function, called `average_popularity_in_year`, that accepts a single numerical `year` argument, and returns the average popularity of Top 10 songs in the given `year`. This indicates how many thousands of times someone listened to a Top 10 song each year.

Your function should behave as follows:

```
>>> average_popularity_in_year(2010)
64.25490196078431
```

**Hint**
- You can do this with a `for` loop _or_ a list comprehension.
- You must both count the number of songs released in the given `year` _and_ sum up their popularity values, because you must divide the sum by the count to return the average.

In [15]:
# TODO: Declare `average_popularity_in_year`, accepting a single `year` argument
def average_popularity_in_year(year):
    
    # TODO: Initialize `total_popularity` and `year_counter` variables to 0
    total_popularity = 0
    year_counter = 0
    
    # TODO: `enumerate` over each `index` and `release_year` of `spreadsheet['Year']`
    for (index, release_year) in enumerate(spreadsheet['Year']):

        # TODO: Check if `release_year` matches `year` argument
                if release_year == year:
        
            # TODO: If so, add current item's popularity to `total_popularity`, and increment `year_counter`
                    popularity = spreadsheet['Popularity'][index]
    
    # TODO: Use `total_popularity` and `year_count` to return the average popularity 
                    total_popularity += int(popularity)
                    year_counter += 1
    average_popularity_in_year =   total_popularity/year_counter
    return average_popularity_in_year

In [16]:
# TODO: Print the average popularity of Top 10 songs in 2010
print(average_popularity_in_year('2010'))

64.25490196078431


In [17]:
# TODO: Call `average_popularity_in_year` on: 2010, 2011, 2012, 2013, 2014, 2015 and print the results
print(average_popularity_in_year('2010'))
print(average_popularity_in_year('2011'))
print(average_popularity_in_year('2012'))
print(average_popularity_in_year('2013'))
print(average_popularity_in_year('2014'))
print(average_popularity_in_year('2015'))

64.25490196078431
61.867924528301884
67.77142857142857
63.985915492957744
62.706896551724135
64.56842105263158


### Problem 3: Number of Songs in Interesting Genres by Year
Write a function, called `count_songs_in_genre_in_year`, that accepts two arguments: `year`, a number, and `genre`, a string. It should return the number of songs in `genre` that were released in the given `year`.

Your function should behave as follows:

```
>>> count_songs_in_genre_in_year(2010, 'hip hop')
4
```

**Hints**
- Be sure to lowercase the `genre` argument before you use it in your function.
- Use `in` to check if the given `genre` matches a value in `spreadsheet['Subgenre']`, instead of an `==`.

In [18]:
# TODO: Declare `count_songs_in_genre_in_year`, accepting two arguments: `year` and `genre`
spreadsheet['Subgenre'] = [genre_1.lower() for genre_1 in spreadsheet['Subgenre']]

def count_songs_in_genre_in_year(year, genre):
    
    # TODO: Initialize `counter` to 0
    counter = 0
    
    # TODO: `enumerate` each `index` and `release_year` in `spreadsheet['Year']`
    for index, release_year in enumerate(spreadsheet["Year"]):
        
        # TODO: Check if current element's `release_year` matches `year` argument
        if release_year == year:
        
            # TODO: Check if current element's subgenre matches `genre` argument 
            if genre in spreadsheet['Subgenre'][index]:
                
                # TODO: If so, increment your `counter`
                counter += 1

    # TODO: Return `counter`
    return (counter)

In [19]:
# TODO: Call `count_songs_in_genre_in_year` with: `2010, 'hip hop'` 
count_songs_in_genre_in_year('2010', 'hip hop')

4

You have been given a list, called `genres`, containing the following values:
- `'dance pop'`
- `'hip hop'`
- `country'`

Invoke `count_songs_in_genre_in_year` on each genre in `genres`, for each year from 2010 to 2015, inclusive. Your code should produce output that starts with the following:

```
31
38
15
42
27
52

4
1
0
2
1
1
```

In [20]:
# TODO: Call `count_songs_in_genre` call 2010 - 2015, with `'dance pop'` as the `genre` argument for every year, and print the results

print(count_songs_in_genre_in_year('2010', 'dance pop'))
print(count_songs_in_genre_in_year('2011', 'dance pop'))
print(count_songs_in_genre_in_year('2012', 'dance pop'))
print(count_songs_in_genre_in_year('2013', 'dance pop'))
print(count_songs_in_genre_in_year('2014', 'dance pop'))
print(count_songs_in_genre_in_year('2015', 'dance pop'))
print ()
# TODO: Call `count_songs_in_genre` call 2010 - 2015, with `'hip hop'` as the `genre` argument for every year, and print the results
print(count_songs_in_genre_in_year('2010', 'hip hop'))
print(count_songs_in_genre_in_year('2011', 'hip hop'))
print(count_songs_in_genre_in_year('2012', 'hip hop'))
print(count_songs_in_genre_in_year('2013', 'hip hop'))
print(count_songs_in_genre_in_year('2014', 'hip hop'))
print(count_songs_in_genre_in_year('2015', 'hip hop'))

31
38
15
42
27
52

4
1
0
2
1
1
