# Sorting Madonna Songs
This project will randomly order the entire backlog of Madonna's songs. This was motivated following a colleague's offhand remark about one's favourite song being *Material Girl* by Madge, which triggered another colleague to provide the challenge of ranking all of Madonna's songs.  

In particular, there are two key stages to this random ordering:
1. Assign a random, distinct integer number next to each song to act as its preference ranking
1. Employ a sorting algorithm to sort this list via the preference ranking column just created

Will try a variety of sorting algorithms detailed below
- **Quicksort**
- **Bubble sort**
- **Breadth-first search**
- **Heapsort**
- **Insertion sort**
- **Shell sort**

## Set-up
Need to load in the relevant packages, set-up the right environemnt, and import the dataset.

In [2]:
# Export our environment, "NewEnv" and save it as "anomaly-detection.yml"
!conda env export -n NewEnv -f environment_anaconda.yml

# Check working directory to ensure user notebook is easily transferable
import os
os.getcwd()

'C:\\Users\\a_vis\\Documents\\Data Science\\Projects\\Sorting-Madge-Songs'

In [3]:
# Import required libraries
import numpy as np
import pandas as pd
import xlrd
import csv

### Convert to CSV
Do not have Excel installed so cannot convert it via that. Instead, get round it via the *xlrd* and *csv* packages.
Note, could directly read in Excel file and play with that. However, learn less that way!

Code for function was taken from this [link](https://stackoverflow.com/questions/9884353/xls-to-csv-converter). However, first encountered an issue on using subfolders. This was resolved in this [link](https://stackoverflow.com/questions/7165749/open-file-in-a-relative-location-in-python). Then encountered an issue concerning the reading of entries as `bytes` instead of `str` which was resolved in this [link](https://stackoverflow.com/questions/33054527/typeerror-a-bytes-like-object-is-required-not-str-when-writing-to-a-file-in).

In [4]:
def csv_from_excel(file_input, file_output, sheet_index):

    wb = xlrd.open_workbook(filename = file_input)
    sh = wb.sheet_by_index(sheet_index)
    file_csv = open(file = file_output, mode = 'wt')
    wr = csv.writer(file_csv, quoting = csv.QUOTE_ALL)

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    file_csv.close()

In [5]:
# run function to output .csv file
csv_from_excel(file_input = 'data\songs_madonna.xlsx', file_output = 'data\songs_madonna.csv', sheet_index = 0)

### Data Wrangle
Load in our .csv file so that we can add distinct random numbers as a column which we will use to sort on.

Note: File is encoded as *ANSI* which is `mbcs` in the `pd.red_csv()`.

In [12]:
# import data
data_madge = pd.read_csv(filepath_or_buffer = 'data\songs_madonna.csv', encoding = 'mbcs')

In [13]:
# display data
data_madge.head()

Unnamed: 0,Songs,Avison
0,4 Minutes,
1,American Life,
2,American Pie,
3,Angel,
4,Another Suitcase in Another Hall,


In the code below, are following a naive method for creating a column of distinct random numbers. This will be in steps:

1. Store the number of rows in a variable.
1. Generate a random sample without replacement using the number of rows as our region of interest.
1. Bind this random sample onto our `data_madge` dataframe.


In [None]:
# import package for random-sampling
import random

In [16]:
# set random seed
seed_random = np.random.RandomState(123)

# 1. store number of rows in a variable
n_rows = len(data_madge.index)

# 2. generate random sample without replacement
# note: using try-catch logic to ensure we generate a sample
try:
    sample_random = random.sample(population = range(0, n_rows), k = n_rows)
    print('Random sample generated is of object type: ', type(sample_random))
except ValueError:
    print('Sample size exceeded population size.')

# 3. bind random sample onto dataframe
data_madge['Preference_Avision'] = sample_random
data_madge = data_madge[['Songs', 'Preference_Avision']]

In [41]:
# check new dataframe
data_madge.head()

Unnamed: 0,Songs,Preference_Avision
0,4 Minutes,81
1,American Life,5
2,American Pie,37
3,Angel,8
4,Another Suitcase in Another Hall,35
