# Musical instruments prices
### A study of the prices of musical instruments in Sri Lanka

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()
sns.set_style("white")
sns.set_palette("mako_r")

In [2]:
df = pd.read_csv("music_instrument_prices.csv", encoding="utf-8")

## Cleaning the dataset

Let's take a look at data that we imported from the csv

In [3]:
df.head(3)

Unnamed: 0,Title,Sub_title,Price,Instrument_Type,Condition,Location,Description,Post_URL,Seller_name,Seller_type,published_date
0,Yamaha (SY-77) Music Synthesizer for sale,"Posted on 04 Oct 7:11 pm, Ja-Ela, Gampaha","Rs 39,000",Keyboard / Piano,Used,"Ja-Ela, Gampaha",Â°â¢Â°Sri Lanka's Largest Digital Piano Selle...,https://ikman.lk/en/ad/yamaha-sy-77-music-synt...,Seven Star International,Member,2021-10-04 19:11:00
1,SRX-718 BASS BIN (PAIR) for sale,"Posted on 10 Oct 7:54 pm, Kadawatha, Gampaha","Rs 77,500",Studio / Live Music Equipment,New,"Kadawatha, Gampaha",â¡Watts 3200â¡â¡Treated Plywoodâ¡,https://ikman.lk/en/ad/srx-718-bass-bin-pair-f...,Sasiru Super Sonics,Member,2021-10-10 19:54:00
2,Piano (Malcom Mendis Piano) for sale,"Posted on 13 Oct 12:43 pm, Kandana, Gampaha","Rs 130,000",Keyboard / Piano,Used,"Kandana, Gampaha","Sri Lanka's Biggest Piano Sale, Reasonable pri...",https://ikman.lk/en/ad/piano-malcom-mendis-pia...,Sell Fast | à¶à¶³à·à¶± | MCI Ikman à¶¯à·à¶±...,Member,2021-10-13 12:43:00


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5167 entries, 0 to 5166
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Title            5167 non-null   object
 1   Sub_title        5167 non-null   object
 2   Price            5167 non-null   object
 3   Instrument_Type  5167 non-null   object
 4   Condition        5167 non-null   object
 5   Location         5167 non-null   object
 6   Description      5167 non-null   object
 7   Post_URL         5167 non-null   object
 8   Seller_name      5167 non-null   object
 9   Seller_type      5167 non-null   object
 10  published_date   5167 non-null   object
dtypes: object(11)
memory usage: 444.2+ KB


We can se that we have no missing values on this dataset. However, al values appear as "object" despite some of them are numbers or dates. We have also two variables,"Condition" and "Seller_type", that seem to be binary. Let's check that before moving on

In [5]:
print(df.Condition.unique())
print(df.Seller_type.unique())

['Used' 'New']
['Member' 'Premium-Member']


As suspected, both variables are binaries. We will change those of a new pair of variables that take 1s and 0s as it's possible values

**Binary variables**

In [6]:
# We are convert the condition type to a boolean variable with 1s and 0s

def textToBoolean(condition, yesval, noval):
    if condition == yesval:
        return 1
    elif condition == noval:
        return 0
    else:
        return null
df["Is_new"] = df["Condition"].apply(lambda x: textToBoolean(x, "New", "Used"))
df["Premium_seller"] = df["Seller_type"].apply(lambda x: textToBoolean(x, "Member", "Premium-Member"))

**Prices**

In [7]:
# We want to convert prices to numbers:

def parsePrice(text):
    text = text.replace("Rs ","")
    text = text.replace(",","")
    return int(text)

df["Price_value"] = df["Price"].apply(lambda x: parsePrice(x))

**Dates**

In [8]:
# Now, let's convert the date strings in "published_date" to datetime objects

df["Published"] = pd.to_datetime(df["published_date"], format="%Y-%m-%d %H:%M:%S")


# Let's check if variables have been created ok:
print("Is_new values:", df.Is_new.unique())
print("Premium_seller values:", df.Is_new.unique())


Is_new values: [0 1]
Premium_seller values: [0 1]


In [9]:
# If we take a look at the "Description" column, we will see some weird characters mixed with the text
# Several encodings have been tryied without success, so we are going to have to remove those characters.
# Let's se an example:
badtext = df.iloc[0].Description
badtext

"Â°â\x80¢Â°Sri Lanka's Largest Digital Piano SellerÂ°â\x80¢Â° Â°â\x80¢Â° Direct Imported Â°â\x80¢Â° Fully Functional and ready to Use Â°â\x80¢Â° Cosmetics : 10/10Â°â\x80¢Â° Ideal for an Hotelier or For an keen learner.Â°â\x80¢Â° 6 months of  WarrantyÂ°â\x80¢Â° Furnished to the OptimumÂ°â\x80¢Â° At Brand New Conditionâ\x80¢Â°â\x80¢ The Art of Honour Lasting Values Â® â\x80¢Â°â\x80¢"

In [10]:
# We are going to make a list of the characters we want to remove and then
# we will create a function that will replace those characters with an empty string

badchars = ["Â","\x80¢","°","â","®","¡","à", "¶","±", "ð"]

def cleanText(text, badchar_list):
    newtext = text
    for char in badchar_list:
        newtext = newtext.replace(char,"")
    return newtext

# In this example we se many of the characters dissapearing, but most of
# the description entries are full of added substrings with seemingly random
# patterns, so it is difficult to easyly clean them all with a simple script.

goodtext = cleanText(badtext, badchars)
goodtext

"Sri Lanka's Largest Digital Piano Seller  Direct Imported  Fully Functional and ready to Use  Cosmetics : 10/10 Ideal for an Hotelier or For an keen learner. 6 months of  Warranty Furnished to the Optimum At Brand New Condition The Art of Honour Lasting Values  "

In [11]:
# We apply the changes to the dataframe
df["Description"] = df["Description"].apply(lambda x: cleanText(x, badchars))

In [12]:
# We take a final look at the dataset
new_cols = ["Is_new", "Price_value", "Premium_seller","Published", "Description"]
df[new_cols].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5167 entries, 0 to 5166
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Is_new          5167 non-null   int64         
 1   Price_value     5167 non-null   int64         
 2   Premium_seller  5167 non-null   int64         
 3   Published       5167 non-null   datetime64[ns]
 4   Description     5167 non-null   object        
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 202.0+ KB


Everything looks fine, so we can now proceed to analyze the dataset

## Exploratory Analysis

**Instrument types**

In the dataset we have several categories for the musical instruments and accesories. Let's get some insights about this categories.

In [108]:
# Here we use pivot_table to group data around Intrument Type and condition, 
# and then whe use different aggregation functions to obtain some info about prices

table = df.pivot_table(values=["Title","Price_value"], index=['Instrument_Type'], columns=['Condition'], aggfunc={'Title': "count",
                             'Price_value': [np.mean, min, max]})

# This gives us a three level MultiIndex, let's lower it to two levels
new_index = [("Max Price", "New"),("Max Price", "Used"),("Mean Price", "New"),("Mean Price", "Used"),("Min Price", "New"),("Min Price", "Used"),("Items", "New"),("Items","Used")]
table.columns = pd.MultiIndex.from_tuples(new_index, names=["","Condition"])

# Now we are going to add a couple of columns to Items: we want the total number of items
# and the proportion of new ones within each instrument type

def getPercent(x,y):
    return(100*x/(x+y))

table["Items","Total"] = table.apply(lambda row: row["Items","New"] + row["Items","Used"], axis=1)
table["Items","% of New"] = table.apply(lambda row: getPercent(row["Items","New"],row["Items","Used"]), axis=1)


# Finally, we see the table, ordered by the most popular items first 
table.round(decimals=2).sort_values([('Items', 'Total')], ascending=False)

Unnamed: 0_level_0,Max Price,Max Price,Mean Price,Mean Price,Min Price,Min Price,Items,Items,Items,Items
Condition,New,Used,New,Used,New,Used,New,Used,Total,% of New
Instrument_Type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Studio / Live Music Equipment,1500000.0,6850000.0,41042.81,80052.79,275.0,500.0,718,1069,1787.0,40.18
String Instrument / Amplifier,348000.0,580000.0,20677.33,30674.88,80.0,1000.0,525,1221,1746.0,30.07
Keyboard / Piano,800000.0,770000.0,62931.36,75736.51,350.0,1500.0,99,557,656.0,15.09
Percussion / drums,230000.0,435000.0,25519.35,50877.08,450.0,750.0,248,384,632.0,39.24
Other Instrument,100000.0,1025000.0,7879.29,64356.04,350.0,1000.0,83,91,174.0,47.7
Woodwind / brass,165000.0,95000.0,44968.18,37045.89,250.0,500.0,11,73,84.0,13.1
Sheet Music,14000.0,150000.0,3732.63,29764.29,1000.0,1500.0,30,14,44.0,68.18
Vinyl,13500.0,95000.0,7180.0,12584.62,1000.0,500.0,5,39,44.0,11.36
