# Musical instruments prices
### A study of the prices of musical instruments in Sri Lanka

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

sns.set()
sns.set_style("white")
sns.set_palette("mako_r")

In [2]:
df = pd.read_csv("music_instrument_prices.csv", encoding="utf-8")

## Cleaning the dataset

Let's take a look at data that we imported from the csv

In [15]:
df.head(3)

Unnamed: 0,Title,Sub_title,Price,Instrument_Type,Condition,Location,Description,Post_URL,Seller_name,Seller_type,published_date,Price_value,Is_new,Premium_seller,Published
0,Yamaha (SY-77) Music Synthesizer for sale,"Posted on 04 Oct 7:11 pm, Ja-Ela, Gampaha","Rs 39,000",Keyboard / Piano,Used,"Ja-Ela, Gampaha",Sri Lanka's Largest Digital Piano Seller Dire...,https://ikman.lk/en/ad/yamaha-sy-77-music-synt...,Seven Star International,Member,2021-10-04 19:11:00,39000,0,1,2021-10-04 19:11:00
1,SRX-718 BASS BIN (PAIR) for sale,"Posted on 10 Oct 7:54 pm, Kadawatha, Gampaha","Rs 77,500",Studio / Live Music Equipment,New,"Kadawatha, Gampaha",Watts 3200Treated Plywood,https://ikman.lk/en/ad/srx-718-bass-bin-pair-f...,Sasiru Super Sonics,Member,2021-10-10 19:54:00,77500,1,1,2021-10-10 19:54:00
2,Piano (Malcom Mendis Piano) for sale,"Posted on 13 Oct 12:43 pm, Kandana, Gampaha","Rs 130,000",Keyboard / Piano,Used,"Kandana, Gampaha","Sri Lanka's Biggest Piano Sale, Reasonable pri...",https://ikman.lk/en/ad/piano-malcom-mendis-pia...,Sell Fast | à¶à¶³à·à¶± | MCI Ikman à¶¯à·à¶±...,Member,2021-10-13 12:43:00,130000,0,1,2021-10-13 12:43:00


In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5167 entries, 0 to 5166
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Title            5167 non-null   object        
 1   Sub_title        5167 non-null   object        
 2   Price            5167 non-null   object        
 3   Instrument_Type  5167 non-null   object        
 4   Condition        5167 non-null   object        
 5   Location         5167 non-null   object        
 6   Description      5167 non-null   object        
 7   Post_URL         5167 non-null   object        
 8   Seller_name      5167 non-null   object        
 9   Seller_type      5167 non-null   object        
 10  published_date   5167 non-null   object        
 11  Price_value      5167 non-null   int64         
 12  Is_new           5167 non-null   int64         
 13  Premium_seller   5167 non-null   int64         
 14  Published        5167 non-null   datetim

We can se that we have no missing values on this dataset. However, al values appear as "object" despite some of them are numbers or dates. We have also two variables,"Condition" and "Seller_type", that seem to be binary. Let's check that before moving on

In [5]:
print(df.Condition.unique())
print(df.Seller_type.unique())

['Used' 'New']
['Member' 'Premium-Member']


As suspected, both variables are binaries. We will change those of a new pair of variables that take 1s and 0s as it's possible values

**Binary variables**

In [14]:
# We are convert the condition type to a boolean variable with 1s and 0s

def textToBoolean(condition, yesval, noval):
    if condition == yesval:
        return 1
    elif condition == noval:
        return 0
    else:
        return null
df["Is_new"] = df["Condition"].apply(lambda x: textToBoolean(x, "New", "Used"))
df["Premium_seller"] = df["Seller_type"].apply(lambda x: textToBoolean(x, "Member", "Premium-Member"))

**Prices**

In [18]:
# We want to convert prices to numbers:

def parsePrice(text):
    text = text.replace("Rs ","")
    text = text.replace(",","")
    return int(text)

df["Price_value"] = df["Price"].apply(lambda x: parsePrice(x))

**Dates**

In [None]:
# Now, let's convert the date strings in "published_date" to datetime objects

df["Published"] = pd.to_datetime(df["published_date"], format="%Y-%m-%d %H:%M:%S")


# Let's check if variables have been created ok:
print("Is_new values:", df.Is_new.unique())
print("Premium_seller values:", df.Is_new.unique())


In [7]:
# If we take a look at the "Description" column, we will see some weird characters mixed with the text
# Several encodings have been tryied without success, so we are going to have to remove those characters.
# Let's se an example:
badtext = df.iloc[0].Description
badtext

"Â°â\x80¢Â°Sri Lanka's Largest Digital Piano SellerÂ°â\x80¢Â° Â°â\x80¢Â° Direct Imported Â°â\x80¢Â° Fully Functional and ready to Use Â°â\x80¢Â° Cosmetics : 10/10Â°â\x80¢Â° Ideal for an Hotelier or For an keen learner.Â°â\x80¢Â° 6 months of  WarrantyÂ°â\x80¢Â° Furnished to the OptimumÂ°â\x80¢Â° At Brand New Conditionâ\x80¢Â°â\x80¢ The Art of Honour Lasting Values Â® â\x80¢Â°â\x80¢"

In [8]:
# We are going to make a list of the characters we want to remove and then
# we will create a function that will replace those characters with an empty string

badchars = ["Â","\x80¢","°","â","®","¡","à", "¶","±", "ð"]

def cleanText(text, badchar_list):
    newtext = text
    for char in badchar_list:
        newtext = newtext.replace(char,"")
    return newtext

# In this example we se many of the characters dissapearing, but most of
# the description entries are full of added substrings with seemingly random
# patterns, so it is difficult to easyly clean them all with a simple script.

goodtext = cleanText(badtext, badchars)
goodtext

"Sri Lanka's Largest Digital Piano Seller  Direct Imported  Fully Functional and ready to Use  Cosmetics : 10/10 Ideal for an Hotelier or For an keen learner. 6 months of  Warranty Furnished to the Optimum At Brand New Condition The Art of Honour Lasting Values  "

In [9]:
# We apply the changes to the dataframe
df["Description"] = df["Description"].apply(lambda x: cleanText(x, badchars))

In [10]:
# We take a final look at the dataset
new_cols = ["Is_new", "Price_value", "Premium_seller","Published", "Description"]
df[new_cols].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5167 entries, 0 to 5166
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   Is_new          5167 non-null   int64         
 1   Price_value     5167 non-null   int64         
 2   Premium_seller  5167 non-null   int64         
 3   Published       5167 non-null   datetime64[ns]
 4   Description     5167 non-null   object        
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 202.0+ KB


Everything looks fine, so we can now proceed to analyze the dataset

## Exploratory Analysis

**Instrument types**

In the dataset we have several categories for the musical instruments and accesories. Let's get some insights about this categories.

In [23]:
pd.DataFrame(df.groupby("Instrument_Type")["Title"].count()).sort_values("Title")

Unnamed: 0_level_0,Title
Instrument_Type,Unnamed: 1_level_1
Sheet Music,44
Vinyl,44
Woodwind / brass,84
Other Instrument,174
Percussion / drums,632
Keyboard / Piano,656
String Instrument / Amplifier,1746
Studio / Live Music Equipment,1787


We can see that most products are within the string instruments and studio equipment. Since we have both new and used products, it would be interesting to further discrimintate categories by condition.

In [37]:
pd.DataFrame(df.groupby(["Instrument_Type", "Condition"])["Title"].count()).sort_values(["Title","Instrument_Type","Condition",]).reset_index()

Unnamed: 0,Instrument_Type,Condition,Title
0,Vinyl,New,5
1,Woodwind / brass,New,11
2,Sheet Music,Used,14
3,Sheet Music,New,30
4,Vinyl,Used,39
5,Woodwind / brass,Used,73
6,Other Instrument,New,83
7,Other Instrument,Used,91
8,Keyboard / Piano,New,99
9,Percussion / drums,New,248


In [38]:
# Quiero un dataframe con las columnas 
# "Instrument_type", "Used (cantidad)", "New (cantidad)", "%Used (proporcion)", "Total", "Average used Price", "Average new price"

In [42]:
df.iloc[:50]["Title"]

0             Yamaha (SY-77) Music Synthesizer for sale
1                      SRX-718 BASS BIN (PAIR) for sale
2                  Piano (Malcom Mendis Piano) for sale
3                 Yamaha Semi Acoustic Guitars for sale
4                                Yamaha Guitar for sale
5                                       Violin for sale
6                 Expnsoin Card Xp 30.50.60.80 for sale
7           à·à¶ºà·à¶©à· à¶©à·âà¶»à¶¸à· for sale
8                     semi acoustic box guitar for sale
9                    32 keys Melodica full set for sale
10                           Guitar wall stand for sale
11                       Fender 41" box guitar for sale
12                      Dilipsons Piano Center for sale
13                                     Sarpina for sale
14                             Electric Guitar for sale
15                   38 " Brand new box guitar for sale
16                             Crossover X Pro for sale
17                10" / 12" 14" cymbals plates f