# Estimasi Harga Laptop

# Business Understanding

Laptop merupakan salah satu alat penunjang kegiatan yang paling sering digunakan dalam pekerjaan sehari - hari. Ada banyak tipe dan jenis laptop tergantung kebutuhan masing - masing. Harga dari setiap laptop pun berbeda - beda tergantung spesifikasi yang dibutuhkan. Hal itulah yang menjadi dasar untuk tujuan disini, yaitu untuk menentukan estimasi harga laptop yang ada dipasaran agar dapat membantu calon pengguna dalam mendapatkan laptop yang sesuai dengan kebutuhannya.

# Data Understanding

Dari file laptop_price.xlsx terdapat 11 kolom yang yang tipe datanya float dan object
 
 0   Company           1303 non-null   object 

 1   TypeName          1303 non-null   object 
 
 2   Inches            1303 non-null   float64
 
 3   ScreenResolution  1303 non-null   object 
 
 4   Cpu               1303 non-null   object 
 
 5   Ram               1303 non-null   object 
 
 6   Memory            1303 non-null   object 
 
 7   Gpu               1303 non-null   object 
 
 8   OpSys             1303 non-null   object 
 
 9   Weight            1303 non-null   object 
 
 10  Price_euros       1303 non-null   float64

# Data Preparation

Pada tahap ini data akan dibersihkan terlebih dahulu agar data dapat digunakan sebagai input dari aplikasi yang akan dibuat 

- Deskripsi library yang digunakan

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

- Memanggil dataset

In [None]:
df = pd.read_csv('laptop_price.csv', encoding='latin-1')

- Deskripsi dataset

In [None]:
df.head()

In [None]:
# Menghapus kolom product

df.drop(columns=['Product'],inplace=True)


In [None]:
# Info mengenai tipe data yang ada di tabel

df.info()

Karena ada beberapa kolom dengan tipe data object, maka akan diubah terlebih dahulu

In [None]:
# Mengubah nama kolom

df.rename(columns = {'Ram':'Ram_GB'}, inplace = True)
df.rename(columns = {'Weight':'Weight_KG'}, inplace = True)

In [None]:
# Menganti tipe data menjadi int dan float dan menghapus "kg" "GB" 

df['Weight_KG'] = df['Weight_KG'].str.replace('kg','')
df['Ram_GB'] = df['Ram_GB'].str.replace('GB','')

df["Ram_GB"] = df["Ram_GB"].astype(int)
df["Weight_KG"] = df["Weight_KG"].astype(float)

df.head()

In [None]:
df.info()

In [None]:
df['ScreenResolution'].value_counts()

Pada kolom ScreenResolution ada beberapa fitur yang tertulis seperti touchscreen dan layar IPS, maka akan dibuat kolom baru yaitu kolom Touchscreen dan kolom IPS

In [None]:
# Membuat kolom Touchscreen

df['TouchScreen']=df['ScreenResolution'].apply(lambda x:1 if 'Touchscreen' in x else 0)

# 1 = Touchscreen
# 0 = No Touchscreen

df.sample(5)

In [None]:
# Membuat kolom IPS

df['IPS']=df['ScreenResolution'].apply(lambda x:1 if 'IPS' in x else 0)

# 1 = Touchscreen
# 0 = No Touchscreen

df.sample(5)

In [None]:
# Merapikan kolom ScreenResolution

df["ScreenResolution"] = df.ScreenResolution.str.split(" ").apply(lambda x: x[-1])

df.sample(5)

In [None]:
df['Cpu'].value_counts()

Karena ada banyak jenis Cpu, maka akan dibuat kolom baru yang berisi tipe Cpu apa yang digunakan, yaitu kolom Cpu_Name dan Cpu_Brand, dimana nanti kita akan menggunakan kolom Cpu_Brand kedepannya

In [None]:
# Membuat kolom Cpu_Name

df['Cpu_Name']=df['Cpu'].apply(lambda x:" ".join(x.split()[0:3]))

In [None]:
# Fungsi yang memilah Cpu

def fetch_processor(text):
    if text == 'Intel Core i7' or text == 'Intel Core i5' or text == 'Intel Core i3':
        return text
    else:
        if text.split()[0] == 'Intel':
            return 'Other Intel Processor'
        else:
            return 'AMD Processor'
        
df['Cpu_Brand']=df['Cpu_Name'].apply(fetch_processor)

df.sample(7)

In [None]:
# Menghapus kolom Cpu dan Cpu_Name

df.drop(columns=['Cpu','Cpu_Name'],inplace=True)

df.sample(5)

In [None]:
df['Memory'] = df['Memory'].astype(str).replace('\.0', '', regex=True)
df["Memory"] = df["Memory"].str.replace('GB', '')
df["Memory"] = df["Memory"].str.replace('TB', '000')
new = df["Memory"].str.split("+", n = 1, expand = True)

df["first"]= new[0]
df["first"]=df["first"].str.strip()

df["second"]= new[1]

df["Layer1HDD"] = df["first"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer1SSD"] = df["first"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer1Hybrid"] = df["first"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer1Flash_Storage"] = df["first"].apply(lambda x: 1 if "Flash Storage" in x else 0)

df['first'] = df['first'].str.replace(r'\D', '', regex=True)

df["second"].fillna("0", inplace = True)

df["Layer2HDD"] = df["second"].apply(lambda x: 1 if "HDD" in x else 0)
df["Layer2SSD"] = df["second"].apply(lambda x: 1 if "SSD" in x else 0)
df["Layer2Hybrid"] = df["second"].apply(lambda x: 1 if "Hybrid" in x else 0)
df["Layer2Flash_Storage"] = df["second"].apply(lambda x: 1 if "Flash Storage" in x else 0)

df['second'] = df['second'].str.replace(r'\D', '', regex=True)

df["first"] = df["first"].astype(int)
df["second"] = df["second"].astype(int)

df["HDD"]=(df["first"]*df["Layer1HDD"]+df["second"]*df["Layer2HDD"])
df["SSD"]=(df["first"]*df["Layer1SSD"]+df["second"]*df["Layer2SSD"])
df["Hybrid"]=(df["first"]*df["Layer1Hybrid"]+df["second"]*df["Layer2Hybrid"])
df["Flash_Storage"]=(df["first"]*df["Layer1Flash_Storage"]+df["second"]*df["Layer2Flash_Storage"])

df.drop(columns=['first', 'second', 'Layer1HDD', 'Layer1SSD', 'Layer1Hybrid',
       'Layer1Flash_Storage', 'Layer2HDD', 'Layer2SSD', 'Layer2Hybrid',
       'Layer2Flash_Storage'],inplace=True)

In [None]:
# Menghapus kolom memori

df.drop(columns=['Memory'],inplace=True)

In [None]:
# Menghapua kolom Hybrid dan Flash Storage

df.drop(columns=['Hybrid','Flash_Storage'],inplace=True)

In [None]:
df['Gpu'].value_counts()

Selanjutnya data Gpu akan diolah menjadi 3 kategori berdasarkan merknya  

In [None]:
df['Gpu brand'] = df['Gpu'].apply(lambda x:x.split()[0])

In [None]:
df.sample(5)

In [None]:
df['Gpu brand'].value_counts()

In [None]:
# Menghapus Gpu Merk ARM karena hanya ada satu

df = df[df['Gpu brand'] != 'ARM']


In [None]:
# Menghapus kolom Gpu 

df.drop(columns=['Gpu'],inplace=True)

In [None]:
df['OpSys'].value_counts()

Selanjutnya akan dibuat fungsi untuk memilah sistem operasi apa yang digunakan oleh laptop

In [None]:
def cat_os(inp):
    if inp == 'Windows 10' or inp == 'Windows 7' or inp == 'Windows 10 S':
        return 'Windows'
    elif inp == 'macOS' or inp == 'Mac OS X':
        return 'Mac'
    else:
        return 'Others/No OS/Linux'

In [None]:
df['OS'] = df['OpSys'].apply(cat_os)


In [None]:
# Menghapus kolom OpSys (dan diganti menjadi kolom OS)

df.drop(columns=['OpSys'],inplace=True)

In [None]:
df.sample(5)

# Modelling

In [None]:
sns.heatmap(df.isnull())

In [None]:
df.describe()

- Visualisasi Data

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(),annot=True)

- Jumlah laptop berdasarkan merk

In [None]:
Merk = df.groupby('Company').count()[['Price_euros']].sort_values(by='Price_euros',ascending=True).reset_index()
Merk = models.rename(columns={'Price_euros':'numberOfLaptops'})

In [None]:
fig = plt.figure(figsize=(15,5))
sns.barplot(x=Merk['Company'], y=Merk['numberOfLaptops'], color='royalblue')
plt.xticks(rotation=60)

- Jumlah laptop berdasarkan memori HDD

In [None]:
Memori= df.groupby('HDD').count()[['Price_euros']].sort_values(by='Price_euros').reset_index()
Memori = Memori.rename(columns={'Price_euros':'count'})

In [None]:
fig = plt.figure(figsize=(15,5))
sns.barplot(x=Memori['HDD'], y=Memori['count'], color='royalblue')
plt.xticks(rotation=60)

- Jumlah laptop berdasarkan memori SSD

In [None]:
Memori= df.groupby('SSD').count()[['Price_euros']].sort_values(by='Price_euros').reset_index()
Memori = Memori.rename(columns={'Price_euros':'count'})

In [None]:
fig = plt.figure(figsize=(15,5))
sns.barplot(x=Memori['SSD'], y=Memori['count'], color='royalblue')
plt.xticks(rotation=60)

- Distribusi SSD

In [None]:
plt.figure(figsize=(15,5))
sns.distplot(df['Ram_GB'])

- Distribusi harga

In [None]:
plt.figure(figsize=(15,5))
sns.distplot(df['Price_euros'])

In [None]:
df.info()

- Seleksi Fitur

In [None]:
features = ['Ram_GB','Weight_KG','TouchScreen','IPS', 'HDD', 'SSD']
x = df[features]
y = df['Price_euros']
x.shape, y.shape

- Split data training dan data testing

In [None]:
from sklearn.model_selection import train_test_split
x_train, X_test, y_train, y_test = train_test_split(x,y,random_state=70)
y_test.shape

- Membuat model regresi linier

In [None]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(x_train,y_train)
pred = lr.predict(X_test)


In [None]:
score = lr.score(X_test, y_test)
print('akurasi model regresi linier = ', score)

# Evaluation

- Membuat inputan model regresi linier


In [None]:
#['Ram(GB)','Weight(KG)','TouchScreen','IPS', 'HDD', 'SSD']
input_data = np.array([[8,15.4,0,1,0,512]])

prediction = lr.predict(input_data)
print('Estimasi harga Laptop dalam euro :', prediction)

- Save model

In [None]:
import pickle

filename = 'estimasi_harga_laptop.sav'
pickle.dump(lr,open(filename,'wb'))