## Data Gathering & Business Understanding

# Business Objective

Tujuan utama proyek ini adalah untuk memahami dinamika pasar produk *fitness tracker* di India guna mendukung strategi bisnis yang lebih efektif, termasuk penentuan harga, fitur yang diminati, serta strategi pemasaran yang lebih baik. Hal ini meliputi analisis faktor-faktor yang memengaruhi harga *fitness tracker*, preferensi pelanggan, dan perilaku belanja untuk memandu pengembangan produk yang sesuai dengan kebutuhan pasar serta meningkatkan kepuasan pelanggan. Selain itu, proyek ini juga mencakup evaluasi efektivitas strategi diskon dan tren pemasaran berdasarkan analisis data yang mendalam.

# Assess Situation

Situasi bisnis yang mendasari analisis ini adalah berkembangnya pasar *fitness tracker* di India akibat meningkatnya kesadaran masyarakat terhadap kesehatan dan kebugaran. Pertumbuhan pesat perangkat wearable di India disertai dengan persaingan ketat antar merek dengan harga dan spesifikasi beragam, yang menjadi faktor utama dalam keputusan pembelian pelanggan. Ulasan dan rating pelanggan dapat memberikan wawasan mengenai tren pasar dan tingkat kepuasan pelanggan. Oleh karena itu, analisis data diperlukan untuk memahami bagaimana tren pasar berkembang serta untuk mengidentifikasi peluang dan tantangan dalam menghadapi kompetisi industri ini.

# Data Mining Goals

Tujuan dari analisis data ini adalah mengidentifikasi jenis *fitness tracker* yang paling diminati pelanggan dan memahami faktor-faktor yang berkontribusi terhadap keputusan pembelian. Analisis dilakukan dengan mengeksplorasi pola transaksi, harga, spesifikasi produk, dan ulasan pelanggan untuk mendapatkan wawasan yang berguna dalam pengembangan produk serta strategi pemasaran yang lebih tepat sasaran.


# Project Plan

Proyek ini akan dilakukan melalui beberapa tahapan:

1.   Data Understanding
- Dataset yang digunakan dalam proyek ini diperoleh dari Kaggle, yang mencakup informasi mengenai harga, spesifikasi produk, rating pelanggan, dan ulasan.
2.   Data Preparation
- Memeriksa kualitas data dengan menangani nilai yang hilang (missing values), duplikasi, serta inkonsistensi dalam format data.
- Memahami distribusi variabel utama, seperti harga, tipe perangkat, dan rating pelanggan, guna mendapatkan gambaran awal tentang pola data.
3.   Visualisasi
- Mengidentifikasi merek dan model fitness tracker yang paling populer berdasarkan jumlah transaksi dan rating pelanggan.
-Menganalisis distribusi harga dan spesifikasi produk, seperti daya tahan baterai, tipe layar, serta fitur tambahan lainnya.
- Memvisualisasikan tren harga dan preferensi pelanggan menggunakan grafik dan diagram untuk memudahkan interpretasi data.
-Menyusun analisis yang sesuai untuk melihat bagaimana faktor-faktor tertentu memengaruhi kepuasan pelanggan.
4.   Dashbord
- Mengembangkan dashboard dengan data dari berbagai perspektif dalam satu
halaman yang menyajikan informasi penting, seperti tren harga, produk terlaris, distribusi rating, dan pola pembelian pelanggan.
- Memastikan dashboard dapat membantu dalam memahami dinamika pasar dan mendukung pengambilan keputusan bisnis.
5.   Insight and Action
- Menyimpulkan temuan utama dari hasil analisis, seperti tren harga, fitur yang paling diminati pelanggan, serta tingkat kepuasan berdasarkan rating.
- emberikan rekomendasi berbasis data bagi produsen dan penjual, seperti strategi pemasaran yang lebih efektif, penyesuaian harga, serta fitur yang perlu ditingkatkan dalam produk mendatang.
- Menyusun langkah-langkah strategis untuk meningkatkan daya saing produk fitness tracker di pasar India.






# Data Preparation

# Membaca dan Menampilkan Dataset

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv("/content/Fitness_trackers.csv")

In [4]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days),Reviews
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14,
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14,
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14,
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14,
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7,
...,...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14,
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14,
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7,
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7,


# Data Cleaning

Data cleaning merupakan proses menghapus atau memodifikasi data yang tidak lengkap, duplikat, tidak akurat, dan salah format. Data-data tersebut dihapus atau dimodifikasi untuk memastikan data yang sedang diolah adalah data berkualitas agar dapat menghasilkan keputusan yang lebih akurat.

# Missing Value

Missing Values adalah sebuah kondisi dimana data hilang atau tidak terbaca.

Melakukan pengecekan missing values perkolom dalam bentuk presentase, guna mendukung pengambilan keputusan pada nilai yang kosong.

In [5]:
print((df.isna().sum() / len(df)) * 100)

Brand Name                         0.000000
Device Type                        0.000000
Model Name                         0.000000
Color                              0.000000
Selling Price                      0.000000
Original Price                     0.000000
Display                            0.000000
Rating (Out of 5)                  9.026549
Strap Material                     0.000000
Average Battery Life (in days)     0.000000
Reviews                           86.194690
dtype: float64


Melakukan pengecekan nilai kosong pada kolom Rating (Out of 5)

In [6]:
df[df['Rating (Out of 5)'].isnull()]

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days),Reviews
455,GARMIN,Smartwatch,Fenix 6 Pro Solar,"Grey, Black",72990,88490,AMOLED Display,,Nylon,45,
456,GARMIN,Smartwatch,Fenix 6X,"Black, Orange, Red",79990,88490,AMOLED Display,,Nylon,45,
457,GARMIN,Smartwatch,FORERUNNER 745 Magma Red,Magma Red,46990,51990,AMOLED Display,,Silicone,14,
458,GARMIN,Smartwatch,Fenix 6,"Black, Red, Orange",76990,86690,LED Display,,Nylon,30,
462,GARMIN,Smartwatch,Vivomove 3,Black,19990,25990,OLED Display,,Silicone,7,
463,GARMIN,Smartwatch,Fenix 6X Solar,"Black, Grey",105990,119490,AMOLED Display,,Leather,45,
464,GARMIN,Smartwatch,Fenix 6,"Grey, Blue",77990,86690,OLED Display,,Nylon,45,
465,GARMIN,Smartwatch,Fenix 5X,Black,58490,58490,OLED Display,,Silicone,30,
466,GARMIN,Smartwatch,Vivomove Luxe,"Black, Silver, White",39990,52990,AMOLED Display,,Silicone,14,
467,GARMIN,Smartwatch,Fenix 6,"Black, Grey",74990,82990,OLED Display,,Leather,30,


Melakukan pengecekan nilai kosong pada kolom Reviews

In [7]:
df[df['Reviews'].isnull()]

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days),Reviews
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14,
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14,
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14,
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14,
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7,
...,...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14,
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14,
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7,
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7,


# Mengatasi Missing Values

Penanganan Missing Values dilakukan dengan dua cara, yaitu imputasi dan dihapus, dengan catatan:
- **Dihapus** jika data yang hilang atau tidak terbaca mencapai >= 70%
- Dapat **dihapus atau diimputasi** apabila data yang hilang masih <=70%, diimputasi dengan cara mengisi data yang kosong menggunakan mean, median, modus dari kolom terkait

# Menghapus Missing Values

Disini kita akan menghapus kolom Estimasi_Kerugian_Rp dengan beberapa kondisim:
- Karena presentase data yang hilang mencapai 86.19%
- Memiliki kondisi yang memungkinkan untuk dihapus

In [8]:
df = df.drop('Reviews', axis=1)

In [9]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7
...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7


# Melakukan Imputasi

Di sini kita akan melakukan imputasi pada kolom Rating (Out of 5) dikarenakan data yang hilang pada kolom ini berada di angka 9.02%

In [10]:
df['Rating (Out of 5)'].dropna().describe()

Unnamed: 0,Rating (Out of 5)
count,514.0
mean,4.229961
std,0.390827
min,2.0
25%,4.025
50%,4.3
75%,4.5
max,5.0


Disini berdasarkan hasil analisis statistik, kita dapat menggunakan mean dari kolom Rating (Out of 5) untuk imputasi nilai kosong tersebut

In [11]:
df['Rating (Out of 5)'] = df['Rating (Out of 5)'].fillna(df['Rating (Out of 5)'].dropna().mean())

In [13]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7
...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7


In [14]:
pd.DataFrame(df.isna().sum() / len(df) * 100, columns=['Null Ratio %'])

Unnamed: 0,Null Ratio %
Brand Name,0.0
Device Type,0.0
Model Name,0.0
Color,0.0
Selling Price,0.0
Original Price,0.0
Display,0.0
Rating (Out of 5),0.0
Strap Material,0.0
Average Battery Life (in days),0.0


# Duplicated Values

Duplicated Values adalah sebuah kondisi dimana ada data yang muncul beberapa kali dalam satu data set.



In [15]:
df[df.duplicated()]

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
50,FitBit,Smartwatch,Versa 3,"Black, Blue, Pink",17999,18999,AMOLED Display,4.3,Elastomer,7
383,APPLE,Smartwatch,Series 6 GPS + Cellular 40 mm Red Aluminium Case,Red,45690,49900,OLED Retina Display,4.5,Aluminium,1
391,APPLE,Smartwatch,Series 6 GPS 44 mm Blue Aluminium Case,Deep Navy,43900,43900,OLED Retina Display,4.5,Aluminium,1
512,GARMIN,Smartwatch,Vivomove 3S,Black,19990,25990,AMOLED Display,4.229961,Silicone,14


Cara penanganannya adalah dengan menghapus semua duplikasi

In [16]:
df = df.drop_duplicates()

In [17]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7
...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7


# Outliers

Outliers adalah nilai yang jauh berbeda dari nilai lainnya dalam dataset. Nilai Outlier bisa jauh lebih rendah atau lebih tinggi. Outlier bisa terjadi karena berbagai alasan seperti faktor kesalahan maupun kejadian lain yang tidak terduga.

Melakukan pengecekan outliers:

In [18]:
results = []

cols = df.select_dtypes(include=['float64', 'int64'])

for col in cols:
  q1 = df[col].quantile(0.25)
  q3 = df[col].quantile(0.75)
  iqr = q3 - q1
  lower_bound = q1 - 1.5*iqr
  upper_bound = q3 + 1.5*iqr
  outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
  percent_outliers = (len(outliers)/len(df))*100
  results.append({'Kolom': col, 'Persentase Outliers': percent_outliers})

# Dataframe dari list hasil
results_df = pd.DataFrame(results)
results_df.set_index('Kolom', inplace=True)
results_df = results_df.rename_axis(None, axis=0).rename_axis('Kolom', axis=1)

# Tampilkan dataframe
display(results_df)

Kolom,Persentase Outliers
Rating (Out of 5),3.743316
Average Battery Life (in days),1.426025


Dikarenakan presentase yang tidak terlalu tinggi namun merupakan variable yang penting, maka dilakukan imputasi dengan metode yang paling sesuai, yaitu Winsorizing karena tetap mempertahankan struktur data tanpa membuang nilai ekstrem.

Melakukan imputasi:

In [19]:
columns_to_impute = ["Rating (Out of 5)", "Average Battery Life (in days)"]

for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

Melakukan pengecekan ulang:

In [20]:
results = []

cols = df.select_dtypes(include=['float64', 'int64'])

for col in cols:
  q1 = df[col].quantile(0.25)
  q3 = df[col].quantile(0.75)
  iqr = q3 - q1
  lower_bound = q1 - 1.5*iqr
  upper_bound = q3 + 1.5*iqr
  outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
  percent_outliers = (len(outliers)/len(df))*100
  results.append({'Kolom': col, 'Persentase Outliers': percent_outliers})

# Dataframe dari list hasil
results_df = pd.DataFrame(results)
results_df.set_index('Kolom', inplace=True)
results_df = results_df.rename_axis(None, axis=0).rename_axis('Kolom', axis=1)

# Tampilkan dataframe
display(results_df)

Kolom,Persentase Outliers
Rating (Out of 5),0.0
Average Battery Life (in days),0.0


In [21]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099,2499,AMOLED Display,4.2,Thermoplastic polyurethane,14
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722,2099,LCD Display,3.5,Leather,14
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469,2999,AMOLED Display,4.1,Thermoplastic polyurethane,14
4,Xiaomi,FitnessBand,Band 3,Black,1799,2199,OLED Display,4.3,Plastic,7
...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000,55000,AMOLED Display,4.1,Silicone,14
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990,20990,AMOLED Display,4.1,Elastomer,14
562,GOQii,FitnessBand,HR,Black,1999,1999,OLED Display,3.8,Silicone,7
563,GOQii,FitnessBand,Vital,Black,3499,3499,OLED Display,3.7,Thermoplastic polyurethane,7


# Incosistent Value

Sebenarnya penanganan incosistent values bisa dilakukan secara manual, tapi disini akan diberikan contoh penggunaan pandas dalam menangani insosistent values

In [22]:
df["Brand Name"] = df["Brand Name"].str.strip()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Brand Name"] = df["Brand Name"].str.strip()


In [23]:
df["Strap Material"] = df["Strap Material"].str.strip().str.capitalize()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Strap Material"] = df["Strap Material"].str.strip().str.capitalize()


In [24]:
df["Selling Price"] = df["Selling Price"].astype(str).str.replace(",", "").astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Selling Price"] = df["Selling Price"].astype(str).str.replace(",", "").astype(float)


In [25]:
df["Original Price"] = df["Original Price"].str.replace(",", "").astype(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Original Price"] = df["Original Price"].str.replace(",", "").astype(float)


In [26]:
df.head()

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days)
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099.0,2499.0,AMOLED Display,4.2,Thermoplastic polyurethane,14
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722.0,2099.0,LCD Display,3.5,Leather,14
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14
4,Xiaomi,FitnessBand,Band 3,Black,1799.0,2199.0,OLED Display,4.3,Plastic,7


# Construct Data

In [27]:
df["Discount Percentage"] = ((df["Original Price"] - df["Selling Price"]) / df["Original Price"]) * 100


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Discount Percentage"] = ((df["Original Price"] - df["Selling Price"]) / df["Original Price"]) * 100


In [None]:
df["Discount Percentage"] = df["Discount Percentage"].round(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Discount Percentage"] = df["Discount Percentage"].round(2)


In [None]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days),Discount Percentage
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14,16.67
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099.0,2499.0,AMOLED Display,4.2,Thermoplastic polyurethane,14,16.01
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722.0,2099.0,LCD Display,3.5,Leather,14,17.96
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14,17.67
4,Xiaomi,FitnessBand,Band 3,Black,1799.0,2199.0,OLED Display,4.3,Plastic,7,18.19
...,...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000.0,55000.0,AMOLED Display,4.1,Silicone,14,0.00
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990.0,20990.0,AMOLED Display,4.1,Elastomer,14,33.35
562,GOQii,FitnessBand,HR,Black,1999.0,1999.0,OLED Display,3.8,Silicone,7,0.00
563,GOQii,FitnessBand,Vital,Black,3499.0,3499.0,OLED Display,3.7,Thermoplastic polyurethane,7,0.00


In [28]:
df["Price Per Feature"] = df["Selling Price"] / df["Average Battery Life (in days)"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Price Per Feature"] = df["Selling Price"] / df["Average Battery Life (in days)"]


In [29]:
df["Price Per Feature"] = df["Price Per Feature"].round(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Price Per Feature"] = df["Price Per Feature"].round(2)


In [30]:
df

Unnamed: 0,Brand Name,Device Type,Model Name,Color,Selling Price,Original Price,Display,Rating (Out of 5),Strap Material,Average Battery Life (in days),Discount Percentage,Price Per Feature
0,Xiaomi,FitnessBand,Smart Band 5,Black,2499.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14,16.672224,178.50
1,Xiaomi,FitnessBand,Smart Band 4,Black,2099.0,2499.0,AMOLED Display,4.2,Thermoplastic polyurethane,14,16.006403,149.93
2,Xiaomi,FitnessBand,HMSH01GE,Black,1722.0,2099.0,LCD Display,3.5,Leather,14,17.960934,123.00
3,Xiaomi,FitnessBand,Smart Band 5,Black,2469.0,2999.0,AMOLED Display,4.1,Thermoplastic polyurethane,14,17.672558,176.36
4,Xiaomi,FitnessBand,Band 3,Black,1799.0,2199.0,OLED Display,4.3,Plastic,7,18.190086,257.00
...,...,...,...,...,...,...,...,...,...,...,...,...
560,Huawei,Smartwatch,Watch 36456,Black,55000.0,55000.0,AMOLED Display,4.1,Silicone,14,0.000000,3928.57
561,Huawei,Smartwatch,GT Fortuna-B19S Sport,Black,13990.0,20990.0,AMOLED Display,4.1,Elastomer,14,33.349214,999.29
562,GOQii,FitnessBand,HR,Black,1999.0,1999.0,OLED Display,3.8,Silicone,7,0.000000,285.57
563,GOQii,FitnessBand,Vital,Black,3499.0,3499.0,OLED Display,3.7,Thermoplastic polyurethane,7,0.000000,499.86


In [32]:
df[["Selling Price", "Original Price", "Discount Percentage", "Price Per Feature"]].head()

Unnamed: 0,Selling Price,Original Price,Discount Percentage,Price Per Feature
0,2499.0,2999.0,16.672224,178.5
1,2099.0,2499.0,16.006403,149.93
2,1722.0,2099.0,17.960934,123.0
3,2469.0,2999.0,17.672558,176.36
4,1799.0,2199.0,18.190086,257.0
