## Import Library
The library used in this project is Pandas. Pandas is a Python library used for data processing, such as analyzing, cleaning, exploring, and manipulating data. The name Pandas comes from the abbreviation "Panel Data" and "Python Data Analysis"

In [7]:
import pandas as pd
pd.__version__

'2.2.2'

## Load Data
After importing the pandas library, next we will call the csv file that we previously saved in Google Drive. This is necessary for further data management.

In [9]:
df = pd.read_csv('Mobile_phone_price.csv')

In [11]:
df

Unnamed: 0,Brand,Model,Storage,RAM,Screen Size (inches),Camera (MP),Battery Capacity (mAh),Price ($)
0,Apple,iPhone 13 Pro,128 GB,6 GB,6.1,12 + 12 + 12,3095,999
1,Samsung,Galaxy S21 Ultra,256 GB,12 GB,6.8,108 + 10 + 10 + 12,5000,1199
2,OnePlus,9 Pro,128 GB,8 GB,6.7,48 + 50 + 8 + 2,4500,899
3,Xiaomi,Redmi Note 10 Pro,128 GB,6 GB,6.67,64 + 8 + 5 + 2,5020,279
4,Google,Pixel 6,128 GB,8 GB,6.4,50 + 12.2,4614,799
...,...,...,...,...,...,...,...,...
402,Samsung,Galaxy Note20 5G,128,8,6.7,12+64+12,4300,1049
403,Xiaomi,Mi 10 Lite 5G,128,6,6.57,48+8+2+2,4160,349
404,Apple,iPhone 12 Pro Max,128,6,6.7,12+12+12,3687,1099
405,Oppo,Reno3,128,8,6.4,48+13+8+2,4025,429


## General Dataframe Information
To see general information from a dataframe (df) we use the following methods:
1. df.shape: to find out the number of rows and columns of the dataframe
2. df.info: to find out the number of columns, column labels, column data types
3. df.describe: to find out the general description of the dataframe in the form of descriptive statistics, such as: count, mean, standard deviation (std), minimum value (min), 25th percentile (Q1), median (50th percentile or Q2), 75th percentile (Q3) and maximum value (max)
4. df.columns: to find out the column titles in the dataframe

In [14]:
df.shape

(407, 8)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 407 entries, 0 to 406
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Brand                   407 non-null    object
 1   Model                   407 non-null    object
 2   Storage                 407 non-null    object
 3   RAM                     407 non-null    object
 4   Screen Size (inches)    407 non-null    object
 5   Camera (MP)             407 non-null    object
 6   Battery Capacity (mAh)  407 non-null    int64 
 7   Price ($)               407 non-null    object
dtypes: int64(1), object(7)
memory usage: 25.6+ KB


In [18]:
df.describe()

Unnamed: 0,Battery Capacity (mAh)
count,407.0
mean,4676.476658
std,797.193713
min,1821.0
25%,4300.0
50%,5000.0
75%,5000.0
max,7000.0


In [20]:
df.columns

Index(['Brand', 'Model', 'Storage ', 'RAM ', 'Screen Size (inches)',
       'Camera (MP)', 'Battery Capacity (mAh)', 'Price ($)'],
      dtype='object')

## Analysis
In this section we will do some analysis, namely: Storage Column Analysis, RAM Column Analysis, Screen Size Column Analysis and Price Column Analysis

### Storage Column Analysis

In [25]:
## Pertama kita akan melihat nilai unik dari dataframe khususnya pada kolom Storage
df['Storage '].unique()

array(['128 GB', '256 GB', '64 GB', '32 GB', '128GB', '256GB', '64GB',
       '32GB', '256', '64', '128', '512', '32'], dtype=object)

The result given from the df['Storage '].unique() command above is an array of unique values ​​in the Storage column. If we see that **in addition to all data types in the Storage column still being object types, there is a dissimilarity in the contents of the Storage column. Where there is data that contains the word GB and does not. For this, we need to delete the word GB**. The following code is an example:

In [28]:
s = '128 GB' # Kita definisikan sebuah variabel s dengan nilai string 128<spasi>GB
s = s.replace(' GB', '') # mereplace string '<spasi>GB' dari string variabel s untuk kemudian disimpan kembali dalam variabel s
s # tampilkan isi dari variabel s, hasil yang diberikan adalah 128 saja tanpa GB

'128'

In [30]:
type(s) # cek tipe data dari variabel s

str

The following code is the actual execution to remove the word GB from the data from the Storage column that still contains GB. In addition, we will also convert the data in the Storage column into an integer data type. **This conversion is needed so that when describing the Storage column, a general description is produced in the form of descriptive statistics, such as: number (count), average (mean), standard deviation (std), minimum value (min), 25th percentile (Q1), median (50th percentile or Q2), 75th percentile (Q3) and maximum value (max).**

In [33]:
df['Storage '] = df['Storage '].str.replace(' ', '')
df['Storage '] = df['Storage '].str.replace('GB', '')
df['Storage '] = df['Storage '].astype(int)

In [35]:
df

Unnamed: 0,Brand,Model,Storage,RAM,Screen Size (inches),Camera (MP),Battery Capacity (mAh),Price ($)
0,Apple,iPhone 13 Pro,128,6 GB,6.1,12 + 12 + 12,3095,999
1,Samsung,Galaxy S21 Ultra,256,12 GB,6.8,108 + 10 + 10 + 12,5000,1199
2,OnePlus,9 Pro,128,8 GB,6.7,48 + 50 + 8 + 2,4500,899
3,Xiaomi,Redmi Note 10 Pro,128,6 GB,6.67,64 + 8 + 5 + 2,5020,279
4,Google,Pixel 6,128,8 GB,6.4,50 + 12.2,4614,799
...,...,...,...,...,...,...,...,...
402,Samsung,Galaxy Note20 5G,128,8,6.7,12+64+12,4300,1049
403,Xiaomi,Mi 10 Lite 5G,128,6,6.57,48+8+2+2,4160,349
404,Apple,iPhone 12 Pro Max,128,6,6.7,12+12+12,3687,1099
405,Oppo,Reno3,128,8,6.4,48+13+8+2,4025,429


Now we will check the Storage data type by using the df.info command.

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 407 entries, 0 to 406
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Brand                   407 non-null    object
 1   Model                   407 non-null    object
 2   Storage                 407 non-null    int64 
 3   RAM                     407 non-null    object
 4   Screen Size (inches)    407 non-null    object
 5   Camera (MP)             407 non-null    object
 6   Battery Capacity (mAh)  407 non-null    int64 
 7   Price ($)               407 non-null    object
dtypes: int64(2), object(6)
memory usage: 25.6+ KB


Now we will check the overview of the dataframe using df.describe. Here we see the difference with the previous describe result which only displays descriptive statistical data from the Battery Capacity (mAh) column, while the latest describe result also displays descriptive statistical data from the Storage column.

In [40]:
df.describe()

Unnamed: 0,Storage,Battery Capacity (mAh)
count,407.0,407.0
mean,123.046683,4676.476658
std,64.96316,797.193713
min,32.0,1821.0
25%,64.0,4300.0
50%,128.0,5000.0
75%,128.0,5000.0
max,512.0,7000.0


## RAM Column Analysis

In [11]:
## Pertama kita akan melihat nilai unik dari dataframe khususnya pada kolom RAM
df['RAM '].unique()

array(['6 GB', '12 GB', '8 GB', '4 GB', '3 GB', '2 GB', '4GB', '8GB',
       '6GB', '12GB', '3GB', '2GB', '5GB', '12', '3', '6', '8', '4', '16',
       '2'], dtype=object)

The result given from the df['RAM '].unique() command above is an array of unique values ​​in the RAM column. If we see that in addition to all data types in the RAM column still being object types, there is a disparity in the contents of the RAM column. Where there is data that contains the word GB and does not. For this, we need to delete the word GB on all data from the RAM column that still contains GB. In addition, we will also convert the data in the RAM column into an integer data type

In [14]:
df['RAM '] = df['RAM '].str.replace(' ', '')
df['RAM '] = df['RAM '].str.replace('GB', '')
df['RAM '] = df['RAM '].astype(int)

Now we will check the RAM data type using the df.info command

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 407 entries, 0 to 406
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Brand                   407 non-null    object
 1   Model                   407 non-null    object
 2   Storage                 407 non-null    object
 3   RAM                     407 non-null    int64 
 4   Screen Size (inches)    407 non-null    object
 5   Camera (MP)             407 non-null    object
 6   Battery Capacity (mAh)  407 non-null    int64 
 7   Price ($)               407 non-null    object
dtypes: int64(2), object(6)
memory usage: 25.6+ KB


Now we will check the general description of the dataframe using df.describe. Here we see the difference with the previous describe results which only display descriptive statistical data from the Battery Capacity (mAh) and Storage columns, while the latest describe results display descriptive statistical data from the RAM column

In [20]:
df.describe()

Unnamed: 0,RAM,Battery Capacity (mAh)
count,407.0,407.0
mean,5.837838,4676.476658
std,2.43198,797.193713
min,2.0,1821.0
25%,4.0,4300.0
50%,6.0,5000.0
75%,8.0,5000.0
max,16.0,7000.0


## Price Column Analysis

In [23]:
## Pertama kita akan melihat nilai unik dari dataframe khususnya pada kolom Price
df['Price ($)'].unique()

array(['999', '1199', '899', '279', '799', '249', '699', '329', '449',
       '199', '299', '379', '179', '729', '599', '139', '189', '399',
       '259', '159', '229', '499', '129', '529', '369', '1099', '169',
       '99', '459', '239', '1299', '429', '659', '269', '359', '$799 ',
       '$399 ', '$699 ', '$329 ', '$999 ', '$549 ', '$1,299 ', '$899 ',
       '$449 ', '$319 ', '$269 ', '$349 ', '$279 ', '$249 ', '$299 ',
       '$969 ', '$1,199 ', '$149 ', '$139 ', '$99 ', '$199 ', '$169 ',
       '$499 ', '$179 ', '$219 ', '$229 ', '$239 ', '$109 ', '$189 ',
       '$389 ', '$309 ', '$369 ', '$129 ', '$849 ', '$469 ', '$209 ',
       '$119 ', '$339 ', '$429 ', '$159 ', '$379 ', '$289 ', '130', '749',
       '149', '969', '649', '349', '419', '1399', '1999', '119', '319',
       '1049'], dtype=object)

The result given by the df['Price (dollar sign)'].unique command above is an array of unique values ​​in the Price column (dollar sign). If we see that in addition to all data types in the Price column (dollar sign) are still object types, there is a difference in the contents of the Price column (dollar sign). Where there is data that contains dollar signs, commas (,) and spaces. For this, we need to replace these symbols in all data from the Price column (dollar sign) that still contains these symbols. In addition, we will also convert the data in the Price column (dollar sign) into an integer data type.

In [26]:
df['Price ($)'] = df['Price ($)'].str.replace('$', '')
df['Price ($)'] = df['Price ($)'].str.replace(',', '')
df['Price ($)'] = df['Price ($)'].str.replace(' ', '')
df['Price ($)'] = df['Price ($)'].astype(int)

Now we will check the Price data type using the df.info command

In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 407 entries, 0 to 406
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Brand                   407 non-null    object
 1   Model                   407 non-null    object
 2   Storage                 407 non-null    object
 3   RAM                     407 non-null    int64 
 4   Screen Size (inches)    407 non-null    object
 5   Camera (MP)             407 non-null    object
 6   Battery Capacity (mAh)  407 non-null    int64 
 7   Price ($)               407 non-null    int64 
dtypes: int64(3), object(5)
memory usage: 25.6+ KB


Now we will check the general description of the dataframe using df.describe. Here we see the difference with the previous describe results which only displayed descriptive statistical data from the Battery Capacity (mAh), Storage and RAM columns, while the latest describe command results also display descriptive statistical data from the Price column

In [32]:
df.describe()

Unnamed: 0,RAM,Battery Capacity (mAh),Price ($)
count,407.0,407.0,407.0
mean,5.837838,4676.476658,408.314496
std,2.43198,797.193713,299.684768
min,2.0,1821.0,99.0
25%,4.0,4300.0,199.0
50%,6.0,5000.0,299.0
75%,8.0,5000.0,499.0
max,16.0,7000.0,1999.0


## Sorting
Sorting is an activity to sort data based on certain columns, either in ascending or descending order.

In [37]:
df

Unnamed: 0,Brand,Model,Storage,RAM,Screen Size (inches),Camera (MP),Battery Capacity (mAh),Price ($)
0,Apple,iPhone 13 Pro,128 GB,6,6.1,12 + 12 + 12,3095,999
1,Samsung,Galaxy S21 Ultra,256 GB,12,6.8,108 + 10 + 10 + 12,5000,1199
2,OnePlus,9 Pro,128 GB,8,6.7,48 + 50 + 8 + 2,4500,899
3,Xiaomi,Redmi Note 10 Pro,128 GB,6,6.67,64 + 8 + 5 + 2,5020,279
4,Google,Pixel 6,128 GB,8,6.4,50 + 12.2,4614,799
...,...,...,...,...,...,...,...,...
402,Samsung,Galaxy Note20 5G,128,8,6.7,12+64+12,4300,1049
403,Xiaomi,Mi 10 Lite 5G,128,6,6.57,48+8+2+2,4160,349
404,Apple,iPhone 12 Pro Max,128,6,6.7,12+12+12,3687,1099
405,Oppo,Reno3,128,8,6.4,48+13+8+2,4025,429


In [39]:
# Berikut adalah kode untuk mengurutkan data-data pada kolom Battery Capacity (mAh) secara descending
df.sort_values(by = 'Battery Capacity (mAh)', ascending = False)

Unnamed: 0,Brand,Model,Storage,RAM,Screen Size (inches),Camera (MP),Battery Capacity (mAh),Price ($)
208,Samsung,Galaxy M62,128GB,8,6.7,64MP + 12MP + 5MP + 5MP,7000,429
334,Samsung,Galaxy M51,128,6,6.7,64+12+5+5,7000,449
205,Motorola,Moto G60,128GB,6,6.8,108MP + 8MP + 2MP,6000,299
35,Realme,C25s,128 GB,4,6.5,13 + 2 + 2,6000,159
372,Motorola,Moto G9 Power,128,4,6.8,64+2+2,6000,229
...,...,...,...,...,...,...,...,...
32,Apple,iPhone SE (2nd Gen),64 GB,3,4.7,12,1821,399
62,Apple,iPhone SE (2020),64 GB,3,4.7,12,1821,399
289,Apple,iPhone SE (2020),64,3,4.7,12,1821,399
333,Apple,iPhone SE (2020),64,3,4.7,12,1821,399


In [41]:
df.sort_values(by = ['Battery Capacity (mAh)', 'Price ($)'], ascending = [False, True])

Unnamed: 0,Brand,Model,Storage,RAM,Screen Size (inches),Camera (MP),Battery Capacity (mAh),Price ($)
208,Samsung,Galaxy M62,128GB,8,6.7,64MP + 12MP + 5MP + 5MP,7000,429
334,Samsung,Galaxy M51,128,6,6.7,64+12+5+5,7000,449
119,Realme,Narzo 50A,64GB,4,6.5,50MP + 2MP,6000,149
128,Nokia,C30,64GB,3,6.82,13MP + 2MP,6000,149
175,Realme,C25s,128GB,4,6.5,13MP + 2MP + 2MP,6000,149
...,...,...,...,...,...,...,...,...
32,Apple,iPhone SE (2nd Gen),64 GB,3,4.7,12,1821,399
62,Apple,iPhone SE (2020),64 GB,3,4.7,12,1821,399
289,Apple,iPhone SE (2020),64,3,4.7,12,1821,399
333,Apple,iPhone SE (2020),64,3,4.7,12,1821,399
