# Which phone features most significantly influence its price?

## Introduction (300)

People use phones daily, and scholars argue that these devices evolved from luxurious items into necessities over the past years (Tanveer et al.). We use them for "calling and sending messages, capturing pictures, accessing the internet, playing games, socializing, and downloading applications." They turned from mere communication tools into daily "multimedia machines" (Tanveer et al.).

However, buying new phones can be challenging and frustrating due to the flood of features they offer (Kobie). To escape this problematic choice, consumers usually consider the advertised characteristics of the device exclusively without inquiring into whether its price corresponds to them (K. Srujan Raju et al. 773). Hence, they are likely to make an uninformed decision and overpay.

Therefore, it is essential to create a predictive model allowing people to input the phone's feature set and evaluate whether the phone's proposed price aligns with the competition and is worth paying.

Thus, we intend to create a **model that would predict a phone's market price given the device's set of characteristics**. To make such a model, in our study, we find the **phone's features that most effectively predict its price**. We intentionally plan to choose the best predictors for our model considering the following reasons:

- We want our model to contain only relevant variables so that there are no hidden relationships between the predictors
- We want to avoid overfitting but instead make our model perform well both on training and testing sets
- We want to create a small model that would be easy to use for the general public

Then, we create multiple linear regressions **(!!! which ones)** using these variables and choose the best model for the given data.

Thus, we want our model to help consumers quickly yet critically evaluate a phone's price tag and make a rational purchasing decision without being misled by advertising.

## Data set and Variables (200)

We use a data set containing specifications and prices for **1321** unique phone models for our analysis. The researchers collected the observations via scraping [gadgets360](https://www.gadgets360.com/mobiles/best-phones) - a tech news website listing and describing smartphones' features and prices - so the data for our study is reliable. Additionally, the scholars scraped and wrangled the data in **August 2020**, so the phone models their data set describes are reasonably recent.
Thus, we can create an effective regression model with this data because it is trustworthy and up-to-date.

Here's the list of variables we will consider in our analysis. We renamed them from the original data set and derived one from another two variables to ease the investigation.

Only some of these, however, will end up in the final predictive model described above.

| Variable Name      | Description                                         |
| ------------------ | --------------------------------------------------- |
| `battery_capacity` | Battery capacity in mAh                             |
| `screen_size`      | Screen Size in Inches across opposite corners       |
| `touchscreen`      | Whether the phone is touchscreen supported or not   |
| `resolution`       | The resolution of the phone: width height           |
| `processor`        | Number of processor cores                           |
| `ram`              | RAM available in phone in MB                        |
| `internal_storage` | Internal Storage of phone in GB                     |
| `rear_camera`      | Resolution of rear camera in MP (0 if unavailable)  |
| `front_camera`     | Resolution of front camera in MP (0 if unavailable) |
| `operating_system` | OS used in phone                                    |
| `gps`              | Whether phone has GPS functionality                 |
| `num_of_sims`      | Number of SIM card slots in phone                   |
| `x3g`              | Whether phone has 3G network functionality          |
| `x4g_lte`          | Whether phone has 4G/LTE network functionality      |


Here's our code for our minor wrangling where we:
- rename the variables
- create a derived variable for `resolution`
- convert the `price` from **Indian Rupee** to **US Dollar**
- preview the data set

We convert the price to USD to make the data preview and analysis more comprehensible.

In [23]:
# Installing missing packages
# https://stackoverflow.com/a/4090208/18184038
package_list <- "psych"
to_install <- package_list[!(package_list %in% installed.packages()[, "Package"])]
if (length(to_install)) install.packages(to_install)

In [35]:
library(tidyverse)
library(psych)

set.seed(1)

# Reading the data set from the web
url <- "https://raw.githubusercontent.com/Ihor16/stat-301-project/ih/rewrite_for_new_dataset/data/specs.csv"
data_raw <- read.csv(url) %>%
  as_tibble()

# Previewing the raw data set
data_raw %>%
  head(3)

X,Name,Brand,Model,Battery.capacity..mAh.,Screen.size..inches.,Touchscreen,Resolution.x,Resolution.y,Processor,⋯,Rear.camera,Front.camera,Operating.system,Wi.Fi,Bluetooth,GPS,Number.of.SIMs,X3G,X4G..LTE,Price
<int>,<chr>,<chr>,<chr>,<int>,<dbl>,<chr>,<int>,<int>,<int>,⋯,<dbl>,<dbl>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<int>
0,OnePlus 7T Pro McLaren Edition,OnePlus,7T Pro McLaren Edition,4085,6.67,Yes,1440,3120,8,⋯,48,16,Android,Yes,Yes,Yes,2,Yes,Yes,58998
1,Realme X2 Pro,Realme,X2 Pro,4000,6.5,Yes,1080,2400,8,⋯,64,16,Android,Yes,Yes,Yes,2,Yes,Yes,27999
2,iPhone 11 Pro Max,Apple,iPhone 11 Pro Max,3969,6.5,Yes,1242,2688,6,⋯,12,12,iOS,Yes,Yes,Yes,2,Yes,Yes,106900


In [36]:
# Conversion rate from INR to USD
# https://www.forbes.com/advisor/money-transfer/currency-converter/inr-usd/
rate <- 0.012282

# Renaming the variables and creating a derived variable for `resolution`
phone_data <- data_raw %>%
  select(-c(X, Name, Brand, Model, "Wi.Fi", Bluetooth)) %>%
  rename(
    battery_capacity = "Battery.capacity..mAh.",
    screen_size = "Screen.size..inches.",
    touchscreen = "Touchscreen",
    resolution_x = "Resolution.x",
    resolution_y = "Resolution.y",
    processor = "Processor",
    ram = "RAM..MB.",
    internal_storage = "Internal.storage..GB.",
    rear_camera = "Rear.camera",
    front_camera = "Front.camera",
    operating_system = "Operating.system",
    gps = "GPS",
    num_of_sims = "Number.of.SIMs",
    x3g = "X3G",
    x4g_lte = "X4G..LTE",
    price = "Price"
  ) %>%
  mutate(
    price = price * rate,
    resolution = resolution_x * resolution_y
  ) %>%
  relocate(resolution, .before = resolution_x) %>%
  select(-c(resolution_x, resolution_y)) %>%
  drop_na() %>%
  select(c(price, everything()))

# Changing categorical variables to factors
phone_data <- phone_data %>%
  mutate(
    touchscreen = as_factor(touchscreen),
    operating_system = as_factor(operating_system),
    gps = as_factor(gps),
    x3g = as_factor(x3g),
    x4g_lte = as_factor(x4g_lte)
  )

# Previewing the wrangled data set
phone_data %>%
  head()

price,battery_capacity,screen_size,touchscreen,resolution,processor,ram,internal_storage,rear_camera,front_camera,operating_system,gps,num_of_sims,x3g,x4g_lte
<dbl>,<int>,<dbl>,<fct>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<fct>,<fct>,<int>,<fct>,<fct>
724.6134,4085,6.67,Yes,4492800,8,12000,256,48,16,Android,Yes,2,Yes,Yes
343.8837,4000,6.5,Yes,2592000,8,6000,64,64,16,Android,Yes,2,Yes,Yes
1312.9458,3969,6.5,Yes,3338496,6,4000,64,12,12,iOS,Yes,2,Yes,Yes
772.5378,3110,6.1,Yes,1483776,6,4000,64,12,12,iOS,Yes,2,Yes,Yes
613.9772,4000,6.4,Yes,2527200,8,6000,128,12,32,Android,Yes,1,No,No
429.0103,3800,6.55,Yes,2592000,8,8000,128,48,16,Android,No,2,Yes,Yes


In [37]:
# Disabling scientific notation
# https://stackoverflow.com/a/27318351/18184038
options(scipen = 999)

# Calculating summary statistics for the data set
phone_data %>%
  describe() %>%
  select(c(min, mean, median, max, sd))

Unnamed: 0_level_0,min,mean,median,max,sd
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
price,6.067308,140.82327,85.96172,2149.227,170.1977791
battery_capacity,1010.0,2938.48933,3000.0,6000.0,873.5141331
screen_size,2.4,5.29131,5.2,7.3,0.6713566
touchscreen*,1.0,1.012509,1.0,2.0,0.1111837
resolution,76800.0,1348761.262693,921600.0,8294400.0,954735.3441324
processor,1.0,5.551141,4.0,10.0,2.1965624
ram,64.0,2488.777778,2000.0,12000.0,1664.4403861
internal_storage,0.064,30.654864,16.0,512.0,36.9502412
rear_camera,0.0,12.070199,12.2,108.0,8.9483374
front_camera,0.0,7.037969,5.0,48.0,6.2954481


## Analysis (1000)

## Evaluation (300)

## Conclusion (200)

## References

K. Srujan Raju, et al. *Data Engineering and Communication Technology*. Springer, 9 Jan. 2020, p. 773.

Kobie, Nicole. “Why Does Buying a New Phone Have to Be so - ProQuest.” *Www.proquest.com*, Apr. 2017, www.proquest.com/docview/1985885659?accountid=14656&forcedol=true&pq-origsite=summon. Accessed 3 Dec. 2022.

Tanveer, Muhammad, et al. “Mobile Phone Buying Decisions among Young Adults: An Empirical Study of Influencing Factors.” *Sustainability*, vol. 13, no. 19, 27 Sept. 2021, p. 10705, 10.3390/su131910705. Accessed 8 Oct. 2021.