## Problem Statement:

A mobile phone manufacturing company wants to improve their pricing strategy by utilizing the power of AI and machine learning algorithms. The company has a large database of mobile phone models with various features such as screen size, camera quality, processor speed, battery life, and others. However, they currently lack an accurate and efficient method for predicting the prices of their mobile phones based on these features. The company is seeking a solution that can help them forecast prices for new phone models with a high degree of accuracy, allowing them to optimize their pricing strategy and stay competitive in the market.

## Solution:

The AI team proposes a hybrid solution to solve the given problem. The plan is to first build a machine learning model that takes historical mobile details like screen size, camera quality, processor speed, battery life, and others along with the price. The model output will be used to predict potential price of the handset. The team also proposes a counterfactual generation engine which can provide various scenarios wherein the predicted outcomes change as per the expected price range. For example, if the price of a handset is predicted to be X and if the manufacturer wants to sell it at a price 10%-15% more than the predicted one, the engine will suggest changes in the phone's configuration to the expected price range. This change in the input parameters can be used as a part of product strategy. A third component of a what-if analysis solution can help the decision makers to tweak the counterfactual suggestions based on certain constraints and check whether those changes can still help them sell the phone at the expected price range.

The AI team suggests following benefits of using the proposed hybrid solution for mobile pricing prediction:

- Accurate price predictions: AI algorithms can analyze vast amounts of data and identify patterns that humans may miss. This enables the tool to accurately predict mobile handset prices based on various features with greater accuracy

- Real-time pricing: With an AI-driven tool, pricing decisions can be made in real-time based on changes in market demand and supply. This allows mobile phone manufacturer to respond quickly to changing market conditions and adjust their pricing strategies accordingly

- Improved profit margins: By accurately predicting prices, the manufacturer can optimize their pricing strategies to maximize profit margins. The tool can identify the optimal price point for a product, taking into account factors such as production costs, competition, and market demand

- Increased customer satisfaction: By setting the right price for a product, the manufacturer can increase customer satisfaction. Customers are more likely to purchase a product if they feel that it is priced fairly and accurately reflects its value

- Better decision-making: An AI-driven tool can provide manufacturer with valuable insights into customer preferences and buying behavior. This can help make informed decisions about product development, marketing, and pricing strategies

### Dataset Details:

- Product_id: ID of each cellphone

- Price: Price of each cellphone

- Sale: Sales number

- weight: Weight of each cellphone

- resolution: Resolution of each cellphone

- ppi: Phone Pixel Density

- cpu core: type of CPU core in each cellphone

- cpu freq: CPU Frequency in each cellphone

- internal mem: Internal memory of each cellphone

- ram: RAM of each cellphone

- RearCam: Resolution of rear camera

- Front_Cam: Resolution of front camera

- battery: Battery capacity (in mA)

- thickness: Thickness of the phone


Source: https://www.kaggle.com/datasets/mohannapd/mobile-price-prediction 

### Stage 1: AI Model Creation:

P.S. The goal of this activity is to build a machine learning model that can fairly predict the mobile prices. Although advanced data processing techniques and ml algorithms  are available, we will use the ones that give satisfactory results, to ensure we are spending more efforts towards the goal i.e. building a Decision Intelligence system.

#### Step 1: Import the required libraries

In [1]:
import pandas as pd
import dice_ml
from dice_ml import Dice
from sklearn.metrics import r2_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

#### Step 2: Load the Dataset

In [2]:
all_data = pd.read_csv('cellphone_price_data.csv')

In [3]:
all_data.head()

Unnamed: 0,Product_id,Price,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness
0,203,2357,10,135.0,5.2,424,8,1.35,16.0,3.0,13.0,8.0,2610,7.4
1,880,1749,10,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9
2,40,1916,10,110.0,4.7,312,4,1.2,8.0,1.5,13.0,5.0,2000,7.6
3,99,1315,11,118.5,4.0,233,2,1.3,4.0,0.512,3.15,0.0,1400,11.0
4,880,1749,11,125.0,4.0,233,2,1.3,4.0,1.0,3.15,0.0,1700,9.9


#### Step 3: Checking the data characteristics and if there are any discrepancies in the data (e.g. missing values, wrong data types etc.)

In [4]:
all_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161 entries, 0 to 160
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Product_id    161 non-null    int64  
 1   Price         161 non-null    int64  
 2   Sale          161 non-null    int64  
 3   weight        161 non-null    float64
 4   resolution    161 non-null    float64
 5   ppi           161 non-null    int64  
 6   cpu core      161 non-null    int64  
 7   cpu freq      161 non-null    float64
 8   internal mem  161 non-null    float64
 9   ram           161 non-null    float64
 10  RearCam       161 non-null    float64
 11  Front_Cam     161 non-null    float64
 12  battery       161 non-null    int64  
 13  thickness     161 non-null    float64
dtypes: float64(8), int64(6)
memory usage: 17.7 KB


As we see from the above details, the dataset does not have any missing values or any issues related to the data. So, no data preprocessing required here and we directly move on to the modeling.

#### Step 4: Modeling

Splitting the dataset for training and inference.

In [5]:
x=all_data.drop(['Price', 'Product_id'],axis=1)
y=all_data['Price']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

Now, building a Random Forest Regressor model and using it for making predictions on the test data, saving and evaluating them.

In [6]:
rf = RandomForestRegressor()
rf.fit(x_train,y_train)

RandomForestRegressor()

In [8]:
predictions = rf.predict(x_test)

In [9]:
print('R2 score for Training Data: ', rf.score(x_train, y_train))
print('R2 score for Inference Data: ', r2_score(y_test,predictions))

R2 score for Training Data:  0.9945696702046988
R2 score for Inference Data:  0.9678002631345382


In [10]:
x_test['predicted_price'] = predictions

In [11]:
x_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 128 entries, 80 to 47
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Sale          128 non-null    int64  
 1   weight        128 non-null    float64
 2   resolution    128 non-null    float64
 3   ppi           128 non-null    int64  
 4   cpu core      128 non-null    int64  
 5   cpu freq      128 non-null    float64
 6   internal mem  128 non-null    float64
 7   ram           128 non-null    float64
 8   RearCam       128 non-null    float64
 9   Front_Cam     128 non-null    float64
 10  battery       128 non-null    int64  
 11  thickness     128 non-null    float64
dtypes: float64(8), int64(4)
memory usage: 13.0 KB


In [12]:
x_test.head()

Unnamed: 0,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness,predicted_price
111,302,149.0,5.5,534,8,1.6,32.0,3.0,16.0,8.0,3000,7.0,2941.57
113,308,77.9,2.4,167,0,0.0,0.004,0.004,0.0,0.0,850,12.4,800.82
144,1781,179.0,6.0,184,4,1.3,8.0,1.0,13.0,8.0,2580,8.0,1928.57
7,13,150.0,5.5,401,4,2.3,16.0,2.0,16.0,8.0,2500,9.5,2295.57
44,40,169.0,5.7,515,4,1.875,64.0,4.0,12.0,5.0,3500,7.9,2956.94


In [13]:
x_train['Price'] = all_data['Price']

### Stage 2: Generating Counterfactuals

We will be using a combination of Counterfactual and What-If analysis to drive decision intelligence. Unlike in the classification use case, we will be using the expected price range of the mobile as desired outcome as this is a regression use case.

In [16]:
d_mobile = dice_ml.Data(dataframe=x_train, continuous_features=['Sale', 'weight', 'resolution', 'ppi', 'cpu core', 'cpu freq', 'internal mem', 'ram', 'RearCam', 'Front_Cam', 'battery', 'thickness'], outcome_name='Price')
m_mobile = dice_ml.Model(model=rf, backend="sklearn", model_type='regressor')

In [17]:
explainer_mobile = Dice(d_mobile, m_mobile, method="random")

### Stage 3: What-If Analysis

#### Step 5: Using the explainer object to generate possible pricing strategies

Now we have the 'explainer' counterfactual object that can be used to produce counterfactual explanations i.e. what changes in the inputs can be made to get the desired output. In our case, it would be:

"For the given predicted price, what can be changed (mobile configurations like screen size, resolution etc.) so that the mobile can be sold at a higher price". Let us understand this through an example. 

In [18]:
input_data = x_test.drop(['predicted_price'], axis=1)

In [20]:
input_record = input_data[0:1]

In [21]:
input_record.head()

Unnamed: 0,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness
111,302,149.0,5.5,534,8,1.6,32.0,3.0,16.0,8.0,3000,7.0


In [23]:
x_test[0:1].head()

Unnamed: 0,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness,predicted_price
111,302,149.0,5.5,534,8,1.6,32.0,3.0,16.0,8.0,3000,7.0,2941.57


For the given record, we see that the predicted price is ~2.9K. Suppose if the manufacturer wants to sell the mobile at a price 10%-15% higher than the predicted one, the explainer can be used to see recommendations in the mobile configuration that can achieve the objective.

In [25]:
predicted_price = x_test[0:1].predicted_price

In [42]:
expected_price_range = [round(float(predicted_price*1.10), 0), round(float(predicted_price*1.15), 0)]

In [43]:
expected_price_range

[3236.0, 3383.0]

In [44]:
random_mobile = explainer_mobile.generate_counterfactuals(input_record,
                                                               total_CFs=2,
                                                               desired_range=expected_price_range)
random_mobile.visualize_as_dataframe(show_only_changes=True)

100%|██████████| 1/1 [00:00<00:00,  1.40it/s]

Query instance (original outcome : 2942)





Unnamed: 0,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness,Price
0,302,149.0,5.5,534,8,1.6,32.0,3.0,16.0,8.0,3000,7.0,2942.0



Diverse Counterfactual set (new outcome: [3236.0, 3383.0])


Unnamed: 0,Sale,weight,resolution,ppi,cpu core,cpu freq,internal mem,ram,RearCam,Front_Cam,battery,thickness,Price
0,-,-,-,-,-,1.6,84.8,5.7,-,-,-,-,3278.169921875
1,-,-,-,-,-,1.6,-,5.2,-,-,5116.0,-,3305.25


From the above output, we see that the explainer model has suggested 2 possible ways to sell the mobile at the expected price range:

1. Change the internal memory from 32 to 84.8 gb, increase ram from 3 to 5.7 gb
2. Change the ram from 3 to 5.2 gb and increase battery capacity from 3000 to 5116 mA

However, there might be certain constraints for the business to take into consideration and one cannot go directly with the suggestions from the counterfactual analysis. One might want to take the inputs from the counterfactual analysis, tweak them a bit based on the business constraints and then check if those tweaked inputs generate the desired output. This is possible through What-If analysis.

#### Step 6: Using What-If analysis for getting the right set of pricing strategies for the given handset

Now suppose for the given counterfactual recommendations, we have a constraint that expensive parts like ram cannot be increased but cheaper components like internal memory can be increased by 2-4 times the present one, and battery can be increased by maximum 50%. So, for the given mobile, we cannot increase the ram. Let us check which configuration can help us get the desired price.

Options:

1. Change the internal memory by 2x and battery by 50% 

In [73]:
expected_internal_memory = int(input_record['internal mem']*2)

In [74]:
expected_battery = int(input_record['battery']*1.5)

In [77]:
custom_inputs = input_record.copy()
custom_inputs['internal mem'] = custom_inputs['internal mem'].replace(int(input_record['internal mem']), expected_internal_memory)
custom_inputs['battery'] = custom_inputs['battery'].replace(int(input_record['battery']), expected_battery)

In [78]:
print('New price based on updated configuration: ', rf.predict(custom_inputs))
if rf.predict(custom_inputs)>=expected_price_range[0]:
    print("The given configuration helps in getting the minimum expected price")  
else:
    print("The given configuration does not help in getting the minimum expected price of",expected_price_range[0])

New price based on updated configuration:  [3043.53]
The given configuration does not help in getting the minimum expected price of 3236.0


We see that increasing the internal memory by 2x and battery size by 50% does not help in getting the expected price range. Let us see if we can achieve the objective with a different combination.

2. Increase internal memory by 4x and battery by 50%

In [79]:
expected_internal_memory2 = int(input_record['internal mem']*4)

In [80]:
expected_battery2 = int(input_record['battery']*1.5)

In [81]:
custom_inputs2 = input_record.copy()
custom_inputs2['internal mem'] = custom_inputs2['internal mem'].replace(int(input_record['internal mem']), expected_internal_memory2)
custom_inputs2['battery'] = custom_inputs2['battery'].replace(int(input_record['battery']), expected_battery2)

In [82]:
print('New price based on updated configuration: ', rf.predict(custom_inputs2))
if rf.predict(custom_inputs2)>=expected_price_range[0]:
    print("The given configuration helps in getting the minimum expected price")  
else:
    print("The given configuration does not help in getting the minimum expected price of",expected_price_range[0])

New price based on updated configuration:  [3247.54]
The given configuration helps in getting the minimum expected price


We see that option 2 gives us the right configuration to achieve the objective of increasing the handset price by at least 10% of the predicted one.