Performing a decision tree prediction on total sales using price and units sold is a common task in predictive analytics. Here's a general outline of the steps you would take:

1. **Data Collection**: Gather historical data on total sales, price, and units sold. Ensure that you have a sufficient amount of data to train your model effectively.

2. **Data Preprocessing**: This step involves cleaning the data and preparing it for analysis. You may need to handle missing values, outliers, and ensure that the data is in the right format for analysis.

3. **Feature Engineering**: You might consider creating additional features from the existing ones if necessary. For example, you could calculate the total revenue (price * units sold) as a new feature.

4. **Split Data**: Split your data into training and testing sets. The training set will be used to train the decision tree model, and the testing set will be used to evaluate its performance.

5. **Model Training**: Train a decision tree model using the training data. The model will learn the relationship between the features (price and units sold) and the target variable (total sales).

6. **Model Evaluation**: Evaluate the performance of the trained model using the testing data. Common metrics for regression tasks like this include Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared.

7. **Hyperparameter Tuning**: Tune the hyperparameters of the decision tree model to improve its performance. Hyperparameters control the behavior of the model, such as the maximum depth of the tree or the minimum number of samples required to split a node.

8. **Prediction**: Once you are satisfied with the performance of the model, you can use it to make predictions on new data. Given the price and units sold for a new product, the model will predict the total sales.

9. **Deployment**: Deploy the trained model into your production environment so that it can be used to make real-time predictions.

This is a high-level overview of the process. Each step may involve further sub-steps and considerations depending on the specifics of your data and the requirements of your problem.

In [1]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Sample data (replace this with your actual dataset)
data = {
    'Price': [10, 20, 30, 40, 50],
    'Units_Sold': [100, 90, 80, 70, 60],
    'Total_Sales': [1000, 1800, 2400, 2800, 3000]
}

# Creating a DataFrame
df = pd.DataFrame(data)

# Splitting the data into features (X) and target variable (y)
X = df[['Price', 'Units_Sold']]
y = df['Total_Sales']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the decision tree regressor model
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Making predictions on the testing set
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Example of using the trained model to make predictions
new_data = {
    'Price': [25],
    'Units_Sold': [85]
}
new_df = pd.DataFrame(new_data)
predicted_total_sales = model.predict(new_df)
print(f"Predicted Total Sales: {predicted_total_sales}")


Mean Squared Error: 640000.0
Predicted Total Sales: [2400.]


This code demonstrates the following steps:

Data preparation: Define sample data and create a DataFrame.
Split the data into features (price and units sold) and the target variable (total sales).
Split the data into training and testing sets.
Create and train a decision tree regressor model using scikit-learn.
Make predictions on the testing set.
Evaluate the model using mean squared error.
Use the trained model to make predictions on new data (price and units sold).

In [2]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load your dataset
# Assuming your data is in a CSV file named 'sales_data.csv'
# Adjust the file path as necessary

data = pd.read_excel('C:/Users/USER/Desktop/Pyth Cleaning/Even.xlsx')

data.head()




Unnamed: 0,Customer ID,First Name,Last Name,Brand,Product,City,Country,Fulfillment Percent,Price,Units Sold,Total Sales
0,1295,Bill,Smith,Nike,Magista Shoes,London,Uk,0,12279,1200,14734800
1,1298,Kennedi,Singh,Armani,V-Neck Tshirt,Madrid,Spain,0,20462,1000,20462000
2,1301,Harley,Fritz,Nike Inc.,Football Shirt,Tampa,Usa,30,12276,1000,12276000
3,1304,Nyla,Novak,Nike UK,Air Max,Tokyo,Japan,30,8160,850,6936000
4,1307,David,Rasmussen,Puma,Air Max,Tokyo,Japan,0,5467,900,4920300


In [6]:
# Assuming your data has columns: 'Price', 'Units_Sold', and 'Total_Sales'
# If your data has different column names, adjust accordingly
X = data[['Price', 'Units Sold']]
y = data['Total Sales']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the decision tree regressor model
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Making predictions on the testing set
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Example of using the trained model to make predictions
new_data = {
    'Price': [25],
    'Units Sold': [85]
}
new_df = pd.DataFrame(new_data)
predicted_total_sales = model.predict(new_df)
print(f"Predicted Total Sales: {predicted_total_sales}")

Mean Squared Error: 4065618582280.0
Predicted Total Sales: [4920300.]


Result
The Mean Squared Error (MSE) is a measure of the average squared difference between the actual and predicted values in the testing set. In this case, the MSE value of approximately 4.07 trillion suggests that, on average, the squared difference between the actual and predicted total sales is very high.

Now, let's interpret the predicted total sales value of approximately 4.92 million. This value represents the model's prediction for the total sales given the input of a price of 25 and units sold of 85. 

Here's a breakdown:

- The model predicts that with a price of 25 and 85 units sold, the total sales would be approximately 4.92 million.
  
- However, considering the high MSE value, it indicates that the model's predictions are quite off from the actual values. The large MSE suggests that the model's predictions vary widely from the true values, indicating poor performance.

To improve the model's performance, you might need to consider several factors such as feature selection, feature engineering, hyperparameter tuning, or even using a different algorithm altogether. Additionally, ensuring a sufficiently large and diverse dataset could also help in building a more accurate predictive model.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Load your dataset
# Assuming your data is in a CSV file named 'sales_data.csv'
# Adjust the file path as necessary
data = pd.read_csv('sales_data.csv')

# Assuming your data has columns: 'Price', 'Units Sold', and 'Total Sales'
# If your data has different column names, adjust accordingly
# Rename 'Units Sold' to 'Units_Sold' to match the column name in the training data
data.rename(columns={'Units Sold': 'Units_Sold'}, inplace=True)

X = data[['Price', 'Units_Sold']]
y = data['Total Sales']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Creating and training the decision tree regressor model
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Making predictions on the testing set
y_pred = model.predict(X_test)

# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Example of using the trained model to make predictions
new_data = {
    'Price': [25],
    'Units_Sold': [85]
}
new_df = pd.DataFrame(new_data)
predicted_total_sales = model.predict(new_df)
print(f"Predicted Total Sales: {predicted_total_sales}")
