**Q1.Step 1 Model Building**

This Python script starts by importing `pandas` and `statsmodels` libraries. It then loads data from a CSV file into a DataFrame and creates a new 'SKU' column based on the 'Product' column. The independent variables for the multinomial logistic regression model are specified as 'Cores', 'Frequency', 'TDP', and 'Price', with a constant added as an intercept. The 'Chosen' column is set as the dependent variable. The script fits the multinomial logistic model using the specified features and outputs a summary of the model.


In [12]:
import pandas as pd
import statsmodels.api as sm

# Load the dataset
file_path = 'Trainingdata2.csv'
data = pd.read_csv(file_path)

# Create a column for SKU based on the Product column, as explained previously
data['SKU'] = ((data['Product'] - 1) // 5) + 1
data.loc[data['Product'] == 0, 'SKU'] = 0

# Define the independent variables (features) for the model
X_reduced = data[['Cores', 'Frequency', 'TDP', 'Price']]
X_reduced = sm.add_constant(X_reduced)  # Add a constant to the model (intercept)

# The dependent variable is 'Chosen'
y = data['Chosen']

# Building the Multinomial Logit Model with the reduced set of features
mnl_model_reduced = sm.MNLogit(y, X_reduced).fit()

# Output the summary of the new model
model_summary = mnl_model_reduced.summary()

# Display the summary
model_summary


Optimization terminated successfully.
         Current function value: 0.325717
         Iterations 8


0,1,2,3
Dep. Variable:,Chosen,No. Observations:,10000.0
Model:,MNLogit,Df Residuals:,9995.0
Method:,MLE,Df Model:,4.0
Date:,"Sun, 19 Nov 2023",Pseudo R-squ.:,0.4208
Time:,17:20:34,Log-Likelihood:,-3257.2
converged:,True,LL-Null:,-5623.4
Covariance Type:,nonrobust,LLR p-value:,0.0

Chosen=1,coef,std err,z,P>|z|,[0.025,0.975]
const,-1.9611,0.061,-32.291,0.0,-2.08,-1.842
Cores,1.0765,0.025,43.561,0.0,1.028,1.125
Frequency,0.7898,0.08,9.869,0.0,0.633,0.947
TDP,-0.0506,0.003,-20.083,0.0,-0.056,-0.046
Price,-0.0016,9.34e-05,-16.719,0.0,-0.002,-0.001


**Step 2 Model predictions on testing data**

In [13]:
import ast
import pandas as pd
import statsmodels.api as sm


# Load the trained logistic regression model
# Replace 'your_saved_model.pkl' with the actual path to your saved model file
trained_model = mnl_model_reduced

# Provided product features
prices_all_products = [1800, 3000, 2700, 2400, 2100]

# Function to map product IDs to features
def map_features(product_id):
    # Replace this logic with the actual mapping based on your requirements
    if product_id == 0:
        return [0, 0, 0, 0]
    elif product_id < 6:
        return [4, 3.2, 95, prices_all_products[product_id % 5]]
    elif product_id < 11:
        return [8, 2.9, 60, prices_all_products[product_id % 5]]
    elif product_id < 16:
        return [8, 2.9, 95, prices_all_products[product_id % 5]]
    elif product_id < 21:
        return [4, 2.9, 60, prices_all_products[product_id % 5]]
    elif product_id < 26:
        return [4, 3.2, 60, prices_all_products[product_id % 5]]
    else:
        return [4, 2.2, 135, prices_all_products[product_id % 5]]


# Read the assortment file and parse each line as a list of integers
with open('assortment_test.txt', 'r') as file:
    assortment_lists = [ast.literal_eval(line.strip()) for line in file.readlines()]

# Initialize an empty DataFrame to hold all predictions
master_predictions_df = pd.DataFrame()

# Initialize a counter for the assortment index
assortment_index = 0

# Process each assortment
for assortment in assortment_lists:
    # Increment the assortment index
    assortment_index += 1

    # Map features for each product ID in the assortment
    features_list = [map_features(pid) for pid in assortment]
    features_df = pd.DataFrame(features_list)
    
    # Add a constant to the DataFrame for the intercept
    features_df_with_const = sm.add_constant(features_df)

    # Predict probabilities with the trained model
    predicted_probabilities = trained_model.predict(features_df_with_const)

    # Combine predictions with product IDs for this assortment
    predictions_df = pd.DataFrame({
        'Product_ID': assortment,
        'Predicted_Probability': predicted_probabilities[1],
        'Assortment_Group': assortment_index
    })
    
    # Concatenate the predictions for this assortment to the master DataFrame
    master_predictions_df = pd.concat([master_predictions_df, predictions_df], ignore_index=True)

# Now, master_predictions_df contains the probabilities for all assortments, with an assortment group identifier
print(master_predictions_df)

      Product_ID  Predicted_Probability  Assortment_Group
0              0               0.123345                 1
1             27               0.000941                 1
2              7               0.844076                 1
3             21               0.054770                 1
4              0               0.123345                 2
...          ...                    ...               ...
4495           2               0.015483               900
4496          27               0.000941               900
4497          15               0.789383               900
4498          22               0.084706               900
4499          17               0.068051               900

[4500 rows x 3 columns]


In [14]:
# Initialize a dictionary to store probabilities by assortment
probabilities_by_assortment = {}

# Iterate over the DataFrame and populate the dictionary
for index, row in master_predictions_df.iterrows():
    # Get the assortment group
    group = row['Assortment_Group']
    
    # If the group is not yet in the dictionary, add it with an empty list
    if group not in probabilities_by_assortment:
        probabilities_by_assortment[group] = []
    
    # Append the probability to the appropriate group
    probabilities_by_assortment[group].append(row['Predicted_Probability'])

# Print the probabilities in the format similar to the provided picture
for group in sorted(probabilities_by_assortment):
    print(f"{probabilities_by_assortment[group]}")

# Open a text file to write the probabilities
with open('Group1.txt', 'w') as file:
    # Now, write the probabilities in the format to the text file
    for group in sorted(probabilities_by_assortment):
        line = f"{probabilities_by_assortment[group]}\n"
        file.write(line)

# Inform that the file has been written
print("Predicted probabilities have been saved to predicted_probabilities.txt")


[0.12334547119726616, 0.0009408549652234181, 0.8440758845876036, 0.054769828834993664]
[0.12334547119726616, 0.06805126309741345, 0.1909865767122207, 0.0015018498511433264, 0.7011925602922374]
[0.12334547119726616, 0.7011925602922374, 0.2738080978247957, 0.8440758845876036, 0.06805126309741345, 0.009750582340707328]
[0.12334547119726616, 0.024502349226389983, 0.9324737152432635, 0.0023965415714711585]
[0.12334547119726616, 0.024502349226389983, 0.47914513278896675, 0.7721774613236568, 0.0038221845818380284]
[0.12334547119726616, 0.054769828834993664, 0.0023965415714711585, 0.9324737152432635, 0.5950200754891025, 0.015483064368309846]
[0.12334547119726616, 0.0023965415714711585, 0.1909865767122207, 0.47914513278896675]
[0.12334547119726616, 0.04372012673647161, 0.36547160790241645, 0.0015018498511433264, 0.009750582340707328]
[0.12334547119726616, 0.2738080978247957, 0.06805126309741345, 0.0023965415714711585, 0.8440758845876036, 0.5950200754891025]
[0.12334547119726616, 0.2292872961095