-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symbolic Formula predicts worse than model itself (REOPEN) #206
Comments
Hi, since only a limited library of symbolic formulas is provided, it could be that the real symbolic formula is not supported in the library, or even the formula is not symbolic at all. It might be helpful to stare at the learned KAN plot a bit by calling |
It has been for a long time since this issue opening. I have rerun my codes and I am here again. Here is the story: I have a case => Regression case my df contains 7 inputs and 1 output. In order to keep progress simple, I have only took first 2 columns as features. target_column_name = "Chance of Admit "
X = df[list(df.columns.drop([target_column_name]+["Serial No."]))[0:2]]
y = df[target_column_name]
# Split whole data to train and remainings
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=0)
# Split remainings data to val and test
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=0)
# Covert data to torch tensor
train_input = torch.tensor(X_train.to_numpy(), dtype=torch.float32)
train_label = torch.tensor(y_train.to_numpy()[:, None], dtype=torch.float32)
val_input = torch.tensor(X_val.to_numpy(), dtype=torch.float32)
val_label = torch.tensor(y_val.to_numpy()[:, None], dtype=torch.float32)
test_input = torch.tensor(X_test.to_numpy(), dtype=torch.float32)
test_label = torch.tensor(y_test.to_numpy()[:, None], dtype=torch.float32)
dataset = {
'train_input': train_input,
'train_label': train_label,
'val_input': val_input,
'val_label': val_label,
'test_input': test_input,
'test_label': test_label
} and I have trained my KAN model => # Create KAN
model = KAN(width=[len(X.columns),3,1], grid=5, k=11)
# Train KAN
results = model.train({'train_input': train_input, 'train_label': train_label, 'test_input': val_input, 'test_label': val_label},
opt="LBFGS", steps=50, loss_fn=torch.nn.MSELoss()) after training, simple I got performance of my model like that => # Predictions of train val and test datasets
test_preds = model.forward(test_input).detach()
test_labels = test_label
train_preds = model.forward(train_input).detach()
train_labels = train_label
val_preds = model.forward(val_input).detach()
val_labels = val_label
# Evaluate metrics
print("Train R2 Score:", r2_score(train_labels.numpy(), train_preds.numpy()))
print("Train MAE:", mean_absolute_error(train_labels.numpy(), train_preds.numpy()))
print("Val R2 Score:", r2_score(val_labels.numpy(), val_preds.numpy()))
print("Val MAE:", mean_absolute_error(val_labels.numpy(), val_preds.numpy()))
print("Test R2 Score:", r2_score(test_labels.numpy(), test_preds.numpy()))
print("Test MAE:", mean_absolute_error(test_labels.numpy(), test_preds.numpy())) These are performance metrics
Now I want to plot model model.plot(scale=2) As you see all of the activation functions looks like ok, nothing is suspicious Then I wanna try symbolic formulation only using x and abs lib = ['x','abs']
model.auto_symbolic(lib=lib) its done
here is my formula => model.symbolic_formula()
([0.02*Abs(0.56*x_2 - 8.33) - 2.09], [x_1, x_2]) firstly : Why do I see only x_2 in this formula even though x_1's activation layers looks dark (not pale) ? As I know if an activation layer is pale that means this feature is not important for model. This is my first question. Secondly: def kan_symbolic_formula_prediction(formula, X):
batch = X.shape[0]
predictions = [] # Empty list for keeping predictions
for i in range(batch):
# Evaluation on symbolic formula on every single fow
expression = formula
for j in range(X.shape[1]):
expression = expression.subs(f'x_{j+1}', X[i, j])
# Get output of formula
predicted = float(expression.evalf())
predictions.append(predicted)
return predictions I tested manuelly to check this func works correctly # Manuel prediction (optional)
manuel_single_inputs = [1,1]
kan_symbolic_formula_prediction(formula, pd.DataFrame([manuel_single_inputs]).to_numpy())
>>> [-1.9345999999999999] function works correctly => Checked and here is the remaining story => # Get results using symbolic formula
preds_from_kan_formula = kan_symbolic_formula_prediction(formula, X_train.to_numpy())
print("MAE from formula on train data",mean_absolute_error(train_labels.numpy(),preds_from_kan_formula))
print("R2 from formula on train data",r2_score(train_labels.numpy(), preds_from_kan_formula))
>>> MAE from formula on train data 1.7811659989688393
>>> R2 from formula on train data -155.2165170931922
print("MAE from model.forward() on train data",mean_absolute_error(train_labels.numpy(), train_preds.numpy()))
print("R2 from model.forward() on train data",r2_score(train_labels.numpy(), train_preds.numpy()))
>>> MAE from model.forward() on train data 0.05722126
>>> R2 from model.forward() on train data 0.7171265138589901 What is your thoughts? |
Hi
I have been practicing about KAN. I have made regression implementation. My output layer has 1 node.
After implementation I got pretty good R2 and MAE on my dataset (including train val test). I wanted to get symbolic formula and I got it according to => https://kindxiaoming.github.io/pykan/Examples/Example_3_classfication.html
After that I created a code which gets the symbolic formula of KAN and calculate given inputs according to given formula.
The function is below:
Then I get prediction by using formula like that:
and here is metrics model.forward() and symbolic formula respectively:
Here is my full code => https://www.kaggle.com/code/seyidcemkarakas/kan-regression-graduate-admissions
The text was updated successfully, but these errors were encountered: