# Applying different model types
The previous two notebooks used linear models, which are as simple as they get. However, in most situations, linear models will not give the best performance. These are usually surpassed by models such as support vector machines, and tree based models such as LightGBM.

As sklearn-compatible models usually have similar named funtions, it is straightforward to compare them and to plug in the model that best suits your case.

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [3]:
df = pd.read_csv('data/chl_regression_tutorial.csv')
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

features = ['rho_443_a', 'rho_492_a', 'rho_560_a', 'rho_665_a', 'rho_704_a', 'rho_740_a', 'rho_783_a', 'rho_865_a']
target = 'CHL'

X_train = df_train[features]
y_train = df_train[target]

X_test = df_test[features]
y_test = df_test[target]

In [6]:
# Import models such as support vector machines, lightgbm, random forest, etc.
from sklearn.svm import SVR
from lightgbm import LGBMRegressor
from sklearn.ensemble import RandomForestRegressor

# Initialize models
svr = SVR()
lgbm = LGBMRegressor(verbose=-1)
rf = RandomForestRegressor()

model_dict = {'svr': svr, 'lgbm': lgbm, 'rf': rf}

for model_name, model in model_dict.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(model.__class__.__name__)
    print('RMSE:', ((y_test - y_pred) ** 2).mean() ** 0.5)
    print()




SVR
RMSE: 2.5232505473571116

LGBMRegressor
RMSE: 1.7580886329549694

RandomForestRegressor
RMSE: 1.7027419952817828

