<h1>What is Hold Out Cross Validation Tecnhique?</h1>

The holdout method is a technique used to evaluate models by splitting the dataset into two parts: the training set and the test set. The model is trained on the training set and then assessed on the test set to determine its performance.

This method is commonly employed when the dataset is small and creating separate training, validation, and testing sets is not feasible. While it is easy to implement, it is crucial to ensure a random split to avoid biased outcomes. It serves as a useful starting point for training machine learning models, but caution must be exercised.

Typically, the holdout method involves dividing the dataset into a 70-30% split, with 70% allocated for training and 30% for testing. This approach allows for model comparison based on accuracy with the test dataset to select the best model. However, there is a risk of overfitting to the test dataset, where models may be excessively optimized for it. This can result in poor generalization to unseen or future data. It is vital to be mindful of this limitation to ensure that the final model can generalize effectively beyond the test dataset.


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error


In [None]:

# Load the wine dataset
df = pd.read_csv('/content/ChemicalContent in Wine.csv')

# Select the attributes and target variable
X = df[['Alcohol', 'Malic_Acid', 'Ash', 'Ash_Alcanity', 'Magnesium',
        'Total_Phenols', 'Flavanoids', 'Nonflavanoid_Phenols',
        'Proanthocyanins', 'Color_Intensity', 'Hue', 'OD280']]
y = df['Proline']

# Split the data into training and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Random Forest Regression
rf_model = RandomForestRegressor(random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
mse_rf = mean_squared_error(y_test, y_pred_rf)

In [None]:
# Support Vector Regression
svr_model = SVR()
svr_model.fit(X_train, y_train)
y_pred_svr = svr_model.predict(X_test)
mse_svr = mean_squared_error(y_test, y_pred_svr)

In [None]:
# Print the results
print("Random Forest Regression - Mean Squared Error:", mse_rf)
print("Support Vector Regression - Mean Squared Error:", mse_svr)



Random Forest Regression - Mean Squared Error: 30773.9256537037
Support Vector Regression - Mean Squared Error: 122616.12493365054
