

# Demo Code for NLU Coursework SVM Model Inference

## Before we begin:
- Upload:
  - pre-trained model: 'svm_model.joblib'
  - Vectorizer:'tfidf_vectorizer.joblib'
  - File name of the test data

 , Directly to this Colab session before proceeding.
- These files are the result of training the SVM model on the NLI task, and the trained vectorizer used for pre-processing.



In [None]:
# Installation of required packages (if not already available in the Google Colab environment)
# Run the following lines if you encounter import errors, or if instructed by the notebook

!pip install joblib  # Used for saving and loading models; included by default in Google Colab, install if missing
!pip install pandas  # Used for data manipulation usually pre-installed in Google Colab, install if missing



1. Import necessary libraries


In [1]:
import pandas as pd
import joblib
import os
import re

2. Load the pre-trained model, vectorizer, and test dataset for prediction

In [2]:
# Load the pre-trained model and vectorizer, error handling
try:
    svm_model = joblib.load('svm_model.joblib')
    tfidf_vectorizer = joblib.load('tfidf_vectorizer.joblib')
except FileNotFoundError as e:
    print(e)
    print("Please make sure the model and vectorizer files are uploaded to this session.")
    raise  # Exit if files not found

In [3]:
# Load new input data for prediction
try:
    new_data = pd.read_csv('demo_input.csv') # This is just a placeholder replace with actual file name of the test data
except FileNotFoundError:
    print("Data file not found. Please upload the test dataset to this session.")
    raise  # Exit if file not found

3. Pre-process test dataset

In [4]:
# Function to preprocess text data
def preprocess_text(text):
    # Use a lambda function to replace non-alphanumeric characters and convert to lowercase
    return text.apply(lambda t: re.sub('[^\w\s]', '', t).lower())
# Preprocess the new data
new_data.fillna('', inplace=True)
new_data['text'] = new_data['premise'].str.cat(new_data['hypothesis'], sep=' ')
new_data['text'] = preprocess_text(new_data['text'])


In [5]:
# Transform the new data using the loaded vectorizer
X_new = tfidf_vectorizer.transform(new_data['text'])

4. Generate predictions for the test data using the loaded model

In [6]:
# Make predictions using the loaded model
predictions = svm_model.predict(X_new)

In [7]:
# Save the predictions to a CSV file in the specified format
predictions_df = pd.DataFrame(predictions, columns=['prediction'])
predictions_df.to_csv('Group_3_A.csv', header=True, index=False)
print("Predictions have been saved successfully to 'Group_3_A.csv'") # Save directly in the session, click on the left-hand side file icon in Google Colab to see the file and downlaod, select refresh by right clicking the files field if file did not appear immedietly

# Output Verification
print(f"Checking if the output file exists: {'Found' if os.path.isfile('Group_3_A.csv') else 'Not Found'}")
if os.path.isfile('Group_3_A.csv'):
    print("\nFirst 5 lines of the prediction file:")
    with open('Group_3_A.csv', 'r') as file:
        for _ in range(5):
            print(file.readline().strip())
else:
    print("Output file not found. Please check the file path and write permissions.")

Predictions have been saved successfully to 'Group_3_A.csv'
Checking if the output file exists: Found

First 5 lines of the prediction file:
prediction
1
0
0
1
