# Multi-class classification using Linear SVM

## Install Libraries

In [None]:
!pip install libshorttext
!pip install pandas

## Preprocessing

Preprocessing is taken care of by the library, so we only need to put the file in the correct format

In [None]:
import pandas as pd


Setting up training data

In [None]:
df = pd.read_csv('data/train.csv')
cols = [Computer Science, Physics, Mathematics, Statistics, Quantitative Biology, Quantitative Finance]
df['text'] = df['TITLE'] + ' ' + df['ABSTRACT']
for col in cols:
    df[[col, 'text']].to_csv(col+'.txt', sep='\t', index=False, header=None)

Setting up test data

In [None]:
df = pd.read_csv('data/test.csv')
df['text'] = df['TITLE'] + ' ' + df['ABSTRACT']
df['label'] = 0 
df[['label', 'text']].to_csv('newtest.txt', sep='\t', index=False, header=None)

## Training the models

Training SVM model with L2 loss, Tf-idf bigram vectorization with stemming and stopword removal

In [None]:
!python text-train.py -P 7 -F 3 -L 2 -f "data/Computer Science.txt" "cs"
!python text-train.py -P 7 -F 3 -L 2 -f "data/Mathematics.txt" "math"
!python text-train.py -P 7 -F 3 -L 2 -f "data/Physics.txt" "physics"
!python text-train.py -P 7 -F 3 -L 2 -f "data/Quantitative Biology.txt" "qb"
!python text-train.py -P 7 -F 3 -L 2 -f "data/Quantitative Finance.txt" "qf"
!python text-train.py -P 7 -F 3 -L 2 -f "data/Statistics.txt" "stats"

Validating on training data to check for underfittig

In [None]:
!python text-predict.py -f -a 0 "data/Computer Science.txt" cs data/computer_result.txt
!python text-predict.py -f -a 0 "data/Mathematics.txt" math data/math_result.txt
!python text-predict.py -f -a 0 "data/Physics.txt" physics data/physics_result.txt
!python text-predict.py -f -a 0 "data/Quantitative Biology.txt" qb data/qb_result.txt
!python text-predict.py -f -a 0 "data/Quantitative Finance.txt" qf data/qf_result.txt
!python text-predict.py -f -a 0 "data/Statistics.txt" stats data/stats_result.txt

## Predicting results

In [None]:
!python text-predict.py -f -a 0 data/newtest.txt cs data/computer_result.txt
!python text-predict.py -f -a 0 data/newtest.txt math data/math_result.txt
!python text-predict.py -f -a 0 data/newtest.txt physics data/physics_result.txt
!python text-predict.py -f -a 0 data/newtest.txt qb data/qb_result.txt
!python text-predict.py -f -a 0 data/newtest.txt qf data/qf_result.txt
!python text-predict.py -f -a 0 data/newtest.txt stats data/stats_result.txt