Note: The Following code is just for reference. I am using dataset from Kaggle, Dataset link: https://www.kaggle.com/competitions/sentiment-analysis-company-reviews/data.

I am just using a small part of Dataset, as the main focus is on python SKlearn model to ONNX model

In [10]:
# Importing Required libraries

In [38]:
import pandas as pd
import numpy as np

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re
from nltk.stem import PorterStemmer


from sklearn.pipeline import Pipeline


from sklearn.feature_extraction.text import CountVectorizer

from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Loading Data

In [39]:
path = r"sentiment-analysis-company-reviews"
df = pd.read_csv(path + "\\train.csv")
df.head(5)

Unnamed: 0,Id,Review,Rating
0,0,Very good value and a great tv very happy and ...,5
1,1,After 6 month still can't access my account,3
2,2,I couldn't make an official review on a produc...,1
3,3,"Fantastic! Extremely easy to use website, fant...",5
4,4,So far annoyed as hell with this bt monthly pa...,1


In [40]:
# Sampling Smaller part , please do uncomment below code if you want to use entire dataset

In [41]:
df = df.sample(frac=0.1, replace=False, random_state=1)

# Preprocessing

In [42]:
stemmer = PorterStemmer()
stopwords = stopwords.words('english')
# stopwords.extend(["we're", "i" , 'if', 'this', "im" , "cant","i'm"])
# print(stopwords)
def lower_text(text):
    return text.lower()

def remove_number(text):
    num = re.compile(r'[-+]?[.\d]*[\d]+[:,.\d]*')
    return num.sub(r'', text)

def remove_punct(text):
    punctuations = '@#!?+&*[]-%.:/();$=><|{}^' + "'`"
    
    for p in punctuations:
#         text = text.replace(p, f' {p} ')
        text = text.replace(p,'')
    text = text.replace(",",'')
    text = text.replace(".",'')
    text = text.replace("'",'')  
    text = text.replace("'",'')   
    return text

def remove_quotes(text):
    text = text.replace('"','')
    return text

def remove_stopwords(text):
    text_list = text.split()
    text_out_list = []
    for word in text_list:
#         print("word", word)
        if word not in stopwords:
            text_out_list.append(word)
    out_text = ' '.join(text_out_list)
    return out_text

def stem(utterance):
    #Remove all single characters
    utterance = re.sub(r'\s+[a-zA-Z]\s+',' ',str(utterance))

    #Removing single characters from the start
    utterance = re.sub(r'^[a-zA-Z]\s+', ' ', utterance)

    #Substituting multiple spaces with single space
    utterance = re.sub(r'\s+', ' ', utterance,flags=re.I)

    utterance = utterance.lower()

    #Lemmatization
    utterance_list = utterance.split()
    utterance_out_list = []
    for word in utterance_list:
        utterance_out_list.append(stemmer.stem(word))

    utterance = ' '.join(utterance_out_list)
    return utterance


def clean_text(text):
    text = lower_text(text)
    text = remove_number(text)    
    text = remove_quotes(text)
# #     print("text before stop words removal: ")
# #     print(text)
#     text = remove_stopwords(text)
# #     print("text after stop words removal: ")
# #     print(text)    
    
    text = remove_punct(text)
# #     print("text before stemming: ")
# #     print(text)
    text = stem(text)
    
#     print("text after stemming: ")
#     print(text)
    
    return text

In [43]:
df["clean_input"] = df["Review"].apply(clean_text)

In [44]:
X = []
y = []
for index,row in df.iterrows():
    X.append(row['clean_input'])
    y.append(row['Rating'])
    
y = np.array(y)

In [45]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Always use Pipeline of the components(Eg: vectorizer, classifier etc..) instead of training seperately for improved accuracy. Because pipeline helps in each components getting optimized together rather than seperately.
As this is sample code, I am not doing any preprocessing.

In [46]:
model_pipeline = Pipeline(steps=[('countVectorizer', CountVectorizer(max_features=1500, min_df=1, max_df=0.75,ngram_range = (1, 3))), 
                                 ('tfidfconverter',TfidfTransformer()),
                                 ('classifier', RandomForestClassifier(n_estimators=1000, random_state=0))
                                 ])

In [47]:
model_pipeline.fit(X_train, y_train)

In [48]:
y_pred = model_pipeline.predict(X_test)

Do Not mind accuarcy as its just sample modelling

In [49]:
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))

[[346   0   0   0  35]
 [ 35   0   0   0  11]
 [ 14   0   0   0  10]
 [ 20   0   0   0  68]
 [ 29   0   0   0 632]]
              precision    recall  f1-score   support

           1       0.78      0.91      0.84       381
           2       0.00      0.00      0.00        46
           3       0.00      0.00      0.00        24
           4       0.00      0.00      0.00        88
           5       0.84      0.96      0.89       661

    accuracy                           0.81      1200
   macro avg       0.32      0.37      0.35      1200
weighted avg       0.71      0.81      0.76      1200

0.815


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# Convert SKlearn to ONNX model

In [50]:
from skl2onnx.common.data_types import StringTensorType
import onnx
import onnxmltools
import numpy as np
import onnxruntime as rt

In [51]:
# Define input_type based on type of your data. string in our case
input_type = [('input', StringTensorType([]))]

# Convert the pipeline to ONNX model
onnx_model = onnxmltools.convert_sklearn(model_pipeline, initial_types=input_type)

In [52]:
onnx_path = r'models\ReviewSentimentAnalysis.onnx'

In [53]:
onnx.save(onnx_model, onnx_path)

# Inference from Onnx

Below is inference code where I used the trained ONNX model for prediction. Onnx suuports wide range of languages like Python, C++, C#, Java, JavaScript, and more.  
Based on your require you can rewrite the below inference code. I am just giving the reference code in python.


In [54]:
onnx_model = rt.InferenceSession(onnx_path)   #onnx_path has got our onnx model now

def predictOnnxNew(texts):
    input_data = np.array(texts, dtype=np.str).reshape(-1, 1)

    result = onnx_model.run(None, {'input': input_data})
    

    label_num = result[0][0]   
    probability = result[1][0][label_num]


    print("Onnx" + " lable predicted " + str(label_num) + "    " + str(probability) + "    " + texts)
    return label_num,probability

In [55]:
output_list = []
Probability_list = []
for user_input  in X_test:
    y_label,Probability = predictOnnxNew(user_input)
    output_list.append(y_label)
    Probability_list.append(Probability)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  input_data = np.array(texts, dtype=np.str).reshape(-1, 1)


Onnx lable predicted 1    0.8860011100769043    aw compani order pixel on friday nd dec no commun despit the promis that you would be email at everi point in the process and no phone by the follow tuesday so had to call them they explain the handset had been delay and would arriv the next day it did but the sim had not been provis end up call them everi day for the follow few day be on hold for over an hour each time to be told just to wait turn the phone off and on again have put the sim in correctli leav the phone off for hour none of which work final call them on friday the th as the sim still did not work and decid want to cancel the contract and return the handset had an extrem difficult convers with the man on the phone who talk over me tri to transfer me refus to listen to what wa say and tri to make it out like it wa some horribl breakup which is ridicul and annoy and made me want to leav more then said he couldnt do it as the sim hadnt been connect eventu manag it but have not

Onnx lable predicted 1    0.8930010795593262    absolut aw couldnt get signal when on call couldnt hear the other person or they couldnt hear me contact three and they told me it would be best to end my contract and send the phone back they took £ from my account then refus to give me pac number after week wa final given one theyv refus to refund me theyv decid to credit an account no longer hold with them ive had to contact citizen advic as thi ha now been an ongo issu for month stay clear of three custom servic is aw ive put in complaint numer time to be told we will get back to you and surpris surpris dont heart anyth dont wast your time
Onnx lable predicted 5    0.9140011668205261    easi to use platform and love end product have use mani time
Onnx lable predicted 5    0.7730008363723755    although there wa an issu with the first product order mainli around the deliveri or lack thereof rais complaint and the probelm wa resolv with issu and new product wa deliv the next day
Onnx la

Onnx lable predicted 5    0.838001012802124    order on line for click and collect everyth happen exactli as the web site said it would perfect
Onnx lable predicted 1    0.5660004019737244    bought laptop onlin and it wa damag call to store with the laptop and wa told had to return onlin contact them and advis of damag and want it replac got refund instead of replac and had to call back to store to buy anoth one idid not think thi wa good custom experi at all
Onnx lable predicted 5    0.6930006742477417    got it for my wife so didnt use it but she veri happi with it
Onnx lable predicted 5    0.6010004878044128    got an email from shell energi reduc my monthli dd howev with the winter ahead want to maintain my current dd amount went onlin chat and spoke to alen who resolv it veri quickli and easili thank you
Onnx lable predicted 5    0.7820008993148804    deliv befor schedul date veri good servic
Onnx lable predicted 1    0.5020002722740173    i’v had lot and mean lot of thing from f

Onnx lable predicted 5    0.8950010538101196    product like new with all document and bit and refurbish price
Onnx lable predicted 5    0.7850008606910706    polit and help solv the problem and came back to check they had
Onnx lable predicted 5    0.9570012092590332    easi to deal with order part requir and deliv on time no prob
Onnx lable predicted 1    0.6720006465911865    order my broadband on the th of septemb with the understand that it would take week to set up and on the go live date the th could plug in the router and id be golden the engin didnt show up and to cut long stori short after mani call with numer complaint lodg we get our next updat on the th of octob but get £ in compens off the first bill my advic to anyon consid go with now tv is do it but the instant someth goe wrong just leav you wont get it fix or resolv and anoth compani will work harder to keep you
Onnx lable predicted 5    0.6880006790161133    when my deliveri didnt arriv they immedi sent out replac cus

Onnx lable predicted 5    0.942001223564148    prompt deliveri valu for money qualiti of product
Onnx lable predicted 5    0.442000150680542    birthday present so not yet use veri quick despatch but let down by herm courier fail to arriv on occas and when it did it wa just left on doorstep in the rain and an email sent state left in safe place with photo that can onli describ as wet concret disgrac servic
Onnx lable predicted 5    0.9950013160705566    the right part deliv on time
Onnx lable predicted 5    0.6630005836486816    great store alway got that one thing you haven’t seenplay in year price are fantast
Onnx lable predicted 1    0.5520004034042358    broken link hard to navig account or even access it no commun been tri for day anda hald to chang my password so can login but it not happen just crap
Onnx lable predicted 5    0.8920010924339294    absolut mint got an incred deal on phone and wa abl to chang network provid with eas
Onnx lable predicted 1    0.5710004568099976    t

Onnx lable predicted 1    0.8730010390281677    diswash now been broken dwn over two mth ago tri to arrang for teamknow to come and get it fix tri to arrang date and time wa like pull my teeth out told they dont give time slot explain that am home alway after pm and that would be ideal for me ha work on the commun with complex care dont get much time off they say no time slot could be morn could be afternoon eventu got for the th thi month have to book day off got the nw motor fit then anoth problem with the dishwash so onc again tri to get them out to suit my work is crazi explain that am not book yet anoth day off told not much they can do they blame the engin the engin blame the teamknow book staff it joke ask for mon th of june when it will be my day off told they dont book that in advanc when you phone anybodi els like plumber so on and so they do there upmost to fit around you then you get told that they onli do mon wed and friday never experienc anythink like thi but they dont m

Onnx lable predicted 5    0.872001051902771    ebrahiem an except custom rep he remind me of when wa stage technic support engin for microsoft he tick all my box my custom experi with him wa top notch thank again for the experi
Onnx lable predicted 1    0.7510008215904236    veri poor liter been with them for over year ask if can go sim onli wa advis there wa an earli upgrad fee thi earli upgrad fee stay the same everi singl month even though they continu to take monthli payment from me that asid from the fact that onc you take out contract they put your monthli payment up pretti much straight away no doubt ill be better off with differ provid with had done thi year ago tbh
Onnx lable predicted 1    0.8440009951591492    thi compani are so bad they take your money then when your dryer set on fire they take week to get an engin out do not buy their insur product as it not worth the paper it written on also spent hour on the phone tri to resolv without success still no dryer or engin
Onn

Onnx lable predicted 5    0.9850013256072998    so easi to use and great qualiti product with quick deliveri couldn’t ask for more 😊
Onnx lable predicted 5    0.8770010471343994    amina wa so help in get my issu dealt with custom servic wa spot on and veri effici thank you again amina
Onnx lable predicted 5    0.9860012531280518    quick deliveri good servic and veri cheap
Onnx lable predicted 5    0.9770012497901917    great wash machinedeliveri and instal wa fast and effici would recommend
Onnx lable predicted 5    0.8610010743141174    thi is my second purchas from envirofon and am pleas with both product which arriv just as describ servic is prompt and effici and easi to use and am especi glad to be use recycl product thank
Onnx lable predicted 5    0.8930010795593262    quick and easi to use mazuma keep you updat at each stage and payment is swift alway use to recycl item
Onnx lable predicted 5    0.7180007696151733    excel servic and they they let you know of what is go on from

Onnx lable predicted 1    0.4510001838207245    my wife said she would like to buy coffe machin for ourselv we look at web page and then decid to visit curri in plymouth we were serv by young man call nadeem who wa veri knowledg and told us which would suit our need he went further and explain variou option open to us should we have problem with the machin hi present of the particular machin convinc us and we purchas the machin
Onnx lable predicted 5    0.8400009870529175    excel sale ladi who made my choic easi fitter arriv at agre time and instal the oven with no mess or delay
Onnx lable predicted 1    0.9210011959075928    if could give minu star would both bt and openreach are complet incompet recent move home to semi detach hous the area live in ha had the benefit of most properti be on full fibr to premis upgrad as result the tenant befor me took advantag of thi and had fftp broadband prior to me move in as result have the fibr infrastructur run around the outsid of my hous the 

Onnx lable predicted 1    0.5220003724098206    it am am call custom servic to renew my month wifi contract thi is the onli way to renew when call thank you for reach now tv the renew depart is close at the moment our open time are to day week lol
Onnx lable predicted 5    0.5840004682540894    wa in urgent need of painter and amaz ladi came and did an fantast job in my hallway and land such fantast price aswel im over the moon with their work today spoke to rachel over the phone and we went through all the detail both ladi are incred and help me with odd touch up on my radiat and skirt board they gave me some bit to use to get mark off aswel they were realli love and chat to my daughter so happi with their work would definit use them again thank you x
Onnx lable predicted 5    0.7870008945465088    there were no problem and would do busi with them again
Onnx lable predicted 1    0.7970008850097656    absolut piec of junk constant buffer issu despit have fibr optic broadband that work 

Onnx lable predicted 5    0.5840004682540894    deliveri at the local depot for three day ask ebuy for help the item return to them enot system close down on thi purchas possibl due to me ask for refund
Onnx lable predicted 5    0.7090007066726685    purchas k it wa so easi to speak to one of their peopl had problem with an old and the ladi sort me out with patienc and great deal of help eventu decid to purchas new deliveri wa next day and cannot fault them at all love peopl and great help not to mention wonder vacuum cleaner
Onnx lable predicted 5    0.9820012450218201    quick effici excel defin recommend
Onnx lable predicted 1    0.771000862121582    veri disappoint as had taken an afternoon of work to take deliveri of an integr wash machin that wa then to be instal all paid for and confirm of thi receiv also paid for the deliveri time so could get time off work when the deliveri guy turn up they were just that deliveri guy need anoth team to instal so they left my wash machin and t

Onnx lable predicted 5    0.9760012626647949    easi websit to use good order and deliv next day excel all round
Onnx lable predicted 5    0.9820013046264648    place order on monday night deliv tuesday dinnertim
Onnx lable predicted 1    0.7550008296966553    had virgin instal on st dec when the engin came to instal it the cabl wa alreadi damag so the instal guy couldnt set it up the engin came out the same day and laid anoth cabl but the servic still didnt work so on rd dec call virgin who said anoth engin would need to come out to inspect the problem but the earliest appoint wa th januari so no internet over christma and new year when the engin did come out he again found it wa anoth damag cabl and would arrang an engin to come out the follow day to repeat the process but after wait in all day no one turn up again contact virgin who said the engin had not been book and the earliest they could get one out wa the th feb absolut disgust they want me to go without internet for week extr

I am just ensuring sklearn model and Onnx model has got same accuracy in the test set

In [56]:
testy_pred = np.array(output_list)

print(confusion_matrix(y_test,testy_pred))
print(classification_report(y_test,testy_pred))
print(accuracy_score(y_test, testy_pred))

[[345   0   0   0  36]
 [ 35   0   0   0  11]
 [ 14   0   0   0  10]
 [ 20   0   0   0  68]
 [ 29   0   0   0 632]]
              precision    recall  f1-score   support

           1       0.78      0.91      0.84       381
           2       0.00      0.00      0.00        46
           3       0.00      0.00      0.00        24
           4       0.00      0.00      0.00        88
           5       0.83      0.96      0.89       661

    accuracy                           0.81      1200
   macro avg       0.32      0.37      0.35      1200
weighted avg       0.71      0.81      0.76      1200

0.8141666666666667


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
