Feature Extraction implementation and creating a simple spam vs ham classifier

In [14]:
import numpy as np
import pandas as pd

df = pd.read_csv('smsspamcollection.tsv', sep='\t')
df.head()

Unnamed: 0,label,message,length,punct
0,ham,"Go until jurong point, crazy.. Available only ...",111,9
1,ham,Ok lar... Joking wif u oni...,29,6
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,155,6
3,ham,U dun say so early hor... U c already then say...,49,6
4,ham,"Nah I don't think he goes to usf, he lives aro...",61,2


In [15]:
from sklearn.model_selection import train_test_split

X = df['message']  # this time we want to look at the text
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [37]:
# from sklearn.feature_extraction.text import CountVectorizer
# from sklearn.feature_extraction.text import TfidfTransformer
# TfidfVectorizer = CountVectorizer + TfidfTransformer hence importing the TfidfVectorizer only

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

text_clf = Pipeline([('tfidf', TfidfVectorizer()),       #creating a pipeline
                     ('clf', LinearSVC()),
])

# Feed the training data through the pipeline
text_clf.fit(X_train, y_train)  
predictions = text_clf.predict(X_test)
from sklearn import metrics
print(metrics.accuracy_score(y_test,predictions))

0.989668297988037


Feature extraction using Spacy 

In [6]:
import spacy

In [5]:
nlp = spacy.load('en_core_web_sm')  # loading the spacy library

# Create a Doc object

doc = nlp(u'I am an intern at Samsung Research Institute Bangalore')

# Print each token separately

for token in doc:
    print(token.text, token.pos_, token.dep_)

I PRON nsubj
am AUX ROOT
an DET det
intern NOUN attr
at ADP prep
Samsung PROPN compound
Research PROPN compound
Institute PROPN compound
Bangalore PROPN pobj


In [7]:
spacy.explain('PROPN')     # explaing the PRON used above

'proper noun'

In [8]:
spacy.explain('nsubj')     # explaining the nsubj used above

'nominal subject'

In [9]:
doc2 = nlp(u'This is the first sentence. This is another sentence. This is the last sentence.') #creating an object

In [10]:
for sent in doc2.sents:            #spacy seperates each sentence
    print(sent)

This is the first sentence.
This is another sentence.
This is the last sentence.


In [12]:
from spacy import displacy        #using spacy to show dependency
displacy.render(doc, jupyter=True)

Other Tags	
.text	
.lemma_	
.pos_	
.tag_	
.shape_	
.is_alpha	
.is_stop	

Using spacy for feature extration along with classifiers such as Naive Bayes ,improves the NLP accuracy 

Using NLTK and its sentimentIntensity Analyser tool for Sentiment analysis

In [None]:
import nltk
nltk.download('vader_lexicon')

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

Using NLTK and creating a Local server using flask framework ,this can be used to communicate with bixby

In [None]:
from flask import Flask,jsonify,request
import numpy as np
import pickle
import pandas as pd

app = Flask(__name__)

@app.route('/',methods = ['GET','POST'])
def me_api():
	def_user = {"Name":"Kartikey Singh","Age":"22"}
	if(request.method == 'POST'):
		user = request.get_json()
		predict_request = sid.polarity_scores(user['text'])
		if(predict_request['compound']>0):
			sentiment ='Positive'
		else:
			sentiment = 'Negative'
		res = pd.Series(sentiment).to_json(orient='values')        
		return jsonify({'predicted' : res[2:-2]}),200
	else:
		return jsonify({
	"Name":def_user["Name"],
	"Age":def_user["Age"],
	})

if __name__=='__main__':
	app.run()

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: off


 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [11/Jul/2020 11:25:16] "[37mPOST / HTTP/1.1[0m" 200 -
127.0.0.1 - - [11/Jul/2020 12:15:07] "[37mPOST / HTTP/1.1[0m" 200 -
127.0.0.1 - - [11/Jul/2020 12:15:40] "[37mPOST / HTTP/1.1[0m" 200 -


To stop the server, Restart the kernal 