#**Spam Comments Detection**
Detecting spam comments is the task of text classification in Machine Learning. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content

#importing the necessary Python libraries and the dataset:

In [26]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB


In [20]:
data =pd.read_csv('/content/Youtube01-Psy.csv')
data.sample(5)

Unnamed: 0,COMMENT_ID,AUTHOR,DATE,CONTENT,CLASS
257,z12dddjgvrnvtxkni22gsfdzqmu2z1qmm,Angela Flemming,2014-11-08T03:46:33,Still a very fun music video to watch! ﻿,0
74,z13osfxhtkfmwpxue234z3wimzmcs1k2x,Stefano Albanese,2014-11-02T12:04:36,http://www.guardalo.org/best-of-funny-cats-gat...,1
313,z13pv52hkmf4jn23g22nzx5zqr2gen1gv04,LBEProductions,2014-11-12T01:40:22,Hey guys can you check my channel out plz. I d...,1
331,z13supiartrcdr4la22xc3aripu2x1z3a,Ink Video Shorts,2014-11-13T02:33:53,Hey come check us out were new on youtube let ...,1
345,z13th1q4yzihf1bll23qxzpjeujterydj,Carmen Racasanu,2014-11-14T13:27:52,How can this have 2 billion views when there's...,0


In [21]:
data.head()

Unnamed: 0,COMMENT_ID,AUTHOR,DATE,CONTENT,CLASS
0,LZQPQhLyRh80UYxNuaDWhIGQYNQ96IuCg-AYWqNPjpU,Julius NM,2013-11-07T06:20:48,"Huh, anyway check out this you[tube] channel: ...",1
1,LZQPQhLyRh_C2cTtd9MvFRJedxydaVW-2sNg5Diuo4A,adam riyati,2013-11-07T12:37:15,Hey guys check out my new channel and our firs...,1
2,LZQPQhLyRh9MSZYnf8djyk0gEF9BHDPYrrK-qCczIY8,Evgeny Murashkin,2013-11-08T17:34:21,just for test I have to say murdev.com,1
3,z13jhp0bxqncu512g22wvzkasxmvvzjaz04,ElNino Melendez,2013-11-09T08:28:43,me shaking my sexy ass on my channel enjoy ^_^ ﻿,1
4,z13fwbwp1oujthgqj04chlngpvzmtt3r3dw,GsMega,2013-11-10T16:05:38,watch?v=vtaRGgvGtWQ Check this out .﻿,1


**We only need the content and class column from the dataset for the rest of the task**

In [22]:
data=data[["CONTENT","CLASS"]]
data.sample(5)

Unnamed: 0,CONTENT,CLASS
236,this comment is wrong﻿,0
126,PSY - GANGNAM STYLE (강남스타일) M/V: http://youtu....,0
71,plz check out fablife / welcome to fablife for...,1
248,"Why the fuck this keeps updated? Comments :""5 ...",0
216,Lol...I dunno how this joke gets a lot of like...,0


**The class column contains values 0 and 1. 0 indicates not spam, and 1 indicates spam. So to make it look better, I will use spam and not spam labels instead of 1 and 0:**

In [23]:
data["CLASS"]=data["CLASS"].map({0:"Not Spam",
                                 1:"Spam Comment"})


In [24]:
data.sample(10)

Unnamed: 0,CONTENT,CLASS
42,SUBSCRIBE TO ME AND I'LL SUBSCRIBE TO YOU! (Mu...,Spam Comment
186,most viewed video in the world﻿,Not Spam
112,This song never gets old love it.﻿,Not Spam
311,Please check out my vidios guys﻿,Spam Comment
336,To everyone joking about how he hacked to get ...,Not Spam
26,Hey guys! Im a 12 yr old music producer. I mak...,Spam Comment
223,Can anyone sub to my channel? :D﻿,Spam Comment
181,Please check out my vidios﻿,Spam Comment
118,It is 0 zero﻿,Not Spam
264,If you pause at 1:39 at the last millisecond y...,Not Spam


#**Training a Classification Model**


**Now let’s move further by training a classification Machine Learning model to classify spam and not spam comments. As this problem is a problem of binary classification, I will use the Bernoulli Naive Bayes algorithm to train the model:**

In [27]:
x = np.array(data["CONTENT"])
y = np.array(data["CLASS"])

cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                test_size=0.2,
                                                random_state=42)

model = BernoulliNB()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.9857142857142858


**Now let’s test the model by giving spam and not spam comments as input:**

In [28]:
sample = "Check this out: https://thecleverprogrammer.com/"
data = cv.transform([sample]).toarray()
print(model.predict(data))

['Spam Comment']


In [29]:
sample = "lack of information"
data = cv.transform([sample]).toarray()
print(model.predict(data))

['Not Spam']


In [30]:
data=cv.transform(["hello my friends"]).toarray()
print(model.predict(data))

['Not Spam']


#**Summary**
**Spam comments detection means classifying comments as spam or not spam. Spam comments on social media platforms are the type of comments posted to redirect the user to another social media account, website or any piece of content.**