# News Classification with Machine Learning

### import required libraries

In [1]:
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as mp
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB

### load the dataset using pandas

In [2]:
dataset = pd.read_csv('news_classification.csv')

### just a glimpse of the data

In [3]:
dataset.head()

Unnamed: 0,news_headline,news_article,news_category
0,50-year-old problem of biology solved by Artif...,DeepMind's AI system 'AlphaFold' has been reco...,technology
1,Microsoft Teams to stop working on Internet Ex...,Microsoft Teams will stop working on Internet ...,technology
2,Hope US won't erect barriers to cooperation: C...,"China, in response to reports of US adding Chi...",technology
3,Global smartphone sales in Q3 falls 5.7% to 36...,The global smartphone sales in the third quart...,technology
4,EU hoping Biden will clarify US position on di...,The European Union (EU) is hoping that US Pres...,technology


### check for total numbers of rows and columns

In [4]:
dataset.shape

(4817, 3)

### just check for if null value exist

In [5]:
dataset.isnull().sum()

news_headline    0
news_article     0
news_category    0
dtype: int64

there is no null value in this dataset

In [6]:
dataset['news_category'].value_counts()

world            1021
entertainment     998
sports            856
technology        751
politics          546
science           389
automobile        256
Name: news_category, dtype: int64

### check for the data types

In [7]:
dataset.dtypes

news_headline    object
news_article     object
news_category    object
dtype: object

### drop the 1st column that is news_headline because it has no use 

In [8]:
dataset = dataset.drop('news_headline',axis=1)

CountVectorizer we are converting raw text to a numerical vector representation of words and n-grams. This makes it easy to directly use this representation as features (signals) in Machine Learning tasks such as for text classification and clustering.

In [9]:
dataset = dataset[["news_article","news_category"]]

x = np.array(dataset["news_article"])
Y = np.array(dataset["news_category"])

CountV = CountVectorizer()
X = CountV.fit_transform(x)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Train & Test Dataset

In [10]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state=2)

In [11]:
print(X.shape)

(4817, 13167)


### Create a Multinominal Naive Bayes Model

In [12]:
model = MultinomialNB()

In [13]:
model.fit(X_train,Y_train)

MultinomialNB()

### Check for Accuracy

In [16]:
model.score(X_train,Y_train)*100

96.67791331430054

### Let's test this Model

In [14]:
user_input = input("Enter News Article: ")
text = CountV.transform([user_input]).toarray()
output = model.predict(text)
output

Enter News Article: Anushka Sharma took to social media to share a throwback picture of herself doing shirshasana during her pregnancy with the support of her husband Virat Kohli. "[I] used the wall for support and also my very able husband supporting my balance, to be extra safe," she wrote. She added, "I'm so glad I could continue my practice through my pregnancy."


array(['entertainment'], dtype='<U13')

Conclusion 

    - This dataset consists of news article and what kind of news it is about like sport, technology, etc.
    - So, in this i have created a Multinomial Naive Bayes model.
    - This model will take input from the user than this model will predict what kind of news it is.