**Problem Statement**

The dataset we are refering to here contains headlines, URLs, and categories for 422,937 news stories collected by a web aggregator between March 10th, 2014 and August 10th, 2014.

News categories included in this dataset include business; science and technology; entertainment; and health. Different news articles that refer to the same news item (e.g., several articles about recently released employment statistics) are also categorized together.

Source:Kaggle

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# New section

**Acknowledgments**


This dataset comes from the UCI Machine Learning Repository. Any publications that use this data should cite the repository as follows:

Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

This specific dataset can be found in the UCI ML Repository at this URL



In [None]:
#Import libraries

import pandas as pd
import numpy as np
import spacy
from tqdm import tqdm
import re
import time
import pickle
pd.set_option('display.max_colwidth', 200)

In [None]:
#Read the dataset

dataframe=pd.read_csv('uci-news-aggregator.csv')

In [None]:
dataframe.head()

Unnamed: 0,ID,TITLE,URL,PUBLISHER,CATEGORY,STORY,HOSTNAME,TIMESTAMP
0,1,"Fed official says weak data caused by weather, should not slow taper","http://www.latimes.com/business/money/la-fi-mo-federal-reserve-plosser-stimulus-economy-20140310,0,1312750.story\?track=rss",Los Angeles Times,b,ddUyU0VZz0BRneMioxUPQVP6sIxvM,www.latimes.com,1394470370698
1,2,Fed's Charles Plosser sees high bar for change in pace of tapering,http://www.livemint.com/Politics/H2EvwJSK2VE6OF7iK1g3PP/Feds-Charles-Plosser-sees-high-bar-for-change-in-pace-of-ta.html,Livemint,b,ddUyU0VZz0BRneMioxUPQVP6sIxvM,www.livemint.com,1394470371207
2,3,US open: Stocks fall after Fed official hints at accelerated tapering,http://www.ifamagazine.com/news/us-open-stocks-fall-after-fed-official-hints-at-accelerated-tapering-294436,IFA Magazine,b,ddUyU0VZz0BRneMioxUPQVP6sIxvM,www.ifamagazine.com,1394470371550
3,4,"Fed risks falling 'behind the curve', Charles Plosser says",http://www.ifamagazine.com/news/fed-risks-falling-behind-the-curve-charles-plosser-says-294430,IFA Magazine,b,ddUyU0VZz0BRneMioxUPQVP6sIxvM,www.ifamagazine.com,1394470371793
4,5,Fed's Plosser: Nasty Weather Has Curbed Job Growth,http://www.moneynews.com/Economy/federal-reserve-charles-plosser-weather-job-growth/2014/03/10/id/557011,Moneynews,b,ddUyU0VZz0BRneMioxUPQVP6sIxvM,www.moneynews.com,1394470372027


In [None]:
# Lets count the rows for each class

dataframe['CATEGORY'].value_counts()



e    152469
b    115967
t    108344
m     45639
Name: CATEGORY, dtype: int64

As we can see classes are imbalanced let us do some class balance in the next step.

In [None]:
e = dataframe[dataframe['CATEGORY']=='e'][:45639]
b = dataframe[dataframe['CATEGORY']=='e'][:45639]
t = dataframe[dataframe['CATEGORY']=='t'][:45639]
m = dataframe[dataframe['CATEGORY']=='m'][:45639]
type(e)
new_df=pd.concat([e,b,t,m])
new_df.head()


Unnamed: 0,ID,TITLE,URL,PUBLISHER,CATEGORY,STORY,HOSTNAME,TIMESTAMP
2169,2170,George Zimmerman Has an Armed Life on the Move,http://www.wltx.com/story/news/nation/2014/03/10/george-zimmerman-life-trayvon-gun/6275113/,WLTX.com,e,d7RBEwyH92gFSrMjpl764nNfewB0M,www.wltx.com,1394517154092
2170,2171,George Zimmerman Signs Autographs At Florida Gun Show,http://www.huffingtonpost.co.uk/2014/03/10/george-zimmerman-signed-autographs_n_4938486.html\?utm_hp_ref=uk,Huffington Post UK,e,d7RBEwyH92gFSrMjpl764nNfewB0M,www.huffingtonpost.co.uk,1394517154269
2171,2172,George Zimmerman Signed Autographs at an Orlando Gun Show â€” But Only 20 ...,http://www.blacknews.com/news/george-zimmerman-signed-autographs-at-an-orlando-gun-show-but-only-20-people-showed-up101101.html,BlackNews.com \(press release\),e,d7RBEwyH92gFSrMjpl764nNfewB0M,www.blacknews.com,1394517154479
2172,2173,George Zimmerman back in controversy,http://www.wtxl.com/news/florida_news/george-zimmerman-back-in-controversy/article_0ca03ea0-a8c3-11e3-938c-001a4bcf6878.html,WTXL ABC 27,e,d7RBEwyH92gFSrMjpl764nNfewB0M,www.wtxl.com,1394517154639
2173,2174,George Zimmerman signs autographs at a Florida gun show,http://www.msnbc.com/the-last-word/zimmerman-signs-autographs-gun-show,MSNBC,e,d7RBEwyH92gFSrMjpl764nNfewB0M,www.msnbc.com,1394517154831


## Create sentence embeddings

# New section

In [None]:
!pip install tensorflow==1.15.0
!pip install tensorflow_hub



In [None]:
import tensorflow_hub as hub
import tensorflow as tf

In [None]:
x=list(new_df['TITLE'])

In [None]:
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)

In [None]:
# Provide input tensor and create embeddings
input_tensor = ["This tutorial is on elmo embeddings from tensorflow hub ",
        "TensorFlow hub provides many reusable pre trained models in several domains"]

embeddings_tensor = elmo(x[:100],
    signature="default",
    as_dict=True)["elmo"]

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


In [None]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    embeddings = sess.run(embeddings_tensor)
    print(embeddings.shape)
    print(embeddings)


(100, 16, 1024)
[[[-0.6392205  -0.65110976 -0.34721714 ... -0.04088937 -0.10432049
   -0.4799287 ]
  [-0.02087877 -0.3912147   0.03618242 ... -0.12097799  0.19085339
   -0.23720494]
  [-0.4240775   0.20260103 -0.23037198 ...  0.11948571  0.53770876
    0.02501076]
  ...
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0.01429836
   -0.01650422]
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0.01429836
   -0.01650422]
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0.01429836
   -0.01650422]]

 [[-0.6392205  -0.65110976 -0.34721714 ... -0.16485338 -0.18690848
   -0.29432395]
  [-0.02087877 -0.3912147   0.03618242 ... -0.49658692  0.11658592
   -0.03891286]
  [-0.06946222  0.77510756  0.078308   ...  0.14057837  0.16227551
   -0.11987554]
  ...
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0.01429836
   -0.01650422]
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0.01429836
   -0.01650422]
  [-0.02840841 -0.04353216  0.04130162 ...  0.02583168 -0

In [None]:
#The above created embeddings can be fed into neural net models to solve machine learning problems.


