### Install AutoKeras
Uncomment the following code if Autokeras is not intalled yet

In [None]:
#!pip install autokeras

### Getting the Articles dataset
This notebook estimates the popularity score of an article on social media platforms,
trained on a [News Popularity](https://archive.ics.uci.edu/ml/datasets/News+Popularity+in+Multiple+Social+Media+Platforms) dataset collected from 2015-2016.

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import autokeras as ak

news_df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00432/Data/News_Final.csv")

###Showing some samples

In [None]:
news_df

Unnamed: 0,IDLink,Title,Headline,Source,Topic,PublishDate,SentimentTitle,SentimentHeadline,Facebook,GooglePlus,LinkedIn
732,299.0,Microsoft’s OneDrive debacle shows its cloud c...,When Microsoft announced earlier this week tha...,Digital Trends via Yahoo! News,microsoft,2015-11-08 12:15:00,-0.166139,-0.259052,6,0,1
734,294.0,‘Economy to improve in next 2 quarters’,"In the coming six months, there seems to be gr...",The Hindu,economy,2015-11-08 12:54:00,0.114820,0.256116,2,0,3
736,292.0,"Get ready for a ton of Fedspeak (DJIA, SPY, SP...",The US economy had a blockbuster October. US c...,Business Insider,economy,2015-11-08 13:07:00,-0.055902,-0.378927,27,2,22
738,328.0,Microsoft to play a big part in Digital India,"Bhaskar Pramanik, Chairman, Microsoft India, s...",DNA India,microsoft,2015-11-08 16:47:00,-0.018326,0.062500,11,1,1
741,201.0,Dollar Goes From Savior to Scapegoat as Zimbab...,Zimbabwe freed its economy from the nightmare ...,Bloomberg,economy,2015-11-08 20:41:00,-0.079057,0.000000,61,0,32
...,...,...,...,...,...,...,...,...,...,...,...
93222,61866.0,Microsoft operating chief Kevin Turner is leav...,"Kevin Turner, the former Walmart executive who...",Recode,microsoft,2016-07-07 14:20:11,0.037689,-0.052129,-1,4,16
93224,61839.0,Microsoft set a new record by storing an OK Go...,Microsoft announced on Thursday that it has se...,Business Insider,microsoft,2016-07-07 14:27:11,-0.122161,0.118732,-1,3,27
93229,61849.0,Read Microsoft's Cringeworthy Millennial-Bait ...,For any corporate recruiter thinking about add...,Fortune,microsoft,2016-07-07 15:06:11,0.051031,0.178885,-1,0,6
93234,61851.0,Stocks rise as investors key in on US economy ...,The June employment report is viewed as a cruc...,MarketWatch,economy,2016-07-07 15:31:05,0.104284,0.044943,-1,3,5



### Data preprocessing
As we want to estimate the popularity score (number) based on its title and headline we will use a regressor, but first, we have to prepare the text data a suitable format.


In [None]:
# converting from other formats (such as pandas) to numpy
text_inputs = np.array(news_df.Title+ ". " + news_df.Headline).astype("str")
media_success_outputs = news_df.LinkedIn.to_numpy(dtype="int")

### Creating the data sets

In [None]:
# Split the dataset in a train and test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(text_inputs, media_success_outputs, test_size = 0.2, random_state = 10)

### Creating and training the models

In [None]:
# Initialize the text regressor
reg = ak.TextRegressor(max_trials=2) # AutoKeras tries different models.

# Callback to avoid overfitting with the EarlyStopping.
cbs = [
    tf.keras.callbacks.EarlyStopping(patience=2),
]

# Search for the best model.
reg.fit(
    x_train,
    y_train,
    callbacks=cbs
)

Trial 2 Complete [00h 03m 44s]
val_loss: 14726.8974609375

Best val_loss So Far: 14726.8974609375
Total elapsed time: 00h 07m 11s
INFO:tensorflow:Oracle triggered exit
Epoch 1/9
Epoch 2/9
Epoch 3/9
Epoch 4/9
Epoch 5/9
Epoch 6/9
Epoch 7/9
Epoch 8/9
Epoch 9/9
INFO:tensorflow:Assets written to: ./text_regressor/best_model/assets


### Evaluating the best model

In [None]:
reg.evaluate(x_test, y_test)



[13944.20703125, 13944.20703125]

###Visualizing the model

In [None]:
# First we export the model to a keras model
model = reg.export_model()

# Now, we ask for the model Sumary:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None,)]                 0         
_________________________________________________________________
expand_last_dim (ExpandLastD (None, 1)                 0         
_________________________________________________________________
text_vectorization (TextVect (None, 64)                0         
_________________________________________________________________
embedding (Embedding)        (None, 64, 32)            160032    
_________________________________________________________________
dropout (Dropout)            (None, 64, 32)            0         
_________________________________________________________________
conv1d (Conv1D)              (None, 62, 32)            3104      
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 60, 32)            3104  

### Predicting some samples

In [None]:
y_predicted = reg.predict(x_test[0:20])
for p in list(zip(x_test[0:20], y_test[0:20], [i[0] for i in y_predicted])):
    print(p)

('Obama guidance, press schedule March 1, 2016. McConnell, Reid .... President Barack Obama in Rancho Mirage, California where Monday he is hosting an ASEAN meeting. MANDEL NGAN/AFP/Getty Images.', 0, 1.938301)
('Microsoft Donates $1 Billion in Cloud Services to Nonprofits .... Microsoft Philanthropies announces a three-year program to put analytics and cloud computing into the hands of 70,000 academic, nonprofit', 11, 55.369102)
("Douglas Rushkoff Professor of Media Theory and Digital Economics .... That's because the digital economy is hurting the real economy, says media theorist Rushkoff explains more surprising facts about our digital economy in his", 59, 11.272597)
("Googling China's Economy Shows Shifting Sentiment. To get a flavor of the changing sentiment on China's economy, look no further than web searches made on Google. ", 0, 24.156607)
('Obama presses moves against tax evasion. Washington (AFP) - President Barack Obama said Friday that proposed laws to end the use of US-b

### Improving the model search

If we need more precision in less time, we can fine-tune our model using an advanced Autokeras feature that allows you to customize your search space. For instance, if your text source has a larger vocabulary (number of distinct words), you may need to create a custom pipeline in AutoKeras to increase the `max_tokens` parameter.

In [None]:
# Callback to avoid overfitting with the EarlyStopping.
cbs = [tf.keras.callbacks.EarlyStopping(patience=2)]

input_node = ak.TextInput()
output_node = ak.TextToIntSequence(max_tokens=20000)(input_node)
# use ngram as block type
output_node = ak.TextBlock(block_type='ngram')(input_node)
# regression output
output_node = ak.RegressionHead()(output_node)
# initialize AutoKeras and find the best model
automodel = ak.AutoModel(inputs=input_node, outputs=output_node, 
                         objective='val_mean_squared_error', max_trials=2)
automodel.fit(x_train, y_train, callbacks=cbs)


Trial 2 Complete [00h 03m 49s]
val_mean_squared_error: 21457.705078125

Best val_mean_squared_error So Far: 21457.705078125
Total elapsed time: 00h 09m 43s
INFO:tensorflow:Oracle triggered exit
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
INFO:tensorflow:Assets written to: ./auto_model/best_model/assets


### Evaluate the custom model

In [None]:
# Evaluate the custom model with testing data
automodel.evaluate(x_test, y_test)



[13508.931640625, 13508.931640625]