## Predicting the Genre of Books from Summaries

This portfolio is all about predicting the genre tag of book by building a model that will depict the genre tags from summary.We'll use a set of book summaries from the [CMU Book Summaries Corpus](http://www.cs.cmu.edu/~dbamman/booksummaries.html) in this experiment.  This contains a large number of summaries (16,559) and includes meta-data about the genre of the books taken from Freebase.  Each book can have more than one genre and there are 227 genres listed in total.  To simplify the problem of genre prediction we will select a small number of target genres that occur frequently in the collection and select the books with these genre labels.  This will give us one genre label per book. 

Our goal in this portfolio is to take this data and build a predictive model to classify the books into one of the five target genres.  We will extract suitable features from the summary and build a suitable model to predict the genre the of the book

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.multiclass import OneVsRestClassifier
import nltk
from nltk.corpus import stopwords
stop_words=stopwords.words('english')
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn import preprocessing
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
%matplotlib inline
import re

## Data Preparation

Our first task is to read the data. It is made available in tab-separated format with no column names. We will use `read_csv` to read the data and used '\t' (tab)  separator to  and supply the column names.  The field names we get  from the [ReadMe](data/booksummaries/README.txt) file.

In [2]:
names = ['wid', 'fid', 'title', 'author', 'date', 'genres', 'summary']

books = pd.read_csv("data/booksummaries/booksummaries.txt", sep="\t", header=None, names=names, keep_default_na=False)
books.head()

Unnamed: 0,wid,fid,title,author,date,genres,summary
0,620,/m/0hhy,Animal Farm,George Orwell,1945-08-17,"{""/m/016lj8"": ""Roman \u00e0 clef"", ""/m/06nbt"":...","Old Major, the old boar on the Manor Farm, ca..."
1,843,/m/0k36,A Clockwork Orange,Anthony Burgess,1962,"{""/m/06n90"": ""Science Fiction"", ""/m/0l67h"": ""N...","Alex, a teenager living in near-future Englan..."
2,986,/m/0ldx,The Plague,Albert Camus,1947,"{""/m/02m4t"": ""Existentialism"", ""/m/02xlf"": ""Fi...",The text of The Plague is divided into five p...
3,1756,/m/0sww,An Enquiry Concerning Human Understanding,David Hume,,,The argument of the Enquiry proceeds by a ser...
4,2080,/m/0wkt,A Fire Upon the Deep,Vernor Vinge,,"{""/m/03lrw"": ""Hard science fiction"", ""/m/06n90...",The novel posits that space around the Milky ...


We next filter the data so that only our target genre labels are included and we assign each text to just one of the genre labels.  It's possible that one text could be labelled with two of these labels (eg. Science Fiction and Fantasy) but we will just assign one of those here. 

In [3]:
target_genres = ["Children's literature",
                 'Science Fiction',
                 'Novel',
                 'Fantasy',
                 'Mystery']

# create a Series of empty strings the same length as the list of books
genre = pd.Series(np.repeat("", books.shape[0]))
# look for each target genre and set the corresponding entries in the genre series to the genre label
for g in target_genres:
    genre[books['genres'].str.contains(g)] = g

# add this to the book dataframe and then select only those rows that have a genre label
# drop some useless columns
books['genre'] = genre
genre_books = books[genre!=''].drop(['genres', 'fid', 'wid'], axis=1)

genre_books.shape

(8954, 5)

In [4]:
# check how many books we have in each genre category
genre_books.groupby('genre').count()

Unnamed: 0_level_0,title,author,date,summary
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Children's literature,1092,1092,1092,1092
Fantasy,2311,2311,2311,2311
Mystery,1396,1396,1396,1396
Novel,2258,2258,2258,2258
Science Fiction,1897,1897,1897,1897


## Modelling

In [5]:
#Find the unique genre in the genre_books
print("Unique genre in the books:",genre_books['genre'].unique())

Unique genre in the books: ["Children's literature" 'Novel' 'Fantasy' 'Science Fiction' 'Mystery']


Our First task is to clean the data and the target features by removing numerical characters and special characters from the list.After cleaning the summary data, we will remove stopwords from the summary columns  and store the clean the data in a column of the genre_book dataframe.

In [6]:
#Clean the genres in the dataset
genres=[]
import re
for i in genre_books['genre']:
    genres.append(re.sub("'","",i))
genre_books['genre']=genres

In [7]:
print("Clean and unique genre in the books:",genre_books['genre'].unique())

Clean and unique genre in the books: ['Childrens literature' 'Novel' 'Fantasy' 'Science Fiction' 'Mystery']


In [8]:
#clean the text and remove stopwords from the summary columns
def Clean_text(text):
    text=re.sub("[^a-zA-Z]"," ",text)
    text = ' '.join(text.split())
    text=text.lower()
    text=(item for item in text.split() if item not in stop_words)
    return ' '.join(text)

In [9]:
genre_books['summary'] = genre_books['summary'].apply(lambda x: Clean_text(x))

Now, we will start our building our model by converting the genre into the features.Since we have single column data so we will
use LabelBinarizer that will convert the text genre into the features.LabelBinarizer makes our process easy by using the transform method.

In [10]:
lb = preprocessing.LabelBinarizer()
lb.fit(genre_books['genre'])

# transform target variable into feature
y = lb.transform(genre_books['genre'])

After that we will use TfidfVectorizer that computes the word counts, IDF values, and Tf-idf scores all in one step in the  same dataset.In the TfidfVectorizer, we have use max_df=0.8, it will remove those words that appears more than in 80% in the document and use 10000 most frequent words as a feature in our model

In [11]:
tfidf_vector = TfidfVectorizer(max_df=0.8, max_features=10000)

Next, we will split the columns summary and genre into test and train dataset

In [12]:
X_train, X_test, y_train, y_test = train_test_split(genre_books['summary'], y, test_size=0.25, random_state=142)

In [13]:
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((6715,), (2239,), (6715, 5), (2239, 5))

Next we will use tfidf_vectorization to calculate the tf-idf of X_train and X_test dataset

In [14]:
Xtrain_tfidf = tfidf_vector.fit_transform(X_train)
Xtest_tfidf= tfidf_vector.transform(X_test)
print("TF_IDF for Train data set of summary:", Xtrain_tfidf.shape)
print("TF_IDF for Test data set of summary:", Xtest_tfidf.shape)

TF_IDF for Train data set of summary: (6715, 10000)
TF_IDF for Test data set of summary: (2239, 10000)


Since we have 5 different target variables so we will have to fit 5 different model with same set of TFIDF features, so can increase our time complexity so to reduce the time complexity of the program we will use logistic regression and for binary relevance problem, we have used OneVSRestClassifier

In [15]:
lr = LogisticRegression()
clf = OneVsRestClassifier(lr)

train the model on the features 'summary train dataset and genre train dataset' and then predict the genre of book of summary test dataset.

In [16]:
clf.fit(Xtrain_tfidf, y_train)

OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None,
                                                 dual=False, fit_intercept=True,
                                                 intercept_scaling=1,
                                                 l1_ratio=None, max_iter=100,
                                                 multi_class='auto',
                                                 n_jobs=None, penalty='l2',
                                                 random_state=None,
                                                 solver='lbfgs', tol=0.0001,
                                                 verbose=0, warm_start=False),
                    n_jobs=None)

In [17]:
y_pred = clf.predict(Xtest_tfidf)

In [18]:
z=lb.inverse_transform(y_pred)

In [19]:
def genre_Pred(summary):
    sum_vec = tfidf_vector.transform([summary])
    sum_pred = clf.predict(sum_vec)
    z=lb.inverse_transform(sum_pred)
    return z

Next, we will predict the genere of all X_test features.

In [20]:
for i in range(len(X_test)): 
  k = X_test.index[i] 
  print("book: ", genre_books['title'][k], "\nPredicted genre: ", genre_Pred(X_test[k])), print("Actual genre: ",genre_books['genre'][k], "\n")

book:  Uncle Cleans Up 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Dear Enemy 
Predicted genre:  ['Novel']
Actual genre:  Mystery 

book:  A Bend in the River 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Murder by the Book 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  What Happened to Mr. Forster? 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Rising 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Rainbow Valley 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Threat 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Children of Magic Moon 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Warbreaker 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Worst Band In The Universe 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Time

book:  Queer 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Last Ship 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Son of the Tree 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  A Practical Man 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Idoru 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Flight of the Old Dog 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Mask 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Magic Pudding 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Inkdeath 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Curse of the Incredible Priceless Corncob 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Festival of Death 
Predicted genre:  ['Childrens lite

book:  Setting Free the Bears 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Rainbows End 
Predicted genre:  ['Novel']
Actual genre:  Science Fiction 

book:  The Doom Brigade 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Felix Holt, the Radical 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Acts of Faith,1985 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Bridesmaid 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Atlantis Prophecy 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Settling Accounts: In at the Death 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Worry Website 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  War Game 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Castle Roogna 
Predicted genre:  ['Fantasy']
Actua

book:  Waiting for The Rain 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Macdermots of Ballycloran 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Operation Thunder Child 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Magellanic Cloud 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Big Planet 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Daniel X: Watch the Skies 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Sail 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Grotesque 
Predicted genre:  ['Novel']
Actual genre:  Mystery 

book:  Sense and Sensibility 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Moderato Cantabile 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Until the Celebration 
Predicted genre:  ['Childrens li

book:  The Adventures of Super Diaper Baby 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Skin Tight 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Fire on the Mountain 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Dragons of Babel 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Bungalow 2 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Postcards from the Edge 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  All You Need Is Kill 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Ballad of Beta-2 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Santaroga Barrier 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Roar 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Bastard Prince 
Predicted genre:  ['Fan

book:  Snow Country 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Command & Conquer: Tiberium Wars 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Kiss Me, Judas 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Kid from Hell 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Secret Agent on Flight 101 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  The Chinese Maze Murders 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  The Serpent's Shadow 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Bell at Sealey Head 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Set This House on Fire 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Blue Light 
Predicted genre:  ['Fantasy']
Actual genre:  Science Fiction 

book:  Smart Women 
Predicted genre:  ['Novel']
Actual genre:  Novel 

b

book:  Snail Mail No More 
Predicted genre:  ['Novel']
Actual genre:  Childrens literature 

book:  Midnight Lamp 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Grunts! 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Sentenced to Prism 
Predicted genre:  ['Science Fiction']
Actual genre:  Fantasy 

book:  Girl in May 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Dies the Fire 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  On the Banks of Plum Creek 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Great Kapok Tree 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Algebra of Ice 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Crow 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Kalimantaan 
Predicted genre:  ['Childrens liter

Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Kinsmen of the Dragon 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Jewel In The Skull 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Eager 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Lust Lizard of Melancholy Cove 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Mr. Monk on Patrol 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Porno 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Sign of the Beaver 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Talkative Man 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Wren to the Rescue 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Man Who Folded Himself 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Beasts of No Na

book:  Specter of the Past 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  A Person of Interest 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Boba Fett: The Fight to Survive 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Kobayashi Maru 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Blue World 
Predicted genre:  ['Fantasy']
Actual genre:  Science Fiction 

book:  Decline and Fall 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Prophecy: Child of Earth 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Journey to Atlantis 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Sideways Arithmetic From Wayside School 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Anne of Ingleside 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  To

book:  Burning Chrome 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Laura 
Predicted genre:  ['Novel']
Actual genre:  Mystery 

book:  Pretties 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Tin Drum 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Given Day 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Whalestoe Letters 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Stop the Train 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Harpy Thyme 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Devil on My Back 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Household Gods 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Dark Journey 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Folding Star 
Pred

book:  Saturn Rukh 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Magician 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Wonders of a Godless World 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Black Bouquet 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Hunted 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Girl Who Loved Wild Horses 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Gravity Dreams 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Mike 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Journey Into Fear 
Predicted genre:  ['Novel']
Actual genre:  Mystery 

book:  Threshold 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Nightmare of Black Island 
Predicted genre:  ['Childrens liter

book:  Calculating God 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Hello America 
Predicted genre:  ['Novel']
Actual genre:  Science Fiction 

book:  Lair of the Lion 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Woven Path 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Rising Force 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  The Eleventh Tiger 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The Joy Luck Club 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  An Imaginative Experience 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Mr. Monk in Outer Space 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Pennterra 
Predicted genre:  ['Science Fiction']
Actual genre:  Fantasy 

book:  The Debt Collector 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

b

book:  Empire Falls 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  And Both Were Young 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Silver Branch 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Baber's Apple 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Farewell to Manzanar 
Predicted genre:  ['Novel']
Actual genre:  Childrens literature 

book:  The Silver Mistress 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  In Winter's Shadow 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Out of This World Watt-Evans 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Undead and Unworthy 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Hundred and One Dalmatians 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Family of Pascual Duarte 
Predicted genre:  [

book:  Lost Light 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  Hideaway 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Twilight 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Three Witnesses 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  Mystery of the Desert Giant 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  The Little Fur Family 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Five Children and It 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Rabbit Is Rich 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Dread Brass Shadows 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Snow 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Curse of the Mistwraith 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Harry Potter and the 

book:  The Bonehunters 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Significant Others 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Castle of Llyr 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Rocket to Luna 
Predicted genre:  ['Mystery']
Actual genre:  Science Fiction 

book:  Wind from the Abyss 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Flag in Exile 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Shriek: An Afterword 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Other Wind 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Reaper Man 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Franny and Zooey 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Petersburg 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Flight of Eagles 
Predicted genre:  ['Childrens litera

book:  The Jagged Orbit 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Second Foundation 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  A Charmed Life 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Fifth Son of the Shoemaker 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Bullet Time 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Cat and Mouse 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Teutonic Knights 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Diary of a Young Girl 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Postman Always Rings Twice 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Crabwalk 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Secret of The Sirens 
Predicted genre:  ['Fantasy']
Actual genre:  

book:  Sten Adventures Book 5: Revenge of the Damned 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Renegade of Callisto 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Dossouye 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Ascension Factor 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Iceworld 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Hannibal Rising 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Vampireology: The True History of the Fallen 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  The Virgin's Lover 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Return of the King 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Warriors of Spider 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Beyond This Place 
Predicted genre:  ['C

book:  Lucky Jim 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  False Colours 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  The Edge 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The Miserable Mill 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Chocky 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Triptych 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Stormed Fortress 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Lord John and the Haunted Soldier 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Grendel 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Darkwing 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Doctor Dolittle's Return 
Predicted genre:  ['Science Fiction']
Actual genre:  Fantasy 

book:  The Lost World 
Predicted g

book:  Neanderthal 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Jack of Shadows 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  To Sail Beyond the Sunset 
Predicted genre:  ['Novel']
Actual genre:  Science Fiction 

book:  The Business 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Forgotten Planet 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Armageddon 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Sheep-Pig 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  Half-Life 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Lake of Tears 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Miss Hickory 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Losing You 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Aeneid 
Pred

book:  The Lamp Of God 
Predicted genre:  ['Childrens literature']
Actual genre:  Mystery 

book:  Dune: House Corrino 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  The moon riders 
Predicted genre:  ['Childrens literature']
Actual genre:  Childrens literature 

book:  The White Hart 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Higher Education 
Predicted genre:  ['Science Fiction']
Actual genre:  Science Fiction 

book:  Irish Love 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 

book:  Jarka Ruus 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  The Delivery Man 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  The Forever King 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Rumours of Rain 
Predicted genre:  ['Novel']
Actual genre:  Novel 

book:  Brothers Majere 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Brendon Chase 
Predicted genre:  ['Childrens literature']
Ac

book:  The Black Company 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Dawn 
Predicted genre:  ['Childrens literature']
Actual genre:  Science Fiction 

book:  Half a Team 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Son of a Witch 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Dread Mountain 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  Ourika 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  The Eye of the Heron 
Predicted genre:  ['Childrens literature']
Actual genre:  Fantasy 

book:  Falling 
Predicted genre:  ['Childrens literature']
Actual genre:  Novel 

book:  Night season 
Predicted genre:  ['Fantasy']
Actual genre:  Fantasy 

book:  A Murder of Quality 
Predicted genre:  ['Mystery']
Actual genre:  Mystery 



In [21]:
#Calculating accuracy score of the test and train data model
y_pred_train=clf.predict(Xtrain_tfidf);
y_pred_test=clf.predict(Xtest_tfidf);

y_train_accscore=accuracy_score(y_train,y_pred_train);
y_test_accscore=accuracy_score(y_test,y_pred_test);

print('TrainDataSet accuracy score:',y_train_accscore)
print('TestDataset accuracy score:',y_test_accscore)

TrainDataSet accuracy score: 0.5213700670141475
TestDataset accuracy score: 0.4037516748548459


So, the accuracy score of test and train dataset are 52 and 40 percent, which tell us that this model can predict the genre tag
of the data.It is not very high accuracy but it is not very low accuracy as the baseline accuracy for each target genre is 20 percent and we have crossed that baseline, so our model can used in predicting the genre.