# CAI 2820C - AI Applications Solutions

## Spring 2025

## Instructor: Claudio S. Castillo 

## Setting up the environment

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk

import joblib
import os

## Collecting Data

In [3]:
data = pd.read_excel("AllITBooks_DataSet.xlsx")
data.head()

Unnamed: 0.1,Unnamed: 0,Book_name,Sub_title,Author,Year,Pages,Language,Size,Format,Description,Category
0,0,"Pro ASP.NET Core 3, 8th. Edition",Develop Cloud-Ready Web Applications Using MVC...,Adam Freeman,2020,1400,English,38.3 MB,"PDF, ePub",\nBook Description:\nThis bestselling comprehe...,ASP.NET
1,1,Modern Data Mining Algorithms in C++ and CUDA C,Recent Developments in Feature Extraction and ...,Timothy Masters,2020,237,English,2.3 MB,"PDF, ePub",\nBook Description:\nDiscover a variety of dat...,C & C++
2,2,SAS Stored Processes,A Practical Guide to Developing Web Applications,Philip Mason,2020,338,English,11.2 MB,"PDF, ePub",\nBook Description:\nCustomize the SAS Stored ...,Software
3,3,Advanced Perl Programming,From Advanced to Expert,"William ""Bo"" Rothwell",2020,308,English,4.9 MB,"PDF, ePub",\nBook Description:\nWilliam “Bo” Rothwell’s A...,Perl
4,4,Articulate Storyline Essentials,Discover Articulate Storyline's ability to enh...,Ashley Chiasson,2015,180,English,8.8 MB,PDF,\nBook Description:\nStoryline is a powerful e...,Computers & Technology


In [4]:
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\cscas\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\cscas\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\cscas\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [6]:
data["ConsolidatedText"] = data.Book_name + " " + data.Sub_title + " " + data.Description 

In [9]:
data["ConsolidatedText"]

0       Pro ASP.NET Core 3, 8th. Edition Develop Cloud...
1       Modern Data Mining Algorithms in C++ and CUDA ...
2       SAS Stored Processes A Practical Guide to Deve...
3       Advanced Perl Programming From Advanced to Exp...
4       Articulate Storyline Essentials Discover Artic...
                              ...                        
8553    Dreamweaver CS6 Mobile and Web Development wit...
8554                                                  NaN
8555                                                  NaN
8556    MongoDB Cookbook Over 80 practical recipes to ...
8557    Foundation HTML5 with CSS3 A Modern Guide and ...
Name: ConsolidatedText, Length: 8558, dtype: object

In [11]:
stop_words = set(stopwords.words('english'))
list(stop_words)[:5]

['myself', 'shan', 'did', 'd', 'doesn']

## Cleaning text

In [13]:
def preprocess_text(text):

    text = str(text)

    text = text.lower()
    
    tokens = word_tokenize(text)

    filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]
    
    return " ".join(filtered_tokens)

In [14]:
data["CleanedDescription"] = data["ConsolidatedText"].apply(preprocess_text)

In [16]:
data[["ConsolidatedText","CleanedDescription"]]

Unnamed: 0,ConsolidatedText,CleanedDescription
0,"Pro ASP.NET Core 3, 8th. Edition Develop Cloud...",pro core 3 8th edition develop web application...
1,Modern Data Mining Algorithms in C++ and CUDA ...,modern data mining algorithms cuda c recent de...
2,SAS Stored Processes A Practical Guide to Deve...,sas stored processes practical guide developin...
3,Advanced Perl Programming From Advanced to Exp...,advanced perl programming advanced expert book...
4,Articulate Storyline Essentials Discover Artic...,articulate storyline essentials discover artic...
...,...,...
8553,Dreamweaver CS6 Mobile and Web Development wit...,dreamweaver cs6 mobile web development html5 c...
8554,,
8555,,
8556,MongoDB Cookbook Over 80 practical recipes to ...,mongodb cookbook 80 practical recipes design d...


## Creating a word embedding

In [19]:
tfidf_vectorizer = TfidfVectorizer(max_features=500)

tfidf_matrix = tfidf_vectorizer.fit_transform(data["CleanedDescription"])

In [20]:
tfidf_vectorizer.get_feature_names_out()

array(['2012', '2d', '2nd', '3d', 'able', 'access', 'across', 'add',
       'administration', 'administrators', 'advanced', 'advantage',
       'algorithms', 'allows', 'along', 'also', 'analysis', 'analytics',
       'analyze', 'android', 'apache', 'api', 'apis', 'app', 'apple',
       'application', 'applications', 'apply', 'approach', 'apps',
       'architecture', 'arduino', 'around', 'aspects', 'author',
       'authors', 'automate', 'automation', 'available', 'azure', 'based',
       'basic', 'basics', 'become', 'beginner', 'beginning', 'best',
       'better', 'beyond', 'big', 'book', 'build', 'building', 'business',
       'capabilities', 'case', 'center', 'certification', 'challenges',
       'chapter', 'chapters', 'cisco', 'clear', 'cloud', 'code', 'coding',
       'common', 'community', 'complete', 'complex', 'components',
       'comprehensive', 'computer', 'computing', 'concepts', 'concise',
       'configuration', 'configure', 'content', 'control', 'cookbook',
       'core

In [21]:
tfidfmatrix = pd.DataFrame(tfidf_matrix.toarray()  , columns=tfidf_vectorizer.get_feature_names_out())

In [23]:
tfidfmatrix.head()

Unnamed: 0,2012,2d,2nd,3d,able,access,across,add,administration,administrators,...,without,wordpress,work,working,works,world,write,writing,written,years
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.04275,0.0,0.0,0.0,0.0,0.047731,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.049063,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.067672,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.063177,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.338099,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Modeling 

In [24]:
# NMF for Topic Modeling
nmf_model = NMF(n_components=10, random_state=42)

nmf_model.fit(tfidf_matrix)

## Get topics 

In [25]:
feature_names = tfidf_vectorizer.get_feature_names_out()

topics = []

for topic_idx, topic in enumerate(nmf_model.components_):

    top_words = [feature_names[i] for i in topic.argsort()[:-11:-1]]

    print(top_words)

    topics.append(", ".join(top_words))

['nan', 'drupal', 'site', 'wordpress', 'content', 'website', 'websites', 'sites', 'modules', 'social']
['web', 'applications', 'javascript', 'application', 'development', 'framework', 'book', 'build', 'php', 'using']
['data', 'analysis', 'big', 'analytics', 'book', 'learning', 'hadoop', 'science', 'machine', 'visualization']
['game', 'games', 'unity', 'development', 'create', '3d', '2d', 'engine', 'book', 'learn']
['book', 'security', 'network', 'system', 'guide', 'windows', 'management', 'learn', 'description', 'linux']
['python', 'programming', 'language', 'code', 'book', 'learning', 'learn', 'programs', 'computer', 'using']
['ios', 'apps', 'app', 'swift', 'iphone', 'apple', 'programming', 'development', 'book', 'learn']
['java', 'programming', 'spring', 'book', 'applications', 'language', 'enterprise', 'code', 'edition', 'application']
['android', 'mobile', 'apps', 'app', 'google', 'studio', 'development', 'applications', 'application', 'devices']
['oracle', 'database', 'sql', 'serv

In [29]:
categories = {
    0: "Content Management Systems (CMS)",
    1: "Web Development and Frameworks",
    2: "Data Analysis and Big Data",
    3: "Game Development",
    4: "Network and Security Administration",
    5: "Programming Languages and Functional Programming",
    6: "Mobile App Development",
    7: "Java and Enterprise Applications",
    8: "Python and Machine Learning",
    9: "Databases and SQL Administration"
}

In [26]:
topic_assignments = nmf_model.transform(tfidf_matrix).argmax(axis=1)

In [27]:
topic_assignments

array([1, 2, 1, ..., 0, 4, 1])

In [32]:
data["AssignedTopic"] = topic_assignments
data['Topic_Keywords'] = [topics[i] for i in topic_assignments]
data['Topic'] = data['AssignedTopic'].map(categories)

In [33]:
data

Unnamed: 0.1,Unnamed: 0,Book_name,Sub_title,Author,Year,Pages,Language,Size,Format,Description,Category,ConsolidatedText,CleanedDescription,AssignedTopic,Topic_Keywords,Topic
0,0,"Pro ASP.NET Core 3, 8th. Edition",Develop Cloud-Ready Web Applications Using MVC...,Adam Freeman,2020,1400,English,38.3 MB,"PDF, ePub",\nBook Description:\nThis bestselling comprehe...,ASP.NET,"Pro ASP.NET Core 3, 8th. Edition Develop Cloud...",pro core 3 8th edition develop web application...,1,"web, applications, javascript, application, de...",Web Development and Frameworks
1,1,Modern Data Mining Algorithms in C++ and CUDA C,Recent Developments in Feature Extraction and ...,Timothy Masters,2020,237,English,2.3 MB,"PDF, ePub",\nBook Description:\nDiscover a variety of dat...,C & C++,Modern Data Mining Algorithms in C++ and CUDA ...,modern data mining algorithms cuda c recent de...,2,"data, analysis, big, analytics, book, learning...",Data Analysis and Big Data
2,2,SAS Stored Processes,A Practical Guide to Developing Web Applications,Philip Mason,2020,338,English,11.2 MB,"PDF, ePub",\nBook Description:\nCustomize the SAS Stored ...,Software,SAS Stored Processes A Practical Guide to Deve...,sas stored processes practical guide developin...,1,"web, applications, javascript, application, de...",Web Development and Frameworks
3,3,Advanced Perl Programming,From Advanced to Expert,"William ""Bo"" Rothwell",2020,308,English,4.9 MB,"PDF, ePub",\nBook Description:\nWilliam “Bo” Rothwell’s A...,Perl,Advanced Perl Programming From Advanced to Exp...,advanced perl programming advanced expert book...,5,"python, programming, language, code, book, lea...",Programming Languages and Functional Programming
4,4,Articulate Storyline Essentials,Discover Articulate Storyline's ability to enh...,Ashley Chiasson,2015,180,English,8.8 MB,PDF,\nBook Description:\nStoryline is a powerful e...,Computers & Technology,Articulate Storyline Essentials Discover Artic...,articulate storyline essentials discover artic...,4,"book, security, network, system, guide, window...",Network and Security Administration
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8553,8553,Dreamweaver CS6 Mobile and Web Development wit...,Harness the cutting-edge features of Dreamweav...,David Karlins,2013,268,English,8.98 MB,PDF,\nBook Description:\nDreamweaver is the most p...,"HTML, HTML5 & CSS Dreamweaver",Dreamweaver CS6 Mobile and Web Development wit...,dreamweaver cs6 mobile web development html5 c...,1,"web, applications, javascript, application, de...",Web Development and Frameworks
8554,8554,Beginning Amazon Web Services with Node.js,,Adam Shackelford,2015,260,English,11.07 MB,PDF,\nBook Description:\nBeginning Amazon Web Serv...,JavaScript,,,0,"nan, drupal, site, wordpress, content, website...",Content Management Systems (CMS)
8555,8555,Pro Grunt.js,,James Cryer,2015,176,English,4.92 MB,PDF,\nBook Description:\nPro Grunt.js gets you qui...,JavaScript,,,0,"nan, drupal, site, wordpress, content, website...",Content Management Systems (CMS)
8556,8556,MongoDB Cookbook,"Over 80 practical recipes to design, deploy, a...",Amol Nayak,2014,388,English,5.72 MB,PDF,\nBook Description:\nMongoDB is a powerful and...,MongoDB,MongoDB Cookbook Over 80 practical recipes to ...,mongodb cookbook 80 practical recipes design d...,4,"book, security, network, system, guide, window...",Network and Security Administration


In [35]:
print(data.Description[0])


Book Description:
This bestselling comprehensive guide to ASP.NET Core is the only book you need for ASP.NET Core development. Period.
Professional developers will produce leaner applications for the ASP.NET Core platform using the guidance in this full-color book, now in its 8th edition and updated for ASP.NET Core 3. It contains detailed explanations of the ASP.NET Core platform and the application frameworks it supports. This edition puts ASP.NET Core 3 into context and dives deep into the tools and techniques required to build modern, extensible, web applications. New features and capabilities such as MVC 3, Razor Pages, Blazor Server, and Blazor WebAssembly are covered, along with demonstrations of how they are applied.
ASP.NET Core 3 is the latest evolution of Microsoft’s ASP.NET web platform and provides a “host-agnostic” framework and a high-productivity programming model that promotes cleaner code architecture, test-driven development, and powerful extensibility.
Best-selling

In [36]:
print(data.Description[1])


Book Description:
Discover a variety of data-mining algorithms that are useful for selecting small sets of important features from among unwieldy masses of candidates, or extracting useful features from measured variables.
As a serious data miner you will often be faced with thousands of candidate features for your prediction or classification application, with most of the features being of little or no value. You’ll know that many of these features may be useful only in combination with certain other features while being practically worthless alone or in combination with most others. Some features may have enormous predictive power, but only within a small, specialized area of the feature space. The problems that plague modern data miners are endless. This book helps you solve this problem by presenting modern feature selection techniques and the code to implement them. Some of these techniques are:


Forward selection component analysis
Local feature selection
Linking features and a