# Build recommendation system to recommend articles

# Objective:

The goal of this challenge is to build recommendation system to recommend articles to their readers.

Many websites today use a recommendation system to recommend articles to their readers. For example, Most websites like Quora, LinkedIn, Medium are also using a recommendation system to recommend articles to its readers

# Step 1: Import all the required libraries

In [5]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
import warnings
warnings.filterwarnings('ignore')

# Step 2 : Read dataset and basic details of dataset

In [9]:
df=pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/articles.csv", encoding='latin1')
df.head()

Unnamed: 0,Article,Title
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms
2,You must have seen the news divided into categ...,News Classification with Machine Learning
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning


# Step 3: Creating Recommendation System using Cosine Similarity

In [10]:
articles=list(df['Article'])

In [25]:
uni_tfidf=text.TfidfVectorizer(input=articles,stop_words='english')

This line creates a TF-IDF vectorizer (uni_tfidf) using scikit-learn's TfidfVectorizer class. This vectorizer is configured to transform the text data in articles into TF-IDF features. The stop_words="english" parameter specifies that common English stop words should be ignored during the vectorization process

In [13]:
uni_matrix=uni_tfidf.fit_transform(articles)

This line transforms the text data (articles) into a sparse matrix (uni_matrix) of TF-IDF features using the vectorizer (uni_tfidf). The resulting matrix represents the articles in the TF-IDF feature space.

In [16]:
uni_sim=cosine_similarity(uni_matrix)

This line calculates the cosine similarity matrix (uni_sim) based on the TF-IDF features. Cosine similarity is a measure of similarity between two non-zero vectors in an inner product space. In this context, it is used to measure the similarity between pairs of articles based on their TF-IDF representations.

In [18]:
def recommend_articles(x):
    return ' '.join(df['Title'].loc[x.argsort()])


This line defines a function named recommend_articles that takes a vector x (presumably a row from the cosine similarity matrix) as input. The function sorts the elements of x in ascending order and extracts the top 5 indices (excluding the highest similarity, which is with itself) using x.argsort()[-5:-1]. It then retrieves the corresponding article titles from the "Title" column of the DataFrame (df) and joins them into a string.

In [19]:
df['Recommended Articles']=[recommend_articles(x) for x in uni_sim]

This line applies the recommend_articles function to each row of the cosine similarity matrix (uni_sim) and stores the recommended articles in a new column named "Recommended Articles" in the DataFrame (df).

In [20]:
df.head()

Unnamed: 0,Article,Title,Recommended Articles
0,Data analysis is the process of inspecting and...,Best Books to Learn Data Analysis,Naive Bayes Algorithm in Machine Learning K-Me...
1,The performance of a machine learning algorith...,Assumptions of Machine Learning Algorithms,Introduction to Recommendation Systems Best Py...
2,You must have seen the news divided into categ...,News Classification with Machine Learning,Best Books to Learn Computer Vision Best Books...
3,When there are only two classes in a classific...,Multiclass Classification Algorithms in Machin...,Send Instagram Messages using Python For Loop ...
4,The Multinomial Naive Bayes is one of the vari...,Multinomial Naive Bayes in Machine Learning,Squid Game Sentiment Analysis using Python Swa...


In [21]:
print(df['Recommended Articles'][22])

Best Books to Learn Computer Vision Best Books to Learn Data Analysis Best Python Frameworks to Build APIs Squid Game Sentiment Analysis using Python Tata Motors Stock Price Prediction with Machine Learning Health Insurance Premium Prediction with Machine Learning Pfizer Vaccine Sentiment Analysis using Python Swap Items of a Python List Animated Scatter Plot using Python Send Instagram Messages using Python Voice Recorder using Python Introduction to Recommendation Systems Best Resources to Learn Python Best Books to Learn NLP Language Detection with Machine Learning News Classification with Machine Learning News Classification with Machine Learning Apple Stock Price Prediction with Machine Learning Types of Neural Networks Multilayer Perceptron in Machine Learning Multiclass Classification Algorithms in Machine Learning For Loop Over Keys and Values in a Python Dictionary Best Books to Learn Deep Learning Multinomial Naive Bayes in Machine Learning Applications of Deep Learning Naive

In [22]:
# Index 22 contains an article on “agglomerated clustering”, and all the recommended
# articles are also based on the concepts of clustering, so we can say that this 
# recommender system can also give great results in real-time.