# BBC News Recommender

Inspired by: https://github.com/DataBeast03/Portfolio/blob/master/NYT_Recommender/Content_Based_Recommendations.ipynb <br>
Data from: https://www.kaggle.com/pariza/bbc-news-summary

In [1]:
# import packages 
import numpy as np
import glob
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

In [2]:
article_list = glob.glob("data/News Articles/business/*.txt")

In [3]:
len(article_list)

510

In [4]:
article_list[:4]

['data/News Articles/business\\001.txt',
 'data/News Articles/business\\002.txt',
 'data/News Articles/business\\003.txt',
 'data/News Articles/business\\004.txt']

In [5]:
f = open(article_list[0], "r")
f.read()[:200]

'Ad sales boost Time Warner profit\n\nQuarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (Â£600m) for the three months to December, from $639m year-earlier.\n\nThe firm, which is now one '

In [6]:
articles = [open(a, "r").read() for a in article_list]

In [8]:
n = 10 # test article size
base = articles[:-n]
test = articles[-n:]

In [59]:
class Recommender():
    
    def __init__(self, base):
        
        self.tfidf_vectorizer = TfidfVectorizer(stop_words='english')
        self.tfidf_vectorizer.fit(base)
        self.base_tfidf = self.tfidf_vectorizer.transform(base)     

        
    def top_n_recommendation_ids(self, test_article, n):
        
        test_tfidf = self.tfidf_vectorizer.transform([test_article])
        
        similarity_scores = self.base_tfidf.dot(test_tfidf.toarray().T)
        similarity_scores = similarity_scores.reshape(-1)
        sorted_indicies = np.flip(np.argsort(similarity_scores))
        
        return sorted_indicies[:n]

In [60]:
recommender = Recommender(base)

In [81]:
test_id = 6
print(test[test_id][:500])

UK economy ends year with spurt

The UK economy grew by an estimated 3.1% in 2004 after accelerating in the last quarter of the year, says the Office for National Statistics (ONS).

The figure is in line with Treasury and Bank of England forecasts. The ONS says gross domestic product (GDP) rose by a strong 0.7% in the three months to 31 December, compared with 0.5% in the previous quarter. The rise came despite a further decline in production output and the worst Christmas for retailers in decad


In [78]:
rec_id = recommender.top_n_recommendation_ids(test_article=test[test_id], n=3)

In [92]:
for i,r in enumerate(rec_id):
    print(f"{i+1}) {base[r]} \n{'-'*125}\n")

1) UK interest rates held at 4.75%

The Bank of England has left interest rates on hold again at 4.75%, in a widely-predicted move.

Rates went up five times from November 2003 - as the bank sought to cool the housing market and consumer debt - but have remained unchanged since August. Recent data has indicated a slowdown in manufacturing and consumer spending, as well as in mortgage approvals. And retail sales disappointed over Christmas, with analysts putting the drop down to less consumer confidence.

Rising interest rates and the accompanying slowdown in the housing market have knocked consumers' optimism, causing a sharp fall in demand for expensive goods, according to a report earlier this week from the British Retail Consortium. The BRC said Britain's retailers had endured their worst Christmas in a decade.

"Today's no change decision is correct," said David Frost, Director General of the British Chambers of Commerce (BCC). "But, if there are clear signs that the economy slows,