<h1 style="background-color:#F18C8D; text-align:center;  color:white; padding:20px">News Recommendation Systems</h1>

<div>
<img src="images/newsimage.jpeg" width="600"/>
</div>

<h3 style="background-color:#F18C8D; color:white; padding:10px">Problem Statement</h3>

### iPrint is an upcoming media house in India that offers media and information services to the people. The company’s business extends across a wide range of media, including news and information services on sports, weather, education, health, research, stocks and healthcare. Over the years, through its online application, iPrint has been efficiently delivering news and information to the common people.  However, with time and technological advancements, several new competitors of iPrint have emerged in the market. Hence, it has decided to begin providing a more personalised experience to its customers.

 

Till date, iPrint was managing its customer base by only recommending the most popular and similar news articles to what the user has already read or watched. However, the recommended news articles were often not relevant to the majority of the users. It was not able to recommend any new content to its customers, and gradually, the company started losing such users, which eventually resulted in immense revenue loss. 

 

iPrint being a cutting-edge company, is trying to solve this issue of revenue leakage by personalising user tastes and introducing new content to its users at the start of the day on the home page of the application. iPrint is planning to assess these recommendations by tracking whether the user clicks on those items or not. Moreover, once the user clicks on any news item A, it wants to recommend news similar to the news A, at the bottom of the page of the news item A. 

 

### Now, let's understand what iPrint is trying to achieve here. By, recommending new items to the users at the start of the day, it is trying to understand if the user has new interests or not. Now, if the user clicks on the new item A and then also on subsequent items recommended at the bottom of the news item A, then iPrint can infer that the topic/ genre of the new item is something of user interest.

 

Now, you have been hired as a data scientist to help iPrint build a robust product to achieve its new objectives. Using the expertise acquired through the program, along with the learnings of the previous session, you need to build a system that would recommend relevant news articles to the customers based on their search history and preferences. 

 

So the problem statement can be divided into two parts discussed below. Of the numerous news articles available on its app about sports, politics, technology and many others, iPrint wants you as a data scientist to identify and build an appropriate recommendation system that would:

- Recommend new top 10 relevant articles to a user when he visits the app at the start of the day
- Recommend top 10 similar news articles that match the ones clicked by the user. Try different models for generating these recommendations and experiment with hybrid models for the same

### You have to ensure that the system does not recommend any news article that has been pulled out from the app or has already been seen by the user. In addition, only the articles that are written in the English language must be considered for content-based recommendations. The final generated list must contain the names of the recommended articles, along with their IDs.

<h3 style="background-color:#F18C8D; color:white; padding:10px">Solution</h3>

## 1) Data pre-processing

The data set provided to you does not contain user ratings that could be used to generate recommendations. For that you need to impute the rating values based on the feature ‘interaction type’ with the highest weightage to content_followed, followed by content_commented_on, content saved, content liked and content_watched. The final data set ‘consumer_interactions’ that will be used to build the recommendation system should contain a feature called ‘ratings’ along with other features that are required to build the ALS, user-based and item-based collaborative filtering models.

Use the ‘platform_content’ data to extract the data set that contains the English articles present on the platform.

## 2) Exploratory data analysis

- Explore the various features present in the data set for their distribution and any meaningful inferences.
- Check the distribution of interaction type, consumer location/country, producer country/location, item type and so on.
- Check the most common language and most popular country that consumes the articles on the platform.

## 3) Recommendation techniques

You may want to use the ‘consumer_interaction’ data set for building the ALS, user-based and item-based collaborative filtering models and ‘platform_content’ data to build a content-based recommendation model.

#### 3.1) User-based collaborative filtering

- Create user-item matrix using the rating values.
- Find the user-similarity matrix based on a similarity measure.
- Generate predicted ratings for all the user-item pairs.

#### 3.2) Item-based collaborative filtering

- Find the item-similarity matrix based on a similarity measure.
- Generate the top 10 similar and relevant items based on the similarity scores.

#### 3.3) Content-based filtering

- Use text processing to analyse the ‘keywords’ feature in the data set.
- Recommend similar items based on the TF-IDF scores.

#### 3.4) ALS

- Create Compressed Sparse user-item and item-user matrices. 
- Train the ALS model and generate recommendations for a user. Try experimenting with the hyperparameters.

#### 3.5) Hybrid recommendation system

- Normalise the scores for content and collaborative filtering and combine them with an appropriate weightage to build a hybrid model.
- Try out hybrids of different types of models that can help recommend items similar to a particular item. For example, Content+Item-based collaborative model, ALS+Item-based collaborative model, ALS+Content-based model, etc.


### 4) Model evaluation

Use appropriate evaluation metrics, such as RMSE, MAE and precision@k, to evaluate the recommendations generated for a user as mentioned in the first part of the problem statement and use global precision@k to assess the overall performance of the recommendation system.

<h3 style="background-color:#F18C8D; padding:10px">Import Library</h3>

In [3]:
import pandas as pd
import re
import plotly.express as px
pd.set_option('max_colwidth', 400) 
from  nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
import numpy as np
from gensim.corpora.dictionary import Dictionary
import string
from gensim.models.tfidfmodel import TfidfModel
from gensim.similarities import MatrixSimilarity
from sklearn.model_selection import train_test_split
import math
import plotly.offline as py
py.init_notebook_mode(connected=True)

<h3 style="background-color:#F18C8D; padding:10px">Reading The Dataset</h3>

In [37]:
platform_content=pd.read_csv('data/platform_content.csv')

In [38]:
platform_content.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,producer_id,producer_session_id,producer_device_info,producer_location,producer_country,item_type,item_url,title,text_description,language
0,1459192779,content_pulled_out,-6451309518266745024,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en
1,1459193988,content_present,-4110354420726924665,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en
2,1459194146,content_present,-7292285110016212249,4340306774493623681,8940341205206233829,,,,HTML,http://cointelegraph.com/news/bitcoin-future-when-gbpcoin-of-branson-wins-over-usdcoin-of-trump,Bitcoin Future: When GBPcoin of Branson Wins Over USDcoin of Trump,"The alarm clock wakes me at 8:00 with stream of advert-free broadcasting, charged at one satoshi per second. The current BTC exchange rate makes that snooze button a costly proposition! So I get up, make coffee and go to my computer to check the overnight performance of my bots. TradeBot earns me on Trump and Branson TradeBot, which allocates funds between the main chain and various national c...",en
3,1459194474,content_present,-6151852268067518688,3891637997717104548,-1457532940883382585,,,,HTML,https://cloudplatform.googleblog.com/2016/03/Google-Data-Center-360-Tour.html,Google Data Center 360° Tour,We're excited to share the Google Data Center 360° Tour - a YouTube 360° video that gives you an unprecedented and immersive look inside one of our data centers. There are several ways to view this video: On desktop using Google Chrome use your mouse or trackpad to change your view while the video plays YouTube app on mobile - move your device around to look at all angles while the video plays...,en
4,1459194497,content_present,2448026894306402386,4340306774493623681,8940341205206233829,,,,HTML,https://bitcoinmagazine.com/articles/ibm-wants-to-evolve-the-internet-with-blockchain-technology-1459189322,"IBM Wants to ""Evolve the Internet"" With Blockchain Technology","The Aite Group projects the blockchain market could be valued at $400 million by 2019. For that reason, some of the biggest names in banking, industry and technology have entered into the space to evaluate how this technology could change the financial world. IBM and Linux, for instance, have brought together some of the brightest minds in the industry and technology to work on blockchain tech...",en


In [39]:
## Shape of dataset
platform_content.shape

(3122, 13)

In [40]:
## Information of each columns 
platform_content.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3122 entries, 0 to 3121
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   event_timestamp       3122 non-null   int64 
 1   interaction_type      3122 non-null   object
 2   item_id               3122 non-null   int64 
 3   producer_id           3122 non-null   int64 
 4   producer_session_id   3122 non-null   int64 
 5   producer_device_info  680 non-null    object
 6   producer_location     680 non-null    object
 7   producer_country      680 non-null    object
 8   item_type             3122 non-null   object
 9   item_url              3122 non-null   object
 10  title                 3122 non-null   object
 11  text_description      3122 non-null   object
 12  language              3122 non-null   object
dtypes: int64(4), object(9)
memory usage: 317.2+ KB


###  Huge Missing Value For `Producer_device_info`,`Producer_location`,`producer_country`

In [41]:
## Checks for Duplicates rows
platform_content['item_id'].duplicated().sum()

65

In [42]:
platform_content['title'].duplicated().sum()

111

In [43]:
website_name=[]
for i in platform_content['item_url'].str.split('/'):
    website_name.append(i[2])
platform_content['website_name']=website_name    

In [44]:
platform_content.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,producer_id,producer_session_id,producer_device_info,producer_location,producer_country,item_type,item_url,title,text_description,language,website_name
0,1459192779,content_pulled_out,-6451309518266745024,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en,www.nytimes.com
1,1459193988,content_present,-4110354420726924665,4340306774493623681,8940341205206233829,,,,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en,www.nytimes.com
2,1459194146,content_present,-7292285110016212249,4340306774493623681,8940341205206233829,,,,HTML,http://cointelegraph.com/news/bitcoin-future-when-gbpcoin-of-branson-wins-over-usdcoin-of-trump,Bitcoin Future: When GBPcoin of Branson Wins Over USDcoin of Trump,"The alarm clock wakes me at 8:00 with stream of advert-free broadcasting, charged at one satoshi per second. The current BTC exchange rate makes that snooze button a costly proposition! So I get up, make coffee and go to my computer to check the overnight performance of my bots. TradeBot earns me on Trump and Branson TradeBot, which allocates funds between the main chain and various national c...",en,cointelegraph.com
3,1459194474,content_present,-6151852268067518688,3891637997717104548,-1457532940883382585,,,,HTML,https://cloudplatform.googleblog.com/2016/03/Google-Data-Center-360-Tour.html,Google Data Center 360° Tour,We're excited to share the Google Data Center 360° Tour - a YouTube 360° video that gives you an unprecedented and immersive look inside one of our data centers. There are several ways to view this video: On desktop using Google Chrome use your mouse or trackpad to change your view while the video plays YouTube app on mobile - move your device around to look at all angles while the video plays...,en,cloudplatform.googleblog.com
4,1459194497,content_present,2448026894306402386,4340306774493623681,8940341205206233829,,,,HTML,https://bitcoinmagazine.com/articles/ibm-wants-to-evolve-the-internet-with-blockchain-technology-1459189322,"IBM Wants to ""Evolve the Internet"" With Blockchain Technology","The Aite Group projects the blockchain market could be valued at $400 million by 2019. For that reason, some of the biggest names in banking, industry and technology have entered into the space to evaluate how this technology could change the financial world. IBM and Linux, for instance, have brought together some of the brightest minds in the industry and technology to work on blockchain tech...",en,bitcoinmagazine.com


In [45]:
## Replacing the 'NaN' with  unknown value  
platform_content['producer_country']=platform_content['producer_country'].fillna('Unknown')
platform_content['producer_location']=platform_content['producer_location'].fillna('Unknown')
platform_content['producer_device_info']=platform_content['producer_device_info'].fillna('Unknown')

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Top 10 websites names whose articles are present on website</h3>

In [46]:
fig=px.bar(platform_content['website_name'].value_counts().head(10),
       labels={'index':'website name','value':'count'},
        color=['techcrunch.com', 'medium.com', 'cloudplatform.googleblog.com',
       'startupi.com.br', 'www.imdb.com', 'exame.abril.com.br',
       'googlediscovery.com', 'www.mckinsey.com', 'www.businessinsider.com',
       'www.wired.com'])
fig.show()

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Language of the content on the platform</h3>

In [47]:
fig=px.bar(platform_content['language'].value_counts(),
       labels={'index':'language','value':'count'},
        color=['en', 'pt', 'la', 'es', 'ja'])
fig.show()

- Most of the articles are present in `English` and `Portuguese (Portugal)` But in this case study we have consider only english as language

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Type of item on the platform</h3>

In [48]:
fig=px.bar(platform_content['item_type'].value_counts(),
       labels={'index':'Item','value':'count'},
        color=['HTML', 'VIDEO', 'RICH'])
fig.show()

- Most of the item present on platform is `HTML` Type

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Country of the producer</h3>

In [49]:
fig=px.bar(platform_content['producer_country'].value_counts(),
       labels={'index':'Country','value':'count'},
       color=['Unknown', 'BR', 'US', 'CA', 'AU', 'PT'])
fig.show()

- Most  of the Articles Present on the platform are produced by  `Unknown` country

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Location of the producer</h3>

In [50]:
fig=px.bar(platform_content['producer_location'].value_counts(),
       labels={'index':'Location','value':'count'},
        color_discrete_sequence=["red"])
fig.show()

 - Most  of the Articles Present on the platform are produced are  `Unknown` location

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Top 10 Producer_id</h3>

In [51]:
fig=px.bar(platform_content['producer_id'].value_counts().head(),
       labels={'index':'Producer_id','value':'count'},
        color_discrete_sequence=["red"])
fig.show()

In [52]:
platform_content['producer_session_id'].nunique()

2017

In [53]:
platform_content['item_id'].nunique()

3057

In [54]:
platform_content.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,producer_id,producer_session_id,producer_device_info,producer_location,producer_country,item_type,item_url,title,text_description,language,website_name
0,1459192779,content_pulled_out,-6451309518266745024,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en,www.nytimes.com
1,1459193988,content_present,-4110354420726924665,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en,www.nytimes.com
2,1459194146,content_present,-7292285110016212249,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://cointelegraph.com/news/bitcoin-future-when-gbpcoin-of-branson-wins-over-usdcoin-of-trump,Bitcoin Future: When GBPcoin of Branson Wins Over USDcoin of Trump,"The alarm clock wakes me at 8:00 with stream of advert-free broadcasting, charged at one satoshi per second. The current BTC exchange rate makes that snooze button a costly proposition! So I get up, make coffee and go to my computer to check the overnight performance of my bots. TradeBot earns me on Trump and Branson TradeBot, which allocates funds between the main chain and various national c...",en,cointelegraph.com
3,1459194474,content_present,-6151852268067518688,3891637997717104548,-1457532940883382585,Unknown,Unknown,Unknown,HTML,https://cloudplatform.googleblog.com/2016/03/Google-Data-Center-360-Tour.html,Google Data Center 360° Tour,We're excited to share the Google Data Center 360° Tour - a YouTube 360° video that gives you an unprecedented and immersive look inside one of our data centers. There are several ways to view this video: On desktop using Google Chrome use your mouse or trackpad to change your view while the video plays YouTube app on mobile - move your device around to look at all angles while the video plays...,en,cloudplatform.googleblog.com
4,1459194497,content_present,2448026894306402386,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,https://bitcoinmagazine.com/articles/ibm-wants-to-evolve-the-internet-with-blockchain-technology-1459189322,"IBM Wants to ""Evolve the Internet"" With Blockchain Technology","The Aite Group projects the blockchain market could be valued at $400 million by 2019. For that reason, some of the biggest names in banking, industry and technology have entered into the space to evaluate how this technology could change the financial world. IBM and Linux, for instance, have brought together some of the brightest minds in the industry and technology to work on blockchain tech...",en,bitcoinmagazine.com


<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Interaction Type</h3>

In [55]:
fig=px.bar(platform_content['interaction_type'].value_counts(),
       labels={'index':'type','value':'count'},
        color=['content_present', 'content_pulled_out'])
fig.show()

### In addition, only the articles that are written in the English language must be considered for content-based recommendations

In [56]:
platform_content=platform_content[platform_content['language']=='en']

In [57]:
platform_content.shape

(2264, 14)

## Removing the content_pulled_out Articles

In [58]:
platform_content=platform_content[~(platform_content['interaction_type']=='content_pulled_out')]
platform_content.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,producer_id,producer_session_id,producer_device_info,producer_location,producer_country,item_type,item_url,title,text_description,language,website_name
1,1459193988,content_present,-4110354420726924665,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://www.nytimes.com/2016/03/28/business/dealbook/ethereum-a-virtual-currency-enables-transactions-that-rival-bitcoins.html,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing...",en,www.nytimes.com
2,1459194146,content_present,-7292285110016212249,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://cointelegraph.com/news/bitcoin-future-when-gbpcoin-of-branson-wins-over-usdcoin-of-trump,Bitcoin Future: When GBPcoin of Branson Wins Over USDcoin of Trump,"The alarm clock wakes me at 8:00 with stream of advert-free broadcasting, charged at one satoshi per second. The current BTC exchange rate makes that snooze button a costly proposition! So I get up, make coffee and go to my computer to check the overnight performance of my bots. TradeBot earns me on Trump and Branson TradeBot, which allocates funds between the main chain and various national c...",en,cointelegraph.com
3,1459194474,content_present,-6151852268067518688,3891637997717104548,-1457532940883382585,Unknown,Unknown,Unknown,HTML,https://cloudplatform.googleblog.com/2016/03/Google-Data-Center-360-Tour.html,Google Data Center 360° Tour,We're excited to share the Google Data Center 360° Tour - a YouTube 360° video that gives you an unprecedented and immersive look inside one of our data centers. There are several ways to view this video: On desktop using Google Chrome use your mouse or trackpad to change your view while the video plays YouTube app on mobile - move your device around to look at all angles while the video plays...,en,cloudplatform.googleblog.com
4,1459194497,content_present,2448026894306402386,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,https://bitcoinmagazine.com/articles/ibm-wants-to-evolve-the-internet-with-blockchain-technology-1459189322,"IBM Wants to ""Evolve the Internet"" With Blockchain Technology","The Aite Group projects the blockchain market could be valued at $400 million by 2019. For that reason, some of the biggest names in banking, industry and technology have entered into the space to evaluate how this technology could change the financial world. IBM and Linux, for instance, have brought together some of the brightest minds in the industry and technology to work on blockchain tech...",en,bitcoinmagazine.com
5,1459194522,content_present,-2826566343807132236,4340306774493623681,8940341205206233829,Unknown,Unknown,Unknown,HTML,http://www.coindesk.com/ieee-blockchain-oxford-cloud-computing/,IEEE to Talk Blockchain at Cloud Computing Oxford-Con - CoinDesk,"One of the largest and oldest organizations for computing professionals will kick off its annual conference on the future of mobile cloud computing tomorrow, where blockchain is scheduled to be one of the attractions. With more than 421,000 members in 260 countries, the Institute of Electrical and Electronics Engineers (IEEE) holding such a high-profile event has the potential to accelerate th...",en,www.coindesk.com


In [59]:
platform_content.shape

(2211, 14)

<h1 style="background-color:#F18C8D; padding:10px"> Content-Based Recommendations</h1>

In [60]:
platform_content=platform_content=platform_content[['item_id','title','text_description']]
platform_content.head()

Unnamed: 0,item_id,title,text_description
1,-4110354420726924665,"Ethereum, a Virtual Currency, Enables Transactions That Rival Bitcoin's","All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing..."
2,-7292285110016212249,Bitcoin Future: When GBPcoin of Branson Wins Over USDcoin of Trump,"The alarm clock wakes me at 8:00 with stream of advert-free broadcasting, charged at one satoshi per second. The current BTC exchange rate makes that snooze button a costly proposition! So I get up, make coffee and go to my computer to check the overnight performance of my bots. TradeBot earns me on Trump and Branson TradeBot, which allocates funds between the main chain and various national c..."
3,-6151852268067518688,Google Data Center 360° Tour,We're excited to share the Google Data Center 360° Tour - a YouTube 360° video that gives you an unprecedented and immersive look inside one of our data centers. There are several ways to view this video: On desktop using Google Chrome use your mouse or trackpad to change your view while the video plays YouTube app on mobile - move your device around to look at all angles while the video plays...
4,2448026894306402386,"IBM Wants to ""Evolve the Internet"" With Blockchain Technology","The Aite Group projects the blockchain market could be valued at $400 million by 2019. For that reason, some of the biggest names in banking, industry and technology have entered into the space to evaluate how this technology could change the financial world. IBM and Linux, for instance, have brought together some of the brightest minds in the industry and technology to work on blockchain tech..."
5,-2826566343807132236,IEEE to Talk Blockchain at Cloud Computing Oxford-Con - CoinDesk,"One of the largest and oldest organizations for computing professionals will kick off its annual conference on the future of mobile cloud computing tomorrow, where blockchain is scheduled to be one of the attractions. With more than 421,000 members in 260 countries, the Institute of Electrical and Electronics Engineers (IEEE) holding such a high-profile event has the potential to accelerate th..."


In [61]:
for i in platform_content['text_description'][:50]:
    print(i)
    print('------------------------------------------------------------------------------------------------------')

All of this work is still very early. The first full public version of the Ethereum software was recently released, and the system could face some of the same technical and legal problems that have tarnished Bitcoin. Many Bitcoin advocates say Ethereum will face more security problems than Bitcoin because of the greater complexity of the software. Thus far, Ethereum has faced much less testing, and many fewer attacks, than Bitcoin. The novel design of Ethereum may also invite intense scrutiny by authorities given that potentially fraudulent contracts, like the Ponzi schemes, can be written directly into the Ethereum system. But the sophisticated capabilities of the system have made it fascinating to some executives in corporate America. IBM said last year that it was experimenting with Ethereum as a way to control real world objects in the so-called Internet of things. Microsoft has been working on several projects that make it easier to use Ethereum on its computing cloud, Azure. "Eth

<h3 style="background-color:#F18C8D; padding:10px">Text Preprocessing</h3>

In [62]:
def preprocess(column):
    word_list=[]
    ## Intializing the objects of porterstemmer
    stemmer = PorterStemmer()
    for word in platform_content[column].to_numpy():
        ## lowering string
        string=word.lower()
        ## removing the special character 
        cleanString = re.sub('\W+',' ', string )
        ## converting sting into tokens
        tokens=word_tokenize(cleanString)
        ## removing the stopwords
        word=[stemmer.stem(word) for word in tokens if word not in stopwords.words('english')]
        word_list.append(word)
    return word_list

In [63]:
word_list=preprocess('text_description')
print(word_list)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [64]:
## calcaluting the lenth of each sentence
sent_len=[]
for word in word_list:
    sent_len.append(len(word))
sent_len

[614,
 210,
 153,
 717,
 173,
 369,
 405,
 953,
 496,
 1187,
 2830,
 168,
 1250,
 581,
 485,
 128,
 339,
 578,
 298,
 513,
 154,
 190,
 462,
 306,
 380,
 100,
 784,
 220,
 272,
 213,
 1745,
 686,
 446,
 221,
 1181,
 288,
 1141,
 237,
 382,
 807,
 358,
 193,
 2432,
 431,
 233,
 503,
 340,
 171,
 618,
 761,
 332,
 169,
 311,
 392,
 758,
 169,
 369,
 492,
 268,
 784,
 211,
 355,
 730,
 110,
 125,
 154,
 116,
 412,
 540,
 1025,
 4017,
 552,
 4017,
 540,
 540,
 540,
 441,
 1629,
 80,
 476,
 971,
 145,
 396,
 203,
 254,
 722,
 383,
 422,
 35,
 46,
 112,
 458,
 388,
 361,
 151,
 543,
 467,
 410,
 399,
 602,
 36,
 239,
 59,
 611,
 176,
 259,
 516,
 320,
 471,
 521,
 148,
 290,
 284,
 1442,
 521,
 1353,
 92,
 586,
 888,
 121,
 200,
 723,
 261,
 312,
 397,
 143,
 167,
 115,
 1459,
 451,
 2627,
 161,
 659,
 206,
 491,
 574,
 411,
 576,
 967,
 305,
 411,
 487,
 178,
 534,
 385,
 1950,
 137,
 439,
 569,
 254,
 194,
 83,
 244,
 411,
 279,
 188,
 277,
 207,
 292,
 377,
 356,
 174,
 231,
 229,
 97,
 4

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Distribution of Text</h3>

In [65]:
fig=px.histogram(sent_len,labels={'value':'Length of Text Description'})
fig.show()

- Most of the text Decription lies 200 to 1300

In [66]:
len(word_list)

2211

In [67]:
## Total numbers words
number_words = 0
for word in word_list:
    number_words = number_words + len(word)
number_words

1266552

# Creating the dictionary

In [68]:
dictionary = Dictionary(word_list) 
dictionary

<gensim.corpora.dictionary.Dictionary at 0x2265cd7c940>

In [69]:
## Overall length of Dictionary
len(dictionary)

31934

In [70]:
## Getting the word 
dictionary.get(15)

'advoc'

## Generate Bag of Words

<div>
    <img src="images/bagofwords.png" width="600"/>
</div>

In [71]:
## creating Bag of Words
corpus = [dictionary.doc2bow(doc) for doc in word_list] 

In [72]:
##length of corpus[0]
len(corpus[0])

358

In [73]:
## For Each each text description calculating the tfidf
tfidf = TfidfModel(corpus)

In [74]:
## getting the Tfidf for document 0
tfidf[corpus[0]]

[(0, 0.012826204397772935),
 (1, 0.009308866529592128),
 (2, 0.015268984127922056),
 (3, 0.01961950781177451),
 (4, 0.007325328909741712),
 (5, 0.02479405938352334),
 (6, 0.015740147161804597),
 (7, 0.022084153210483622),
 (8, 0.02394804451746195),
 (9, 0.009432512366595404),
 (10, 0.01400436490631321),
 (11, 0.012257465700623688),
 (12, 0.01212457704269326),
 (13, 0.026229680016990553),
 (14, 0.008002278126139309),
 (15, 0.023725048082720083),
 (16, 0.016646455955172775),
 (17, 0.024178866967217643),
 (18, 0.015208637238195322),
 (19, 0.019239306596659404),
 (20, 0.007162436656666022),
 (21, 0.03241924717823618),
 (22, 0.02023470022808021),
 (23, 0.006728119827443642),
 (24, 0.014943829387773127),
 (25, 0.03322948348139467),
 (26, 0.014578979358860314),
 (27, 0.02063347473409601),
 (28, 0.017920899555199758),
 (29, 0.024062443311233882),
 (30, 0.025635227476895256),
 (31, 0.01229593227132032),
 (32, 0.02252577975628523),
 (33, 0.04636025857167622),
 (34, 0.0054670879970489415),
 (35, 

## Generate Similarity Matrix

<div>
<img src="images/cos_sim.png" width="600"/>
</div>

In [75]:
## similiartiy matrix
sims = MatrixSimilarity(tfidf[corpus], num_features=len(dictionary))

In [76]:
len(dictionary)

31934

In [77]:
print(sims)

MatrixSimilarity<2211 docs, 31934 features>


In [78]:
## calcualting the similarity matrix for dcoument 0
sims[corpus[0]]

array([0.8759018 , 0.04416057, 0.04072737, ..., 0.08580103, 0.14212885,
       0.01994529], dtype=float32)

In [79]:
len(sims[corpus[0]])

2211

## Recommendation

<div>
<img src="images/recommendation.jpg" width="600"/>
</div>

In [80]:
def recommedation_on_content_based(text):
    similarity_df=pd.DataFrame()
    stemmer = PorterStemmer()
    ## lowering the text
    word=text.lower()
    ## removing the special character 
    cleanString = re.sub('\W+',' ', word )
    ## tokenization
    tokens=word_tokenize(cleanString)
    ## stemmimg
    word=[stemmer.stem(word) for word in tokens if word not in stopwords.words('english')]
    ## creating bow of words
    query_doc_bow=dictionary.doc2bow(word)
    ## Tfidf
    query_doc_tfidf = tfidf[query_doc_bow] 
    ### Get Similarity Score using Similarity Matrix
    similarity_array = sims[query_doc_tfidf]
    ## get the data into DataFrame
    similarity_df=platform_content
    similarity_df['content_Score']=similarity_array
    
    similarity_df['normalized_content_score']=(similarity_df['content_Score']-min(similarity_df['content_Score']))/(max(similarity_df['content_Score'])-min(similarity_df['content_Score']))
    ##sorting the score in descending order
    result=similarity_df.sort_values(by=['normalized_content_score'],ascending=False)
    result['Article_id']=result['item_id'].map(mapping_to_ids)
   
    return result

In [81]:
recommedation_on_content_based(platform_content['text_description'][170])

Unnamed: 0,item_id,title,text_description,content_Score,normalized_content_score,Article_id
170,3394403706511230595,Scanning hyperspace: how to tune machine learning models,"Introduction When doing machine learning using Python's scikit-learn library , you can often get reasonable predictive performance by using out-of-the-box settings for your models. However, the payoff can be huge if you invest at least some time into tuning models to your specific problem and dataset. In the previous post , we explored the concepts of overfitting, cross-validation, and the bia...",1.000000,1.000000,2028.0
853,-4127059794203205931,TPOT: A Python tool for automating data science,"Machine learning is often touted as: A field of study that gives computers the ability to learn without being explicitly programmed. Despite this common claim, anyone who has worked in the field knows that designing effective machine learning systems is a tedious endeavor, and typically requires considerable experience with machine learning algorithms, expert knowledge of the problem domain, a...",0.297008,0.296677,832.0
839,-7959318068735027467,Auto-scaling scikit-learn with Spark,"Data scientists often spend hours or days tuning models to get the highest accuracy. This tuning typically involves running a large number of independent Machine Learning (ML) tasks coded in Python or R. Following some work presented at Spark Summit Europe 2015, we are excited to release Scikit-learn integration package for Spark that dramatically simplifies the life of data scientists using P...",0.249253,0.248900,229.0
1855,9208127165664287660,Approaching (Almost) Any Machine Learning Problem | Abhishek Thakur,"Abhishek Thakur , a Kaggle Grandmaster, originally published this post here on July 18th, 2016 and kindly gave us permission to cross-post on No Free Hunch An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses o...",0.212268,0.211898,2977.0
1853,8381798621267347902,Approaching (Almost) Any Machine Learning Problem,"An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over ...",0.208993,0.208622,2845.0
...,...,...,...,...,...,...
877,279771472506428952,5 Unique Features Of Google Compute Engine That No IaaS Provider Could Match,"Google Compute Engine (GCE), the infrastructure service of Google Cloud Platform, is a late entrant in the market. Amazon EC2 was announced in 2006 while Microsoft added VMs to Azure in 2012. Google announced the general availability of GCE only in late 2013. Source: Google Despite being the laggard in the IaaS [...]",0.000949,0.000480,1531.0
319,7707640607626518697,Linus Torvalds: The mind behind Linux,"Linus Torvalds transformed technology twice -- first with the Linux kernel, which helps power the Internet, and again with Git, the source code management system used by developers worldwide. In a rare interview with TED Curator Chris Anderson, Torvalds discusses with remarkable openness the personality traits that prompted his unique philosophy of work, engineering and life.",0.000670,0.000201,2740.0
451,5379671084978512851,Linus Torvalds: The mind behind Linux,"Linus Torvalds transformed technology twice -- first with the Linux kernel, which helps power the Internet, and again with Git, the source code management system used by developers worldwide. In a rare interview with TED Curator Chris Anderson, Torvalds discusses with remarkable openness the personality traits that prompted his unique philosophy of work, engineering and life.",0.000670,0.000201,2357.0
3114,-4132331404553626868,Gartner Reprint,"Gartner redesigned the Magic Quadrant for BI and analytics platforms in 2016, to reflect this more than decade-long shift. A year later, in 2017, there is significant evidence to suggest that the BI and analytics platform market's multiyear transition to modern agile business-led analytics is now mainstream.",0.000627,0.000157,830.0


<h3 style="background-color:#F18C8D; padding:10px">User Content Dataset</h3>

In [4]:
## Reading the dataset
user_content=pd.read_csv('data/consumer_transanctions.csv')
user_content.head()


Columns (4) have mixed types.Specify dtype option on import or set low_memory=False.



Unnamed: 0,event_timestamp,interaction_type,item_id,consumer_id,consumer_session_id,consumer_device_info,consumer_location,country
0,1465413032,content_watched,-3499919498720038879,-8845298781299428018,1264196770339959068,,,
1,1465412560,content_watched,8890720798209849691,-1032019229384696495,3621737643587579081,"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.24 Safari/537.36",NY,US
2,1465416190,content_watched,310515487419366995,-1130272294246983140,2631864456530402479,,,
3,1465413895,content_followed,310515487419366995,344280948527967603,-3167637573980064150,,,
4,1465412290,content_watched,-7820640624231356730,-445337111692715325,561148 1178424124714,,,


In [5]:
user_content[user_content.duplicated()].shape

(11, 8)

In [6]:
## shape of the user_content
user_content.shape

(72312, 8)

In [7]:
## checking the each columns information
user_content.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72312 entries, 0 to 72311
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   event_timestamp       72312 non-null  int64 
 1   interaction_type      72312 non-null  object
 2   item_id               72312 non-null  int64 
 3   consumer_id           72312 non-null  int64 
 4   consumer_session_id   72312 non-null  object
 5   consumer_device_info  56918 non-null  object
 6   consumer_location     56907 non-null  object
 7   country               56918 non-null  object
dtypes: int64(3), object(5)
memory usage: 4.4+ MB


- Consumer_sesssion_id datatype is not correct format
- Consumer_device_info,consumer_location,country has missing Value

In [8]:
user_content.drop(4,inplace=True)

In [9]:
## Filling the 'Nan' Values with unknown value
user_content['consumer_device_info']=user_content['consumer_device_info'].fillna('unknown')

In [10]:
## cleaning the consumer_device_info column and dervie a new columns as Device
device=[]
for i in user_content['consumer_device_info'].str.split(' '):
    if len(i)==1:
        device.append(i[0])
    else:
        if i[1]=='-':
            device.append(i[0])
        else:
            if i[1]=='(Macintosh;':
                device.append('Mac')
            elif i[1]=='(X11;':
                device.append('Linux')
            else:
                device.append('Window')
user_content['Device']=device            

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Information about the device used by the consumer</h3>

In [11]:
fig=px.bar(user_content['Device'].value_counts(),
       labels={'index':'Device Type','value':'count'},
        color=['unknown', 'Mac', 'Linux', 'Window','iOS','Android'])
fig.show()

- Most of the user uses Windows Operating System
- Least of the user uses Apple phones

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Top 10 Location of the consumer</h3>

In [12]:
user_content['consumer_location']=user_content['consumer_location'].fillna('unknown')

In [13]:
fig=px.bar(user_content['consumer_location'].value_counts().head(10),
       labels={'index':'location','value':'count'},
       color=['SP', 'unknown', 'MG', 'NY', 'TX', 'GA', 'RJ', 'NJ', '?', 'CA'])
fig.show()

- Most of the consumers reads articles from 'SP' as Location

In [14]:
user_content['country']=user_content['country'].fillna('unknown')

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Top 10 Country of the consumer</h3>

In [15]:
fig=px.bar(user_content['country'].value_counts().head(10),
       title='Top 10 Country of the consumer',
       labels={'index':'Country','value':'count'},
       color=['BR', 'unknown', 'US', 'KR', 'CA', 'JP', 'AU', 'GB', 'DE', 'IE'])
fig.show()

- Most of the users which reads the articles are from `Berlin` as Country

<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Creating a Rating Column</h3>

###  The data set provided to you does not contain user ratings that could be used to generate recommendations. For that you need to impute the rating values based on the feature ‘interaction type’ with the highest weightage to content_followed, followed by content_commented_on, content saved, content liked and content_watched

<div>
<img src="images/Rating.jpg" width="600"/>
</div>

In [16]:
d={'content_watched':1,'content_liked':2,'content_saved':3,'content_commented_on':4,'content_followed':5}
def rating(i):
    if i in d.keys():
        return d[i]
user_content['Rating']=user_content['interaction_type'].apply(rating)    
user_content.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,consumer_id,consumer_session_id,consumer_device_info,consumer_location,country,Device,Rating
0,1465413032,content_watched,-3499919498720038879,-8845298781299428018,1264196770339959068,unknown,unknown,unknown,unknown,1
1,1465412560,content_watched,8890720798209849691,-1032019229384696495,3621737643587579081,"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.24 Safari/537.36",NY,US,Mac,1
2,1465416190,content_watched,310515487419366995,-1130272294246983140,2631864456530402479,unknown,unknown,unknown,unknown,1
3,1465413895,content_followed,310515487419366995,344280948527967603,-3167637573980064150,unknown,unknown,unknown,unknown,5
5,1465413742,content_watched,310515487419366995,-8763398617720485024,1395789369402380392,"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36",MG,BR,Window,1


In [17]:
[user_content.duplicated()]

[0        False
 1        False
 2        False
 3        False
 5        False
          ...  
 72307    False
 72308    False
 72309    False
 72310    False
 72311    False
 Length: 72311, dtype: bool]

In [18]:
user_content.groupby(['consumer_id', 'item_id']).size()

consumer_id           item_id             
-9223121837663643404  -8949113594875411859    1
                      -8377626164558006982    1
                      -8208801367848627943    1
                      -8187220755213888616    1
                      -7423191370472335463    8
                                             ..
 9210530975708218054   8477804012624580461    4
                       8526042588044002101    1
                       8856169137131817223    1
                       8869347744613364434    1
                       9209886322932807692    1
Length: 40710, dtype: int64

In [19]:
### For Each Customer interacted with Item
users_interactions_count_df = user_content.groupby(['consumer_id', 'item_id']).size().groupby('consumer_id').size().reset_index()
users_interactions_count_df.head()

Unnamed: 0,consumer_id,0
0,-9223121837663643404,43
1,-9212075797126931087,5
2,-9207251133131336884,7
3,-9199575329909162940,11
4,-9196668942822132778,7


<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >For Each Customer interacted with Item</h3>

In [20]:
fig=px.histogram(data_frame=users_interactions_count_df,x=0,labels={'0':'Customer'})
fig.show()

## Recommender systems have a problem known as user `cold-start`, in which is hard do provide personalized recommendations for users with none or a very few number of consumed items, due to the lack of information to model their preferences.For this reason, we are keeping in the dataset only users with at leas 5 interactions.

In [21]:
## Filtering the customer which greater or equal interaction with item
users_with_enough_interactions_df=users_interactions_count_df[users_interactions_count_df[0]>=5]
users_with_enough_interactions_df.head()

Unnamed: 0,consumer_id,0
0,-9223121837663643404,43
1,-9212075797126931087,5
2,-9207251133131336884,7
3,-9199575329909162940,11
4,-9196668942822132778,7


In [22]:
## merging the two dataframe and getting only those customer_id which have alteast 5 interaction with the item
interactions_from_selected_users_df=user_content.merge(users_with_enough_interactions_df,
                                                       how='right',right_on='consumer_id',
                                                       left_on='consumer_id')
interactions_from_selected_users_df.head()

Unnamed: 0,event_timestamp,interaction_type,item_id,consumer_id,consumer_session_id,consumer_device_info,consumer_location,country,Device,Rating,0
0,1463138398,content_watched,7516228655554309785,-9223121837663643404,-4482197405545551645,"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",SP,BR,Window,1,43
1,1463656314,content_watched,3041906492387035176,-9223121837663643404,-7824685088995468735,"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36",SP,BR,Window,1,43
2,1464113091,content_watched,-3750879736572068916,-9223121837663643404,-2774275024909061125,"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36",SP,BR,Window,1,43
3,1462283851,content_watched,-730957269757756529,-9223121837663643404,2625340673871268120,"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36",SP,BR,Window,1,43
4,1462452127,content_watched,-8949113594875411859,-9223121837663643404,-3673331845456357462,"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",SP,BR,Window,1,43


In [23]:
print('Without Interaction ')
print(user_content.shape[0])
print('Interactions from users with at least 5 interactions')
print(interactions_from_selected_users_df.shape[0])

Without Interaction 
72311
Interactions from users with at least 5 interactions
69867


In [24]:

def smooth_user_preference(x):
    return math.log(1+x, 2)

## Users are allowed to view an article many times, and interact with them in different ways (eg. like or comment,watched,follow). Thus, to model the user interest on a given article, we aggregate all the interactions the user has performed in an item by a weighted sum of interaction type strength and apply a log transformation to smooth the distribution.

In [25]:
interactions_full_df=interactions_from_selected_users_df.groupby(['consumer_id', 'item_id'])['Rating'].sum().apply(smooth_user_preference).reset_index()
interactions_full_df

Unnamed: 0,consumer_id,item_id,Rating
0,-9223121837663643404,-8949113594875411859,1.000000
1,-9223121837663643404,-8377626164558006982,1.000000
2,-9223121837663643404,-8208801367848627943,1.000000
3,-9223121837663643404,-8187220755213888616,1.000000
4,-9223121837663643404,-7423191370472335463,3.169925
...,...,...,...
39101,9210530975708218054,8477804012624580461,3.584963
39102,9210530975708218054,8526042588044002101,1.000000
39103,9210530975708218054,8856169137131817223,1.000000
39104,9210530975708218054,8869347744613364434,1.000000


In [26]:
fig=px.histogram(interactions_full_df['Rating'])
fig.show()

<h1 style="background-color:#F18C8D; padding:10px"> User-Based Recommendations</h1>

In [27]:
## getting the important columns
consumer_interaction=interactions_full_df[['consumer_id','item_id','Rating']]
consumer_interaction

Unnamed: 0,consumer_id,item_id,Rating
0,-9223121837663643404,-8949113594875411859,1.000000
1,-9223121837663643404,-8377626164558006982,1.000000
2,-9223121837663643404,-8208801367848627943,1.000000
3,-9223121837663643404,-8187220755213888616,1.000000
4,-9223121837663643404,-7423191370472335463,3.169925
...,...,...,...
39101,9210530975708218054,8477804012624580461,3.584963
39102,9210530975708218054,8526042588044002101,1.000000
39103,9210530975708218054,8856169137131817223,1.000000
39104,9210530975708218054,8869347744613364434,1.000000


In [28]:
##shape of dataset
consumer_interaction.shape

(39106, 3)

In [29]:
consumer_interaction.describe()

Unnamed: 0,consumer_id,item_id,Rating
count,39106.0,39106.0,39106.0
mean,-3.550552e+16,-4.828865e+16,1.480363
std,5.107185e+18,5.373235e+18,0.72133
min,-9.223122e+18,-9.222795e+18,1.0
25%,-3.954277e+18,-4.754224e+18,1.0
50%,-7.450732e+16,-5924996000000000.0,1.0
75%,3.829785e+18,4.562045e+18,1.584963
max,9.210531e+18,9.222265e+18,8.071462


In [30]:
## number of user
n_users=len(set(consumer_interaction['consumer_id']))
print(n_users)
## number of items
n_items=len(set(consumer_interaction['item_id'].unique()))
print(n_items)

1140
2984


## Creatings New item_Id from existing item_id

In [31]:
## using pandas rank function to create Ids
consumer_interaction['item_Id']=consumer_interaction['item_id'].rank(method='dense').astype('int')
consumer_interaction

Unnamed: 0,consumer_id,item_id,Rating,item_Id
0,-9223121837663643404,-8949113594875411859,1.000000,66
1,-9223121837663643404,-8377626164558006982,1.000000,161
2,-9223121837663643404,-8208801367848627943,1.000000,189
3,-9223121837663643404,-8187220755213888616,1.000000,197
4,-9223121837663643404,-7423191370472335463,3.169925,314
...,...,...,...,...
39101,9210530975708218054,8477804012624580461,3.584963,2859
39102,9210530975708218054,8526042588044002101,1.000000,2865
39103,9210530975708218054,8856169137131817223,1.000000,2920
39104,9210530975708218054,8869347744613364434,1.000000,2923


## Creatings New consumer_Id from existing consumer_id

In [32]:
consumer_interaction['consumer_Id']=consumer_interaction['consumer_id'].rank(method='dense').astype('int')
consumer_interaction

Unnamed: 0,consumer_id,item_id,Rating,item_Id,consumer_Id
0,-9223121837663643404,-8949113594875411859,1.000000,66,1
1,-9223121837663643404,-8377626164558006982,1.000000,161,1
2,-9223121837663643404,-8208801367848627943,1.000000,189,1
3,-9223121837663643404,-8187220755213888616,1.000000,197,1
4,-9223121837663643404,-7423191370472335463,3.169925,314,1
...,...,...,...,...,...
39101,9210530975708218054,8477804012624580461,3.584963,2859,1140
39102,9210530975708218054,8526042588044002101,1.000000,2865,1140
39103,9210530975708218054,8856169137131817223,1.000000,2920,1140
39104,9210530975708218054,8869347744613364434,1.000000,2923,1140


In [33]:
## number unique article in platform
print(consumer_interaction['item_id'].nunique())
print(consumer_interaction['item_Id'].nunique())

2984
2984


In [34]:
## number unique customer in platform
print(consumer_interaction['consumer_id'].nunique())
print(consumer_interaction['consumer_Id'].nunique())

1140
1140


In [35]:
## creating new datafram with new item and new customer id
consumer=consumer_interaction[['consumer_Id','item_Id','Rating']]
consumer.head()

Unnamed: 0,consumer_Id,item_Id,Rating
0,1,66,1.0
1,1,161,1.0
2,1,189,1.0
3,1,197,1.0
4,1,314,3.169925


In [36]:
## creating the dictionary which contains both ids i.e old and new ids
df=consumer_interaction[['item_id','item_Id']]
mapping_to_ids=dict(zip(df.item_id, df.item_Id))
mapping_to_ids

{-8949113594875411859: 66,
 -8377626164558006982: 161,
 -8208801367848627943: 189,
 -8187220755213888616: 197,
 -7423191370472335463: 314,
 -7331393944609614247: 328,
 -6872546942144599345: 386,
 -6728844082024523434: 417,
 -6590819806697898649: 443,
 -6558712014192834002: 451,
 -6545872007932025533: 453,
 -6484638837208285334: 465,
 -5781461435447152359: 570,
 -5002383425685129595: 702,
 -4541461982704074404: 769,
 -4233177915193302509: 813,
 -4205346868684833897: 819,
 -3912939678517879962: 864,
 -3750879736572068916: 890,
 -2402288292108892893: 1087,
 -730957269757756529: 1358,
 -559964548932224920: 1390,
 -447851796385928420: 1411,
 834896074125772354: 1617,
 921770761777842242: 1631,
 943818026930898372: 1632,
 1436883058900979473: 1706,
 1469580151036142903: 1714,
 3041906492387035176: 1963,
 3180828616327439381: 1991,
 3367026768872537336: 2021,
 4419562057180692966: 2187,
 4563606877148407012: 2206,
 5087084654882097891: 2302,
 5211673327552264703: 2322,
 5293701842202310496: 2

## Dividing the dataset into train and test

In [82]:
from sklearn.model_selection import train_test_split

In [83]:
## spilting the data into train,test
train,test=train_test_split(consumer,train_size=.80,random_state=42)

In [84]:
## shape of train set
train.shape

(31284, 3)

In [85]:
## shape of test set
test.shape

(7822, 3)

In [86]:
train.consumer_Id.nunique()

1140

In [87]:
test.consumer_Id.nunique()

1039

## Create empty data matrix: user*Item

In [88]:
## creating Data_matrix
data_matrix=np.zeros((n_users,n_items))
data_matrix

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [89]:
data_matrix.shape

(1140, 2984)

## Fill User*Item Train matrix with rating values

In [90]:
## filling the value in data_matrix
for line in train.itertuples():
    data_matrix[line[1]-1,line[2]-1]=line[3]

In [91]:
data_matrix[2][data_matrix[2]>0]

array([2.        , 2.        , 2.        , 2.80735492, 2.32192809])

In [92]:
train[train['consumer_Id']==3]

Unnamed: 0,consumer_Id,item_Id,Rating
49,3,99,2.0
54,3,2713,2.321928
50,3,785,2.0
51,3,847,2.807355
48,3,2,2.0


In [93]:
data_matrix[2][846]

2.807354922057604

## Create Data Matrix with Test Data

In [94]:
data_matrix_test=np.zeros((n_users,n_items))
data_matrix_test

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [95]:
data_matrix_test.shape

(1140, 2984)

In [96]:
for line in test.itertuples():
    data_matrix_test[line[1]-1,line[2]-1]=line[3]

In [97]:
data_matrix_test

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [98]:
test

Unnamed: 0,consumer_Id,item_Id,Rating
29053,789,489,2.807355
32165,872,1910,4.087463
38802,1128,597,1.000000
8824,316,2417,1.584963
14183,451,67,1.000000
...,...,...,...
8225,286,2658,3.169925
33414,918,2873,1.584963
16071,480,195,2.584963
33567,922,99,2.321928


In [99]:
test[test['consumer_Id']==789]

Unnamed: 0,consumer_Id,item_Id,Rating
29053,789,489,2.807355
29129,789,2525,1.0
29103,789,1739,1.0
29091,789,1492,2.321928
29115,789,2129,1.584963
29067,789,876,1.0
29099,789,1625,1.0
29078,789,1247,1.0
29035,789,139,1.0
29039,789,181,1.0


In [100]:
data_matrix_test[788][data_matrix_test[788]>0]

array([1.        , 1.        , 1.        , 2.32192809, 1.        ,
       2.80735492, 1.        , 1.        , 1.        , 1.        ,
       1.        , 2.32192809, 1.        , 1.        , 1.        ,
       1.        , 1.5849625 , 1.5849625 , 1.        , 2.        ,
       1.5849625 , 1.        , 1.        ])

In [101]:
data_matrix_test[788][2218]

1.5849625007211563

# Pairwise Distance

In [102]:
from sklearn.metrics.pairwise import pairwise_distances 

In [103]:
## calucating the similarity between the users
user_similarity = 1- pairwise_distances(data_matrix, metric='cosine')

In [104]:
user_similarity

array([[1.        , 0.        , 0.        , ..., 0.0250279 , 0.        ,
        0.04564174],
       [0.        , 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.19981309, 0.19531254,
        0.        ],
       ...,
       [0.0250279 , 0.        , 0.19981309, ..., 1.        , 0.12514376,
        0.        ],
       [0.        , 0.        , 0.19531254, ..., 0.12514376, 1.        ,
        0.        ],
       [0.04564174, 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [105]:
user_similarity.shape

(1140, 1140)

In [106]:
data_matrix.shape

(1140, 2984)

## Item pairwise similarity distance

In [107]:
## calucating the similarity between the items
item_similarity = 1-pairwise_distances(data_matrix.T, metric='cosine')
item_similarity

array([[1.        , 0.        , 0.        , ..., 0.32824637, 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.10840762],
       ...,
       [0.32824637, 0.        , 0.        , ..., 1.        , 0.06477147,
        0.09716568],
       [0.        , 0.        , 0.        , ..., 0.06477147, 1.        ,
        0.06446985],
       [0.        , 0.        , 0.10840762, ..., 0.09716568, 0.06446985,
        1.        ]])

In [108]:
item_similarity.shape

(2984, 2984)

## Dot product of Data Matrix with similarity matrix

In [109]:
item_prediction = np.dot(user_similarity, data_matrix_test)

In [110]:
## getting prediction
item_prediction

array([[0.        , 0.        , 0.4362074 , ..., 0.        , 0.14240918,
        0.        ],
       [0.        , 0.92294969, 0.09747097, ..., 0.        , 0.03447794,
        0.11280333],
       [0.        , 1.66853733, 0.09771126, ..., 0.        , 0.04360585,
        0.        ],
       ...,
       [0.        , 0.51687326, 0.12257919, ..., 0.        , 0.02793983,
        0.        ],
       [0.        , 0.64433818, 0.82274264, ..., 0.        , 0.09529441,
        0.05560664],
       [0.        , 0.01959323, 0.73797645, ..., 0.        , 0.19321   ,
        0.01971601]])

In [111]:
item_prediction.shape

(1140, 2984)

In [112]:
data_matrix_test[data_matrix_test.nonzero()]

array([3.169925  , 1.        , 1.        , ..., 1.        , 2.32192809,
       2.        ])

In [113]:
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error

## Evalutation metric

# MAE

In [114]:
def mae(prediction,groundtruth):
    prediction=prediction[groundtruth.nonzero()]
    groundtruth=groundtruth[data_matrix_test.nonzero()]
    return mean_absolute_error(groundtruth,prediction)

In [115]:
mae(item_prediction,data_matrix_test)

0.5464233881933521

# MSE

In [116]:
def mse(prediction,groundtruth):
    prediction=prediction[groundtruth.nonzero()]
    groundtruth=groundtruth[data_matrix_test.nonzero()]
    return mean_squared_error(groundtruth,prediction)

In [117]:
mse(item_prediction,data_matrix_test)

0.6802487692241399

<h3 style="background-color:#F18C8D; padding:10px"> Get Recommended Item For Any User </h3>


In [121]:
def Recommendation_user_based(user_id,n):
    prediction_df = pd.DataFrame(item_prediction)
    recommended_item_df = pd.DataFrame(prediction_df.iloc[user_id].sort_values(ascending=False))
    recommended_item_df.reset_index(inplace=True)
    recommended_item_df.columns = ['item_id', 'score']
    full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
    full_data[['item_Id','consumer_Id']]=full_data[['item_Id','consumer_Id']].astype('int')
    final_recommendation=pd.merge(recommended_item_df,full_data,left_on='item_id',right_on='item_Id')
    final_recommendation=final_recommendation[['item_Id','title','text_description','score']]
    final_recommendation['collaborative_score_normalized'] = (final_recommendation['score']-min(final_recommendation['score']))/(max(final_recommendation['score'])-min(final_recommendation['score']))
    final_recommendation=final_recommendation[~final_recommendation.duplicated()]
    return final_recommendation

In [122]:
Recommendation_user_based(170,n_items)

Unnamed: 0,item_Id,title,text_description,score,collaborative_score_normalized
0,896,Platforms are the real powerhouses in Silicon Valley's business landscape,"One of the most important lessons that Silicon Valley learned, that gives it a strategic advantage, is to think bigger than products and business models: it builds platforms. The fastest growing and most disruptive companies in history - Google, Amazon, Uber, AirBnb, and eBay-aren't focused on selling products, they are building platforms. It goes beyond tech. Companies such as Walmart, Nike, ...",1.990190,1.000000
1,2901,Behind Facebook Messenger's plan to be an app platform,"The question is: Why? Why would you as a user want all this integration? Why not just download Uber and request a car that way? Meanwhile, why would a developer or a business want to bake their services into Messenger? Wouldn't they rather users get their apps instead? Lastly, why does Facebook want to add all of these features anyway, and potentially weigh it down with so many added complicat...",1.835141,0.922094
11,1796,[Free Course] Running Valuable Design Sprints - University of Virginia | Coursera,"About this course: Typically, clients and managers don't want to pay for design (or strategy) but want to rush toward results. Design sprints allow you to meet clients' desire for solutions--to develop sooner--and still adjust based on feedback. But not all sprints are alike. In this course, you'll learn how to run situation-appropriate sprints, whether it's testing user motivation, interface ...",1.612263,0.810105
16,405,Dial in with ease using the latest Google Calendar app for Android,Launch Details Release track: Launching to both Rapid release and Scheduled release Rollout pace: Full rollout (1-3 days for feature visibility) Impact: All end users Action: Change management suggested/FYI More Information Google Play Note: all launches are applicable to all Google Apps editions unless otherwise noted Launch release calendar Launch detail categories Get these product update a...,1.430228,0.718639
27,2818,Fooling The Machine,"Two groups, one at Berkeley University and another at Georgetown University, have successfully developed algorithms that can issue speech commands for digital personal assistants, like Siri and Google Now, in the form of bursts of sound unrecognizable to human ears. To a human, these commands just sound like random white noise, but they could be used to tell a voice-activated assistant like Am...",1.298524,0.652463
...,...,...,...,...,...
25275,1477,10 Free Screen Readers For Blind Or Visually Impaired Users - Usability Geek,"It is not difficult for a sighted person to imagine how being blind or visually impaired could make using a computer difficult. Just close your eyes and you will instantly experience that even processing text is impossible - or impossible without additional software at least. Now a range of software is available that can help to make using a computer an easier, more enjoyable and more producti...",0.000000,0.000000
25285,1493,"9 trends you need to watch at CES 2017, from 'AI' assistants to 'AR' devices","Great things happen rarely: Santa drops by only once a year, we eat stuffing on just a single day in November ... heck, we only have the opportunity to make America great again once every four years. And likewise, the gadget paradise CES comes to Las Vegas every January for just four short days. Fortunately, the stuff unveiled there continues to dazzle and delight us for the rest of the year. ...",0.000000,0.000000
25295,1488,Defining The Sharing Economy: What Is Collaborative Consumption--And What Isn't?,"This year, the term ""sharing economy"" was introduced into the Oxford English Dictionary, proof-not that we need it-that the sharing economy as an idea is here to stay. But what's happened along the way is a fracturing of the understanding of what the sharing economy actually is, and what it is not. The picture is growing increasingly confusing, and it's a problem. Many terms are being used to ...",0.000000,0.000000
25296,1480,Java 8 Streams - A Deeper Approach About Performance Improvement,"Introduction Java 8 was released almost three years ago, but it still lacks articles with deeper approach through Stream API. There are some good articles about it, but not a single one showing a real world example and comparing its performance against Java 7 style of coding. This article assumes that the reader already has some knowledge about Stream API, so many simple code will not be expla...",0.000000,0.000000


# Precision@k for Item Based

In [123]:
id=89

In [124]:
relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==id]['item_Id'].tolist()
relevant_set

[257, 795, 1306, 1358, 1453, 1631, 1639]

In [125]:
pred_set=Recommendation_user_based(id,n_items)['item_Id'][:10].tolist()
pred_set

[1689, 2818, 1618, 405, 1910, 846, 1637, 1692, 1357, 1452]

In [126]:
set(relevant_set) & set(pred_set)

set()

In [127]:
len(set(relevant_set) & set(pred_set))

0

In [128]:
precession_at_10=len(set(relevant_set) & set(pred_set))/10
precession_at_10

0.0

# Global Precision

In [127]:
precession=[]
gobal_precession=0
for user_index in range(0,n_users):
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=Recommendation_user_based(user_index,n_items)['item_Id'][:20].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/20
    precession.append(precession_at_10)
    gobal_precession=gobal_precession+precession_at_10
    

In [128]:
gobal_precession

13.600000000000023

<h1 style="background-color:#F18C8D; padding:10px"> Alternative Least Square</h1>

<div>
<img src="images/als.jpeg" width="600"/>
</div>
<p>Alternating Least Square (ALS) is also a matrix factorization algorithm and it runs itself in a parallel fashion. ALS is built for a larges-scale collaborative filtering problems. ALS is doing a pretty good job at solving scalability and sparseness of the Ratings data, and it’s simple and scales well to very large datasets.</p>

In [129]:
consumer_interaction.head()

Unnamed: 0,consumer_id,item_id,Rating,item_Id,consumer_Id
0,-9223121837663643404,-8949113594875411859,1.0,66,1
1,-9223121837663643404,-8377626164558006982,1.0,161,1
2,-9223121837663643404,-8208801367848627943,1.0,189,1
3,-9223121837663643404,-8187220755213888616,1.0,197,1
4,-9223121837663643404,-7423191370472335463,3.169925,314,1


In [130]:
consumer_interaction.shape[0]

39106

## Create Sparse User Item Matrix

In [131]:
from scipy.sparse import csr_matrix

In [132]:
alpha=80

In [133]:
## creating csr mtrix
sparse_user_item=csr_matrix(([alpha]*consumer_interaction.shape[0],(consumer_interaction['consumer_Id'],consumer_interaction['item_Id'])))

In [134]:
sparse_user_item

<1141x2985 sparse matrix of type '<class 'numpy.intc'>'
	with 39106 stored elements in Compressed Sparse Row format>

## Convert To Array

In [135]:
csr_user_array=sparse_user_item.toarray()
csr_user_array

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int32)

In [136]:
csr_user_array.shape

(1141, 2985)

In [137]:
csr_user_array[20]

array([ 0,  0,  0, ...,  0, 80,  0], dtype=int32)

### csr matrix only stores where value is 80 [non-zero]. (Compressed Sparse Row)

In [138]:
print(sparse_user_item)

  (1, 66)	80
  (1, 161)	80
  (1, 189)	80
  (1, 197)	80
  (1, 314)	80
  (1, 328)	80
  (1, 386)	80
  (1, 417)	80
  (1, 443)	80
  (1, 451)	80
  (1, 453)	80
  (1, 465)	80
  (1, 570)	80
  (1, 702)	80
  (1, 769)	80
  (1, 813)	80
  (1, 819)	80
  (1, 864)	80
  (1, 890)	80
  (1, 1087)	80
  (1, 1358)	80
  (1, 1390)	80
  (1, 1411)	80
  (1, 1617)	80
  (1, 1631)	80
  :	:
  (1140, 1975)	80
  (1140, 2021)	80
  (1140, 2025)	80
  (1140, 2069)	80
  (1140, 2078)	80
  (1140, 2188)	80
  (1140, 2251)	80
  (1140, 2332)	80
  (1140, 2347)	80
  (1140, 2449)	80
  (1140, 2468)	80
  (1140, 2536)	80
  (1140, 2606)	80
  (1140, 2674)	80
  (1140, 2694)	80
  (1140, 2717)	80
  (1140, 2770)	80
  (1140, 2807)	80
  (1140, 2812)	80
  (1140, 2843)	80
  (1140, 2859)	80
  (1140, 2865)	80
  (1140, 2920)	80
  (1140, 2923)	80
  (1140, 2979)	80


## Create Item-User Sparse Matrix

In [139]:
sparse_item_user=sparse_user_item.T.tocsr()
sparse_item_user

<2985x1141 sparse matrix of type '<class 'numpy.intc'>'
	with 39106 stored elements in Compressed Sparse Row format>

In [140]:
## converting into array
csr_item_array=sparse_item_user.toarray()

In [141]:
csr_item_array.shape

(2985, 1141)

In [142]:
print(sparse_item_user)

  (1, 146)	80
  (1, 208)	80
  (1, 497)	80
  (1, 669)	80
  (1, 1120)	80
  (2, 3)	80
  (2, 134)	80
  (2, 349)	80
  (2, 556)	80
  (2, 673)	80
  (2, 723)	80
  (2, 891)	80
  (2, 911)	80
  (3, 29)	80
  (3, 101)	80
  (3, 128)	80
  (3, 209)	80
  (3, 390)	80
  (3, 400)	80
  (3, 491)	80
  (3, 538)	80
  (3, 554)	80
  (3, 591)	80
  (3, 691)	80
  (3, 732)	80
  :	:
  (2983, 477)	80
  (2983, 540)	80
  (2983, 574)	80
  (2983, 632)	80
  (2983, 667)	80
  (2983, 715)	80
  (2983, 734)	80
  (2983, 744)	80
  (2983, 777)	80
  (2983, 788)	80
  (2983, 794)	80
  (2983, 821)	80
  (2983, 836)	80
  (2983, 837)	80
  (2983, 908)	80
  (2983, 944)	80
  (2983, 1032)	80
  (2983, 1095)	80
  (2983, 1123)	80
  (2983, 1135)	80
  (2984, 108)	80
  (2984, 363)	80
  (2984, 591)	80
  (2984, 724)	80
  (2984, 944)	80


## Create Test,Train Data

In [143]:
from implicit.evaluation import train_test_split

In [144]:
## spliting the data inti train test
train,test=train_test_split(sparse_user_item,
                train_percentage=0.8,
                random_state=42)

In [145]:
train

<1141x2985 sparse matrix of type '<class 'numpy.intc'>'
	with 31327 stored elements in Compressed Sparse Row format>

In [146]:
test

<1141x2985 sparse matrix of type '<class 'numpy.intc'>'
	with 7779 stored elements in Compressed Sparse Row format>

## Building ALS Model

In [147]:
import implicit

In [148]:
model = implicit.als.AlternatingLeastSquares(factors=100, regularization=0.1, iterations=20, calculate_training_loss=False)


Intel MKL BLAS detected. Its highly recommend to set the environment variable 'export MKL_NUM_THREADS=1' to disable its internal multithreading



In [149]:
model

<implicit.cpu.als.AlternatingLeastSquares at 0x2265cd4ccd0>

In [150]:
train

<1141x2985 sparse matrix of type '<class 'numpy.intc'>'
	with 31327 stored elements in Compressed Sparse Row format>

In [151]:
model.fit(train)


  0%|          | 0/20 [00:00<?, ?it/s]

In [152]:
model

<implicit.cpu.als.AlternatingLeastSquares at 0x2265cd4ccd0>

In [153]:
print(sparse_user_item)

  (1, 66)	80
  (1, 161)	80
  (1, 189)	80
  (1, 197)	80
  (1, 314)	80
  (1, 328)	80
  (1, 386)	80
  (1, 417)	80
  (1, 443)	80
  (1, 451)	80
  (1, 453)	80
  (1, 465)	80
  (1, 570)	80
  (1, 702)	80
  (1, 769)	80
  (1, 813)	80
  (1, 819)	80
  (1, 864)	80
  (1, 890)	80
  (1, 1087)	80
  (1, 1358)	80
  (1, 1390)	80
  (1, 1411)	80
  (1, 1617)	80
  (1, 1631)	80
  :	:
  (1140, 1975)	80
  (1140, 2021)	80
  (1140, 2025)	80
  (1140, 2069)	80
  (1140, 2078)	80
  (1140, 2188)	80
  (1140, 2251)	80
  (1140, 2332)	80
  (1140, 2347)	80
  (1140, 2449)	80
  (1140, 2468)	80
  (1140, 2536)	80
  (1140, 2606)	80
  (1140, 2674)	80
  (1140, 2694)	80
  (1140, 2717)	80
  (1140, 2770)	80
  (1140, 2807)	80
  (1140, 2812)	80
  (1140, 2843)	80
  (1140, 2859)	80
  (1140, 2865)	80
  (1140, 2920)	80
  (1140, 2923)	80
  (1140, 2979)	80


In [154]:
model

<implicit.cpu.als.AlternatingLeastSquares at 0x2265cd4ccd0>

In [155]:
print(sparse_user_item)

  (1, 66)	80
  (1, 161)	80
  (1, 189)	80
  (1, 197)	80
  (1, 314)	80
  (1, 328)	80
  (1, 386)	80
  (1, 417)	80
  (1, 443)	80
  (1, 451)	80
  (1, 453)	80
  (1, 465)	80
  (1, 570)	80
  (1, 702)	80
  (1, 769)	80
  (1, 813)	80
  (1, 819)	80
  (1, 864)	80
  (1, 890)	80
  (1, 1087)	80
  (1, 1358)	80
  (1, 1390)	80
  (1, 1411)	80
  (1, 1617)	80
  (1, 1631)	80
  :	:
  (1140, 1975)	80
  (1140, 2021)	80
  (1140, 2025)	80
  (1140, 2069)	80
  (1140, 2078)	80
  (1140, 2188)	80
  (1140, 2251)	80
  (1140, 2332)	80
  (1140, 2347)	80
  (1140, 2449)	80
  (1140, 2468)	80
  (1140, 2536)	80
  (1140, 2606)	80
  (1140, 2674)	80
  (1140, 2694)	80
  (1140, 2717)	80
  (1140, 2770)	80
  (1140, 2807)	80
  (1140, 2812)	80
  (1140, 2843)	80
  (1140, 2859)	80
  (1140, 2865)	80
  (1140, 2920)	80
  (1140, 2923)	80
  (1140, 2979)	80


In [156]:
sparse_user_item.shape

(1141, 2985)

<div>
<img src="images/als.png" width="600"/>
</div>

<h1 style="background-color:#F18C8D; padding:10px"> Recommmendation Using Als</h1>

In [157]:
def Recommmendation_Als(user_id,top_recommendation):
    n_items=2984
    ## Calculates the most similar articles for a userid or array of userids
    similar = model.recommend(user_id-1,sparse_user_item[user_id-1], N=n_items)
    ## Converting Into Dataframe and Transposing
    similar_df = pd.DataFrame(similar).transpose()
    ## Renaming the columns
    similar_df.columns=['Article_id', 'score']
    full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
    ## combinning the Dataset
    merged_similar = pd.merge(similar_df, full_data, how='left', left_on='Article_id',right_on='item_Id')[['Article_id','title','text_description','score']]
    ## normalizing the Scores
    merged_similar['Als_score_normalized'] = (merged_similar['score']-min(merged_similar['score']))/(max(merged_similar['score'])-min(merged_similar['score']))
    ## drop the duplicates rows
    merged_similar=merged_similar.drop_duplicates().dropna()
    return merged_similar[:top_recommendation+1]
    


In [158]:
Recommmendation_Als(170,10)

Unnamed: 0,Article_id,title,text_description,score,Als_score_normalized
0,298.0,My favorite people and resources to learn Android programming from,"���� Twitter I've really enjoyed following these Android community members on Twitter. These folks aren't just knowledgeable teachers and key open-source contributors. They're also positive-minded, hopeful, and friendly. Those qualities are just as important to me as being an expert in the area. Chiu-Ki Chan - A devoted learner and teacher, Chiu-Ki does it all. She interviews folks , runs 360...",0.338424,1.0
60,545.0,Microservices: Real Architectural Patterns,"Microservices: Real Architectural Patterns A dissection of our favorite folk architecture Introduction I'm fascinated by the lore and mystery behind microservices. As a concept, microservices feels like one of the most interesting folk architectures of the modern era. It's useful enough to be applied widely across different usage patterns and also vague enough to mean many different things. I'...",0.330099,1.0
83,1279.0,PHP 7 Support in PhpStorm 2016.2,"PHP 7 is gaining traction, and we've been working hard to try and make PhpStorm 2016.2 the best tool around for working with PHP 7. This is something I'm really excited about. One of my biggest problems with PhpStorm's PHP 7 support was the lack of inspections around scalar type hints and return type hints. HHVM/Hack ships with a static analysis tool that makes working with its type hint syste...",0.310272,1.0
93,396.0,Best of CES 2017: The Show's 10 Coolest Designs,"For a design lover, CES can be a-how should we put this-challenging experience. Several hundred vendors from around the world fill not one, but two convention centers on the Las Vegas Strip with every gadget you can imagine-and a few you can't. (Like, say, a dental floss dispenser with a microchip. Seriously.) The sheer quantity of stuff on show can make the search for compelling designs feel ...",0.300053,1.0
108,1885.0,10 Modern Software Over-Engineering Mistakes,"10 Modern Software Over-Engineering Mistakes Few things are guaranteed to increase all the time: Distance between stars, Entropy in the visible universe, and Fucking business requirements . Many articles say Dont over-engineer but don't say why or how. Here are 10 clear examples. Important Note: Some points below like ""Don't abuse generics"" are being misunderstood as ""Don't use generics at all...",0.29172,1.0
223,1910.0,You don't talk about refactoring club,The first rule of Fight Club is: You do not talk about Fight Club. The second rule of Fight Club is: You do not talk about Fight Club. I guess the same could be said about refactoring. That would first requires to define what I mean by refactoring in the context of this post : Refactoring is any action on the codebase that improves quality. Which in turn requires to define what is quality. Eve...,0.270455,1.0
278,847.0,Former Google career coach shares a visual trick for figuring out what to do with your life,"If you want 2017 to be an exciting year, design it that way. That's the advice of former Google career coach and job strategist Jenny Blake , who has helped more than a thousand people improve their work lives. She recommends creating a ""mind map,"" a visual diagram of your interests and goals. Drawing one doesn't take long and could help you figure out the next project, hobby or career change ...",0.262629,1.0
488,2443.0,Visual Thinking and Learning 3.0 working together at Walmart.com | Happy Melly,"As usual, three Scrum Masters and I were drinking a coffee and talking about our daily grind at Walmart.com , when a common problem emerged during our dialogue: How to improve our grooming meetings? In my opinion the grooming meeting is the most important factor in a Sprint. Do it well and your team will know exactly what they have to do, with a low level of uncertainty. However, if the meetin...",0.248183,1.0
507,2256.0,"Netflix says Geography, Age, and Gender are ""Garbage"" for Predicting Taste","Netflix rolled out to 130 new countries earlier this year, and you might expect that it began carefully tailoring its offerings for each of them, or at least for various regions. But as a new Wired feature reveals, that couldn't be further from the truth-Netflix uses one predictive algorithm worldwide, and it treats demographic data as almost irrelevant. Get Data Sheet , Fortune 's technology ...",0.236153,1.0
537,2983.0,Angular 2,"Welcome to the Angular 2 Style Guide Purpose If you are looking for an opinionated style guide for syntax, conventions, and structuring Angular applications, then step right in. The purpose of this style guide is to provide guidance on building Angular applications by showing the conventions we use and, more importantly, why we choose them. Style Vocabulary Each guideline describes either a go...",0.234718,1.0


# Global Precision for ALS

In [162]:
precession_als=[]
gobal_precession_als=0
for user_index in range(1,n_users):
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=Recommmendation_Als(user_index,10)['Article_id'][:10].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/10
    precession_als.append(precession_at_10)
    gobal_precession_als=gobal_precession_als+precession_at_10
    

In [164]:
gobal_precession_als

37.100000000000136

# Precision@k

In [163]:
precession_als

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.5,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.1,
 0.1,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.1,
 0.1,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.1,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.6,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.1,
 0.1,
 0.0

<div>
<img src="images/hybrid.png" width="600"/>
</div>

<h1 style="background-color:#F18C8D; padding:10px"> Hybrid Recommendation System using ALS+Item-Based </h1>

In [159]:
def Hybird_model_als_item_based(user_id,top_top_recommendation):
    n_items=2984
    
    ## Calling the Recommedation_user_based function
    item_recommendation=Recommendation_user_based(user_id,n_items)
    ## caling the Als based Function
    als_recommendation=Recommmendation_Als(user_id,n_items)
    ## mergeging the dataset 
    als_item_based=pd.merge(item_recommendation,als_recommendation,left_on='item_Id',right_on='Article_id')
    
   ## Calculating the final score as taking average of two.
    als_item_based['Final_Score']=(als_item_based['collaborative_score_normalized']+als_item_based['Als_score_normalized'])/2
    ## Selecting the columns
    als_item_based=als_item_based[['Article_id','title_x','text_description_x','Final_Score']]
   
    ## renaming the columns 
    als_item_based.rename(columns={'title_x':'Title','text_description_x':'Text Description'},inplace=True)
#     ## Getting the full data
#     full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
#     ## For a particular user you are recommending the articles we are see which are articles are seen or read,watched  by him 
#     seen_articles_by_user=list(full_data[full_data['consumer_Id']==user_id]['item_Id'])
#     ## Removing the articles which are already seen or read by th user
#     als_item_based=als_item_based[~(als_item_based['Article_id'].isin(seen_articles_by_user))]
#      ## sorting the value by final Score
    als_item_based.sort_values(by='Final_Score',ascending=False)
    return als_item_based[:top_top_recommendation+1]

In [166]:
Hybird_model_als_item_based(170,10)

Unnamed: 0,Article_id,Title,Text Description,Final_Score
0,896.0,Platforms are the real powerhouses in Silicon Valley's business landscape,"One of the most important lessons that Silicon Valley learned, that gives it a strategic advantage, is to think bigger than products and business models: it builds platforms. The fastest growing and most disruptive companies in history - Google, Amazon, Uber, AirBnb, and eBay-aren't focused on selling products, they are building platforms. It goes beyond tech. Companies such as Walmart, Nike, ...",1.0
1,2901.0,Behind Facebook Messenger's plan to be an app platform,"The question is: Why? Why would you as a user want all this integration? Why not just download Uber and request a car that way? Meanwhile, why would a developer or a business want to bake their services into Messenger? Wouldn't they rather users get their apps instead? Lastly, why does Facebook want to add all of these features anyway, and potentially weigh it down with so many added complicat...",0.961047
2,1796.0,[Free Course] Running Valuable Design Sprints - University of Virginia | Coursera,"About this course: Typically, clients and managers don't want to pay for design (or strategy) but want to rush toward results. Design sprints allow you to meet clients' desire for solutions--to develop sooner--and still adjust based on feedback. But not all sprints are alike. In this course, you'll learn how to run situation-appropriate sprints, whether it's testing user motivation, interface ...",0.905053
3,405.0,Dial in with ease using the latest Google Calendar app for Android,Launch Details Release track: Launching to both Rapid release and Scheduled release Rollout pace: Full rollout (1-3 days for feature visibility) Impact: All end users Action: Change management suggested/FYI More Information Google Play Note: all launches are applicable to all Google Apps editions unless otherwise noted Launch release calendar Launch detail categories Get these product update a...,0.85932
4,2818.0,Fooling The Machine,"Two groups, one at Berkeley University and another at Georgetown University, have successfully developed algorithms that can issue speech commands for digital personal assistants, like Siri and Google Now, in the form of bursts of sound unrecognizable to human ears. To a human, these commands just sound like random white noise, but they could be used to tell a voice-activated assistant like Am...",0.826231
5,846.0,Blaise Agüera y Arcas: How computers are learning to be creative,"You have JavaScript disabled We're on the edge of a new frontier in art and creativity - and it's not human. Blaise Agüera y Arcas, principal scientist at Google, works with deep neural networks for machine perception and distributed learning. In this captivating demo, he shows how neural nets trained to recognize images can be run in reverse, to generate them. The results: spectacular, halluc...",0.807004
6,2516.0,14 Cool Drupal 8 modules for site builders | August 2016,Here's what struck me last month about Drupal modules. This month I have chosen to focus on Drupal 8 as this is slowly becoming the go-to version - partly because many required modules have been migrated and grown up. 1. Require Login Make sure all your visitors on your Drupal website are obligated to log in. Handy for a Drupal intranet or social communities. It is possible to exclude certain ...,0.806585
7,2711.0,A digital crack in banking's business model,"Low-cost attackers are targeting customers in lucrative parts of the sector. The rise of digital innovators in financial services presents a significant threat to the traditional business models of retail banks. Historically, they have generated value by combining different businesses, such as financing, investing, and transactions, which serve their customers' broad financial needs over the l...",0.799227
8,1692.0,Making your business fit for the future,"How do you shift the working habits of thousands of employees to ensure they are fit for the future? That was the task facing PwC in 2015. Here, Chief Digital Officer John Riccio talks about the experience of implementing large-scale digital transformation. Six months ago, we began the process of moving PwC Australia's collaboration tools to the cloud. We would be using Google's Apps for Work ...",0.79151
9,2888.0,How to use Windows 10's Task View and virtual desktops,"Windows 10 brings a lot of great features to the PC, but one that power users are greeting with an exasperated "" finally "" is virtual desktops. This longstanding productivity powerhouse has long been standard on OS X and Linux distributions. Windows has actually supported the feature for a while despite not making virtual desktops available natively, but now the feature is going mainstream as ...",0.787347


# Global Precision for ALS-ITEM

In [167]:
precession_als_item=[]
gobal_precession_als_item=0
for user_index in range(1,n_users):
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=Hybird_model_als_item_based(user_index,10)['Article_id'][:10].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/10
    precession_als_item.append(precession_at_10)
    gobal_precession_als_item=gobal_precession_als_item+precession_at_10


In [169]:
gobal_precession_als_item

12.39999999999998

# Precision@k

In [170]:
precession_als_item

[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.2,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.1,
 0.0,
 0.1,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0

<h1 style="background-color:#F18C8D; padding:10px"> Hybrid Recommendation System using ALS+Content-Based </h1>

In [160]:
def recommendation_read_by_user(user_id):
    
   
    ## Getting the full data
    full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
    
    ## getting the text_description which is seen by a user or likes or watched .
    texts=platform_content[platform_content['item_id'].isin(list(full_data[full_data['consumer_Id']==user_id]['item_id']))]['text_description']
    
    ## for each description we are getting the recommendation
    score_for_each_user=[]
    for text in texts:
        df=recommedation_on_content_based(text)
        ##appending the recommendations for each text description
        
        score_for_each_user.append(df)
    dfs=[score_for_each_user[i].add_suffix(i).set_index('item_id'+str(i)) for i in range(len(score_for_each_user))]
    
    ## joining all the recomendation for all the  articles which are seen or like
    combined_df=dfs[0].join(dfs[1:])       
    combined_df.reset_index(inplace=True)
    total=0
    
    ## taking the average score amomg all the aticles
    for i in range(len(score_for_each_user)):
        total=total+combined_df['normalized_content_score'+str(i)]
    combined_df['Score']=total 
    
    ##normalizing the final score
    combined_df['normalized_content_score']=(combined_df['Score']-min(combined_df['Score']))/(max(combined_df['Score'])-min(combined_df['Score']))
    return combined_df[['item_id0','Article_id0','title0','text_description0','normalized_content_score']]

In [161]:
def Hybrid_model_als_content_based(user_id,top_recommendation):
        n_items=2984

        content_recommendation=recommendation_read_by_user(user_id)
        als_recommendation=Recommmendation_Als(user_id,n_items)
        als_content_based=pd.merge(content_recommendation,als_recommendation,left_on='Article_id0',right_on='Article_id')
        als_content_based['Final_Score']=(als_content_based['normalized_content_score']+als_content_based['Als_score_normalized'])/2

        als_content_based=als_content_based[['Article_id','item_id0','title','text_description','Final_Score']]
        ## Getting the full data
    #     full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
    #     ## For a particular user you are recommending the articles w which  are seen or read,watched  by him 
    #     seen_articles_by_user=list(full_data[full_data['consumer_Id']==user_id]['item_Id'])
    #     ## Removing the articles which are already seen or read by th user
    #     als_content_based=als_content_based[~(als_content_based['Article_id'].isin(seen_articles_by_user))]


        als_content_based=als_content_based.sort_values(by='Final_Score',ascending=False)


        return als_content_based[:top_recommendation]

In [162]:
Hybrid_model_als_content_based(17,10)

Unnamed: 0,Article_id,item_id0,title,text_description,Final_Score
0,836.0,-4102297002729307038,Acquia's Dev Desktop - a Drupal server for beginners,"Web development can be complicated. When it comes to contributing to Drupal, either on one's own time or as part of a job, one of the biggest problems beginners have is getting all of the tools working correctly. It can also be a source of frustration to people with plenty of experience too - maybe they would prefer to put time into writing witty blog posts, designing the perfect rounded corne...",1.0
564,1343.0,-820343972901090172,Stop saying learning to code is easy.,"Stop saying learning to code is easy. I saw this tweet after the Apple WWDC keynote and had thought the same thing. Hang on, programming is hard. Rewarding, sure. Interesting, totally. But ""easy"" sets folks up for failure and a lifetime of self-doubt. When we tell folks - kids or otherwise - that programming is easy , what will they think when it gets difficult? And it will get difficult. That...",0.940545
1091,1430.0,-324861070284719256,12 Best Slack Communities for Every Professional,"If your company isn't using Slack, chances are, the company next door is. In February, the team communication tool had 500,000-plus daily users-making it the fastest-growing business app ever . But Slack isn't just dominating the cubicle. Recently, people have been using the platform to found and maintain ""digital communities: "" forums for collaborating and connecting with like-minded professi...",0.936486
1010,2727.0,7606744605433799135,Why the customer experience matters,"Truly understanding customer needs may help companies improve not only the buying experience but also their bottom line. A company's relationship with its customers is about much more than improving product ratings or decreasing wait times. Understanding the customer journey is about learning what customers experience from the moment they begin considering a purchase, and then working to make ...",0.917036
982,2355.0,5338677278233757627,How to Get a Job at Google,"MOUNTAIN VIEW, Calif. - LAST June, in an interview with Adam Bryant of The Times, Laszlo Bock, the senior vice president of people operations for Google - i.e., the guy in charge of hiring for one of the world's most successful companies - noted that Google had determined that ""G.P.A.'s are worthless as a criteria for hiring, and test scores are worthless. ... We found that they don't predict ...",0.914331
16,1208.0,-1630229587164086350,Dries Buytaert,"The one big question I get asked over and over these days is: ""How is Drupal 8 doing?"". It's understandable. Drupal 8 is the first new version of Drupal in five years and represents a significant rethinking of Drupal. So how is Drupal 8 doing? With less than half a year since Drupal 8 was released , I'm happy to answer: outstanding! As of late March, Drupal.org counted over 60,000 Drupal 8 sit...",0.913015
650,2438.0,5928346445655989915,[Tools] A Tool for tracking Kanban projects (that you can cut out and keep) - Emily Webber,"I have been looking around for the right digital tool that will support my physical project wall. I want something that allows me easily to back up information for reference, but more importantly outputs the useful metrics that are important for continuous improvement. A lot of people I know use Trello , which is not bad as a tool that creates a visual representation of the cards, but it doesn...",0.905601
1634,2873.0,8586403905004879205,How to augment your career with leadership coaching,"Burnout . Pipeline problems . Harassment . The laundry list of issues for women's leadership in tech is long and the list of solutions seems much shorter. One of the popular pieces of advice for women in tech looking to advance their careers is to find a mentor. Sheryl Sandberg, in her book Lean In , talks about her mentors who encouraged her, challenged her, and advocated for her throughout h...",0.899655
383,2914.0,8828175072897143018,"The code that took America to the moon was just published to GitHub, and it's like a 1960s time capsule","When programmers at the MIT Instrumentation Laboratory set out to develop the flight software for the Apollo 11 space program in the mid-1960s, the necessary technology did not exist. They had to invent it. They came up with a new way to store computer programs, called ""rope memory,"" and created a special version of the assembly programming language. Assembly itself is obscure to many of today...",0.891791
1124,2092.0,3772685586944428188,Can user-centered design steer you wrong?,"It's not unusual for us to operate under the assumption that different people perform different tasks... well... differently . The personas we craft to represent our users help us remember our audience - their preferences, their demographics, and their technical knowledge. But what if personas aren't the answer to everything? Recently, Comrade collaborated on a study that examined the adoption...",0.89111


In [164]:
full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
user_ids=[]
for i in range(1,n_users):
    if i in full_data['consumer_Id'].tolist():
        pass
    else:
        user_ids.append(i)

In [166]:
total_user_id=[]
for user_index in range(1,n_users+1):
    total_user_id.append(user_index)
user_id=list(set(total_user_id)-set(user_ids))
user_id    

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 152,
 153,
 154,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,
 185,
 186,
 187,
 18

#   Global Precision for ALS-CONTENT

In [250]:
precession_als_content_based=[]
gobal_precession_als_content_based=0
for user_index in user_id:
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=Hybrid_model_als_content_based(user_index,10)['Article_id'].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/10
    precession_als_content_based.append(precession_at_10)
    gobal_precession_als_content_based=gobal_precession_als_content_based+precession_at_10

In [255]:
gobal_precession_als_content_based

569.2999999999997

# Precision@k

In [251]:
precession_als_content_based

[0.0,
 0.4,
 0.1,
 0.8,
 0.1,
 0.5,
 1.0,
 0.6,
 0.9,
 0.6,
 0.6,
 0.4,
 0.8,
 0.3,
 0.1,
 0.5,
 1.0,
 0.2,
 0.7,
 0.4,
 0.1,
 0.2,
 0.4,
 0.5,
 0.4,
 0.6,
 0.8,
 0.4,
 0.4,
 0.4,
 0.6,
 0.4,
 0.5,
 0.6,
 0.7,
 0.5,
 0.4,
 0.4,
 1.0,
 0.7,
 0.1,
 0.5,
 0.9,
 0.3,
 0.7,
 0.2,
 0.2,
 0.5,
 0.4,
 0.6,
 1.0,
 0.8,
 0.9,
 0.7,
 0.6,
 1.0,
 0.5,
 1.0,
 0.5,
 0.8,
 0.6,
 0.7,
 0.3,
 0.3,
 0.4,
 0.2,
 0.5,
 0.5,
 0.3,
 0.3,
 0.5,
 1.0,
 0.5,
 0.4,
 0.7,
 0.6,
 0.7,
 0.9,
 0.3,
 0.8,
 0.4,
 0.7,
 0.1,
 0.9,
 0.6,
 0.4,
 0.1,
 0.2,
 0.4,
 0.8,
 0.6,
 0.4,
 0.1,
 0.8,
 1.0,
 0.5,
 0.4,
 0.5,
 0.5,
 0.8,
 0.6,
 0.8,
 0.3,
 0.8,
 0.6,
 0.8,
 0.3,
 0.6,
 0.2,
 0.8,
 0.6,
 0.3,
 0.6,
 0.1,
 0.2,
 0.7,
 0.3,
 0.5,
 0.7,
 0.6,
 0.7,
 0.3,
 0.6,
 0.8,
 0.3,
 0.8,
 0.5,
 0.5,
 0.7,
 0.4,
 0.9,
 0.5,
 0.9,
 0.9,
 0.2,
 0.7,
 0.7,
 0.2,
 0.2,
 1.0,
 0.1,
 0.6,
 0.3,
 0.6,
 0.4,
 0.9,
 0.9,
 0.6,
 0.4,
 0.8,
 0.7,
 0.9,
 0.7,
 0.6,
 0.6,
 0.4,
 0.5,
 0.3,
 1.0,
 0.4,
 0.4,
 0.5,
 0.8,
 0.4,
 0.4,
 0.1,
 0.3

# Global Precision for Content Based

In [299]:
precession_content_based=[]
gobal_precession_content_based=0
for user_index in user_id:
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=recommendation_read_by_user(user_index)['Article_id0'][:10].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/10
    precession_content_based.append(precession_at_10)
    gobal_precession_content_based=gobal_precession_content_based+precession_at_10

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
152
153
154
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
240
241
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283

In [301]:
gobal_precession_content_based

153.89999999999742

# Precision@k

In [300]:
precession_content_based

[0.2,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.4,
 0.4,
 0.2,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.2,
 0.1,
 0.2,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.1,
 0.2,
 0.4,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.3,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.3,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.7,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.2,
 0.1,
 0.2,
 0.5,
 0.1,
 0.1,
 0.3,
 0.3,
 0.2,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.4,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.4,
 0.1,
 0.1,
 0.2,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1,
 0.2,
 0.1,
 0.2,
 0.1,
 0.1,
 0.1,
 0.1,
 0.1

<div>
<img src="images/content-item.png" width="600"/>
</div>

<h1 style="background-color:#F18C8D; padding:10px"> Hybrid Recommendation System using Content+Item-Based </h1>

In [175]:
def Hybrid_content_item_based(user_id,top_recommendation):
    n_items=2984
    content_recommendation=recommendation_read_by_user(user_id)
    ## Calling the Recommedation_user_based function
    item_recommendation=Recommendation_user_based(user_id,n_items)

    ## mergeging the dataset 
    content_item_based=pd.merge(item_recommendation,content_recommendation,left_on='item_Id',right_on='Article_id0')
    
   ## Calculating the final score as taking average of two.
    content_item_based['Final_Score']=(content_item_based['collaborative_score_normalized']+content_item_based['normalized_content_score'])/2
    ## Selecting the columns
    content_item_based=content_item_based[['item_Id','title','text_description','Final_Score']]
   
#     ## Getting the full data
#     full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
#     ## For a particular user you are recommending the articles we are see which are articles are seen or read,watched  by him 
#     seen_articles_by_user=list(full_data[full_data['consumer_Id']==user_id]['item_Id'])
#     ## Removing the articles which are already seen or read by th user
#     content_item_based=content_item_based[~(content_item_based['item_Id'].isin(seen_articles_by_user))]
     ## sorting the value by final Score
    content_item_based=content_item_based.sort_values(by='Final_Score',ascending=False)[:top_recommendation]
    return content_item_based
    
    
    
    

In [176]:
Hybrid_content_item_based(6,10)

Unnamed: 0,item_Id,title,text_description,Final_Score
218,1713,What is WebAssembly? The Dawn of a New Era - JavaScript Scene,"It's much harder to get real work done when you're writing directly in assembly language. So why do we need this WebAssembly thing? We need WebAssembly because as flexible as JavaScript is, it's still too hard to express many of the things we may want to in JavaScript, and the features we'd need to make it easy might add complexity to a language that already confuses many users . WebAssembly g...",0.528876
1,1314,12 JavaScript Hacks,"In this post I will share 12 extremely useful hacks for JavaScript . These hacks reduce the code and will help you to run optimized code. So let's start hacking! 1) Converting to boolean using !! operator Sometimes we need to check if some variable exists or if it has a valid value, to consider them as true value . For do this kind of validation, you can use the !! (Double negation operator) a...",0.524785
0,494,"Walt Disney Now Loves Blockchain, Going Trustless in Seattle","Disney has come out with a wanted ad for an intern to work on its private blockchain . Companies in diverse sectors, not just financial services , are taking notice of the blockchain and everybody seems to love it! Trustless in Seattle The position advertised has been for an intern at the Disney Private Blockchain within the DTSS (Disney Technology Solutions and Services) business unit. The pr...",0.514512
1723,225,World's Fastest Growing Open Source Platform Pushes Out New Release,"New ""Current"" version line focuses on performance improvements, increased reliability and better security for its 3.5 million users SAN FRANCISCO, April, 26, 2016 - The Node.js Foundation , a community-led and industry-backed consortium to advance the development of the Node.js platform, today announced the release of Node.js version 6 (Node.js v6). This release provides major performance impr...",0.5
1158,1444,Comparing React.js to Vue.js for dynamic tabular data,"Intro The aim of this post is to observe the differences between React and Vue as view layers. The scenario is a page which has a table of nested, frequently-updating data with a fixed number of rows. This nicely represents some of the problems we face on the front-end at Football Radar. We don't need to know too much about React or Vue to accomplish this, although we are much more experienced...",0.495548
513,1906,Improving Angular performance with 1 line of code,"So I thought to myself, ""Genius dot com raised $50 million from Andreessen Horowitz didn't they? They must have lots of Angular-trained engineers. How didn't they catch that?!"" Then I began to wonder, if Genius didn't use this one line of code that improves the performance of their website... who else? So I ⌘+T my way to , the official gallery of Angular projects. I clicked the first example I...",0.467375
2105,1517,Bringing Pokémon GO to life on Google Cloud,"Throughout my career as an engineer, I've had a hand in numerous product launches that grew to millions of users. User adoption typically happens gradually over several months, with new features and architectural changes scheduled over relatively long periods of time. Never have I taken part in anything close to the growth that Google Cloud customer Niantic experienced with the launch of Pokém...",0.458149
82,1693,The brilliant mechanics of Pokémon Go,"A court ruled that it could be a federal crime to share your Netflix password If you haven't seen it already, you will soon walking down the street. Every person you pass who is fervently looking at their phone is likely playing the number one game in the country right now: Pokémon Go. You might think it's popular because of the brand. Nintendo, which refused to make a Pokémon game for the lon...",0.408484
1076,984,Welcome Google Cloud Platform!,"Google Cloud Platform joined the Node.js Foundation today. This news comes on the heels of the Node.js runtime going into beta on Google App Engine , a platform that makes it easy to build scalable web applications and mobile backends across a variety of programming languages. In the industry, there's been a lot of conversations around a third wave of cloud computing that focuses less on infra...",0.360207
371,2323,You SHOULD Learn Vanilla JavaScript Before JS Frameworks - Snipcart,"It's 2013. Our small dev team is on the verge of shipping one of its most impressive client projects to date. I'm at my stand-up desk, skimming through early morning emails. My partner bursts through the office door: ""Something's wrong with our Angular app, man. I've got a digest is already in progress error popping everywhere, and I can't figure out what's happening,"" he says, visibly nervous...",0.350693


# Global Precision for Content-Item Based

In [177]:
precession_content_item_based=[]
gobal_precession_content_item_based=0
for user_index in user_id:
    relevant_set=consumer_interaction[consumer_interaction['consumer_Id']==user_index]['item_Id'].tolist()
    pred_set=Hybrid_content_item_based(user_index,10)['item_Id'].tolist()
    precession_at_10=len(set(relevant_set) & set(pred_set))/10
    print(user_index,precession_at_10)
    precession_content_item_based.append(precession_at_10)
    gobal_precession_content_item_based=gobal_precession_content_item_based+precession_at_10

1 0.5
2 0.4
3 0.2
4 0.5
5 0.1
6 0.5
7 0.7
8 0.5
9 0.2
10 0.2
11 0.6
12 0.4
13 0.8
14 0.2
15 0.1
16 0.4
17 0.9
18 0.2
19 0.5
20 0.2
21 0.2
22 0.1
23 0.2
24 0.4
25 0.2
26 0.6
27 0.5
28 0.5
29 0.2
30 0.1
31 0.6
32 0.2
33 0.2
34 0.6
35 0.3
36 0.5
37 0.4
38 0.5
39 0.8
40 0.7
41 0.1
42 0.5
43 0.7
44 0.2
45 0.3
46 0.2
47 0.2
48 0.2
49 0.5
50 0.6
51 0.8
52 0.5
53 0.9
54 0.7
55 0.4
56 0.4
57 0.4
58 0.3
59 0.4
60 0.8
61 0.4
62 0.5
64 0.3
65 0.3
66 0.4
67 0.2
68 0.6
69 0.1
70 0.3
71 0.3
72 0.5
73 0.4
74 0.5
75 0.3
76 0.5
77 0.5
78 0.5
79 0.4
80 0.4
81 0.6
82 0.4
83 0.4
84 0.1
85 0.7
86 0.6
87 0.4
88 0.1
89 0.2
90 0.4
91 0.6
92 0.6
93 0.4
94 0.2
95 0.8
96 0.6
97 0.4
98 0.4
99 0.2
100 0.3
101 0.6
102 0.5
103 0.4
104 0.3
105 0.5
106 0.2
107 0.5
108 0.2
109 0.2
110 0.2
111 0.4
112 0.4
113 0.3
114 0.6
115 0.1
116 0.2
117 0.6
118 0.3
119 0.2
120 0.4
121 0.3
122 0.4
123 0.3
124 0.1
125 0.5
126 0.3
127 0.4
128 0.2
129 0.5
130 0.2
131 0.4
132 0.3
133 0.3
134 0.3
135 0.4
136 0.2
137 0.4
138 0.7
139 0.2
140

IndexError: single positional indexer is out-of-bounds

In [178]:
gobal_precession_content_item_based

425.0

# Precision@k

In [179]:
precession_content_item_based

[0.5,
 0.4,
 0.2,
 0.5,
 0.1,
 0.5,
 0.7,
 0.5,
 0.2,
 0.2,
 0.6,
 0.4,
 0.8,
 0.2,
 0.1,
 0.4,
 0.9,
 0.2,
 0.5,
 0.2,
 0.2,
 0.1,
 0.2,
 0.4,
 0.2,
 0.6,
 0.5,
 0.5,
 0.2,
 0.1,
 0.6,
 0.2,
 0.2,
 0.6,
 0.3,
 0.5,
 0.4,
 0.5,
 0.8,
 0.7,
 0.1,
 0.5,
 0.7,
 0.2,
 0.3,
 0.2,
 0.2,
 0.2,
 0.5,
 0.6,
 0.8,
 0.5,
 0.9,
 0.7,
 0.4,
 0.4,
 0.4,
 0.3,
 0.4,
 0.8,
 0.4,
 0.5,
 0.3,
 0.3,
 0.4,
 0.2,
 0.6,
 0.1,
 0.3,
 0.3,
 0.5,
 0.4,
 0.5,
 0.3,
 0.5,
 0.5,
 0.5,
 0.4,
 0.4,
 0.6,
 0.4,
 0.4,
 0.1,
 0.7,
 0.6,
 0.4,
 0.1,
 0.2,
 0.4,
 0.6,
 0.6,
 0.4,
 0.2,
 0.8,
 0.6,
 0.4,
 0.4,
 0.2,
 0.3,
 0.6,
 0.5,
 0.4,
 0.3,
 0.5,
 0.2,
 0.5,
 0.2,
 0.2,
 0.2,
 0.4,
 0.4,
 0.3,
 0.6,
 0.1,
 0.2,
 0.6,
 0.3,
 0.2,
 0.4,
 0.3,
 0.4,
 0.3,
 0.1,
 0.5,
 0.3,
 0.4,
 0.2,
 0.5,
 0.2,
 0.4,
 0.3,
 0.3,
 0.3,
 0.4,
 0.2,
 0.4,
 0.7,
 0.2,
 0.2,
 0.5,
 0.1,
 0.6,
 0.3,
 0.3,
 0.3,
 0.6,
 0.6,
 0.1,
 0.1,
 0.5,
 0.6,
 0.6,
 0.3,
 0.7,
 0.2,
 0.4,
 0.4,
 0.3,
 0.4,
 0.2,
 0.5,
 0.3,
 0.3,
 0.4,
 0.4,
 0.1,
 0.3

In [220]:
df=pd.read_csv('metric.csv',usecols=['model_name','gobal_precession','Global_average'])
df

Unnamed: 0,model_name,gobal_precession,Global_average
0,Item-Baesd,13.6,0.01193
1,ALS,37.1,0.032544
2,ALS+Content,569.3,0.499386
3,ALS+Item,12.4,0.010877
4,Content-Based,153.9,0.135
5,Content+Item,425.0,0.372807


<h3 style="background-color:#78C9EF; text-align:center; color:white; padding:10px " >Comparsion with All Models</h3>

In [194]:
px.bar(data_frame=df,x='model_name',y='gobal_precession',color='model_name')

## ALS+Content based works well among all the models

<h1 style="background-color:#F18C8D; padding:10px"> Final Model</h1>

In [217]:
def Hybrid_model_als_content_based(user_id,top_recommendation):
                n_items=2984

                content_recommendation=recommendation_read_by_user(user_id)
                als_recommendation=Recommmendation_Als(user_id,n_items)
                als_content_based=pd.merge(content_recommendation,als_recommendation,left_on='Article_id0',right_on='Article_id')
                als_content_based['Final_Score']=(als_content_based['normalized_content_score']+als_content_based['Als_score_normalized'])/2

                als_content_based=als_content_based[['Article_id','item_id0','title','text_description','Final_Score']]
                ## Getting the full data
                full_data=pd.merge(consumer_interaction,platform_content,how='inner',on='item_id')
                 ## For a particular user you are recommending the articles w which  are seen or read,watched  by him 
                seen_articles_by_user=list(full_data[full_data['consumer_Id']==user_id]['item_Id'])
                 ## Removing the articles which are already seen or read by th user
                als_content_based=als_content_based[~(als_content_based['Article_id'].isin(seen_articles_by_user))]


                als_content_based=als_content_based.sort_values(by='Final_Score',ascending=False)
                


                return als_content_based[:top_recommendation][['item_id0','title']]

In [218]:
Hybrid_model_als_content_based(10,10)

Unnamed: 0,item_id0,title
8,638282658987724754,Machine Learning for Designers
246,-7126520323752764957,"How Google is Remaking Itself as a ""Machine Learning First"" Company - Backchannel"
224,5250363310227021277,"How Google is Remaking Itself as a ""Machine Learning First"" Company - Backchannel"
103,-7002558676365724983,Building a digital-banking business
697,-6940659689413147290,An Exclusive Look at How AI and Machine Learning Work at Apple - Backchannel
95,8596997246990922861,This year's Founders' Letter
229,-7033990154815318757,The Conversational Economy Part 1: What's Causing the Bot Craze?
25,-8187220755213888616,Organizing for digital acceleration: Making a two-speed IT operating model work
37,4785499183287168509,The new tech talent you need to succeed in digital
42,1981046186743381313,State of the Digital Nation 2016
