# Article Recommender System
**About the data** - [Source](https://www.kaggle.com/gspmoreira/recommender-systems-in-python-101/data) This file contains information about the articles shared in the platform. 
Each article has its sharing date (timestamp), the original url, title, content in plain text, the article' lang (Portuguese - pt or English - en) and information about the user who shared the article (author).

There are two possible event types at a given timestamp: 
- CONTENT SHARED: The article was shared in the platform and is available for users. 
- CONTENT REMOVED: The article was removed from the platform and not available for further recommendation.

**Aim** - Build recommender systems for sharing these articles

### Importing libraries

In [42]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math

In [43]:
# Loading data
articles_df = pd.read_csv('data/shared_articles.csv')
# Choosing only the shared articles
articles_df = articles_df[articles_df['eventType'] == 'CONTENT SHARED']

# Loading how users interact with data
interactions_df = pd.read_csv('data/users_interactions.csv')


### Data cleaning

In [44]:
# Creating a new column to quantify the degree of interaction
# Weights are assigned by me
event_type_strength = {
   'VIEW': 1.0,
   'LIKE': 2.0, 
   'BOOKMARK': 3.0, 
   'FOLLOW': 4.0,
   'COMMENT CREATED': 5.0,  
}

interactions_df['eventStrength'] = interactions_df['eventType'].apply(lambda x: event_type_strength[x])

# Making interactions a smooth function
def smooth_user_preference(x):
    """Return a log transformation"""
    return math.log(1+x, 2)
    
interactions_df = interactions_df \
                    .groupby(['personId', 'contentId'])['eventStrength'].sum() \
                    .apply(smooth_user_preference).reset_index()
print('Unique user/item interactions: %d' % len(interactions_df))


Unique user/item interactions: 40710


In [45]:
# Only taking users with at least5 interaction
# Such that we have enough information to recommend
count_df = interactions_df.groupby(['personId', 'contentId']).size().groupby('personId').size()
print('Total users = %d' % len(count_df))
users_with_enough_interactions_df = count_df[count_df >= 5].reset_index()[['personId']]
print('Users after filtering = %d' % len(users_with_enough_interactions_df))

Total users = 1895
Users after filtering = 1140


In [46]:
# Removing less than 5 interactions from the originals interaction dataset
print('Total interactions = %d' % len(interactions_df))
interactions_from_selected_users_df = interactions_df.merge(users_with_enough_interactions_df, 
               how = 'right',
               left_on = 'personId',
               right_on = 'personId')
print('Interactions after filtering = %d' % len(interactions_from_selected_users_df))

Total interactions = 40710
Interactions after filtering = 39106
