___
TEAM RUBY (MACHINE LEARNING TRACK)
___
# Recommender System for Lucid.blog

This is the code notebook for Recommender System built for lucid. 

In this notebook, we will focus on providing a basic recommendation system by suggesting posts that are most similar to a particular individual's interests, in this case, posts and bio. 

- Let's get started with the data preparation

## Import Libraries

In [1]:
import numpy as np
import pandas as pd

## Get the Data

### 1. User Data

In [2]:
columns = ['user_id', 'fullname', 'username', 'email', 'image', 'email_provider', 'provider_id', 'password', 'remember_token', 'created_at', 'updated_at', 'short_bio']
users = pd.read_csv('lucid_dataset/users.csv', names=columns)
users.drop(['fullname', 'email', 'email_provider', 'image', 'provider_id', 'password', 'remember_token', 'created_at', 'updated_at'], axis=1, inplace=True)

In [3]:
users.head(2)

Unnamed: 0,user_id,username,short_bio
0,1,eniayomi,Software Developer | DevOPs Engineer
1,2,DMatrix,Web Developer


### 2. Following/Follower Data

In [4]:
fol_col = ['follow_t', 'follower_id', 'status']

In [6]:
following = pd.read_csv('lucid_dataset/following.csv', names=fol_col)
following.drop(['status', 'follower_id'], axis=1, inplace=True)
following.head(5)

Unnamed: 0,follow_t
0,3
1,6
2,3
3,3
4,7


### 3. Posts Data

In [7]:
post_col = ['post_id', 'user_id', 'title', 'content', 'tags', 'slug', 'created_at', 'updated_at', 'image', 'status_id', 'action',
           'p_id']
posts = pd.read_csv('lucid_dataset/posts.csv', names=post_col)
posts.drop(['title', 'slug', 'created_at', 'updated_at', 'image', 'status_id', 'action', 'p_id'], axis=1, inplace=True)
posts.head(2)

Unnamed: 0,post_id,user_id,content,tags
0,1,2077,I learnt how to use the table tag as i have us...,
1,2,1719,"I am on this journey with start.ng, and here ...",Technology


### 4. Notification Data

In [8]:
noti_col = ['id', 'post_id', 'parent_comment_id', 'comment', 'sender_id', 'notification_freq', 'status', 'action', 'type', 'created_at', 'updated_at']
notifications = pd.read_csv('lucid_dataset/notifications.csv', names=noti_col)
notifications.drop(['id', 'post_id', 'parent_comment_id', 'comment', 'sender_id', 'status', 'action', 'type', 'created_at', 'updated_at'], axis=1, inplace=True)
notifications.head(2)

Unnamed: 0,notification_freq
0,7
1,4


### Combining all 4 dataframes

In [19]:
df = pd.merge(users, posts, on='user_id')

In [20]:
df = df.join(following, how='right')

In [21]:
df = df.join(notifications, how='right')

In [22]:
df.tags.fillna('Technology', inplace=True)

In [23]:
df.dropna(inplace=True)

In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154 entries, 0 to 189
Data columns (total 8 columns):
user_id              154 non-null float64
username             154 non-null object
short_bio            154 non-null object
post_id              154 non-null float64
content              154 non-null object
tags                 154 non-null object
follow_t             154 non-null int64
notification_freq    154 non-null int64
dtypes: float64(2), int64(2), object(4)
memory usage: 10.8+ KB


### Computing Number of Followers and total Notifications per User

In [25]:
df['no_of_followers'] = df.groupby('user_id', as_index=False)['follow_t'].transform(lambda s: s.count())
df.drop('follow_t', axis=1, inplace=True)

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 154 entries, 0 to 189
Data columns (total 8 columns):
user_id              154 non-null float64
username             154 non-null object
short_bio            154 non-null object
post_id              154 non-null float64
content              154 non-null object
tags                 154 non-null object
notification_freq    154 non-null int64
no_of_followers      154 non-null int64
dtypes: float64(2), int64(2), object(4)
memory usage: 10.8+ KB


In [27]:
df.head()

Unnamed: 0,user_id,username,short_bio,post_id,content,tags,notification_freq,no_of_followers
0,1.0,eniayomi,Software Developer | DevOPs Engineer,280.0,Yh it did,Technology,7,4
1,1.0,eniayomi,Software Developer | DevOPs Engineer,994.0,First,Health,4,4
2,1.0,eniayomi,Software Developer | DevOPs Engineer,995.0,second![](/storage/1/images/img-ga0dlp954u.png),Technology,45,4
3,1.0,eniayomi,Software Developer | DevOPs Engineer,996.0,third![](/storage/1/images/img-hta9olnjij.png)...,Technology,4,4
4,2.0,DMatrix,Web Developer,981.0,\`html\`\n\n \n\n\`<!DOCTYPE html>\n\n<html l...,Technology,4,12
