<a href="https://colab.research.google.com/github/Aditya0721/Recommender/blob/master/Recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **# The main idea behind the generation of the Recommendation system is to recommend posts using both Collaborative and Content Based Filtering Methods. For Collaborative based filtering we will find out all the categories and find the cosine similarity using item-item collaborative filtering.**
```



# **DATA PREPROCESSING**

## Uploading the datasets and storing it in dataframes

In [0]:
import pandas as pd
from google.colab import files
import io

In [0]:
uploaded = files.upload()

In [0]:
users_df = pd.read_csv('users.csv')
users_df.head()

In [0]:
posts_df = pd.read_csv('posts.csv')
posts_df.head()

In [0]:
views_df = pd.read_csv('views.csv')
views_df.head()

## DETAILS OF THE DATAFRAMES

In [0]:
users_df.describe()

In [0]:
views_df.describe()

In [0]:
posts_df.describe()

**checking for null values**

In [0]:
users_df.isnull().any()


In [0]:
views_df.isnull().any()

In [0]:
posts_df.isnull().any()

**we have got null values in the category column of the posts dataset**

**as its a categorical variable we have to fill the null values considering the title and post_type**

**counting total number of null values in posts_df**

In [0]:
posts_df['category'].isna().sum()

**finding title and post_type for all 28 null values**

In [0]:
posts_df[posts_df['category'].isna()]

**As we can see most of the null values are for the post_types project we will create a new category called project and assign it to the null values and when ever a user leaves the category field empty we will assign the category field same as the post_type as default**

In [0]:
posts_df[posts_df['category'].isna()]='project'

**As we have imputed the null values lets check again**

In [0]:
posts_df.isna().any()

**Now lets see how many unique categories are present in the dataset**

In [0]:
posts_df['category'].unique()

**As we can see some of the categories are combination of multiple categoris we will add more rows to the posts_df by splitting the categories and making one category for each row**

In [0]:
categories = {}
for i in posts_df['category']:
  categories.update({i:[]})
  for j in list(i.split("|")):
      categories[i].append(j) 
print(categories)

**Now lets update posts_df**

In [0]:
updated_data =  []
for i in categories:
  dummy = posts_df[posts_df['category']==i]
  id = dummy['_id'].values[0]
  title = dummy['title'].values[0]
  post_type = dummy[' post_type'].values[0]
  for j in categories[i]:
      dict1 = {}
      dict1.update({'_id':id})
      dict1.update({'title':title})
      dict1.update({'category':j})
      dict1.update({' post_type':post_type})
      updated_data.append(dict1)

In [0]:
posts_df_updated = pd.DataFrame(updated_data)


**now lets check updated posts_df**

In [0]:
posts_df_updated.loc[:, ['_id', 'category', ' post_type']].head()

In [0]:
posts_df.loc[:, ['_id', 'category', ' post_type']].head()

In [0]:
posts_df_updated.describe()

**As we can see we have successfully created our posts dataframe now we will merge the dataframes**

##MERGING OF THE DATAFRAMES 

**We will merge views_df and posts_df_updated dataframes as they conatin all the columns and data that we will need for our recommendation system**

In [0]:
views_df.columns

In [0]:
posts_df_updated.columns

**As we know for merging we need a common column and post_id is the column in the both dataframes but we have to rename one of them to perform the merge operation**
**We will change rename _id to post id in the posts_df_updated column** 

In [0]:
posts_df_updated.rename(columns={'_id':'post_id'}, inplace=True)

In [35]:
posts_df_updated.columns

Index(['post_id', 'title', 'category', ' post_type'], dtype='object')

In [0]:
main_df = pd.merge(views_df, posts_df_updated)

In [38]:
main_df.columns

Index(['user_id', 'post_id', 'timestamp', 'title', 'category', ' post_type'], dtype='object')

In [39]:
main_df.head(10)

Unnamed: 0,user_id,post_id,timestamp,title,category,post_type
0,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z,Save Earth.,Visual Arts,artwork
1,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z,Save Earth.,Graphic Design,artwork
2,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z,Save Earth.,Artistic design,artwork
3,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z,Save Earth.,Graphic,artwork
4,5df49b32cc709107827fb3c7,5ec821ddec493f4a2655889e,2020-06-01T10:46:45.131Z,Save Earth.,Illustration,artwork
5,5ec3ba5374f7660d73aa1201,5ec821ddec493f4a2655889e,2020-05-24T10:49:55.177Z,Save Earth.,Visual Arts,artwork
6,5ec3ba5374f7660d73aa1201,5ec821ddec493f4a2655889e,2020-05-24T10:49:55.177Z,Save Earth.,Graphic Design,artwork
7,5ec3ba5374f7660d73aa1201,5ec821ddec493f4a2655889e,2020-05-24T10:49:55.177Z,Save Earth.,Artistic design,artwork
8,5ec3ba5374f7660d73aa1201,5ec821ddec493f4a2655889e,2020-05-24T10:49:55.177Z,Save Earth.,Graphic,artwork
9,5ec3ba5374f7660d73aa1201,5ec821ddec493f4a2655889e,2020-05-24T10:49:55.177Z,Save Earth.,Illustration,artwork


In [40]:
main_df.tail(20)

Unnamed: 0,user_id,post_id,timestamp,title,category,post_type
1730,5d610ae1653a331687083239,5e7bd922cfc8b713f5ac7da9,2020-03-27T09:45:14.071Z,What sports will look like in the future,Computer Technology,blog
1731,5d610ae1653a331687083239,5e7bd922cfc8b713f5ac7da9,2020-03-27T09:45:14.071Z,What sports will look like in the future,Robotics,blog
1732,5d610ae1653a331687083239,5e7bd922cfc8b713f5ac7da9,2020-03-27T09:45:14.071Z,What sports will look like in the future,Data Science,blog
1733,5d610ae1653a331687083239,5e7bd922cfc8b713f5ac7da9,2020-03-27T09:45:14.071Z,What sports will look like in the future,Information Technology,blog
1734,5d610ae1653a331687083239,5e7bd922cfc8b713f5ac7da9,2020-03-27T09:45:14.071Z,What sports will look like in the future,Artificial Intelligence,blog
1735,5e5855ced701ab08af792b51,5e7bd922cfc8b713f5ac7da9,2020-03-26T21:39:16.764Z,What sports will look like in the future,Computer Technology,blog
1736,5e5855ced701ab08af792b51,5e7bd922cfc8b713f5ac7da9,2020-03-26T21:39:16.764Z,What sports will look like in the future,Robotics,blog
1737,5e5855ced701ab08af792b51,5e7bd922cfc8b713f5ac7da9,2020-03-26T21:39:16.764Z,What sports will look like in the future,Data Science,blog
1738,5e5855ced701ab08af792b51,5e7bd922cfc8b713f5ac7da9,2020-03-26T21:39:16.764Z,What sports will look like in the future,Information Technology,blog
1739,5e5855ced701ab08af792b51,5e7bd922cfc8b713f5ac7da9,2020-03-26T21:39:16.764Z,What sports will look like in the future,Artificial Intelligence,blog


In [41]:
main_df.describe()

Unnamed: 0,user_id,post_id,timestamp,title,category,post_type
count,1750,1750,1750,1750,1750,1750
unique,88,231,626,227,234,3
top,5d60098a653a331687083238,5ec7abfdec493f4a26558860,2020-05-22T20:11:47.721Z,FASHION ILLUSTRATION (OP ART),Illustration,blog
freq,261,63,7,63,78,846


**Now the merging is done and we have our actuall dataframe to work on now lets move to the nect step which is filtering using Collaborative Filtering **

# **Collborative Filtering**