# <font color='darkcyan'> Recommendation System for Connecting Like-Minded Users </font> 


We are employing a content-based approach to build a recommendation system to recommend other users to a user with whom he/she can collaborate with.

We will be using user's profile, comments and posts liked to come up with recommendations.

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import sqlite3

import warnings
warnings.filterwarnings("ignore")

# For Natural Language Processing
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer

# For scaling the data
from sklearn.preprocessing import StandardScaler

# For calculating cosine simiarity
from sklearn.metrics.pairwise import cosine_similarity

---

---

### <font color='darkcyan'> Selecting features for building the recommendation system </font> 


These will be stored in the database `user`.

| Column | Remarks                                            |
|:-------|:---------------------------------------------------|
|user_id|Unique identifier for the users|
|user_name|Name of the user|
|profile_interest|This is a non-numeric column. It provides information on the user profile and interests|
|visual_art|This is a dummy variabe. 0 if not participated in this category, 1 if yes|
|cooking|This is a dummy variabe. 0 if not participated in this category, 1 if yes|
|music|This is a dummy variabe. 0 if not participated in this category, 1 if yes|
|poetry|This is a dummy variabe. 0 if not participated in this category, 1 if yes|
|artsandcraft|This is a dummy variabe. 0 if not participated in this category, 1 if yes|
|post_shared|The number of posts shared by a user. This is a numeric column.|

In [None]:
conn = sqlite3.connect('user.db')

myquery = "SELECT * FROM user_info"
            
user_df = pd.read_sql(myquery, conn)

---
### <font color='darkcyan'> Extracting features from the interest column by using Natural Language Processing (NLP) </font> 

</br></br>


In [None]:
import nltk
nltk.download('stopwords')
import string

from nltk.corpus import stopwords 
ENGLISH_STOP_WORDS = stopwords.words('english')

# Defining a custom function for tokenizing 

stemmer = nltk.stem.PorterStemmer()

def my_tokenizer(sentence):
    
    sentence = sentence.replace('\n','')
    
     # Remove numbers
    list_digit= ['0','1', '2', '3', '4', '5', '6', '7', '8', '9']
    for digit in list_digit:
        sentence = sentence.replace(digit,'')
    
    # Remove punctuation and set to lower case
    for punctuation_mark in string.punctuation:
        sentence = sentence.replace(punctuation_mark,'').lower()

    # split sentence into words
    listofwords = sentence.split(' ')
    listofstemmed_words = []
    
        
    # Remove stopwords and any tokens that are just empty strings
    for word in listofwords:
        if (not word in ENGLISH_STOP_WORDS) and (word!='') and (not word in REMOVE_WORDS):

            # Stem words
            stemmed_word = stemmer.stem(word)
            listofstemmed_words.append(stemmed_word)

    return listofstemmed_words

In [None]:
# Building a basic tf-idf vector using the above created tokenizer function

from sklearn.feature_extraction.text import TfidfVectorizer

# Instantiate the Vectorizer
tfidf = TfidfVectorizer(tokenizer=my_tokenizer)

# Fit the Vectorizer to the training data
tfidf.fit(user_df['profile_interest'])

# Transform the training data 
reviews_tfidf = tfidf.transform(user_df['profile_interest'])

In [None]:
# Transforming our original dataframe using the tfidf vector

tfidf_result = (reviews_tfidf).toarray()
tfidf_df = pd.DataFrame(tfidf_result, columns = tfidf.get_feature_names())
combined_df= pd.concat([final_reviews, user_df], axis=1)

# Dropping the non-numerical columns to compute the cosine similarity
final_df = combined_df.drop(['user_id', 'user_name', 'profile_interest' ], axis=1)


We are using the cosine similarity metrics to compute the similarity between the users and ultimately make recommendations. If we think of the various features of each user being a vector in a multi-dimensional space, this metric captures the orientation rather than the distance between the vectors. Mathematically, it measures the cosine of the angle between the two vectors. 

We will convert the similarity array into a dataframe with the index and column values as the restaurant name, and the row value will represent the cosine similarity between the column and the index restaurants. The diagonal value which will represent the cosine similarity between the restaurant themselves will be 1. the value will be between 0 (no similarity) and 1 (absolute similarity).

In [None]:
# Calculating the cosine similarity between the restaurants
similarity_score = cosine_similarity(final_df, final_df)

# Converting the above array to similarity score
sim = pd.DataFrame(similarity_score, columns=user_df['user_name'], index=user_df['user_name'])


# Recommendation Function

def user_recommendations(user_id):
    
    # Making a dataframe to hold the list of recommendations sorted by cosine similarity in descending order
    recommended_users = pd.DataFrame(list((sim[user_id].sort_values(ascending=False)).index))
    
    recommended_users.columns = ['Recommended Users']
    
    
    return recommended_users.head(3)
        