<h1>A simple content based recommender system</h1><br>
<h2>Boston housing dataset</h2>

Attribute Information:

    1. CRIM      per capita crime rate by town
    2. ZN        proportion of residential land zoned for lots over 
                 25,000 sq.ft.
    3. INDUS     proportion of non-retail business acres per town
    4. CHAS      Charles River dummy variable (= 1 if tract bounds 
                 river; 0 otherwise)
    5. NOX       nitric oxides concentration (parts per 10 million)
    6. RM        average number of rooms per dwelling
    7. AGE       proportion of owner-occupied units built prior to 1940
    8. DIS       weighted distances to five Boston employment centres
    9. RAD       index of accessibility to radial highways
    10. TAX      full-value property-tax rate per USD 10,000 
    11. PTRATIO  pupil-teacher ratio by town
    12. B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks 
                 by town
    13. LSTAT    % lower status of the population
    14. MEDV     Median value of owner-occupied homes in USD 1000's = Target

In [None]:
from sklearn import datasets
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from sklearn.preprocessing import normalize

boston = datasets.load_boston()

Let's see how the data looks like

In [110]:
print("Feature Names:\n", boston.feature_names)
print("Feature 0 :\n", boston.data[0])
print("Median value of owner-occupied homes in USD 1000's = Target [0]\nUSD. ", boston.target[0]*1000)

Feature Names:
 ['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
 'B' 'LSTAT']
Feature 0 :
 [  6.32000000e-03   1.80000000e+01   2.31000000e+00   0.00000000e+00
   5.38000000e-01   6.57500000e+00   6.52000000e+01   4.09000000e+00
   1.00000000e+00   2.96000000e+02   1.53000000e+01   3.96900000e+02
   4.98000000e+00]
Median value of owner-occupied homes in USD 1000's = Target [0]
USD.  24000.0


<h1>Let's define our recommendation algorithm</h1>

In [125]:
def recommend_by_features(data, targets, sample_id, top_n):
    '''An example for simplest content based recommendation'''
    
    #Initialize an array
    similarity_scores = np.zeros(len(data))
    
    #Calculate similarities between samples based on their features
    for row in range(0,len(data)-1):
        similarity_scores[row] = cosine_similarity([data[sample_id,:]],[data[row,:]])
        
    #Select top n features based on the sorted similarity scores
    selected_features = data[np.argsort(-similarity_scores),:][1:top_n+1,:]
    #Select top n target labels based on the sorted similarity scores
    selected_targets = targets[np.argsort(-similarity_scores)][1:top_n+1]
    
    #Print sorted similarities
    print("Closest similarity scores\n", similarity_scores[np.argsort(-similarity_scores)][1:top_n+1])
    #Print sorted similarity indices
    print("Closest similarity indices\n", np.argsort(-similarity_scores)[1:top_n+1],"\n\n")

    print("Recommendations: ")
    print("Median Value of similar houses\n", selected_targets*1000)
    #print("Features: ",selected_features)
    return

That's it! all we need is a sample Id and call our content based recommnder system. Our problem statement will read as such:<br> <b><em>Find me the median values for top n houses similar to a given house</em></b>

In [126]:
sample_id = 43
top_n = 10
print("Median values of sample house:",boston.target[sample_id]*1000,"\n\n")
#find me the median values for top n houses similar to a given house
recommend_by_features(boston.data, boston.target, sample_id, top_n)

Median values of sample house: 24700.0 


Closest similarity scores
 [ 0.99991518  0.99990154  0.99856734  0.99835476  0.998345    0.99823438
  0.99821168  0.99789822  0.99751009  0.99742009]
Closest similarity indices
 [ 42  41 271  52  53  45  46 335 338 333] 


Recommendations: 
Median Value of similar houses
 [ 25300.  26600.  25200.  25000.  23400.  19300.  20000.  21100.  20600.
  22200.]



<h2>Use-case 1:</h2><br>
<h3>Title: Based on the topics you selected here are some of the people who should talk to!</h3><br>
Domain Knowledge: People who share common topics are likely to be similar and hence are likely to form friendship bonds. <br>
Objective: Recommend friends for a given member. <em>(User based filtering, User-User recommendation)</em><br>


<h2>Use-case 2:</h2><br>
<h3>Title:?</h3><br>
Domain Knowledge: People who share common topics of interest with most members of a group they are not a part of, might be interested in the group.<br>
Objective: Recommend new groups for a member. <em>(User based filtering, User-Item recommendation)</em><br>


<h2>Use-case 3:</h2><br>
<h3>Title:?</h3><br>
Domain Knowledge: Groups that share common topics with other groups that a member is not a part of, might be of interest to the member.<br>
Objective: Recommend new groups for a member. <em>(Item based filtering, User-Item recommendation)</em><br>


<h2>Question: What is common here?</h2><br>
Answer: Features are knowm, well defined, and densely packed.
