## Project Description

Utilized patient ratings with a drug and medical condition dataset to generate treatment suggestions.

Let's take a practical scenario where multiple medical practitioners have treated patients with different medical conditions with the most suitable drugs available. For every prescribed drug, the patients are diagnosed and then suggested a treatment plan, which is our experiences.

The purpose of the recommendation system is to understand and find patterns with the information provided by patients during the diagnosis, and then suggest a treatment plan, which most closely matches the pattern identified by the recommendation system. 

At the end of this article, we are going deeper into how these recommendations work and how we can find one preferred suggestion and the next five closest suggestions for any treatment.


## Definition

A recommendation system suggests or predicts a user's behaviour by observing patterns of their past behaviour compared to others.

In simple terms, it is a filtering engine that picks more relevant information for specific users by using all the available information. It is often used in ecommerce like Amazon, Flipkart, Youtube, and Netflix and personalized user products like Alexa and Google Home Mini.

For the medical industry, where suggestions must be most accurate, a recommendation system will also take experiences into account. So, we must use all our experiences, and such applications will use every piece of information for any treatment. 

Recommendation systems use information like various medical conditions and their effect on each patient. They compare these patterns to every new treatment to find the closest similarity.

## Concepts and Technology

To design the recommendation system, we need a few concepts, which are listed below.

1. Concepts: Pattern Recognition, Correlation, Cosine Similarity, Vector norms (L1, L2, L-Infinity)‍

2. Language: Python (library: Numpy & Pandas), Scipy, Sklearn

As far as the prototype development is concerned, we have support of a library (Scipy & Sklearn) that executes all the algorithms for us. All we need is a little Python and to use library functions.

## Different Approaches for Recommendation Systems

Below I have listed a few filtering approaches and examples:
<ol>
<li>Collaborative filtering: It is based on review or response of users for any entity. Here, the suggestion is based on the highest rated item by most of the users. E.g., movie or mobile suggestions.‍</li>
<li>Content-based filtering: It is based on the pattern of each user's past activity. Here, the suggestion is based on the most preferred by similar users. E.g., food suggestions.‍</li>
<li>Popularity-based filtering: It is based on a pattern of popularity among all users. E.g., YouTube video suggestions</li>
</ol>    

Based on these filtering approaches, there will be different approaches to recommender systems, which are explained below:
<ul>
<li>Multi-criteria recommender systems: Various conditions like age, gender, location, likes, and dislikes are used for categorization and then items are suggested. E.g., suggestion of apparel based on age and gender.‍</li>
<li>Risk-aware recommender systems: There is always uncertainty when users use Internet applications (website or mobile). Recommending any advertisement over the Internet must consider risk and users must be aware of this. E.g., advertisement display suggestion over Internet application.</li><li>Mobile recommender systems: These are location-based suggestions that consist of users’ current location or future location and provide suggestions based on that. E.g., mostly preferred in traveling and tourism.‍</li><li>Hybrid recommender systems: These are the combination of multiple approaches for recommendations. E.g., suggestion of hotels and restaurants based on user preference and travel information.‍</li>
<li>Collaborative and content recommender systems: These are the combination of collaborative and content-based approaches. E.g., suggestion of the highest-rated movie of users’ preference along with their watch history.</li>
</ul>    

## Practical Example with Implementation

In this example, we have a sample dataset of drugs prescribed for various medical conditions and ratings given by patients. What we need here is for any medical condition we have to receive a suggestion for the most suitable prescribed drugs for treatment.

<b>Sample Dataset: </b>

<i>Below is the sample of the publicly available medical drug dataset used from the Winter 2018 Kaggle University Club Hackathon.</i>

Sample Code: 

We will do this in 5 steps:

1. Importing required libraries

2. Reading the drugsComTest_raw.csv file and creating a pivot matrix.

3. Creating a KNN model using the NearestNeighbors function with distance metric- 'cosine' & algorithm- 'brute'. Possible values for distance metric are 'cityblock', 'euclidean', 'l1', 'l2' & ‘manhattan’. Possible values for the algorithm are 'auto', 'ball_tree', 'kd_tree', 'brute' & 'cuml'.

4. Selecting one medical condition randomly for which we have to suggest 5 drugs for treatment.

5. Finding the 6 nearest neighbors for the sample, calling the kneighbors function with the trained KNN models created in step 3. The first k-neighbor for the sample medical condition is self with a distance of 0. The next 5 k-neighbors are drugs prescribed for the sample medical condition.

In [None]:
# Step 1
import pandas as pd
import numpy as np

from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()


In [None]:
# Step 2
df = pd.read_csv(r'C:\Users\annuc\Downloads\drugsComTest_raw.csv').fillna('NA')
df['condition_id'] = pd.Series(encoder.fit_transform(df['condition'].values), index=df.index)
df_medical = df.filter(['drugName', 'condition', 'rating', 'condition_id'], axis=1)
df_medical_ratings_pivot=df_medical.pivot_table(index='drugName',columns='condition_id',values='rating').fillna(0)
df_medical_ratings_pivot_matrix = csr_matrix(df_medical_ratings_pivot.values)




In [None]:
# Step 3
# distance =  [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]
# algorithm = ['auto', 'ball_tree', 'kd_tree', 'brute', 'cuml']
model_knn = NearestNeighbors(metric = 'cosine', algorithm = 'brute')
model_knn.fit(df_medical_ratings_pivot_matrix)




In [None]:
# Step 4
sample_index = np.random.choice(df_medical_ratings_pivot.shape[0])
sample_condition = df_medical_ratings_pivot.iloc[sample_index,:].values.reshape(1, -1)




In [9]:
# Step 5
distances, indices = model_knn.kneighbors(sample_condition, n_neighbors = 6)
for i in range(0, len(distances.flatten())):
    if i == 0:
        print('Recommendations for {0}:\n'.format(df_medical_ratings_pivot.index[sample_index]))
    else:
        recommendation = df_medical_ratings_pivot.index[indices.flatten()[i]]
        distanceFromSample = distances.flatten()[i]
        print('{0}: {1}, with distance of {2}:'.format(i, recommendation, distanceFromSample))

Recommendations for Orphenadrine:

1: Amerge, with distance of 0.13907347178568452:
2: Goody's Extra-Strength Headache Powders, with distance of 0.13907347178568452:
3: Zolmitriptan, with distance of 0.13907347178568452:
4: Maxalt-MLT, with distance of 0.13907347178568452:
5: Imitrex Statdose, with distance of 0.13907347178568452:
