# Ranking algorithms
These algorithms will be implemented as part of the demo for our legal API. They assist in filtering lawyers and recommendations as needed. To filter by attributes, tags (attributes) will be assigned to each lawyer (instance). A dataframe with binary labels is the approach to take. Classical ML algorithms include Random Forests/Decision Trees (require a label), KNN (unsupervised). Other options that Adam thinks are inappropriate but can be explored are SVM and logistic regression. More advanced deep learning techniques such as neural networks should be applied to the chatbot instead in Adam's opinion because these techniques are reserved to applications involving the replication or simulation of human brain behaviour.

## EDA of initial state

## Groupby rank
* Ranking relies on sorting methods
* Requires a csv (excel) file storing all the data about each lawyer with the matrics (ratings, number of visits by consumer(s) for the lawyer chosen, location, recency, other possible factors TBD)
* Solution to impute missing data: push ranking to bottom of filter. (eg. if rating unknown or not entered, ranking is last but shortcomings in bias)
* Imputation solutions TBD

In [1]:
import pandas as pd

## PageRank
* Fundamental algorithm that founders of Google implemented to create their rankings system of webpages in the 90s. Heavily modified or not used at all nowadays because of multiple reasons.
* Shortcomings and edge cases: more 'experienced' lawyers likely to get top rankings whilst emerging lawyers do not get a chance at all to be viewed. Heavy biases. 
* Easy implementation, weightings can be adjusted to account for poor performing lawyers who may be viewed regularly

In [None]:
import networkx as nx

## Collaborative filtering (AdaKNN)

* Filters based on similar users' choices (a recommendation system implemented by streaming services)
* Factors that can be accounted for: rating (explicit reactions), time spent on lawyer's page (implicit reaction)
* To use this technique how do we address biases during the initialisation phase
* Normalisation techniques?
* Issue is that clients don't usually browse and rate lawyers like they do to films and restaurants

### Procedure
* Find a set of users with similar behaviours to target user
* Use cosine similarity to find region of similar users
* To remove 'tough critic' biases, subtract the mean of a user's vector from each component.
* Measure accuracy of user's predicted behaviour by using RMSE

For more details on the technical procedure: https://realpython.com/build-recommendation-engine-collaborative-filtering/#:~:text=Collaborative%20filtering%20is%20a%20technique,similar%20to%20a%20particular%20user.

In [1]:
from scipy import spatial
#demonstration on rating scale 0-5
a=(2.5,4,4.5,2.1)
b=(3,4.7,4.3,2.4)
c=(3.4,4.2,3.9,2.5)
#least distance indicates most similar
print(f'Distance between a and b: {spatial.distance.euclidean(a,b)}')
print(f'Distance between b and c: {spatial.distance.euclidean(c,b)}')
print(f'Distance between a and c: {spatial.distance.euclidean(a,c)}')
#considering other geometric factors in higher dimensional space
print(f'angle between a and b: {spatial.distance.cosine(a,b)}')
print(f'angle between c and b: {spatial.distance.cosine(c,b)}')
print(f'angle between a and c: {spatial.distance.cosine(a,c)}')

Distance between a and b: 0.9327379053088816
Distance between b and c: 0.7615773105863908
Distance between a and c: 1.1704699910719625
angle between a and b: 0.00512161046424553
angle between c and b: 0.0045009587866460254
angle between a and c: 0.013312179972831295


## AES explanation
* There was something in the 80s called RSA intended to protect messages being sent from one party to another but it had limited capabilities in encryption ability. This was to do with hashing keys and whatnot.
* Eventually, new ways of encrypting stuff came out where hashing keys became more robust so messages became harder to read for people who did not have the encryption keys. Right now, the most secure standard of encryption is called AES (advanced encryption standard) and is regularly used in wifi to make messages sent through Discord, etc. not easy to read unless some serious breach occurs. We use 256 bit encryption because it contains the most layers of processing to create keys. 