# Consignas

1)Desarrollar un sistema de recomendación de prestadores a partir de un modelo K-Nearest Neighbours.

Teniendo en cuenta que el perfil de los pacientes en cada especialidad muestra una marcada diferencia, desarrollar un modelo de recomendación para cada especialidad.
Utilizar modelo sklearn.neighbors.KNeighborsClassifier. La variable target será el id_prestador.
En línea con el objetivo planteado para el proyecto, el dataset de entrenamiento y validación del modelo serán aquellos prestadores que muestran altas calificaciones y un nivel de demanda medio y alto. Las altas calificaciones nos aseguran que los prestadores a recomendar brindan un buen servicio. El nivel de demanda medio/alto nos asegura que las calificaciones son representativas estadísticamente.


# Importación de librerías

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas_profiling
import seaborn as sns
import ptitprince as pt
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV ,  train_test_split
from sklearn.metrics import confusion_matrix , classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

# Importación de datasets

In [2]:
df_procesado=pd.read_csv('../Data/df_procesado.csv')
df_procesado.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 31 columns):
 #   Column                                                                             Non-Null Count  Dtype  
---  ------                                                                             --------------  -----  
 0   id_consumo_encoded                                                                 2500 non-null   int64  
 1   id_socio_encoded                                                                   2500 non-null   int64  
 2   id_prestador_encoded                                                               2500 non-null   int64  
 3   calificacion_experiencia_encoded                                                   2500 non-null   float64
 4   id_zona_encoded                                                                    2500 non-null   int64  
 5   edad_encoded                                                                       2500 non-null   float

In [3]:
df_procesado.head(2)

Unnamed: 0,id_consumo_encoded,id_socio_encoded,id_prestador_encoded,calificacion_experiencia_encoded,id_zona_encoded,edad_encoded,antiguedad_encoded,sexo_encoded_F,sexo_encoded_M,plan_encoded_a,...,especialidad_prestador_encoded_CLINICA MEDICA,especialidad_prestador_encoded_NUTRICIONISTAS(LIC.),especialidad_prestador_encoded_PEDIATRIA,edad_cat_pediatrico,categoria_prestador_gold,categoria_prestador_silver,categoria_prestador_standard,categoria_socio_gold,categoria_socio_silver,categoria_socio_standard
0,1,100000,200000,0.666667,1,0.952941,0.530769,1,0,1,...,1,0,0,1,1,0,0,0,0,1
1,2,100000,200000,0.555556,1,0.952941,0.530769,1,0,1,...,1,0,0,1,1,0,0,0,0,1


## Especialidad Clínica Médica

Filtramos los prestadores según la categoría estándar para simplificar el análisis

In [4]:
columns=['id_socio_encoded','id_prestador_encoded','calificacion_experiencia_encoded',]
df_filtrado = df_procesado[(df_procesado['especialidad_prestador_encoded_CLINICA MEDICA']==1) &
                           (df_procesado['categoria_prestador_standard'])==1]
df_filtrado=df_filtrado[columns]

In [5]:
df_filtrado

Unnamed: 0,id_socio_encoded,id_prestador_encoded,calificacion_experiencia_encoded
0,100000,200000,0.666667
1,100000,200000,0.555556
2,100002,200000,0.666667
3,100010,200000,0.666667
4,100017,200000,0.666667
...,...,...,...
2490,101482,200191,0.111111
2491,101698,200191,0.000000
2492,101698,200191,0.111111
2493,101861,200191,0.111111


Tenemos varias calificaciones de un mismo socio para el mismo prestador, por lo cual calculamos el promedio de todas ellas.

In [8]:
df_filtrado[(df_filtrado.id_socio_encoded==100000) & (df_filtrado.id_prestador_encoded==200000)]

Unnamed: 0,id_socio_encoded,id_prestador_encoded,calificacion_experiencia_encoded
0,100000,200000,0.666667
1,100000,200000,0.555556


In [11]:
df_sr=df_filtrado.groupby(['id_socio_encoded','id_prestador_encoded'])['calificacion_experiencia_encoded'].mean().reset_index()
df_sr.head()

Unnamed: 0,id_socio_encoded,id_prestador_encoded,calificacion_experiencia_encoded
0,100000,200000,0.611111
1,100001,200001,0.777778
2,100002,200000,0.666667
3,100003,200002,0.888889
4,100004,200003,0.222222


In [12]:
df_pivot = df_sr.pivot_table(columns='id_socio_encoded', index='id_prestador_encoded', values="calificacion_experiencia_encoded")
df_pivot.fillna(0, inplace=True)

In [13]:
df_pivot

id_socio_encoded,100000,100001,100002,100003,100004,100006,100007,100008,100010,100011,...,101882,101884,101885,101893,101895,101902,101906,101909,101910,101919
id_prestador_encoded,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
200000,0.611111,0.000000,0.666667,0.000000,0.000000,0.000000,0.000000,0.0,0.666667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200001,0.000000,0.777778,0.000000,0.000000,0.000000,0.888889,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200002,0.000000,0.000000,0.000000,0.888889,0.000000,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200003,0.000000,0.000000,0.000000,0.000000,0.222222,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200005,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.666667,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200185,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200187,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200188,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
200191,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(algorithm='brute')
model.fit(df_pivot)

NearestNeighbors(algorithm='brute')

In [23]:
distances, suggestions = model.kneighbors(df_pivot.iloc[100000, :].values.reshape(1, -1))

IndexError: single positional indexer is out-of-bounds

In [22]:
for i in range(len(suggestions)):
  print(df_pivot.index[suggestions[i]])

Int64Index([200000, 200188, 200184, 200167, 200137], dtype='int64', name='id_prestador_encoded')
