# Análisis de empresas

Para llevar a cabo este proyecto se parte de un dataset extraido de Kaggle **(datasets/davidgauthier/glassdoor-job-reviews)**. Pese a ello, la primera idea para elaborar el proyecto era realizar WebScraping de Glassdoor o Indeed para extraer este tipo de información. 

Considerando los términos y condiciones de dichas webs, he preferido trabajar con un dataset de Kaggle que me ofreciera la misma información sin incurrir problemas legales.

**Nota:** No se tiene confirmación de que las valoraciones sean creadas por empleados o exempleados de la empresa.

In [None]:
%pip install matplotlib

In [1]:
# Carga de librerias

import pandas as pd 
import numpy as np
import os
#import matplotlib as plt

In [93]:
# Almacenamos el csv con pandas en df
ruta = os.path.join("glassdoor_reviews.csv")
df = pd.read_csv(ruta)
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,diversity_inclusion,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
0,AFH-Wealth-Management,2015-04-05,,Current Employee,,2,4.0,3.0,,2.0,3.0,3.0,x,o,r,"Young colleagues, poor micro management",Very friendly and welcoming to new staff. Easy...,"Poor salaries, poor training and communication."
1,AFH-Wealth-Management,2015-12-11,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",2,3.0,1.0,,2.0,1.0,4.0,x,o,r,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...
2,AFH-Wealth-Management,2016-01-28,Office Administrator,"Current Employee, less than 1 year","Bromsgrove, England, England",1,1.0,1.0,,1.0,1.0,1.0,x,o,x,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very..."
3,AFH-Wealth-Management,2016-04-16,,Current Employee,,5,2.0,3.0,,2.0,2.0,3.0,x,o,r,Over promised under delivered,Nice staff to work with,No career progression and salary is poor
4,AFH-Wealth-Management,2016-04-23,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",1,2.0,1.0,,2.0,1.0,1.0,x,o,x,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,"Current Employee, more than 5 years","München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,4.0,v,v,v,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations
838562,the-LEGO-Group,2021-06-03,Sales Associate,"Current Employee, less than 1 year","London, England, England",3,,,,,,,o,o,o,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,5.0,3.0,5.0,3.0,v,o,o,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,"Current Employee, less than 1 year",,5,,,,,,,o,o,o,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ..."


### Comprensión de variables:

Las variables `recommend` , `ceo_approv` y `outlook` tienen asignado un valor `v/r/x/o`, referidos a: v-Positivo, r-Moderado, x-Negativo y o-Sin opinión.

La variable `date_review` aparece como tipo `object` por lo que podemos transformalos a tipo `time`. Esto nos servíra tanto ahora como en fases posteriores de reporte.

Podríamos pensar que `current` debería ser tipo `bool`, dado que contiene texto adicional por el momento seguiremos trabajando con el como `object`.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 838566 entries, 0 to 838565
Data columns (total 18 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   firm                 838566 non-null  object 
 1   date_review          838566 non-null  object 
 2   job_title            838566 non-null  object 
 3   current              838566 non-null  object 
 4   location             541223 non-null  object 
 5   overall_rating       838566 non-null  int64  
 6   work_life_balance    688672 non-null  float64
 7   culture_values       647193 non-null  float64
 8   diversity_inclusion  136066 non-null  float64
 9   career_opp           691065 non-null  float64
 10  comp_benefits        688484 non-null  float64
 11  senior_mgmt          682690 non-null  float64
 12  recommend            838566 non-null  object 
 13  ceo_approv           838566 non-null  object 
 14  outlook              838566 non-null  object 
 15  headline         

Como podemos apreciar, de las 18 variables unicamente 7 son formato numérico.

-  `overall-rating` represente la puntuación general que le dan a la empresa.
-  `work_life_balanca` representa el grado de conciliación entre la vida laboral y personal.
-  `culture_values` representa la valoración de la cultura y los valores de la empresa.
-  `diversity_inclusion` representa la diversidad e inclusión de la empresa.
-  `career_opp` hace referencia a la proyección profesional.
-  `comp_beneficts` hace referencia a la remuneración y los beneficios obtenidos.
-  `senior_mgmt` hace referencia a la dirección ejecutiva de la empresa.

Dado que todas las puntuaciones que se dan van de 1-5 en formato entero, podríamos cambiar el tipo de las variables `float` a `int`.

### Análisis exploratorio de Datos

Primero vamos a identificar los valores faltantes y en base al número e impacto aplicaremos un tratamiento u otro.



In [79]:
# Verificamos los nulos de cada columna
df.isnull().sum()

firm                        0
date_review                 0
job_title                   0
current                     0
location               297343
overall_rating              0
work_life_balance      149894
culture_values         191373
diversity_inclusion    702500
career_opp             147501
comp_benefits          150082
senior_mgmt            155876
recommend                   0
ceo_approv                  0
outlook                     0
headline                 2590
pros                        2
cons                       13
dtype: int64

Como podemos ver la variable **diversity_inclusion** tiene cerca del 80 % de valores nulos. Por lo que sería complicado imputar correctamente valores nuevos considerando los ya existentes.

En lugar de esto, lo que harémos será eliminar esta columna. De este modo, estamos prescindiendo de una variable importante para algunos posibles análisis que se podrían realizar a futuro, pero estamos asegurando no contaminar los datos con valores imaginarios.

In [94]:
# Eliminamos la columna diversity_inclusion
df.drop(columns=["diversity_inclusion"], inplace=True)


In [95]:
# Explorando el dataset podemos ver que hay valores en blanco que no aparecen como Missing, lo que podemos llamar "valores vacios".
df["job_title"].unique()

array([' ', ' Office Administrator', ' IFA', ...,
       ' Seasonal Ride Operator/Attendant', ' Service Employee',
       ' Senior Experience Designer'], shape=(62275,), dtype=object)

In [96]:
# Esta es una forma de representar los valores que tengan en su variable "job_title" == ' '
df[df["job_title"]== ' ']
# Tenemos 79065 entradas sin especificar el puesto, entorno al 9.5 %.

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
0,AFH-Wealth-Management,2015-04-05,,Current Employee,,2,4.0,3.0,2.0,3.0,3.0,x,o,r,"Young colleagues, poor micro management",Very friendly and welcoming to new staff. Easy...,"Poor salaries, poor training and communication."
3,AFH-Wealth-Management,2016-04-16,,Current Employee,,5,2.0,3.0,2.0,2.0,3.0,x,o,r,Over promised under delivered,Nice staff to work with,No career progression and salary is poor
66,AJ-Bell,2015-07-01,,"Former Employee, more than 3 years",,3,4.0,1.0,2.0,2.0,2.0,x,v,x,Average company,Good team work\r\nLife / work balance,No development\r\nLack of leadership\r\nPoor l...
71,AJ-Bell,2016-05-23,,Former Employee,,1,,,,,,o,o,o,Tunbridge Wells office - ONLY good for first-j...,Great Experience for 18months - 2 years if str...,Pay in the Tunbridge Wells office matches the ...
74,AJ-Bell,2016-08-04,,Current Employee,,3,3.0,3.0,3.0,2.0,2.0,x,v,v,Pensions Administrator,"People make the place, the best Manager I've h...","Management issues, salary is lower than averag..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838070,the-LEGO-Group,2017-11-23,,Current Employee,"San Mateo, CA",3,4.0,4.0,1.0,2.0,2.0,x,r,v,"Great team, cooperate inconsistent",People you work with are great. \nGreat produc...,District Management and higher tell one story ...
838115,the-LEGO-Group,2018-03-26,,Former Employee,,5,3.0,5.0,4.0,2.0,4.0,v,v,r,Associate,"Great environment, Team Friendly, Never a dull...","Long hours, sometimes bad management"
838171,the-LEGO-Group,2018-10-19,,Current Employee,,4,,,,,,o,o,o,Supply planning manager,"Flexible, good working environment","Work life balance, Manuel system"
838177,the-LEGO-Group,2018-11-12,,Current Employee,,5,4.0,5.0,3.0,4.0,4.0,v,v,v,Great Culture,Amazing culture & very enjoyable place to work.,Maybe a bit stressful to keep a smile for more...


En este punto, se empiezan a abrir distintas posibilidades para tratar los datos faltantes. Dado que tenemos cerca de 850 mil datos y en varias columnas valores Missing:

1) Eliminar las filas con datos Missing
2) Interpolar valores (si tenemos una relación temporal)
3) Imputar los datos faltantes (esto altera las estadísticas de las variables) 

In [97]:
df["location"].unique()

# Localizamos 14487 localizaciones distintas, incluyendo NAN.

array([nan, 'Bromsgrove, England, England', 'Century City, CA', ...,
       'Vejle', 'Lainate', 'Wijnegem, Antwerp'],
      shape=(14487,), dtype=object)

Como podemos comprobar extrayendo las localizaciones únicas, podemos encontrar varios tipos de "escrituras":

- Localización vacía (NAN)
- Localización de: Distrito, Pais, Pais
- Localización de: Municipio
- Localización de: Municipio, Provincia 

In [98]:
df[df["location"].isna()]

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
0,AFH-Wealth-Management,2015-04-05,,Current Employee,,2,4.0,3.0,2.0,3.0,3.0,x,o,r,"Young colleagues, poor micro management",Very friendly and welcoming to new staff. Easy...,"Poor salaries, poor training and communication."
3,AFH-Wealth-Management,2016-04-16,,Current Employee,,5,2.0,3.0,2.0,2.0,3.0,x,o,r,Over promised under delivered,Nice staff to work with,No career progression and salary is poor
5,AFH-Wealth-Management,2016-05-26,Office Administrator,"Current Employee, less than 1 year",,3,4.0,2.0,2.0,3.0,2.0,o,r,r,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...
8,AFH-Wealth-Management,2016-11-03,Anonymous Employee,"Former Employee, more than 1 year",,4,4.0,4.0,4.0,4.0,4.0,v,o,v,I liked working for AFH,"Nice Staff, good HR Team.\r\nFeels vibrant and...",Can't really think of any obvious cons
12,AFH-Wealth-Management,2017-05-15,Quality Control Administrator,"Current Employee, less than 1 year",,4,5.0,4.0,5.0,4.0,4.0,v,v,v,Good Place To Work,Everyone is friendly and there is always a bri...,No cons as such.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838551,the-LEGO-Group,2021-05-18,Business Process Manager,Current Employee,,4,4.0,5.0,2.0,3.0,4.0,v,v,v,Working @LEGO,Conducive environment\r\nStrong and friendly c...,Too many meetings\r\nToo much democracy leadin...
838554,the-LEGO-Group,2021-05-19,Seasonal Ride Operator/Attendant,Former Employee,,3,2.0,3.0,2.0,2.0,2.0,o,o,o,Fun experience but poorly paid and poorly managed,Fun experience with great socials and I have m...,Low pay and short breaks considering the respo...
838557,the-LEGO-Group,2021-05-24,Service Employee,Current Employee,,1,,,,,,o,o,o,.,"Ok work, discount on Lego sets","Management is bad, and more"
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,v,o,o,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...


De este modo podemos ver como `location` tiene casi 1/3 de los datos nulos

In [99]:
df[(df["job_title"]== ' ') & (df["location"].isna())]

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
0,AFH-Wealth-Management,2015-04-05,,Current Employee,,2,4.0,3.0,2.0,3.0,3.0,x,o,r,"Young colleagues, poor micro management",Very friendly and welcoming to new staff. Easy...,"Poor salaries, poor training and communication."
3,AFH-Wealth-Management,2016-04-16,,Current Employee,,5,2.0,3.0,2.0,2.0,3.0,x,o,r,Over promised under delivered,Nice staff to work with,No career progression and salary is poor
66,AJ-Bell,2015-07-01,,"Former Employee, more than 3 years",,3,4.0,1.0,2.0,2.0,2.0,x,v,x,Average company,Good team work\r\nLife / work balance,No development\r\nLack of leadership\r\nPoor l...
71,AJ-Bell,2016-05-23,,Former Employee,,1,,,,,,o,o,o,Tunbridge Wells office - ONLY good for first-j...,Great Experience for 18months - 2 years if str...,Pay in the Tunbridge Wells office matches the ...
74,AJ-Bell,2016-08-04,,Current Employee,,3,3.0,3.0,3.0,2.0,2.0,x,v,v,Pensions Administrator,"People make the place, the best Manager I've h...","Management issues, salary is lower than averag..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838065,the-LEGO-Group,2017-11-13,,Former Employee,,5,,,,,,o,o,o,. . . . .,Excellent collegues and work culture,Difficult to progress career wise
838115,the-LEGO-Group,2018-03-26,,Former Employee,,5,3.0,5.0,4.0,2.0,4.0,v,v,r,Associate,"Great environment, Team Friendly, Never a dull...","Long hours, sometimes bad management"
838171,the-LEGO-Group,2018-10-19,,Current Employee,,4,,,,,,o,o,o,Supply planning manager,"Flexible, good working environment","Work life balance, Manuel system"
838177,the-LEGO-Group,2018-11-12,,Current Employee,,5,4.0,5.0,3.0,4.0,4.0,v,v,v,Great Culture,Amazing culture & very enjoyable place to work.,Maybe a bit stressful to keep a smile for more...


Con el condicional mostrado arriba podemos comprobar como 1/5 aproximadamente de los datos en los que no tenemos la `location`, tampoco tenemos el `job_title`. Es decir, de esos 297.343 datos nulos en `location`, 63.647 también son nulos en `job_title`.

Por lo que podemos dropear estas filas concretamente y volver a evaluar los valores faltantes.

In [100]:
# Eliminamos las filas que cumplen el condicional y guardamos el DataFrame

df = df.drop(df[(df["job_title"] == ' ') & (df["location"].isna())].index)
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
1,AFH-Wealth-Management,2015-12-11,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",2,3.0,1.0,2.0,1.0,4.0,x,o,r,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...
2,AFH-Wealth-Management,2016-01-28,Office Administrator,"Current Employee, less than 1 year","Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,x,o,x,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very..."
4,AFH-Wealth-Management,2016-04-23,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",1,2.0,1.0,2.0,1.0,1.0,x,o,x,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr..."
5,AFH-Wealth-Management,2016-05-26,Office Administrator,"Current Employee, less than 1 year",,3,4.0,2.0,2.0,3.0,2.0,o,r,r,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...
6,AFH-Wealth-Management,2016-09-23,IFA,Former Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,x,o,r,It horrible management,Good investment management strategy. Overall t...,The management and seniors are ruthless. No tr...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,"Current Employee, more than 5 years","München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,v,v,v,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations
838562,the-LEGO-Group,2021-06-03,Sales Associate,"Current Employee, less than 1 year","London, England, England",3,,,,,,o,o,o,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,v,o,o,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,"Current Employee, less than 1 year",,5,,,,,,o,o,o,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ..."


Como podemos ver seguimos teniendo valores *Missing* y valores *Vacios*. Lo que podemos hacer ahora es descomponer la `location` creando otras columnas. De este modo, en base a la `firm` y la `location` del pais podemos imputar los datos faltantes.

- 

Tambien podemos descomponer la variable `current` en el tiempo y si sigue trabajando o no. 

- Si sigue trabajando o no podemos interpretarlo como una **variable cualitativa** donde *empleado actual = 1* y *ex-empleado = 0*.
- El tiempo que ha estado trabajando en la empresa podemos separarlo en rangos. Además de esto, debemos considerar que hay empleados que siguen trabajando en la empresa pero desconocemos el tiempo que llevan. Para estos empleados crearemos una etiqueta **Desconocido** sustituyendo los valores *Missing*.

In [101]:
# Sacamos valores estadisticos relevantes de las variables numéricas
df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
overall_rating,774919.0,3.660586,1.178852,1.0,3.0,4.0,5.0,5.0
work_life_balance,644957.0,3.379162,1.309625,1.0,2.0,4.0,4.0,5.0
culture_values,610386.0,3.59429,1.325665,1.0,3.0,4.0,5.0,5.0
career_opp,647107.0,3.466394,1.27432,1.0,3.0,4.0,5.0,5.0
comp_benefits,644847.0,3.40266,1.221342,1.0,3.0,4.0,4.0,5.0
senior_mgmt,639730.0,3.178555,1.33468,1.0,2.0,3.0,4.0,5.0


### Ingeniería de características:

Considerando las variables `recommend`, `ceo_approv` y `outlook`. Podemos crear un puntuaje en base a dichas valoraciones, calcular la media y en base a esta clasificar las opiniones de los "encuestados".

Estas variables hacen referencia a: la recomendarías a un amigo, la valoración del rendimiento laboral del ceo y las perspectivas de la empresa en 6 meses respectivamente. 

Para ello, traduciremos `v/r/x/o` en +2/+1/-1/0 puntos respectivamente. Tras esto, calcularemos la puntuación media en una nueva variable `avg_score` (la máxima puntuación a la que se puede optar es de 2 puntos y la mínima de -1).

In [102]:
# Tranformamos simbolos a valores numericos

df["recommend"] = df["recommend"].map({"v":2, "r":1, "x":-1, "o":0})
df["ceo_approv"] = df["ceo_approv"].map({"v":2, "r":1, "x":-1, "o":0})
df["outlook"] = df["outlook"].map({"v":2, "r":1, "x":-1, "o":0})
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons
1,AFH-Wealth-Management,2015-12-11,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",2,3.0,1.0,2.0,1.0,4.0,-1,0,1,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...
2,AFH-Wealth-Management,2016-01-28,Office Administrator,"Current Employee, less than 1 year","Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,-1,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very..."
4,AFH-Wealth-Management,2016-04-23,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",1,2.0,1.0,2.0,1.0,1.0,-1,0,-1,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr..."
5,AFH-Wealth-Management,2016-05-26,Office Administrator,"Current Employee, less than 1 year",,3,4.0,2.0,2.0,3.0,2.0,0,1,1,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...
6,AFH-Wealth-Management,2016-09-23,IFA,Former Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,1,It horrible management,Good investment management strategy. Overall t...,The management and seniors are ruthless. No tr...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,"Current Employee, more than 5 years","München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,2,2,2,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations
838562,the-LEGO-Group,2021-06-03,Sales Associate,"Current Employee, less than 1 year","London, England, England",3,,,,,,0,0,0,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,2,0,0,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,"Current Employee, less than 1 year",,5,,,,,,0,0,0,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ..."


In [103]:
# Creamos la variable relativa a la media sobre la compañia
df["avg_score_company"] = ((df["recommend"] + df["ceo_approv"] + df["outlook"])/3).round(2)
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons,avg_score_company
1,AFH-Wealth-Management,2015-12-11,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",2,3.0,1.0,2.0,1.0,4.0,-1,0,1,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...,0.00
2,AFH-Wealth-Management,2016-01-28,Office Administrator,"Current Employee, less than 1 year","Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,-1,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very...",-0.67
4,AFH-Wealth-Management,2016-04-23,Office Administrator,"Current Employee, more than 1 year","Bromsgrove, England, England",1,2.0,1.0,2.0,1.0,1.0,-1,0,-1,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr...",-0.67
5,AFH-Wealth-Management,2016-05-26,Office Administrator,"Current Employee, less than 1 year",,3,4.0,2.0,2.0,3.0,2.0,0,1,1,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...,0.67
6,AFH-Wealth-Management,2016-09-23,IFA,Former Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,1,It horrible management,Good investment management strategy. Overall t...,The management and seniors are ruthless. No tr...,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,"Current Employee, more than 5 years","München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,2,2,2,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations,2.00
838562,the-LEGO-Group,2021-06-03,Sales Associate,"Current Employee, less than 1 year","London, England, England",3,,,,,,0,0,0,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...,0.00
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,2,0,0,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...,0.67
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,"Current Employee, less than 1 year",,5,,,,,,0,0,0,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ...",0.00


In [104]:
# Split de las variables

df[["current", "current_time"]] = df["current"].str.split(",", expand=True)
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons,avg_score_company,current_time
1,AFH-Wealth-Management,2015-12-11,Office Administrator,Current Employee,"Bromsgrove, England, England",2,3.0,1.0,2.0,1.0,4.0,-1,0,1,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...,0.00,more than 1 year
2,AFH-Wealth-Management,2016-01-28,Office Administrator,Current Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,-1,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very...",-0.67,less than 1 year
4,AFH-Wealth-Management,2016-04-23,Office Administrator,Current Employee,"Bromsgrove, England, England",1,2.0,1.0,2.0,1.0,1.0,-1,0,-1,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr...",-0.67,more than 1 year
5,AFH-Wealth-Management,2016-05-26,Office Administrator,Current Employee,,3,4.0,2.0,2.0,3.0,2.0,0,1,1,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...,0.67,less than 1 year
6,AFH-Wealth-Management,2016-09-23,IFA,Former Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,1,It horrible management,Good investment management strategy. Overall t...,The management and seniors are ruthless. No tr...,0.00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,Current Employee,"München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,2,2,2,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations,2.00,more than 5 years
838562,the-LEGO-Group,2021-06-03,Sales Associate,Current Employee,"London, England, England",3,,,,,,0,0,0,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...,0.00,less than 1 year
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,2,0,0,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...,0.67,
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,Current Employee,,5,,,,,,0,0,0,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ...",0.00,less than 1 year


In [107]:
#df[["location", "country", "country_0"]] = df["location"].str.split(",", n=1, expand=True)
df

Unnamed: 0,firm,date_review,job_title,current,location,overall_rating,work_life_balance,culture_values,career_opp,comp_benefits,senior_mgmt,recommend,ceo_approv,outlook,headline,pros,cons,avg_score_company,current_time
1,AFH-Wealth-Management,2015-12-11,Office Administrator,Current Employee,"Bromsgrove, England, England",2,3.0,1.0,2.0,1.0,4.0,-1,0,1,"Excellent staff, poor salary","Friendly, helpful and hard-working colleagues",Poor salary which doesn't improve much with pr...,0.00,more than 1 year
2,AFH-Wealth-Management,2016-01-28,Office Administrator,Current Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,-1,"Low salary, bad micromanagement",Easy to get the job even without experience in...,"Very low salary, poor working conditions, very...",-0.67,less than 1 year
4,AFH-Wealth-Management,2016-04-23,Office Administrator,Current Employee,"Bromsgrove, England, England",1,2.0,1.0,2.0,1.0,1.0,-1,0,-1,client reporting admin,"Easy to get the job, Nice colleagues.","Abysmal pay, around minimum wage. No actual tr...",-0.67,more than 1 year
5,AFH-Wealth-Management,2016-05-26,Office Administrator,Current Employee,,3,4.0,2.0,2.0,3.0,2.0,0,1,1,Office administrator,Some good people to work with.\n\nFlexible wor...,Morale.\n\nLack of managerial structure.\n\nDo...,0.67,less than 1 year
6,AFH-Wealth-Management,2016-09-23,IFA,Former Employee,"Bromsgrove, England, England",1,1.0,1.0,1.0,1.0,1.0,-1,0,1,It horrible management,Good investment management strategy. Overall t...,The management and seniors are ruthless. No tr...,0.00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
838561,the-LEGO-Group,2021-06-02,Marketing Manager,Current Employee,"München, Bavaria, Bavaria",5,4.0,5.0,4.0,4.0,4.0,2,2,2,Just an awesome company to work for!!!,"Great company values, awesome product, smart c...",Not very easy to transfer to other locations,2.00,more than 5 years
838562,the-LEGO-Group,2021-06-03,Sales Associate,Current Employee,"London, England, England",3,,,,,,0,0,0,working at lego,staff discount is really nice,micro managing is a hassle\r\ncan become menta...,0.00,less than 1 year
838563,the-LEGO-Group,2021-06-03,Strategist,Current Employee,,4,5.0,5.0,3.0,5.0,3.0,2,0,0,not interested in growing their people,loved brand for a lot of people,you can spend 6-10 years without any promotion...,0.67,
838564,the-LEGO-Group,2021-06-04,Customer Service Representative,Current Employee,,5,,,,,,0,0,0,Great Place to Work,"Good wages, good hours, lots of resources","Working every other weekend, busy seasons can ...",0.00,less than 1 year


### Gráficos

### Modelado

Previamente debemos definir las líneas de trabajo del proyecto. Dado los datos, 

Podemos usar las variables `headline`, `pros` y `cons` en un NLP.

Si tenemos bastantes datos de empleados que continuan dentro de la empresa: ¿podemos predecir el % de abandono? 
