### TRATAMIENTO

A continuación, se va a hacer la gestión de nulos de nuestros datos una vez que se han eliminado las columnas redundantes en base al análisis hecho de outliers y nulos, y una vez que se han eliminado registros duplicados. También se va a llevar a cabo la normalización de las variables.

In [26]:
# importamos las librerías que necesitamos

# Tratamiento de datos
import pandas as pd
import numpy as np
from IPython.display import display


# Librerías de visualización
import seaborn as sns
import matplotlib.pyplot as plt

In [27]:
# ver todas las columnas
pd.set_option('display.max_columns', None)

In [28]:
# ver todas las filas
pd.set_option('display.max_rows', None)

In [38]:
# Cargamos el csv

df = pd.read_csv("datos_limpios.csv", index_col = 0)

#### GESTIÓN DE NULOS

### ✨ Plan de normalización y limpieza de columnas

🖥️ **Columna `remotework`**  
- 🔄 Normalizar valores a `yes` / `no`.

---

👫 **Columna `gender`**  
- 🔄 Normalizar valores a `M` / `F`.

---

🔤 **Columnas categóricas (texto)**  
- 📝 Corregir fallos tipográficos (ejemplo: combinación de mayúsculas y minúsculas).  
- 🔠 Capitalizar la primera letra de cada valor.  
- ⚙️ Implementar una función que recorra todas las columnas categóricas para normalizar texto.

---

🔢 **Columnas numéricas**  
- 🔧 Corregir fallos tipográficos:  
  - ➡️ Reemplazar `,` por `.` en valores numéricos.  
  - 🎯 Redondear decimales a 2 dígitos.  
- ⚙️ Implementar una función que recorra todas las columnas numéricas y normalice sus valores.

---

📏 **Columna `distancefromhome`**  
- 🔄 Convertir valores negativos a valor absoluto.

---

🔍 **Detección y corrección de valores mal escritos en categóricas**  
- 👀 Identificar valores con `unique()` durante el EDA.  
- ✏️ Reemplazar cada error con la palabra correctamente escrita.  
  - Ejemplo: `marreid` → `married`.

---

🏷️ **Nombres de columnas**  
- ✨ Renombrar y normalizar: todas las columnas con formato `.title`.  
  - Ejemplo: `distancefromhome` → `Distancefromhome`.

In [39]:
# Normalizar la columna 'remotework' a valores yes/no:

#1.Inspeccionamos primero los valores únicos
df['remotework'].unique()

array(['Yes', '1', 'False', '0', 'True'], dtype=object)

In [40]:
#2. Normalizamos 
df['remotework'] = (
    df['remotework']
    .astype(str)       # convertir a texto
    .str.strip()       # quitar espacios
    .str.lower()       # pasar a minúsculas
    .replace({         # mapear valores conocidos
        'yes': 'yes',
        'true': 'yes',
        '1': 'yes',
        'false': 'no',
        '0': 'no'
    })
)

#3.Verificamos el resultado final
print(df['remotework'].value_counts())


remotework
yes    1000
no      614
Name: count, dtype: int64


In [41]:
# Normalizar la columna gender a valores M/F:

#1.Inspeccionamos primero los valores únicos
print(df['gender'].unique())

[0 1]


In [42]:
#2. Normalizamos 
df['gender'] = (
    df['gender']
    .astype(str)        # aseguramos texto
    .str.strip()        # quitamos espacios
    .str.lower()        # pasamos a minúsculas
    .replace({
        'male': 'M',
        'm': 'M',
        '1': 'M',
        'hombre': 'M',
        'man': 'M',
        'female': 'F',
        'f': 'F',
        '0': 'F',
        'mujer': 'F',
        'woman': 'F'
    })
)

#3. Verificamos el resultado
print(df['gender'].value_counts())

gender
F    971
M    643
Name: count, dtype: int64


In [43]:
# Creamos una función para normalizar todas las columnas categóricas

def normalizar_categoricas(df):
    """
    Normaliza columnas categóricas de df:
      - elimina espacios al inicio y al final
      - reemplaza '_' por espacio
      - corrige combinación de mayúsculas/minúsculas
      - usa Title Case en las columnas que lo requieren
      - mantiene NaN intactos
    """
    import pandas as pd

    # Creamos un diccionario para las columnas que deben tener cada valor/palabra capitalizada (Title Case)
    cols_title = ['department', 'jobrole']

    for col in df.select_dtypes(include=['object', 'category']).columns:
        s = df[col]
        was_categorical = pd.api.types.is_categorical_dtype(s)

        # Máscara para no tocar valores nulos
        mask = s.notna()

        # Limpieza común
        temp = (
            s.loc[mask]
             .astype(str)
             .str.strip()           # quitar espacios al inicio/final
             .str.replace('_', ' ') # reemplazar '_' por espacio
             .str.lower()           # todo a minúsculas
        )

        # Ajuste de capitalización
        if col.lower() in cols_title:
            temp = temp.str.title()         # cada palabra con mayúscula
        else:
            temp = temp.str.capitalize()    # sólo la primera letra

        df.loc[mask, col] = temp

        if was_categorical:
            df[col] = df[col].astype('category')

    return df

# ✅ Uso: aplicar y guardar la función sobre df_limpio
df = normalizar_categoricas(df)

  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)
  was_categorical = pd.api.types.is_categorical_dtype(s)


In [44]:
df.head()

Unnamed: 0,attrition,businesstravel,dailyrate,department,distancefromhome,education,educationfield,employeenumber,environmentsatisfaction,gender,jobinvolvement,joblevel,jobrole,jobsatisfaction,maritalstatus,monthlyincome,monthlyrate,numcompaniesworked,overtime,percentsalaryhike,performancerating,relationshipsatisfaction,standardhours,stockoptionlevel,totalworkingyears,trainingtimeslastyear,worklifebalance,yearsatcompany,yearssincelastpromotion,yearswithcurrmanager,datebirth,salary,remotework
0,No,,2015.722222,,6,3,,1,1,F,3,5,Research Director,3,,"16280,83$","42330,17$",7,No,13,30,3,Full time,0,,5,30.0,20,15,15,1972,"195370,00$",Yes
1,No,,2063.388889,,1,4,Life sciences,2,3,F,2,5,Manager,3,,,"43331,17$",0,,14,30,1,,1,340.0,5,30.0,33,11,9,1971,"199990,00$",Yes
2,No,Travel rarely,1984.253968,Research & Development,4,2,Technical degree,3,3,F,3,5,Manager,4,Married,,"41669,33$",1,No,11,30,4,,0,220.0,3,,22,11,15,1981,"192320,00$",Yes
3,No,Travel rarely,1771.404762,,2,4,Medical,4,1,M,3,4,Research Director,3,Married,"14307,50$","37199,50$",3,,19,30,2,Full time,2,,2,,20,5,6,1976,"171690,00$",No
4,No,,1582.771346,,3,3,Technical degree,5,1,M,4,4,Sales Executive,1,Divorced,"12783,92$","33238,20$",2,No,12,30,4,,1,,5,30.0,19,2,8,1977,,No


In [45]:
# Creamos una función para renombrar todas las columnas
import re

def normalizar_nombres_columnas(df):
    """
    Normaliza nombres de columnas:
      - Inserta '_' entre palabras (detecta camelCase y letras/números)
      - Capitaliza la primera letra de cada palabra
    """
    nuevas_columnas = []

    for col in df.columns:
        # 1. Elimina espacios al inicio y final
        col = col.strip()

        # 2. Inserta '_' entre palabras (camelCase o letras/números)
        col = re.sub(r'(?<=[a-z])(?=[A-Z])', '_', col)     # minúscula→Mayúscula
        col = re.sub(r'(?<=[a-zA-Z])(?=[0-9])', '_', col)  # letra→dígito
        col = re.sub(r'(?<=[0-9])(?=[a-zA-Z])', '_', col)  # dígito→letra

        # 3. Pasar a minúsculas
        col = col.lower()

        # 4. Capitalizar la primera letra de cada palabra
        partes = col.split('_')
        partes = [p.capitalize() for p in partes]

        # 5. Volver a unir con '_'
        nuevo_nombre = '_'.join(partes)

        nuevas_columnas.append(nuevo_nombre)

    df.columns = nuevas_columnas
    return df


# Uso: aplicar y guardar la función sobre df_limpio
df = normalizar_nombres_columnas(df)


In [46]:
df.head()

Unnamed: 0,Attrition,Businesstravel,Dailyrate,Department,Distancefromhome,Education,Educationfield,Employeenumber,Environmentsatisfaction,Gender,Jobinvolvement,Joblevel,Jobrole,Jobsatisfaction,Maritalstatus,Monthlyincome,Monthlyrate,Numcompaniesworked,Overtime,Percentsalaryhike,Performancerating,Relationshipsatisfaction,Standardhours,Stockoptionlevel,Totalworkingyears,Trainingtimeslastyear,Worklifebalance,Yearsatcompany,Yearssincelastpromotion,Yearswithcurrmanager,Datebirth,Salary,Remotework
0,No,,2015.722222,,6,3,,1,1,F,3,5,Research Director,3,,"16280,83$","42330,17$",7,No,13,30,3,Full time,0,,5,30.0,20,15,15,1972,"195370,00$",Yes
1,No,,2063.388889,,1,4,Life sciences,2,3,F,2,5,Manager,3,,,"43331,17$",0,,14,30,1,,1,340.0,5,30.0,33,11,9,1971,"199990,00$",Yes
2,No,Travel rarely,1984.253968,Research & Development,4,2,Technical degree,3,3,F,3,5,Manager,4,Married,,"41669,33$",1,No,11,30,4,,0,220.0,3,,22,11,15,1981,"192320,00$",Yes
3,No,Travel rarely,1771.404762,,2,4,Medical,4,1,M,3,4,Research Director,3,Married,"14307,50$","37199,50$",3,,19,30,2,Full time,2,,2,,20,5,6,1976,"171690,00$",No
4,No,,1582.771346,,3,3,Technical degree,5,1,M,4,4,Sales Executive,1,Divorced,"12783,92$","33238,20$",2,No,12,30,4,,1,,5,30.0,19,2,8,1977,,No


Fallos tipográficos para columnas numéricas (redondeo de decimales a 2 dígitos, reemplazar ',' por '.') 

Función para normalizar el texto (bucle for para pasar por todas las columnas numéricas)

In [47]:
# 1. Identificamos columnas numéricas
columnas_numericas = df.select_dtypes(include="number").columns
print("Columnas numéricas:", list(columnas_numericas))

# 2. Definimos una función para normalizar números
def normalizar_numeros(columna):
    # Reemplazamos comas por puntos y convertimos a numérico
    columna = columna.astype(str).str.replace(",", ".", regex=False)
    columna = pd.to_numeric(columna, errors="coerce")  # convierte a número
    return columna.round(2)  # redondeamos a 2 decimales

# 3. Aplicamos la función a cada columna numérica
for col in columnas_numericas:
    df[col] = normalizar_numeros(df[col])

# 4. Revisamos las primeras filas
df[columnas_numericas].head()

Columnas numéricas: ['Dailyrate', 'Distancefromhome', 'Education', 'Employeenumber', 'Environmentsatisfaction', 'Jobinvolvement', 'Joblevel', 'Jobsatisfaction', 'Numcompaniesworked', 'Percentsalaryhike', 'Relationshipsatisfaction', 'Stockoptionlevel', 'Trainingtimeslastyear', 'Yearsatcompany', 'Yearssincelastpromotion', 'Yearswithcurrmanager', 'Datebirth']


Unnamed: 0,Dailyrate,Distancefromhome,Education,Employeenumber,Environmentsatisfaction,Jobinvolvement,Joblevel,Jobsatisfaction,Numcompaniesworked,Percentsalaryhike,Relationshipsatisfaction,Stockoptionlevel,Trainingtimeslastyear,Yearsatcompany,Yearssincelastpromotion,Yearswithcurrmanager,Datebirth
0,2015.72,6,3,1,1,3,5,3,7,13,3,0,5,20,15,15,1972
1,2063.39,1,4,2,3,2,5,3,0,14,1,1,5,33,11,9,1971
2,1984.25,4,2,3,3,3,5,4,1,11,4,0,3,22,11,15,1981
3,1771.4,2,4,4,1,3,4,3,3,19,2,2,2,20,5,6,1976
4,1582.77,3,3,5,1,4,4,1,2,12,4,1,5,19,2,8,1977


In [48]:

# Lista de columnas a limpiar 
columnas_a_limpiar = [
    "Totalworkingyears",
    "Performancerating",
    "Monthlyrate",
    "Monthlyincome",
    "Worklifebalance",
    "Salary"
]

# Recorremos cada columna y limpiamos
for col in columnas_a_limpiar:
    if col in df.columns:   # solo si la columna existe en el DataFrame
        # Convertimos a string, cambiamos coma por punto, quitamos símbolos raros
        df[col] = (df[col].astype(str)
                                        .str.replace(",", ".", regex=False)   # coma → punto
                                        .str.replace(r"[^\d\.\-]", "", regex=True))  # quitamos $ y letras
        
        # Convertimos a número y redondeamos a 2 decimales
        df[col] = pd.to_numeric(df[col], errors="coerce").round(2)

# Verificamos el resultado
df[columnas_a_limpiar].head()

Unnamed: 0,Totalworkingyears,Performancerating,Monthlyrate,Monthlyincome,Worklifebalance,Salary
0,,3.0,42330.17,16280.83,3.0,195370.0
1,34.0,3.0,43331.17,,3.0,199990.0
2,22.0,3.0,41669.33,,,192320.0
3,,3.0,37199.5,14307.5,,171690.0
4,,3.0,33238.2,12783.92,3.0,


In [49]:
df.head(30)

Unnamed: 0,Attrition,Businesstravel,Dailyrate,Department,Distancefromhome,Education,Educationfield,Employeenumber,Environmentsatisfaction,Gender,Jobinvolvement,Joblevel,Jobrole,Jobsatisfaction,Maritalstatus,Monthlyincome,Monthlyrate,Numcompaniesworked,Overtime,Percentsalaryhike,Performancerating,Relationshipsatisfaction,Standardhours,Stockoptionlevel,Totalworkingyears,Trainingtimeslastyear,Worklifebalance,Yearsatcompany,Yearssincelastpromotion,Yearswithcurrmanager,Datebirth,Salary,Remotework
0,No,,2015.72,,6,3,,1,1,F,3,5,Research Director,3,,16280.83,42330.17,7,No,13,3.0,3,Full time,0,,5,3.0,20,15,15,1972,195370.0,Yes
1,No,,2063.39,,1,4,Life sciences,2,3,F,2,5,Manager,3,,,43331.17,0,,14,3.0,1,,1,34.0,5,3.0,33,11,9,1971,199990.0,Yes
2,No,Travel rarely,1984.25,Research & Development,4,2,Technical degree,3,3,F,3,5,Manager,4,Married,,41669.33,1,No,11,3.0,4,,0,22.0,3,,22,11,15,1981,192320.0,Yes
3,No,Travel rarely,1771.4,,2,4,Medical,4,1,M,3,4,Research Director,3,Married,14307.5,37199.5,3,,19,3.0,2,Full time,2,,2,,20,5,6,1976,171690.0,No
4,No,,1582.77,,3,3,Technical degree,5,1,M,4,4,Sales Executive,1,Divorced,12783.92,33238.2,2,No,12,3.0,4,,1,,5,3.0,19,2,8,1977,,No
5,No,,1771.92,Research & Development,22,3,Medical,6,4,M,3,4,Manager,4,,14311.67,37210.33,3,No,11,3.0,2,,1,,3,3.0,22,4,7,1975,,Yes
6,No,,1032.49,,25,3,Life sciences,7,1,M,3,3,Sales Executive,1,,8339.32,21682.23,7,,11,3.0,4,Part time,0,28.0,3,2.0,21,7,9,1964,100071.84,Yes
7,No,Travel rarely,556.26,,1,1,,8,2,F,3,2,Sales Executive,3,Married,,11681.39,1,No,25,4.0,3,Part time,0,20.0,3,3.0,20,11,6,1981,53914.11,No
8,No,,1712.18,,2,5,,9,2,M,3,4,Manager,1,Married,13829.17,35955.83,7,No,16,3.0,2,Full time,1,22.0,2,3.0,18,11,8,1982,165950.0,Yes
9,No,Travel frequently,1973.98,,9,3,,10,1,F,3,5,Research Director,3,,15943.72,41453.67,2,No,17,3.0,2,,1,21.0,2,4.0,18,0,11,1982,,No


Columna 'distancefromhome' (viene con valores negativos) 

Función para convertir a valor absoluto

In [50]:
# 1. Definimos una función que convierte los valores a absolutos
def normalizar_distancia(columna):
    return columna.abs()   # abs() convierte -5 → 5, deja los positivos igual

# 2. Aplicamos la función a la columna 'distancefromhome'
df["Distancefromhome"] = normalizar_distancia(df["Distancefromhome"])

# 3. Revisamos las primeras filas para comprobar
print(df["Distancefromhome"].head(10))

0     6
1     1
2     4
3     2
4     3
5    22
6    25
7     1
8     2
9     9
Name: Distancefromhome, dtype: int64


In [51]:
df.head(30)

Unnamed: 0,Attrition,Businesstravel,Dailyrate,Department,Distancefromhome,Education,Educationfield,Employeenumber,Environmentsatisfaction,Gender,Jobinvolvement,Joblevel,Jobrole,Jobsatisfaction,Maritalstatus,Monthlyincome,Monthlyrate,Numcompaniesworked,Overtime,Percentsalaryhike,Performancerating,Relationshipsatisfaction,Standardhours,Stockoptionlevel,Totalworkingyears,Trainingtimeslastyear,Worklifebalance,Yearsatcompany,Yearssincelastpromotion,Yearswithcurrmanager,Datebirth,Salary,Remotework
0,No,,2015.72,,6,3,,1,1,F,3,5,Research Director,3,,16280.83,42330.17,7,No,13,3.0,3,Full time,0,,5,3.0,20,15,15,1972,195370.0,Yes
1,No,,2063.39,,1,4,Life sciences,2,3,F,2,5,Manager,3,,,43331.17,0,,14,3.0,1,,1,34.0,5,3.0,33,11,9,1971,199990.0,Yes
2,No,Travel rarely,1984.25,Research & Development,4,2,Technical degree,3,3,F,3,5,Manager,4,Married,,41669.33,1,No,11,3.0,4,,0,22.0,3,,22,11,15,1981,192320.0,Yes
3,No,Travel rarely,1771.4,,2,4,Medical,4,1,M,3,4,Research Director,3,Married,14307.5,37199.5,3,,19,3.0,2,Full time,2,,2,,20,5,6,1976,171690.0,No
4,No,,1582.77,,3,3,Technical degree,5,1,M,4,4,Sales Executive,1,Divorced,12783.92,33238.2,2,No,12,3.0,4,,1,,5,3.0,19,2,8,1977,,No
5,No,,1771.92,Research & Development,22,3,Medical,6,4,M,3,4,Manager,4,,14311.67,37210.33,3,No,11,3.0,2,,1,,3,3.0,22,4,7,1975,,Yes
6,No,,1032.49,,25,3,Life sciences,7,1,M,3,3,Sales Executive,1,,8339.32,21682.23,7,,11,3.0,4,Part time,0,28.0,3,2.0,21,7,9,1964,100071.84,Yes
7,No,Travel rarely,556.26,,1,1,,8,2,F,3,2,Sales Executive,3,Married,,11681.39,1,No,25,4.0,3,Part time,0,20.0,3,3.0,20,11,6,1981,53914.11,No
8,No,,1712.18,,2,5,,9,2,M,3,4,Manager,1,Married,13829.17,35955.83,7,No,16,3.0,2,Full time,1,22.0,2,3.0,18,11,8,1982,165950.0,Yes
9,No,Travel frequently,1973.98,,9,3,,10,1,F,3,5,Research Director,3,,15943.72,41453.67,2,No,17,3.0,2,,1,21.0,2,4.0,18,0,11,1982,,No


In [53]:
# Detectar valores únicos en columnas categóricas
columnas_categoricas = df.select_dtypes(include="object").columns

for col in columnas_categoricas:
    print(f"\nColumna: {col}")
    print(df[col].unique())


Columna: Attrition
['No' 'Yes']

Columna: Businesstravel
[nan 'Travel rarely' 'Travel frequently' 'Non-travel']

Columna: Department
[nan 'Research & Development' 'Sales' 'Human Resources']

Columna: Educationfield
[nan 'Life sciences' 'Technical degree' 'Medical' 'Other' 'Marketing'
 'Human resources']

Columna: Gender
['F' 'M']

Columna: Jobrole
['Research Director' 'Manager' 'Sales Executive' 'Manufacturing Director'
 'Research Scientist' 'Healthcare Representative' 'Laboratory Technician'
 'Sales Representative' 'Human Resources']

Columna: Maritalstatus
[nan 'Married' 'Divorced' 'Single' 'Marreid']

Columna: Overtime
['No' nan 'Yes']

Columna: Standardhours
['Full time' nan 'Part time']

Columna: Remotework
['Yes' 'No']


In [54]:

# 1. Seleccionamos solo columnas categóricas
columnas_categoricas = df.select_dtypes(include="object").columns

# 2. Mostramos valores únicos ANTES de corregir
print("=== VALORES ÚNICOS ANTES DE CORREGIR ===")
for col in columnas_categoricas:
    print(f"\nColumna: {col}")
    print(df[col].unique())

# 3. Correcciones manuales (ejemplos comunes, ajusta según tus datos)
correcciones = {
    "maritalstatus": {
        "Marreid": "Married",   # error tipográfico
        "Marrid": "Married"
    },
    "gender": {
        "Femal": "Female",
        "F": "Female",
        "M": "Male"
    },
    "jobrole": {
        "ManaGER": "Manager",
        "Sales Excecutive": "Sales Executive"
    },
    "department": {
        "Resarch & Development": "Research & Development"
    }
}

# 4. Aplicamos las correcciones
for col, reemplazos in correcciones.items():
    if col in df.columns:
        df[col] = df[col].replace(reemplazos)

# 5. Mostramos valores únicos DESPUÉS de corregir
print("\n=== VALORES ÚNICOS DESPUÉS DE CORREGIR ===")
for col in correcciones.keys():
    if col in df.columns:
        print(f"\nColumna: {col}")
        print(df[col].unique())

# 6. Vista rápida del DataFrame corregido
pd.set_option("display.max_columns", None)  # para ver todas las columnas
print("\n=== PRIMERAS FILAS DEL DF LIMPIO ===")
print(df.head(10))

=== VALORES ÚNICOS ANTES DE CORREGIR ===

Columna: Attrition
['No' 'Yes']

Columna: Businesstravel
[nan 'Travel rarely' 'Travel frequently' 'Non-travel']

Columna: Department
[nan 'Research & Development' 'Sales' 'Human Resources']

Columna: Educationfield
[nan 'Life sciences' 'Technical degree' 'Medical' 'Other' 'Marketing'
 'Human resources']

Columna: Gender
['F' 'M']

Columna: Jobrole
['Research Director' 'Manager' 'Sales Executive' 'Manufacturing Director'
 'Research Scientist' 'Healthcare Representative' 'Laboratory Technician'
 'Sales Representative' 'Human Resources']

Columna: Maritalstatus
[nan 'Married' 'Divorced' 'Single' 'Marreid']

Columna: Overtime
['No' nan 'Yes']

Columna: Standardhours
['Full time' nan 'Part time']

Columna: Remotework
['Yes' 'No']

=== VALORES ÚNICOS DESPUÉS DE CORREGIR ===

=== PRIMERAS FILAS DEL DF LIMPIO ===
  Attrition     Businesstravel  Dailyrate              Department  \
0        No                NaN    2015.72                     NaN   
1  

In [55]:
df.head(30)

Unnamed: 0,Attrition,Businesstravel,Dailyrate,Department,Distancefromhome,Education,Educationfield,Employeenumber,Environmentsatisfaction,Gender,Jobinvolvement,Joblevel,Jobrole,Jobsatisfaction,Maritalstatus,Monthlyincome,Monthlyrate,Numcompaniesworked,Overtime,Percentsalaryhike,Performancerating,Relationshipsatisfaction,Standardhours,Stockoptionlevel,Totalworkingyears,Trainingtimeslastyear,Worklifebalance,Yearsatcompany,Yearssincelastpromotion,Yearswithcurrmanager,Datebirth,Salary,Remotework
0,No,,2015.72,,6,3,,1,1,F,3,5,Research Director,3,,16280.83,42330.17,7,No,13,3.0,3,Full time,0,,5,3.0,20,15,15,1972,195370.0,Yes
1,No,,2063.39,,1,4,Life sciences,2,3,F,2,5,Manager,3,,,43331.17,0,,14,3.0,1,,1,34.0,5,3.0,33,11,9,1971,199990.0,Yes
2,No,Travel rarely,1984.25,Research & Development,4,2,Technical degree,3,3,F,3,5,Manager,4,Married,,41669.33,1,No,11,3.0,4,,0,22.0,3,,22,11,15,1981,192320.0,Yes
3,No,Travel rarely,1771.4,,2,4,Medical,4,1,M,3,4,Research Director,3,Married,14307.5,37199.5,3,,19,3.0,2,Full time,2,,2,,20,5,6,1976,171690.0,No
4,No,,1582.77,,3,3,Technical degree,5,1,M,4,4,Sales Executive,1,Divorced,12783.92,33238.2,2,No,12,3.0,4,,1,,5,3.0,19,2,8,1977,,No
5,No,,1771.92,Research & Development,22,3,Medical,6,4,M,3,4,Manager,4,,14311.67,37210.33,3,No,11,3.0,2,,1,,3,3.0,22,4,7,1975,,Yes
6,No,,1032.49,,25,3,Life sciences,7,1,M,3,3,Sales Executive,1,,8339.32,21682.23,7,,11,3.0,4,Part time,0,28.0,3,2.0,21,7,9,1964,100071.84,Yes
7,No,Travel rarely,556.26,,1,1,,8,2,F,3,2,Sales Executive,3,Married,,11681.39,1,No,25,4.0,3,Part time,0,20.0,3,3.0,20,11,6,1981,53914.11,No
8,No,,1712.18,,2,5,,9,2,M,3,4,Manager,1,Married,13829.17,35955.83,7,No,16,3.0,2,Full time,1,22.0,2,3.0,18,11,8,1982,165950.0,Yes
9,No,Travel frequently,1973.98,,9,3,,10,1,F,3,5,Research Director,3,,15943.72,41453.67,2,No,17,3.0,2,,1,21.0,2,4.0,18,0,11,1982,,No
