## Database Structure Design



| Relationship                          | Type  | Explanation                                                                 |
| ------------------------------------- | ----- | --------------------------------------------------------------------------- |
| `employee` → `employee_demographics`  | **1:1** | Each employee has a single row with their demographic information           |
| `employee` → `employee_professional`  | **1:1** | Each employee has a single registered professional profile                  |
| `employee` → `employee_financial`     | **1:1** | Each employee has a unique set of financial data                            |
| `employee` → `employee_satisfaction`  | **1:1** | Each employee has a single satisfaction and performance evaluation          |


## Tables and Columns Structure



### employee
- employeenumber (PK)
- attrition

### employee_demographics
- employeenumber (FK)
- age
- gender
- marital_status
- date_birth
- generation
- education
- educationfield
- distance_from_home

### employee_professional
- employeenumber (FK)
- job_role
- department
- years_at_company
- num_companies_worked
- over_time
- training_times_last_year
- years_in_current_role
- job_level
- business_travel
- standard_hours
- remotework

### employee_financial
- employeenumber (FK)
- monthly_income
- monthly_rate
- salary
- stock_option_level
- daily_rate
- percent_salary_hike

### employee_satisfaction
- employeenumber (FK)
- job_satisfaction
- performance_rating
- relationship_satisfaction
- work_life_balance
- environment_satisfaction
- job_involvement


In [None]:
import mysql.connector
from mysql.connector import errorcode

import pandas as pd

CREAR DATABASE EN MYSQL

In [None]:
try:
  cnx = mysql.connector.connect(user='root', password='AlumnaAdalab',
                              host='127.0.0.1',
                              database='abc_corporation')
  print("ok")
# en caso de que no lo consigas por que hay algún error entonces
except mysql.connector.Error as err:

  # si es un error con la contraseña devuelveme un mensaje de acceso denegado ya que tenemos problemas con la contraseña
  if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
    print("Something is wrong with your user name or password")
  
  # si el error no tiene que ver con la contraseña, puede ser porque la base de datos no exista, devuelveme un mensaje de que la base de datos no existe
  elif err.errno == errorcode.ER_BAD_DB_ERROR:
    print("Database does not exist")
  
  # si no es por ninguno de los errores anteriores, printeame cual es el error que estoy teniendo en mi conexión
  else:
    print(err)


## Creación de la Base de Datos

In [None]:
import pandas as pd

# Leitura do arquivo CSV
df = pd.read_csv('hr_data_cleaned.csv')

# Verifique as colunas e dados
df.head(1)


In [None]:
# En SQL ejecutamos CREATE SCHEMA abc_corporation
cursor = cnx.cursor()
cursor.execute("USE abc_corporation;")

### employee
- employeenumber (PK)
- attrition

In [None]:
# employee: Tabela principal
cursor.execute("""
CREATE TABLE IF NOT EXISTS employee (
    employeenumber INT PRIMARY KEY,
    attrition VARCHAR(10)
);
""")


In [None]:
employee_data = df[['employeenumber', 'attrition']].values.tolist()
cursor.executemany("""
    INSERT INTO employee (employeenumber, attrition)
    VALUES (%s, %s)
""", employee_data)



# 📌 ¿Para qué sirve .tolist() en pandas?
#
# El método .tolist() se utiliza para convertir un objeto de pandas 
# (como una Serie o un DataFrame) en una lista de Python.
#
# Si se aplica a una Serie, devuelve una lista simple:
#     [valor1, valor2, valor3, ...]
#
# Si se aplica a un DataFrame (como en nuestro caso),
#     devuelve una lista de listas:
#     [[fila1_col1, fila1_col2, ...], [fila2_col1, fila2_col2, ...], ...]
#
# Esto es especialmente útil cuando usamos cursor.executemany() en MySQL,
# que requiere recibir una lista de tuplas o listas como entrada para insertar múltiples registros.





### employee_demographics
- employeenumber (FK)
- age
- gender
- marital_status
- date_birth
- generation
- education
- educationfield
- distance_from_home


In [None]:
# employee_demographics: Dados demográficos
cursor.execute("""
CREATE TABLE IF NOT EXISTS employee_demographics (
    employeenumber INT PRIMARY KEY,
    age INT,
    gender VARCHAR(20),
    marital_status VARCHAR(20),
    date_birth INT,
    generation VARCHAR(20),
    education INT,
    educationfield VARCHAR(100),
    distance_from_home INT,
    FOREIGN KEY (employeenumber) REFERENCES employee(employeenumber)
);
""")


In [None]:
demo_data = df[['employeenumber', 'age', 'gender', 'maritalstatus', 'datebirth','generation', 'education', 'educationfield', 
                'distancefromhome']].rename(columns={'maritalstatus': 'marital_status'}).where(pd.notnull(df), None).values.tolist()
# El método .where(pd.notnull(df), None) se utiliza en pandas para reemplazar valores NaN por None, el cual es el valor nulo reconocido por MySQL al insertar datos desde un DataFrame.

cursor.executemany("""
    INSERT INTO employee_demographics (
        employeenumber, age, gender, marital_status, date_birth,
        generation, education, educationfield, distance_from_home
    )
    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
""", demo_data)

cnx.commit()
print(" Dados inseridos con exito en la tabla employee_demographics.")


# ¿Para qué sirve .where(pd.notnull(df), None) en pandas?
#
# Este método se utiliza para reemplazar todos los valores faltantes (NaN)
# en un DataFrame por el valor nulo de Python (`None`).
#
# ¿Por qué es necesario?
# - MySQL no entiende `NaN` (Not a Number) como valor nulo.
# - Al insertar datos desde pandas usando `executemany()`, es necesario que
#   los valores nulos estén representados como `None`, que sí es aceptado por MySQL.
#
# Ejemplo práctico:
# df = df.where(pd.notnull(df), None)
# Esto deja los valores válidos igual, y cambia solo los NaN a None.



### employee_professional
- employeenumber (FK)
- job_role
- department
- years_at_company
- num_companies_worked
- over_time
- training_times_last_year
- years_in_current_role
- job_level
- business_travel
- standard_hours
- remotework


In [None]:
#  employee_professional: Informações profissionais
cursor.execute("""
CREATE TABLE IF NOT EXISTS employee_professional (
    employeenumber INT PRIMARY KEY,
    job_role VARCHAR(100),
    department VARCHAR(100),
    years_at_company INT,
    num_companies_worked INT,
    over_time VARCHAR(10),
    training_times_last_year INT,
    years_in_current_role FLOAT,
    job_level INT,
    business_travel VARCHAR(50),
    standard_hours VARCHAR(10),
    remotework BOOLEAN,
    FOREIGN KEY (employeenumber) REFERENCES employee(employeenumber)
);
""")


In [None]:
# 1. Selecionar e renomear as colunas primeiro
prof_df = df[['employeenumber', 'jobrole', 'department', 'yearsatcompany',
              'numcompaniesworked', 'overtime', 'trainingtimeslastyear',
              'yearsincurrentrole', 'joblevel', 'businesstravel',
              'standardhours', 'remotework']].rename(columns={
    'yearsatcompany': 'years_at_company',
    'numcompaniesworked': 'num_companies_worked',
    'overtime': 'over_time',
    'trainingtimeslastyear': 'training_times_last_year',
    'yearsincurrentrole': 'years_in_current_role',
    'joblevel': 'job_level',
    'businesstravel': 'business_travel',
    'standardhours': 'standard_hours'
})

# 2. Substituir NaN por None no DataFrame reduzido
prof_df = prof_df.where(pd.notnull(prof_df), None)

# 3. Converter em lista
prof_data = prof_df.values.tolist()

# 4. Inserir no banco
cursor.executemany("""
    INSERT INTO employee_professional (
        employeenumber, job_role, department, years_at_company,
        num_companies_worked, over_time, training_times_last_year,
        years_in_current_role, job_level, business_travel,
        standard_hours, remotework
    )
    VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
""", prof_data)


print("Dados inseridos com sucesso na tabela employee_professional.")



### employee_financial
- employeenumber (FK)
- monthly_income
- monthly_rate
- salary
- stock_option_level
- daily_rate
- percent_salary_hike


In [None]:
# employee_financial: Dados financeiros
cursor.execute("""
CREATE TABLE IF NOT EXISTS employee_financial (
    employeenumber INT PRIMARY KEY,
    monthly_income FLOAT,
    monthly_rate FLOAT,
    salary FLOAT,
    stock_option_level INT,
    daily_rate FLOAT,
    percent_salary_hike INT,
    FOREIGN KEY (employeenumber) REFERENCES employee(employeenumber)
);
""")


In [None]:
fin_data = df[['employeenumber', 'monthlyincome_$', 'monthlyrate_$', 'salary',
               'stockoptionlevel', 'dailyrate', 'percentsalaryhike']] \
               .rename(columns={
                   'monthlyincome_$': 'monthly_income',
                   'monthlyrate_$': 'monthly_rate',
                   'stockoptionlevel': 'stock_option_level',
                   'dailyrate': 'daily_rate',
                   'percentsalaryhike': 'percent_salary_hike'
               }).values.tolist()

cursor.executemany("""
    INSERT INTO employee_financial (
        employeenumber, monthly_income, monthly_rate, salary,
        stock_option_level, daily_rate, percent_salary_hike
    )
    VALUES (%s, %s, %s, %s, %s, %s, %s)
""", fin_data)



### employee_satisfaction
- employeenumber (FK)
- job_satisfaction
- performance_rating
- relationship_satisfaction
- work_life_balance
- environment_satisfaction
- job_involvement


In [None]:
# employee_satisfaction: Indicadores de satisfação
cursor.execute("""
CREATE TABLE IF NOT EXISTS employee_satisfaction (
    employeenumber INT PRIMARY KEY,
    job_satisfaction INT,
    performance_rating FLOAT,
    relationship_satisfaction INT,
    work_life_balance VARCHAR(50),
    environment_satisfaction INT,
    job_involvement INT,
    FOREIGN KEY (employeenumber) REFERENCES employee(employeenumber)
);
""")


In [None]:
sat_data = df[['employeenumber', 'jobsatisfaction', 'performancerating',
               'relationshipsatisfaction', 'worklifebalance',
               'environmentsatisfaction', 'jobinvolvement']] \
               .rename(columns={
                   'jobsatisfaction': 'job_satisfaction',
                   'performancerating': 'performance_rating',
                   'relationshipsatisfaction': 'relationship_satisfaction',
                   'worklifebalance': 'work_life_balance',
                   'environmentsatisfaction': 'environment_satisfaction',
                   'jobinvolvement': 'job_involvement'
               }).values.tolist()

cursor.executemany("""
    INSERT INTO employee_satisfaction (
        employeenumber, job_satisfaction, performance_rating,
        relationship_satisfaction, work_life_balance,
        environment_satisfaction, job_involvement
    )
    VALUES (%s, %s, %s, %s, %s, %s, %s)
""", sat_data)


In [None]:
cnx.commit()
print("Dados inseridos com sucesso em todas as tabelas.")


In [None]:
config = {
  'user': 'root',
  'password': 'AlumnaAdalab',
  'host': '127.0.0.1',
  'database': 'tienda',
  'raise_on_warnings': True
}

cnx = mysql.connector.connect(**config)

cnx.close()