# Import Data - Importacao do CSV
**Projeto 1: IBM HR Analytics - AiDAPT - Cegid Academy**

Importacao do ficheiro WA_Fn-UseC_-HR-Employee-Attrition.csv via BULK INSERT
e verificacao dos dados importados.

Base de dados: Projeto1_IBM_HR

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
from urllib.parse import quote_plus
from sqlalchemy import create_engine
load_dotenv(find_dotenv())

%load_ext sql

host = os.getenv('MSSQL_HOST', 'localhost')
port = os.getenv('MSSQL_PORT', '1433')
user = os.getenv('MSSQL_USER', 'sa')
password = quote_plus(os.getenv('MSSQL_PASSWORD', 'your_password_here'))
engine = create_engine(f"mssql+pymssql://{user}:{password}@{host}:{port}/Projeto1_IBM_HR")
%sql engine --alias Projeto1_IBM_HR

---
**AVISO**: Este notebook contem operacoes de setup (CREATE/DROP/BULK INSERT).
Executar apenas se necessario recriar a estrutura da base de dados.

## PREPARACAO

**Preparacao (terminal):**
```bash
Copiar o CSV para o container Docker:
docker cp "Projeto_1/enunciado/WA_Fn-UseC_-HR-Employee-Attrition.csv" sqlserver:/var/opt/mssql/data/
```

## IMPORTACAO VIA BULK INSERT

In [2]:
%%sql
BULK INSERT Colaboradores
FROM '/var/opt/mssql/data/WA_Fn-UseC_-HR-Employee-Attrition.csv'
WITH (
    FIRSTROW = 2,           -- Ignorar cabecalho
    FIELDTERMINATOR = ',',  -- Separador de campos
    ROWTERMINATOR = '\n',   -- Separador de linhas
    TABLOCK
);

## VERIFICACAO APOS IMPORTACAO

### Contar registos (deve ser 1470)

In [3]:
%%sql
SELECT COUNT(*) AS TotalRegistos FROM Colaboradores;

TotalRegistos
1470


### Verificar primeiros 10 registos

In [4]:
%%sql
SELECT TOP 10
    EmployeeNumber, Age, Gender, Department, JobRole, MonthlyIncome, Attrition
FROM Colaboradores
ORDER BY EmployeeNumber;

EmployeeNumber,Age,Gender,Department,JobRole,MonthlyIncome,Attrition
1,41,Female,Sales,Sales Executive,5993,Yes
2,49,Male,Research & Development,Research Scientist,5130,No
4,37,Male,Research & Development,Laboratory Technician,2090,Yes
5,33,Female,Research & Development,Research Scientist,2909,No
7,27,Male,Research & Development,Laboratory Technician,3468,No
8,32,Male,Research & Development,Laboratory Technician,3068,No
10,59,Female,Research & Development,Laboratory Technician,2670,No
11,30,Male,Research & Development,Laboratory Technician,2693,No
12,38,Male,Research & Development,Manufacturing Director,9526,No
13,36,Male,Research & Development,Healthcare Representative,5237,No


### Verificar estatisticas basicas

In [5]:
%%sql
SELECT
    'Age' AS Coluna,
    MIN(Age) AS Min,
    MAX(Age) AS Max,
    AVG(CAST(Age AS FLOAT)) AS Media
FROM Colaboradores
UNION ALL
SELECT
    'MonthlyIncome',
    MIN(MonthlyIncome),
    MAX(MonthlyIncome),
    AVG(CAST(MonthlyIncome AS FLOAT))
FROM Colaboradores
UNION ALL
SELECT
    'YearsAtCompany',
    MIN(YearsAtCompany),
    MAX(YearsAtCompany),
    AVG(CAST(YearsAtCompany AS FLOAT))
FROM Colaboradores;

Coluna,Min,Max,Media
Age,18,60,36.92380952380952
MonthlyIncome,1009,19999,6502.931292517007
YearsAtCompany,0,40,7.0081632653061225


### Verificar valores unicos das colunas categoricas

In [6]:
%%sql
SELECT 'Gender' AS Coluna, Gender AS Valor, COUNT(*) AS Total FROM Colaboradores GROUP BY Gender
UNION ALL
SELECT 'Department', Department, COUNT(*) FROM Colaboradores GROUP BY Department
UNION ALL
SELECT 'Attrition', Attrition, COUNT(*) FROM Colaboradores GROUP BY Attrition;

Coluna,Valor,Total
Gender,Female,588
Gender,Male,882
Department,Human Resources,63
Department,Research & Development,961
Department,Sales,446
Attrition,No,1233
Attrition,Yes,237


### Verificar se ha valores nulos nas colunas principais

In [7]:
%%sql
SELECT
    SUM(CASE WHEN Age IS NULL THEN 1 ELSE 0 END) AS Age_Nulls,
    SUM(CASE WHEN Gender IS NULL THEN 1 ELSE 0 END) AS Gender_Nulls,
    SUM(CASE WHEN Department IS NULL THEN 1 ELSE 0 END) AS Department_Nulls,
    SUM(CASE WHEN MonthlyIncome IS NULL THEN 1 ELSE 0 END) AS MonthlyIncome_Nulls
FROM Colaboradores;

Age_Nulls,Gender_Nulls,Department_Nulls,MonthlyIncome_Nulls
0,0,0,0


## Notas

- **1470 registos** importados do CSV
- **BULK INSERT** requer que o ficheiro esteja acessivel dentro do container Docker
- Comando: `docker cp "enunciado/WA_Fn-UseC_-HR-Employee-Attrition.csv" sqlserver:/var/opt/mssql/data/`