# Import Data - Importacao do CSV
**Projeto 1: IBM HR Analytics - AiDAPT - Cegid Academy**

Importacao do ficheiro WA_Fn-UseC_-HR-Employee-Attrition.csv via BULK INSERT
e verificacao dos dados importados.

Base de dados: Projeto1_IBM_HR

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

%load_ext sql

password = os.getenv('MSSQL_PASSWORD', 'your_password_here')
connection_url = f"mssql+pymssql://sa:{password}@localhost:1433/Projeto1_IBM_HR"
%sql {connection_url}

---
**AVISO**: Este notebook contem operacoes de setup (CREATE/DROP/BULK INSERT).
Executar apenas se necessario recriar a estrutura da base de dados.

## PREPARACAO

**Preparacao (terminal):**
```bash
Copiar o CSV para o container Docker:
docker cp "Projeto_1/enunciado/WA_Fn-UseC_-HR-Employee-Attrition.csv" sqlserver:/var/opt/mssql/data/
```

## IMPORTACAO VIA BULK INSERT

In [None]:
%%sql
BULK INSERT Colaboradores
FROM '/var/opt/mssql/data/WA_Fn-UseC_-HR-Employee-Attrition.csv'
WITH (
    FIRSTROW = 2,           -- Ignorar cabecalho
    FIELDTERMINATOR = ',',  -- Separador de campos
    ROWTERMINATOR = '\n',   -- Separador de linhas
    TABLOCK
);

## VERIFICACAO APOS IMPORTACAO

### Contar registos (deve ser 1470)

In [None]:
%%sql
SELECT COUNT(*) AS TotalRegistos FROM Colaboradores;

### Verificar primeiros 10 registos

In [None]:
%%sql
SELECT TOP 10
    EmployeeNumber, Age, Gender, Department, JobRole, MonthlyIncome, Attrition
FROM Colaboradores
ORDER BY EmployeeNumber;

### Verificar estatisticas basicas

In [None]:
%%sql
SELECT
    'Age' AS Coluna,
    MIN(Age) AS Min,
    MAX(Age) AS Max,
    AVG(CAST(Age AS FLOAT)) AS Media
FROM Colaboradores
UNION ALL
SELECT
    'MonthlyIncome',
    MIN(MonthlyIncome),
    MAX(MonthlyIncome),
    AVG(CAST(MonthlyIncome AS FLOAT))
FROM Colaboradores
UNION ALL
SELECT
    'YearsAtCompany',
    MIN(YearsAtCompany),
    MAX(YearsAtCompany),
    AVG(CAST(YearsAtCompany AS FLOAT))
FROM Colaboradores;

### Verificar valores unicos das colunas categoricas

In [None]:
%%sql
SELECT 'Gender' AS Coluna, Gender AS Valor, COUNT(*) AS Total FROM Colaboradores GROUP BY Gender
UNION ALL
SELECT 'Department', Department, COUNT(*) FROM Colaboradores GROUP BY Department
UNION ALL
SELECT 'Attrition', Attrition, COUNT(*) FROM Colaboradores GROUP BY Attrition;

### Verificar se ha valores nulos nas colunas principais

In [None]:
%%sql
SELECT
    SUM(CASE WHEN Age IS NULL THEN 1 ELSE 0 END) AS Age_Nulls,
    SUM(CASE WHEN Gender IS NULL THEN 1 ELSE 0 END) AS Gender_Nulls,
    SUM(CASE WHEN Department IS NULL THEN 1 ELSE 0 END) AS Department_Nulls,
    SUM(CASE WHEN MonthlyIncome IS NULL THEN 1 ELSE 0 END) AS MonthlyIncome_Nulls
FROM Colaboradores;

## Notas

- **1470 registos** importados do CSV
- **BULK INSERT** requer que o ficheiro esteja acessivel dentro do container Docker
- Comando: `docker cp "enunciado/WA_Fn-UseC_-HR-Employee-Attrition.csv" sqlserver:/var/opt/mssql/data/`