### 📋 **Notebook: Initial Data Import**
Welcome to this first notebook. In this notebook, we will focus on connecting to the database and creating the necessary tables. Specifically, we will connect locally to a PostgreSQL instance. Afterward, we will perform some brief transformations on our dataset, which is initially in a CSV format, to prepare it for insertion into the tables we created earlier.

Before proceeding, ensure that you have already installed the necessary dependencies listed in the requirements.txt file. You can do this by running the following command:

---

In [2]:
pip install -r requirements.txt


Note: you may need to restart the kernel to use updated packages.


ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'




---

## 1. 🗂️ Set Workdir
Ensure that you already have your own .env file containing your environment variables.



In [2]:
import sys
import os
from dotenv import load_dotenv

In [3]:
# Load environment variables
load_dotenv()
work_dir = os.getenv('WORK_DIR')

# Ensure the working directory is in sys.path
sys.path.append(work_dir)

from src.db_connection import build_engine
from src.transform_data import DataTransformer


### 📝 **Explanation:**

- We load environment variables to access database credentials securely.
- The working directory is added to `sys.path` to ensure our project modules are easily accessible.

---

## 2. 🔌 **Connect to the Database & 📦 Import Libraries**

With our environment set up, we'll now connect to the PostgreSQL database using SQLAlchemy.


In [4]:
from src.model import CandidatesRaw, CandidatesTransformed
from src.db_connection import build_engine
from sqlalchemy import inspect
from sqlalchemy.orm import sessionmaker
from sqlalchemy.exc import SQLAlchemyError
from src.transform_data import DataTransformer


In [5]:
# Connect to the database
engine = build_engine()
Session = sessionmaker(bind=engine)
session = Session()

Successfully connected to the database postgres!


### 📝 **Explanation:**

- We use the `build_engine` function to establish a connection to the database. This connection will be used throughout the notebook to interact with the database.

---

## 3. 🗄️ **Create the Database Table**

Here, we check if the CandidatesRaw table exists. If it does, it will be dropped and then recreated. This ensures that the table structure is always up to date. Note: Be cautious when running this in a production environment, as it will drop the existing table.


In [12]:
# Save transformed data to a new CSV
transformer.save_transformed_data('../data/candidates_transformed.csv')

try:
    if inspect(engine).has_table('CandidatesRaw'):
        CandidatesRaw.__table__.drop(engine)
    CandidatesRaw.__table__.create(engine)
    print("Table created successfully.")
except SQLAlchemyError as e:
    print(f"Error creating table: {e}")
finally:
    engine.dispose()


Table created successfully.


### 📝 **Explanation:**

- **DataTransformer**: A custom class designed to handle our data transformations.
- **Standardize Column Names**: We standardize column names to ensure consistency and avoid issues with naming conventions.
- **Generate IDs**: Each candidate is assigned a unique ID for easier reference and database operations.
---


## 4. 📤 **Load Data into the Table**


Now, we'll load the original dataset, which is already normalized, into the CandidatesRaw table. Since the dataset is normalized, we don't need to add any additional columns such as ID.


In [13]:
try:
    # Inicializar la clase transformadora con el archivo CSV
    transformer = DataTransformer('../data/candidates.csv')
    
    # Estandarizar los nombres de las columnas
    transformer.standardize_column_names()

    # Generate unique IDs for each candidate
    transformer.generate_ids()
    
    # Subir los datos a la tabla 'CandidatesRaw'
    transformer.data.to_sql('CandidatesRaw', con=engine, if_exists='append', index=False)
    print("Data uploaded")
except SQLAlchemyError as e:
    print(f"Database error: {e}")
except Exception as e:
    print(f"Error: {e}")
finally:
    if hasattr(engine, 'dispose'):
        engine.dispose()

    if 'session' in locals():
        session.close()

Data uploaded


### 📝 **Explanation:**

- **Save to CSV**: We save the transformed data to a new CSV file as a backup and for any future reference.
- **Insert to Database**: The transformed data is then inserted into the `Candidates_raw` table in our PostgreSQL database.

---
# ✅ **Summary**

In this notebook, we successfully:

- Configured our environment and set up the necessary modules.
- Established a connection to our PostgreSQL database.
- Loaded raw candidate data, performed initial transformations, and saved it to the database.

🎉 **Next Steps**: We'll clean the data further and perform exploratory data analysis (EDA) in the next notebook. 

---

### 📊 **Notebook Flow**

| Step | Description |
|------|-------------|
| 🔧 | Configure the Environment |
| 🔌 | Connect to the Database |
| 📂 | Load and Transform Data |
| 🗄️ | Save Raw Data to Database |

### 🎯 **Objectives Met**

- Environment configured and secured 🔐
- Database connection established 🔗
- Raw data loaded and transformed 📈
- Data successfully saved to PostgreSQL 🗄️
