# Data Transformation

In this notebook, a comprehensive data transformation process will be carried out using a combination of tools and data processing techniques.

The goal is to ensure that the data is in a structured and usable format, ready for any subsequent analysis or reporting.

It's important to have the environment variables with your database credentials _(from the .env file)_ and the requirements _(installed via pip install -r requirements.txt)_.

---

We will load the environment variables from the .env file, which contains important configurations such as paths and credentials. Then, we will obtain the working directory from these variables and add it to the system path to ensure that the project's modules can be imported correctly.

In [7]:
import sys
import os
from dotenv import load_dotenv

load_dotenv()
work_dir = os.getenv('WORK_DIR')

sys.path.append(work_dir)

print('Workdir: ', work_dir)

Workdir:  /Users/carol/Documents/workshop01


Import the necessary modules and classes for the rest of the notebook.

In [8]:
from src.db_connection import build_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy import inspect
from src.model import Candidates_transformed
from sqlalchemy.exc import SQLAlchemyError
from src.transform import Transform

The build_engine function is called to configure and create a connection to the PostgreSQL database.

In [9]:
engine = build_engine()

Successfully connected to the database workshop1!


A SQLAlchemy session is created using the database engine established in the previous step. This session is necessary for performing read and write operations on the database.

In [10]:
Session = sessionmaker(bind=engine)
session = Session()

Check if the Candidates_transformed table already exists in the database. If it does, the table is dropped. Then, a new Candidates table is created. This ensures that the table is up-to-date and ready to receive new data. If any errors occur during this process, an error message is printed.

In [11]:
try:
    inspector = inspect(engine)

    if inspector.has_table('Candidates_transformed'):
        try:
            Candidates_transformed.__table__.drop(engine)
        except SQLAlchemyError as e:
            print(f"Error dropping table: {e}")
            raise
    try:
        Candidates_transformed.__table__.create(engine)
        print("Table creation was successful.")
    except SQLAlchemyError as e:
        print(f"Error creating table: {e}")
        raise

except SQLAlchemyError as error:
    print(f"An error occurred: {error}")

Table creation was successful.


It's time to perform the transformations.

These transformations will include:

- Generating an ID column.
- Renaming the columns (as they originally have spaces).
- Adding the 'Hired' column as requested, based on the Code Challenge Score and Technical Interview Score.
- Group the categories, as there are too many and they could be reduced.

In [12]:
try:
    df = Transform('../data/candidates.csv')
    
    df.insert_ids()
    df.rename_columns()
    df.add_hired_column()
    df.technology_to_category()
    
    
    df.df.to_sql('Candidates_transformed', engine, if_exists='append', index=False)
    
    print("Data uploaded successfully")

except Exception as e:
    print(f"An error occurred: {e}")

finally:
    if session:
        session.close()

Data uploaded successfully
