# Data Processed by Jhonatan Steven Morales


In this section, we will process the data to normalize it before importing it into the database while preserving the relationships between the tables. The tables will be named CandidatesProcessed and Countries, and we will make all necessary changes to ensure proper data transformation.

Ensure that you already have your own .env file containing your environment variables.

In [1]:
import sys
import os
from dotenv import load_dotenv

load_dotenv()
work_dir = os.getenv('WORK_DIR')

sys.path.append(work_dir)

libraries & Data loading

In [2]:
from sqlalchemy import inspect
from sqlalchemy.orm import sessionmaker
from  src.database.dbconnection import getconnection
from python_code.transform import DataTransform
from src.model.models import CandidatesProcessed
from src.model.models import Countries
from sqlalchemy.exc import SQLAlchemyError



Using the SQLAlchemy library, connect to the database. If you encounter any issues, check that your .env file contains the correct environment variables and try again.

In [3]:
engine = getconnection()
Session = sessionmaker(bind=engine)
session = Session()

Conected successfully to database workshop1!


Make sure to create the countries table first, as it serves as the foreign key for the candidates table. This will help avoid any potential errors. In this process, ensure that there are no other tables with the same name. If such tables exist, they should be dropped before creating the new ones.

In [4]:
try:
    if inspect(engine).has_table('Countries'):
        Countries.__table__.drop(engine)
    Countries.__table__.create(engine)
    print("Table created successfully.")
except SQLAlchemyError as e:
    print(f"Error creating table: {e}")
finally:
    engine.dispose()

Table created successfully.


In [5]:
try:
    if inspect(engine).has_table('CandidatesProcessed'):
        CandidatesProcessed.__table__.drop(engine)
    CandidatesProcessed.__table__.create(engine)
    print("Table created successfully.")
except SQLAlchemyError as e:
    print(f"Error creating table: {e}")
finally:
    engine.dispose()

Table created successfully.


We will use the previously created DataTransform class to perform the necessary transformations. First, we will ensure that each record is assigned a unique id. Next, we will add a new column, which will be a boolean variable where 0 indicates not hired and 1 indicates hired. We will rename columns that contain spaces. Finally, to simplify the technology field, we will group the different job types into predefined categories to make data analysis and visualization easier.

Generalization of Categories

Software Development:

*   Game Development
*   Development - Backend
*   Development - FullStack
*   Adobe Experience Manager
*   Development - CMS Frontend
*   Development - Frontend
*   Development - CMS Backend

DevOps and System Administration:

*   DevOps
*   System Administration
*   Database Administration

Management and Support:
*   Social Media Community Management
*   Client Success
*   Sales

Other Areas:

*   Mulesoft
*   Technical Writing
*   Salesforce
Data Engineering and Analytics:

*   Data Engineer
*   Business Intelligence
*   Business Analytics / Project Management
Security:

*   Security
*   Security Compliance
Design and QA:

*   Design
*   QA Manual
*   QA Automation




After these changes, we will make minor adjustments to the data. Finally, we will separate the countries by a unique id, insert the countries into their respective table, and insert the candidates into the other table.

In [7]:
try:
    transform_candidates = DataTransform('../data/candidates.csv')
    transform_candidates.insert_id()
    transform_candidates.HiredOrNotHired()
    transform_candidates.rename()
    transform_candidates.technology_by_category()
    transform_candidates.FullNameReplace()
    transform_candidates.ApplicationDateToDateType()
    
     
    transform_countrys = transform_candidates.NormalizeCountry()
    
    transform_countrys.to_sql('Countries', con=engine, if_exists='append', index=False)
    transform_candidates.df.to_sql('CandidatesProcessed', con=engine, if_exists='append', index=False)
    
    print("Data uploaded")

except SQLAlchemyError as e:
    print(f"Database error: {e}")

except Exception as e:
    print(f"Error: {e}")

finally:
    if hasattr(engine, 'dispose'):
        engine.dispose()
    if 'session' in locals():
        session.close()



Data uploaded
