# Lake House Simulation

This notebook simulates a data lake house by loading multiple CSV files into a SQLite database. The data includes absence types, absences, contract basis, postcodes, salary statements, and work plans. The notebook demonstrates how to create database tables, inspect their structure, and query the data for further analysis.

In [None]:
"""
This cell imports CSV files into a SQLite database. It performs the following steps:
1. Reads multiple CSV files into a dictionary of pandas DataFrames.
2. Connects to a SQLite database (or creates one if it doesn't exist).
3. Cleans column names by stripping whitespace, converting to lowercase, and replacing spaces with underscores.
4. Writes each DataFrame to a corresponding table in the SQLite database, replacing the table if it already exists.
5. Closes the database connection after all tables are created.
"""
import sqlite3

import pandas as pd

# Fichiers déjà lus dans le dictionnaire
csv_files = {
    "absence_type": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\Absence_Type.csv",
        sep=";",
    ),
    "absence": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\ABSENCES.csv",
        sep=";",
    ),
    "contract_basis": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\CONTRACT_BASIS.csv",
        sep=";",
    ),
    "post_code": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\POSTCODES.csv",
        sep=";",
    ),
    "salary_statement": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\SALARY_STATEMENT.csv",
        sep=";",
    ),
    "work_plan": pd.read_csv(
        r"C:\Users\pieta\OneDrive\Bureau\Beyond Data Group\Beyond-Data-Group\csv\WORK_PLAN.csv",
        sep=";",
    ),
}

# Connexion à la base SQLite
conn = sqlite3.connect("fabric_sim.db")

for table_name, df in csv_files.items():
    df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")
    df.to_sql(table_name, conn, if_exists="replace", index=False)
    print(f"✅ Table '{table_name}' créée avec succès")

conn.close()

✅ Table 'absence_type' créée avec succès
✅ Table 'absence' créée avec succès
✅ Table 'contract_basis' créée avec succès
✅ Table 'post_code' créée avec succès
✅ Table 'salary_statement' créée avec succès
✅ Table 'work_plan' créée avec succès


In [None]:
def print_table_columns(*table_names):
    """
    Prints the column names of the specified tables in the SQLite database.

    Parameters:
    *table_names (str): Names of the tables whose column names are to be printed.

    The function connects to the SQLite database 'fabric_sim.db', retrieves the column
    information for each specified table using the PRAGMA table_info command, and prints
    the column names for each table. The database connection is closed after the operation.
    """
    conn = sqlite3.connect("fabric_sim.db")
    for table_name in table_names:
        query = f"PRAGMA table_info({table_name})"
        result = pd.read_sql_query(query, conn)
        print(f"Columns in table '{table_name}':")
        print(result["name"].tolist())
    conn.close()


# Example usage:
print_table_columns(
    "absence_type",
    "absence",
    "contract_basis",
    "post_code",
    "salary_statement",
    "work_plan",
)

Columns in table 'absence_type':
['type_absence', 'type_absence_fr']
Columns in table 'absence':
['firm_id', 'department_id', 'category_id', 'person_id', 'year', 'quarter', 'month', 'date', 'period', 'qty_illness_days', 'qty_z0_days', 'qty_z1_days', 'qty_z2_days', 'qty_z3_days', 'qty_p0_days', 'qty_p1_days', 'qty_p2_days', 'qty_p3_days', 'qty_a1_days', 'qty_a2_days', 'freq_z0_days', 'freq_z1_days', 'freq_z2_days', 'freq_z3_daqs', 'freq_p0_days', 'freq_p1_days', 'freq_p2_days', 'freq_p3_days', 'freq_a1_days', 'freq_a2_days', 'qty_days_worked', 'qty_working_days']
Columns in table 'contract_basis':
['contract_zip_code', 'firm_id', 'department_id', 'category_id', 'person_id', 'contract_start_date', 'contract_end_date', 'company_start_date', 'birth_date', 'contract_terminatio_reason', 'gender', 'nationality', 'contract_type']
Columns in table 'post_code':
['postcode', 'region_code', 'region']
Columns in table 'salary_statement':
['fdcp', 'gross_salary', 'net_salary', 'gross_salary_108', 'p

In [None]:
"""
This cell demonstrates how to read a specific table from the SQLite database into a pandas DataFrame.

Steps:
1. Connects to the SQLite database 'fabric_sim.db'.
2. Reads the entire 'absence_type' table into a pandas DataFrame using an SQL SELECT query.
3. Prints the column names of the DataFrame as a list.

The database connection remains open after the operation.
"""

# Example of reading a specific table
conn = sqlite3.connect("fabric_sim.db")
df_abs_type = pd.read_sql("SELECT * FROM absence_type", conn)

print(df_abs_type.columns.tolist())

['type_absence', 'type_absence_fr']
