# Metadata

**L1 Taxonomy** - SQL in Python

**L2 Taxonomy** - Database Normalization

**Subtopic** - Integrating SQL Database Queries with Python for Normalized Data Storage

**Use Case** - Develop a Python application that uses SQLAlchemy to store user credentials in normalized tables with one-to-one relationships, incorporating two-factor authentication for secure access.

**Programming Language** - Python

**Target Model** - o1

# Setup

```requirements.txt
sqlaclchemy==2.0.41
bcrypt==4.3.0
```


# Prompt

I am implementing a secure credential-storage microservice using SQLAlchemy and SQLite.

The service must persist user credentials in fully normalized tables (3NF) connected via one-to-one relationships and enforce strict validation and schema constraints for correctness, uniqueness, and security.

It should be started by a function as :
init_db() -> sqlalchemy.engine.Engine  
   It returns a SQLAlchemy engine bound to a fresh in-memory SQLite database with all models initialized.

**Input Format**

Users will be registered via register_user(payload: dict, session: Session) -> dict | int
having attributes:
     - "username": str (ASCII, 3–32 chars, non-empty, unique)  
     - "password": str (plaintext, >7 chars, non-empty)  
     - "email": str (valid address, unique)  
     - "phone": str (E.164 format, unique)

They can login to database via login(payload: dict, session: Session) -> bool  
By providing  
     - "username": str  
     - "password": str (plaintext)

Both functions receive a valid SQLAlchemy Session bound to the engine from init_db().


**Output Format**

- register_user(...)  
  -> {"user_id": int} on success, or -1 on failure

- login(...)  
  -> True only if the account exists and the password matches; otherwise, return an exception

**Examples**

```
Input:
register_user({
    "username": "alice",
    "password": "hunter2!",
    "email": "alice@example.com",
    "phone": "+15551234567"
}, session)

Output:
{"user_id": 1}

Input:
register_user({
    "username": "alice",  # Duplicate username
    "password": "newpass123",
    "email": "other@example.com",
    "phone": "+15550000000"
}, session)

Output:
-1

Input:
register_user({
    "username": "bob",
    "password": "123",  # Too short
    "email": "bob@example.com",
    "phone": "+15550000001"
}, session)

Output:
-1

Input:
login({
    "username": "alice",
    "password": "hunter2!"
}, session)

Output:
True

Input:
login({
    "username": "alice",
    "password": "wrong"
}, session)

Output:
"Invalid username or password"

Input:
login({
    "username": "ghost",
    "password": "doesnotexist"
}, session)

Output:
"Invalid username or password"
```




**Requirements**

- Use SQLAlchemy Declarative ORM only.  
  (Raw SQL allowed **only** for enabling foreign key constraints via PRAGMA foreign_keys = ON.)

- Your schema must be in 3rd Normal Form (3NF):  
  - Split data into users and credentials tables with one-to-one relationship via unique foreign keys.
  - Store passwords only in the credentials table.

- Enforce schema constraints:
  - NOT NULL, UNIQUE, and FOREIGN KEY must be defined at the database level, not in Python logic only.

- Passwords must be stored as bcrypt hashes with cost factor (rounds) >= 12

- Input must be validated before insert:
  - Enforce presence, type, and minimum field lengths.
  - Reject any missing or malformed fields.

- No global state beyond the engine returned by init_db().  
- No file or disk I/O.

**Solution Expectation**

- Use SQLAlchemy Declarative ORM only.
- Split data across two fully normalized tables:
   - users: stores metadata (username, email, phone)
   - credentials: stores password hash and links to a single user
- Enforce one-to-one relationship between users and credentials using a unique foreign key.
- Ensure all fields are fully validated to the provided criteria and are:
  - UNIQUE (where applicable)

**Function Signature**

```python
def init_db() -> sqlalchemy.engine.Engine:
  pass
def register_user(payload: dict, session: sqlalchemy.orm.Session) -> dict | int:
  pass
def login(payload: dict, session: sqlalchemy.orm.Session) -> bool:
  pass
```

**Edge Case Behavior**

- Duplicate username, email, or phone return a case of failure and respective output for that.
- Missing or invalid field return a case of failure and respective output for that.
- Password shorter than 8 chars -> return a case of failure and respective output for that.
- Login with wrong password or nonexistent user return a case of failure and respective output for that.

**Constraints**

- Libraries allowed: sqlalchemy, bcrypt and Python standard library
- Each data must be validated and meet the mentioned criteria before being inserted in to the db for user_registraion.
- Solution should be testable by just importing Signature functions to a test script.

In [3]:
# code

"""This module uses SQLAlchemy to store data in normalized tables."""
import re
import bcrypt
import sqlalchemy
from sqlalchemy import (
    create_engine, Column, Integer, String, ForeignKey, text, LargeBinary
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, Session

Base = declarative_base()


class User(Base):
    """A base user class for user table."""

    __tablename__ = "users"

    id = Column(Integer, primary_key=True)
    username = Column(String(32), unique=True, nullable=False)
    email = Column(String(128), unique=True, nullable=False)
    phone = Column(String(15), unique=True, nullable=False)

    credential = relationship(
        "Credential", uselist=False,
        back_populates="user", cascade="all, delete-orphan")


class Credential(Base):
    """a base credential class to credential table."""

    __tablename__ = "credentials"

    id = Column(Integer, primary_key=True)
    user_id = Column(
        Integer, ForeignKey(
            "users.id", ondelete="CASCADE"), unique=True, nullable=False)
    password_hash = Column(LargeBinary, nullable=False)

    user = relationship("User", back_populates="credential")


def init_db() -> sqlalchemy.engine.Engine:
    """Return the Engine after Initiating the database."""
    engine = create_engine("sqlite:///:memory:", echo=False, future=True)
    with engine.connect() as conn:
        conn.execute(text("PRAGMA foreign_keys = ON"))
    Base.metadata.create_all(engine)
    return engine


def register_user(payload: dict, session: Session) -> dict | int:
    """
    Register a new user after validating inputs and enforcing constraints.

    Returns {"user_id": int} on success or -1 on failure.
    """
    try:
        username = payload["username"]
        password = payload["password"]
        email = payload["email"]
        phone = payload["phone"]
    except KeyError:
        return -1

    # Validation
    if not (isinstance(
        username, str)
         and 3 <= len(username) <= 32 and username.isascii()):
        return -1
    if not (isinstance(password, str) and len(password) >= 8):
        return -1
    if not (isinstance(
        email, str)
         and re.fullmatch(r"[^@\s]+@[^@\s]+\.[^@\s]+", email)):
        return -1
    if not (isinstance(phone, str) and re.fullmatch(r"^\+\d{11,15}$", phone)):
        return -1

    try:
        hashed = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))

        user = User(username=username, email=email, phone=phone)
        user.credential = Credential(password_hash=hashed)

        session.add(user)
        session.commit()
        return {"user_id": user.id}
    except sqlalchemy.exc.IntegrityError:
        session.rollback()
        return -1


def login(payload: dict, session: Session) -> bool:
    """
    Authenticate user by validating (username, password).

    Returns True if match is successful, otherwise raises Exception.
    """
    try:
        username = payload["username"]
        password = payload["password"]
    except KeyError:
        raise Exception("Invalid username or password")

    user = session.query(User).filter_by(username=username).first()
    if not user or not user.credential:
        raise Exception("Invalid username or password")

    if bcrypt.checkpw(password.encode(), user.credential.password_hash):
        return True
    else:
        raise Exception("Invalid username or password")


ModuleNotFoundError: No module named 'bcrypt'

In [2]:
# tests

"""
Unit tests for the clean_and_impute function.
"""
import os
import tempfile
import unittest
import pandas as pd
from main import clean_and_impute


class TestCleanAndImpute(unittest.TestCase):
    """
    Test suite for the clean_and_impute function.
    """

    def setUp(self) -> None:
        """
        Create a temporary directory and switch to it before each test.
        """
        self.tempdir = tempfile.TemporaryDirectory()
        self.original_cwd = os.getcwd()
        os.chdir(self.tempdir.name)

    def tearDown(self) -> None:
        """
        Restore the original directory and clean up the temporary directory after each test.
        """
        os.chdir(self.original_cwd)
        self.tempdir.cleanup()

    def write_csv(self, filename: str, df: pd.DataFrame) -> None:
        """
        Write the given DataFrame to a CSV file without an index.

        Args:
            filename: The name of the CSV file to write.
            df: The DataFrame to save.
        """
        df.to_csv(filename, index=False)

    def test_threshold_out_of_bounds(self) -> None:
        """
        Threshold values outside [0, 1] should return an empty DataFrame and log an error.
        """
        df, log = clean_and_impute('dummy.csv', {}, -0.1)
        self.assertTrue(df.empty)
        self.assertIn('Invalid threshold value', log[0])

        df, log = clean_and_impute('dummy.csv', {}, 1.5)
        self.assertTrue(df.empty)
        self.assertIn('Invalid threshold value', log[0])

    def test_file_not_found(self) -> None:
        """
        A missing file should return an empty DataFrame and log a read failure.
        """
        df, log = clean_and_impute('nofile.csv', {'col': 'mean'}, 0.5)
        self.assertTrue(df.empty)
        self.assertIn('Failed to read CSV', log[0])

    def test_empty_csv(self) -> None:
        """
        A CSV containing only headers should return an empty DataFrame and log the condition.
        """
        df_input = pd.DataFrame(columns=['a', 'b'])
        self.write_csv('empty.csv', df_input)
        df, log = clean_and_impute('empty.csv', {}, 0.5)
        self.assertTrue(df.empty)
        self.assertIn('CSV is empty or contains only headers', log[0])

    def test_skip_nonexistent_column_rule(self) -> None:
        """
        Imputation rules for columns not in the CSV should be skipped with a warning log.
        """
        df_input = pd.DataFrame({'x': [1, None, 3]})
        self.write_csv('data.csv', df_input)
        df, log = clean_and_impute('data.csv', {'y': 'mean'}, 0.5)
        self.assertIn("Skipped non-existent column 'y'", log[0])

    def test_mean_and_median_imputation(self) -> None:
        """
        Numeric columns should be correctly imputed using mean and median strategies.
        """
        df_input = pd.DataFrame({
            'a': [1, None, 3],
            'b': [None, 2, 4]
        })
        self.write_csv('in.csv', df_input)
        df, log = clean_and_impute('in.csv', {'a': 'mean', 'b': 'median'}, 1.0)
        self.assertAlmostEqual(df.loc[1, 'a'], 2)
        self.assertAlmostEqual(df.loc[0, 'b'], 2)
        self.assertIn('Column a: filled 1 missing values with mean=', log[0])
        self.assertIn('Column b: filled 1 missing values with median=', log[1])

    def test_mode_imputation_and_high_cardinality(self) -> None:
        """
        Mode imputation should fill missing values, and high-cardinality columns should be dropped.
        """
        df_input = pd.DataFrame({
            'c': ['x', None, 'x', 'y', 'z',
                  'w', 'v', 'u', 't', 's']
        })
        self.write_csv('mode.csv', df_input)
        df, log = clean_and_impute('mode.csv', {'c': 'mode'}, 1.0)
        self.assertEqual(df.loc[1, 'c'], 'x')
        self.assertTrue(any('Dropped column' in entry for entry in log))

    def test_drop_entirely_nan_column(self) -> None:
        """
        Columns that are entirely NaN should be dropped and logged.
        """
        df_input = pd.DataFrame({
            'd': [None, None, None],
            'e': [1, 2, 3]
        })
        self.write_csv('nan.csv', df_input)
        df, log = clean_and_impute('nan.csv', {'d': 'mode', 'e': 'mean'}, 1.0)
        self.assertNotIn('d', df.columns)
        self.assertIn("Dropped column 'd' due to being entirely NaN", log)

    def test_drop_columns_over_threshold(self) -> None:
        """
        Columns with missing-value fraction above the threshold should be dropped.
        """
        df_input = pd.DataFrame({
            'f': [1, None, None, 4],
            'g': [1, 2, 3, 4]
        })
        self.write_csv('thr.csv', df_input)
        df, log = clean_and_impute('thr.csv', {'f': 'mean', 'g': 'mean'}, 0.25)
        self.assertNotIn('f', df.columns)
        self.assertIn(
            "Dropped column 'f' due to missing data > threshold=0.25",
            log
        )

    def test_drop_rows_over_threshold(self) -> None:
        """
        Rows with missing-value fraction above the threshold should be dropped.
        """
        df_input = pd.DataFrame({
            'h': [1, None, 3],
            'i': [None, None, None]
        })
        self.write_csv('rows.csv', df_input)
        df, log = clean_and_impute('rows.csv', {'h': 'mean', 'i': 'mode'}, 0.5)
        self.assertNotIn(1, df.index)
        self.assertIn(
            'Dropped row 1 due to missing data > threshold=0.5',
            log
        )

    def test_no_missing_values(self) -> None:
        """
        When no missing values exist, log that nothing was changed.
        """
        df_input = pd.DataFrame({
            'j': [1, 2, 3],
            'k': ['a', 'b', 'c']
        })
        self.write_csv('nomiss.csv', df_input)
        df, log = clean_and_impute('nomiss.csv', {}, 0.5)
        self.assertIn(
            'No missing values found, nothing was changed',
            log
        )

    def test_output_file_created(self) -> None:
        """
        Ensure the cleaned CSV file is saved and the final log entry is correct.
        """
        df_input = pd.DataFrame({
            'l': [1, None],
            'm': ['x', None]
        })
        self.write_csv('out.csv', df_input)
        _, log = clean_and_impute('out.csv', {'l': 'mean', 'm': 'mode'}, 1.0)
        self.assertTrue(os.path.exists('cleaned_out.csv'))
        self.assertIn(
            "Saved cleaned DataFrame to 'cleaned_out.csv'",
            log[-1]
        )


if __name__ == '__main__':
    unittest.main()


ImportError: cannot import name 'clean_and_impute' from '__main__' (unknown location)

# Model Breaking Proof

#### Model Breaking Task URL: <https://rlhf-v3.turing.com/prompt/cee2c57a-c782-4761-b3bc-7f584878b2f0?origin=https%3A%2F%2Flabeling-ms.turing.com&redirect_url=https%3A%2F%2Flabeling-ms.turing.com%2Fconversations%2F73283%2Fview>

#### Model code:

```python
import re
import bcrypt
import sqlalchemy
from sqlalchemy import (
    Column,
    Integer,
    String,
    ForeignKey,
    create_engine,
    text
)
from sqlalchemy.orm import (
    sessionmaker,
    declarative_base,
    relationship,
    Session
)

Base = declarative_base()


class User(Base):
    """
    Stores user metadata (username, email, phone).
    The credential relationship references the associated
    password hash stored in the Credential table.
    """
    __tablename__ = 'users'

    id = Column(Integer, primary_key=True, autoincrement=True)
    username = Column(String(32), unique=True, nullable=False)
    email = Column(String(255), unique=True, nullable=False)
    phone = Column(String(20), unique=True, nullable=False)

    # One-to-one relationship to Credential
    credential = relationship(
        "Credential",
        back_populates="user",
        uselist=False,
        cascade="all, delete-orphan"
    )


class Credential(Base):
    """
    Stores password hash for exactly one user.
    Maintains a one-to-one relationship with the User table
    via the unique foreign key user_id.
    """
    __tablename__ = 'credentials'

    id = Column(Integer, primary_key=True, autoincrement=True)
    user_id = Column(
        Integer,
        ForeignKey('users.id', ondelete='CASCADE'),
        unique=True,
        nullable=False
    )

# <Issue>: <register_user does not hash password to binary as per schema, stores as String(60)>

    password_hash = Column(String(60), nullable=False)

    # Relationship back to User
    user = relationship("User", back_populates="credential")


def init_db() -> sqlalchemy.engine.Engine:
    """
    Returns a SQLAlchemy engine bound to a fresh in-memory SQLite database
    with all models initialized, enforcing foreign key constraints.
    """
    engine = create_engine('sqlite:///:memory:', echo=False, future=True)
    # Enable foreign key constraints
    with engine.begin() as conn:
        conn.execute(text("PRAGMA foreign_keys = ON"))
        Base.metadata.create_all(conn)
    return engine


def _is_valid_username(username: str) -> bool:
    """Validate username: ASCII only, length 3–32."""
    if not (3 <= len(username) <= 32):
        return False
    return all(ord(c) < 128 for c in username)  # ASCII check


def _is_valid_password(password: str) -> bool:
    """Validate password: length >= 8."""
    return len(password) >= 8


def _is_valid_email(email: str) -> bool:
    """
    Very basic check for email format.
    For production, consider a more robust validation or a specialized library.
    """
    pattern = r"^[^@\s]+@[^@\s]+\.[^@\s]+$"
    return bool(re.match(pattern, email))


def _is_valid_phone(phone: str) -> bool:
    """
    Basic E.164 format check:
    Start with '+' followed by 6-15 digits (typical E.164 length).
    Adjust if your phone requirements vary.
    """
    pattern = r"^\+\d{6,15}$"
    return bool(re.match(pattern, phone))


def register_user(payload: dict, session: Session) -> dict | int:
    """
    Register a new user with validated fields and unique constraints.
    On success, returns {"user_id": <int>}, otherwise returns -1.
    """
    required_fields = ["username", "password", "email", "phone"]
    for field in required_fields:
        if field not in payload:
            return -1  # Missing required field

    username = payload["username"]
    password = payload["password"]
    email = payload["email"]
    phone = payload["phone"]

    # Validate input
    if not (isinstance(username, str) and _is_valid_username(username)):
        return -1
    if not (isinstance(password, str) and _is_valid_password(password)):
        return -1
    if not (isinstance(email, str) and _is_valid_email(email)):
        return -1
    if not (isinstance(phone, str) and _is_valid_phone(phone)):
        return -1

    # Check for uniqueness in the database
    existing_user = (
        session.query(User)
        .filter(
            (User.username == username)
            | (User.email == email)
            | (User.phone == phone)
        )
        .first()
    )
    if existing_user:
        # Duplicate found (username/email/phone)
        return -1

    # If all validations pass, create user
    new_user = User(username=username, email=email, phone=phone)

    # Hash the password (bcrypt with cost >= 12)
    hashed_pw = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt(rounds=12))
    new_credential = Credential(password_hash=hashed_pw)

    # Link the credential to the user
    new_user.credential = new_credential

    try:
        session.add(new_user)
        session.commit()
    except:
        session.rollback()
        return -1

    return {"user_id": new_user.id}


def login(payload: dict, session: Session) -> bool:
    """
    Attempt to log in with a username and plaintext password.
    Returns True on successful authentication,
    or "Invalid username or password" if check fails.
    """
    required_fields = ["username", "password"]
    for field in required_fields:
        if field not in payload:
            return "Invalid username or password"

    username = payload["username"]
    password = payload["password"]

    # Look up user
    user = session.query(User).filter(User.username == username).first()
    if not user or not user.credential:

# <Issue>: <Function login returns a string on failure instead of raising an exception>

        return "Invalid username or password"

    # Check password
    stored_hash = user.credential.password_hash
    if bcrypt.checkpw(password.encode('utf-8'), stored_hash):
        return True
    else:
# <Issue>: <Function login returns a string on failure instead of raising an exception>
        return "Invalid username or password"
```