# Mixed Data Types Testing
This notebook tests `qutePandas` functionality on a realistic small DataFrame with mixed data types:
- **Strings** (name)
- **Integers** (age)
- **Categorical** (gender)
- **Datetime** (date_of_birth)
- **Null values** across all columns

Each test validates that qutePandas produces results equivalent to pandas.

## Setup
Import libraries and configure the kdb+ license.

In [None]:
import os
import sys
sys.path.append(os.path.abspath('..'))
sys.path.append(os.path.abspath('.'))
import qutePandas as qpd
import pandas as pd
import numpy as np
import pykx as kx
from test_utils import verify_correctness

# Setup License
local_lic = os.path.abspath('../kdb_lic')
if os.path.exists(local_lic): os.environ['QLIC'] = local_lic
qpd.connect()

print('Setup Complete')

## Create Test DataFrame
Define a small DataFrame with realistic mixed-type data including null values.

In [None]:
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', None],
    'age': [25, 30, 35, 40, None],
    'gender': ['F', 'M', 'M', 'M', None],
    'date_of_birth': [pd.Timestamp('1998-01-01'), pd.Timestamp('1993-02-15'), 
                      pd.Timestamp('1988-06-20'), pd.Timestamp('1983-11-05'), pd.NaT]
}
df = pd.DataFrame(data)
q_df = qpd.DataFrame(df)

print('DataFrame Created:')
df

## Cleaning Functions
Test data cleaning operations on the mixed-type DataFrame.

### Test 1: dropna
Remove all rows containing any null values.

In [None]:
pd_res = df.dropna()
q_res = qpd.dropna(q_df, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: dropna')
q_res

### Test 2: dropna_col
Remove rows with null values in a specific column (age).

In [None]:
pd_res = df.dropna(subset=['age'])
q_res = qpd.dropna_col(q_df, 'age', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: dropna_col')
q_res

### Test 3: fillna
Fill null values in the age column with 0.

In [None]:
pd_res = df.fillna({'age': 0})
q_res = qpd.fillna(q_df, 'age', 0, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: fillna')
q_res

## Transformation Functions
Test structural and type transformations on the DataFrame.

### Test 4: rename
Rename the 'name' column to 'full_name'.

In [None]:
pd_res = df.rename(columns={'name': 'full_name'})
q_res = qpd.rename(q_df, {'name': 'full_name'}, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: rename')
q_res

### Test 5: cast
Convert the age column from integer to float type.

In [None]:
pd_res = df.copy()
pd_res['age'] = pd_res['age'].astype(float)
q_res = qpd.cast(q_df, 'age', 'float', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: cast')
q_res

### Test 6: drop_col
Drop the gender column from the DataFrame.

In [None]:
pd_res = df.drop(columns=['gender'])
q_res = qpd.drop_col(q_df, 'gender', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: drop_col')
q_res

## Grouping & Aggregation
Test grouping operations on categorical data.

### Test 7: groupby_sum
Group by gender and sum the age values.

In [None]:
pd_res = df.groupby('gender', dropna=False)['age'].sum()
q_res = qpd.groupby_sum(q_df, 'gender', 'age', return_type='p').set_index('gender')['age']
assert verify_correctness(pd_res, q_res)
print('✓ Passed: groupby_sum')
pd.DataFrame({'gender': q_res.index, 'age_sum': q_res.values})

### Test 8: groupby_avg
Group by gender and calculate the average age.

In [None]:
pd_res = df.groupby('gender', dropna=False)['age'].mean()
q_res = qpd.groupby_avg(q_df, 'gender', 'age', return_type='p').set_index('gender')['age']
assert verify_correctness(pd_res, q_res)
print('✓ Passed: groupby_avg')
pd.DataFrame({'gender': q_res.index, 'age_avg': q_res.values})

## Custom Function Application
Test the apply mechanism for row-wise operations.

### Test 9: apply (sum)
Apply sum function across rows on a numeric subset.

In [None]:
df_num = df[['age']]
q_num = kx.toq(df_num)
pd_res = df_num.sum(axis=1)
q_res = qpd.apply(q_num, 'sum', axis=1, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: apply (sum)')
q_res

## Introspection Functions
Test metadata retrieval capabilities.

### Test 10: dtypes
Retrieve data type information for all columns.

In [None]:
pd_res = df.dtypes
q_res = qpd.dtypes(q_df, return_type='p')
assert isinstance(q_res, pd.DataFrame)
assert len(q_res) == 4
print('✓ Passed: dtypes')
print('\nPandas dtypes:')
print(pd_res)
print('\nqutePandas dtypes (kdb+ meta):')
q_res

## Summary
All tests passed! qutePandas correctly handles mixed data types including:
- String columns
- Integer columns with nulls
- Categorical data
- Datetime objects

The library maintains compatibility with pandas while leveraging kdb+'s performance advantages.