# Mixed Data Types Testing
This notebook tests `qutePandas` functionality on a realistic small DataFrame with mixed data types:
- **Strings** (name)
- **Integers** (age)
- **Categorical** (gender)
- **Datetime** (date_of_birth)
- **Null values** across all columns

Each test validates that qutePandas produces results equivalent to pandas.

## Setup
Import libraries and configure the kdb+ license.

In [1]:
import os
import sys
sys.path.append(os.path.abspath('..'))
sys.path.append(os.path.abspath('.'))
import qutePandas as qpd
import pandas as pd
import numpy as np
import pykx as kx
from test_utils import verify_correctness

# Setup License
local_lic = os.path.abspath('../kdb_lic')
if os.path.exists(local_lic): os.environ['QLIC'] = local_lic
qpd.connect()

print('Setup Complete')

Setup Complete


## Create Test DataFrame
Define a small DataFrame with realistic mixed-type data including null values.

In [2]:
data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', None],
    'age': [25, 30, 35, 40, None],
    'gender': ['F', 'M', 'M', 'M', None],
    'date_of_birth': [pd.Timestamp('1998-01-01'), pd.Timestamp('1993-02-15'), 
                      pd.Timestamp('1988-06-20'), pd.Timestamp('1983-11-05'), pd.NaT]
}
df = pd.DataFrame(data)
q_df = qpd.DataFrame(df)

print('DataFrame Created:')
df

DataFrame Created:


Unnamed: 0,name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05
4,,,,NaT


## Cleaning Functions
Test data cleaning operations on the mixed-type DataFrame.

### Test 1: dropna
Remove all rows containing any null values.

In [3]:
pd_res = df.dropna()
q_res = qpd.dropna(q_df, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: dropna')
q_res

✓ Passed: dropna


Unnamed: 0,name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05


### Test 2: dropna_col
Remove rows with null values in a specific column (age).

In [4]:
pd_res = df.dropna(subset=['age'])
q_res = qpd.dropna_col(q_df, 'age', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: dropna_col')
q_res

✓ Passed: dropna_col


Unnamed: 0,name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05


### Test 3: fillna
Fill null values in the age column with 0.

In [5]:
pd_res = df.fillna({'age': 0})
q_res = qpd.fillna(q_df, 'age', 0, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: fillna')
q_res

✓ Passed: fillna


Unnamed: 0,name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05
4,,0.0,,NaT


## Transformation Functions
Test structural and type transformations on the DataFrame.

### Test 4: rename
Rename the 'name' column to 'full_name'.

In [6]:
pd_res = df.rename(columns={'name': 'full_name'})
q_res = qpd.rename(q_df, {'name': 'full_name'}, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: rename')
q_res

✓ Passed: rename


Unnamed: 0,full_name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05
4,,,,NaT


### Test 5: cast
Convert the age column from integer to float type.

In [7]:
pd_res = df.copy()
pd_res['age'] = pd_res['age'].astype(float)
q_res = qpd.cast(q_df, 'age', 'float', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: cast')
q_res

✓ Passed: cast


Unnamed: 0,name,age,gender,date_of_birth
0,Alice,25.0,F,1998-01-01
1,Bob,30.0,M,1993-02-15
2,Charlie,35.0,M,1988-06-20
3,David,40.0,M,1983-11-05
4,,,,NaT


### Test 6: drop_col
Drop the gender column from the DataFrame.

In [8]:
pd_res = df.drop(columns=['gender'])
q_res = qpd.drop_col(q_df, 'gender', return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: drop_col')
q_res

✓ Passed: drop_col


Unnamed: 0,name,age,date_of_birth
0,Alice,25.0,1998-01-01
1,Bob,30.0,1993-02-15
2,Charlie,35.0,1988-06-20
3,David,40.0,1983-11-05
4,,,NaT


## Grouping & Aggregation
Test grouping operations on categorical data.

### Test 7: groupby_sum
Group by gender and sum the age values.

In [9]:
pd_res = df.groupby('gender', dropna=False)['age'].sum()
q_res = qpd.groupby_sum(q_df, 'gender', 'age', return_type='p').set_index('gender')['age']
assert verify_correctness(pd_res, q_res)
print('✓ Passed: groupby_sum')
pd.DataFrame({'gender': q_res.index, 'age_sum': q_res.values})

✓ Passed: groupby_sum


Unnamed: 0,gender,age_sum
0,,0.0
1,F,25.0
2,M,105.0


### Test 8: groupby_avg
Group by gender and calculate the average age.

In [10]:
pd_res = df.groupby('gender', dropna=False)['age'].mean()
q_res = qpd.groupby_avg(q_df, 'gender', 'age', return_type='p').set_index('gender')['age']
assert verify_correctness(pd_res, q_res)
print('✓ Passed: groupby_avg')
pd.DataFrame({'gender': q_res.index, 'age_avg': q_res.values})

✓ Passed: groupby_avg


Unnamed: 0,gender,age_avg
0,,
1,F,25.0
2,M,35.0


## Custom Function Application
Test the apply mechanism for row-wise operations.

### Test 9: apply (sum)
Apply sum function across rows on a numeric subset.

In [11]:
df_num = df[['age']]
q_num = kx.toq(df_num)
pd_res = df_num.sum(axis=1)
q_res = qpd.apply(q_num, 'sum', axis=1, return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: apply (sum)')
q_res

✓ Passed: apply (sum)


0    25.0
1    30.0
2    35.0
3    40.0
4     0.0
dtype: float64

## Introspection Functions
Test metadata retrieval capabilities.

### Test 10: dtypes
Retrieve data type information for all columns.

In [12]:
pd_res = df.dtypes
q_res = qpd.dtypes(q_df, return_type='p')
assert isinstance(q_res, pd.DataFrame)
assert len(q_res) == 4
print('✓ Passed: dtypes')
print('\nPandas dtypes:')
print(pd_res)
print('\nqutePandas dtypes (kdb+ meta):')
q_res

✓ Passed: dtypes

Pandas dtypes:
name                     object
age                     float64
gender                   object
date_of_birth    datetime64[ns]
dtype: object

qutePandas dtypes (kdb+ meta):


Unnamed: 0_level_0,t,f,a
c,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
name,b's',,
age,b'f',,
gender,b's',,
date_of_birth,b'p',,


## Indexing & Selection
Test `loc` and `iloc` on mixed-type data.

### Test 11: loc
Select rows where age > 30 and specific columns.

In [13]:
mask = df['age'] > 30
q_mask = kx.toq(list(mask.values))
pd_res = df.loc[mask, ['name', 'age']]
q_res = qpd.loc(q_df, rows=q_mask, cols=['name', 'age'], return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: loc')
q_res

✓ Passed: loc


Unnamed: 0,name,age
0,Charlie,35.0
1,David,40.0


### Test 12: iloc
Select first 2 rows and columns by integer position.

In [14]:
pd_res = df.iloc[0:2, 0:2]
q_res = qpd.iloc(q_df, rows=slice(0, 2), cols=slice(0, 2), return_type='p')
assert verify_correctness(pd_res, q_res)
print('✓ Passed: iloc')
q_res

✓ Passed: iloc


Unnamed: 0,name,age
0,Alice,25.0
1,Bob,30.0


## Summary
All tests passed! qutePandas correctly handles mixed data types including:
- String columns
- Integer columns with nulls
- Categorical data
- Datetime objects
- Indexing operations (loc/iloc)

The library maintains compatibility with pandas while leveraging kdb+'s performance advantages.

## Test print
Test the print function to display tables without pandas conversion.

In [None]:
print('Testing qpd.print with full table:')
q_table = qpd.DataFrame(df)
qpd.print(q_table)

print('\nTesting qpd.print with n=3:')
qpd.print(q_table, n=3)

print('\nTest passed: print executed successfully')