# Neuralk NICLClassifier: Housing Price Classification

This notebook demonstrates Neuralk's **In-Context Learning (NICL)** classifier on a real-world housing dataset. The `NICLClassifier` follows the standard scikit-learn API, making it easy to integrate into existing ML pipelines.

---

## Getting Access

To use the Neuralk Cloud API, you need an API key. Run the following command to get started:

```bash
neuralk login
```

This will display instructions and a link to create your account at: https://prediction.neuralk-ai.com/register

Your API key will be generated upon registration. It looks like `nk_live_xxxxxxxxxxxx`.

## Setting up your API key

You can provide your API key in two ways:

**Option 1: Environment variable (recommended)**

```bash
# Linux/macOS
export NEURALK_API_KEY=nk_live_your_api_key_here

# Windows
set NEURALK_API_KEY=nk_live_your_api_key_here
```

**Option 2: Pass directly to the classifier**

```python
from neuralk import NICLClassifier
clf = NICLClassifier(api_key="nk_live_your_api_key_here")
```

## Setup



In [None]:
%%capture
%pip install -q neuralk scikit-learn skrub

In [None]:
import os
import warnings

import skrub
from sklearn.datasets import make_classification
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_validate, train_test_split
from sklearn.pipeline import make_pipeline

from neuralk import NICLClassifier, datasets

skrub.set_config(use_table_report=False)
warnings.filterwarnings("ignore", message="Found unknown categories.*during transform")

API_KEY = os.environ.get("NEURALK_API_KEY")

In [None]:
X, y = make_classification(random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

print(f"Train: {X_train.shape[0]} samples, {X_train.shape[1]} features")
print(f"Test:  {X_test.shape[0]} samples")

In [None]:
# Note: In-context learning models are pretrained.
# The fit() call stores training data for context but does not perform traditional model fitting.
classifier = NICLClassifier(api_key=API_KEY).fit(X_train, y_train)

predictions = classifier.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

print(f"Accuracy: {accuracy:.2%}")

---

## 2. Real-World Dataset: Housing Price Classification

Now we apply the classifier to a real dataset containing house descriptions and sale prices. The target is the sale price binned into categories.

In [None]:
X, y = datasets.housing()

print(f"Dataset: {X.shape[0]} samples, {X.shape[1]} features")
print(f"Classes: {y.nunique()}")
X.assign(Sale_Price=y).head()

### Preprocessing Pipeline

The dataset contains mixed types (numeric, categorical, text). The `NICLClassifier` accepts only numeric input, so we build a preprocessing pipeline:

1. **TableVectorizer** - encodes non-numeric columns
2. **SquashingScaler** - normalizes feature scales
3. **SimpleImputer** - handles missing values
4. **PCA** - reduces dimensionality for optimal context efficiency

In [None]:
nicl_pipeline = make_pipeline(
    skrub.TableVectorizer(),
    skrub.SquashingScaler(),
    SimpleImputer(),
    NICLClassifier(api_key=API_KEY),
)

In [None]:
cv_results = cross_validate(nicl_pipeline, X, y, error_score="raise", scoring="accuracy")

print(f"NICL Accuracy: {cv_results['test_score'].mean():.2%} (+/- {cv_results['test_score'].std():.2%})")
print(f"Fold scores: {[f'{s:.2%}' for s in cv_results['test_score']]}")

---

## 3. Baseline Comparison: Gradient Boosting

For reference, we compare against `HistGradientBoostingClassifier` using the same preprocessing (without PCA, which degrades tree-based models).

In [None]:
baseline_pipeline = make_pipeline(
    skrub.TableVectorizer(),
    skrub.SquashingScaler(),
    SimpleImputer(),
    HistGradientBoostingClassifier(),
)

cv_results_baseline = cross_validate(baseline_pipeline, X, y, error_score="raise", scoring="accuracy")

print(f"Gradient Boosting Accuracy: {cv_results_baseline['test_score'].mean():.2%} (+/- {cv_results_baseline['test_score'].std():.2%})")
print(f"Fold scores: {[f'{s:.2%}' for s in cv_results_baseline['test_score']]}")

---

## Summary

The `NICLClassifier` achieves competitive accuracy with minimal setup:

- **No training required** - pretrained model uses context directly
- **scikit-learn compatible** - integrates seamlessly with existing pipelines
- **Handles large datasets** - automatic subsampling when needed

For advanced usage including custom context selection, see the [Neuralk documentation](https://docs.neuralk-ai.com/).