# LoRA Fine-Tuning Demo

This notebook demonstrates how to configure LoRA (Low-Rank Adaptation) parameters for sentiment analysis fine-tuning on movie reviews. It uses an NPC (Natural Processing Component) to recommend optimal hyperparameters based on the dataset characteristics.

## Imports

In [None]:
import pandas as pd
from sklearn.datasets import fetch_20newsgroups
from npcpy.npc_compiler import NPC

## Data Preparation

Define a function to prepare movie review data using the 20newsgroups dataset as a proxy.

In [None]:
def prepare_movie_review_data():
    # Using 20newsgroups as proxy for review data
    categories = ['rec.arts.movies.current-films', 'rec.arts.movies.past-films']
    newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
    
    reviews = []
    for text, label in zip(newsgroups.data[:100], newsgroups.target[:100]):
        sentiment = 'positive' if len(text.split()) > 50 else 'negative'  # Simple heuristic
        reviews.append({'text': text[:200], 'sentiment': sentiment})
    
    return pd.DataFrame(reviews)

### Test Data Preparation

Generate the dataset and examine the first few rows.

In [None]:
df = prepare_movie_review_data()
print(f"Dataset size: {len(df)} samples")
print(f"\nSentiment distribution:")
print(df['sentiment'].value_counts())
print(f"\nFirst few rows:")
display(df.head())

## LoRA Configuration Demo

Define a function that uses an NPC to recommend optimal LoRA hyperparameters based on the dataset.

In [None]:
def movie_review_lora_demo():
    df = prepare_movie_review_data()
    
    trainer_npc = NPC(
        name='LoRA Trainer',
        primary_directive='Configure LoRA parameters for sentiment analysis fine-tuning',
        model='llama3.2',
        provider='ollama'
    )
    
    config_format = '''
    {"lora_config": {
        "r": 16,
        "lora_alpha": 32, 
        "target_modules": ["q_proj", "v_proj"],
        "lora_dropout": 0.05,
        "learning_rate": 2e-4,
        "epochs": 3
    }}
    '''
    
    prompt = f"""Configure LoRA parameters for sentiment analysis on movie reviews.
Dataset size: {len(df)} samples
Task: Binary sentiment classification

Recommend optimal LoRA hyperparameters.
Format as: {config_format}"""
    
    response = trainer_npc.get_llm_response(prompt, format='json')
    config = response['response']['lora_config']
    
    # Save training data and config
    df.to_csv('movie_reviews_training.csv', index=False)
    config_df = pd.DataFrame([config])
    config_df.to_csv('lora_config.csv', index=False)
    
    print("LoRA configuration and training data prepared")
    print(f"Config: {config}")
    return df, config

## Run the Demo

Execute the full LoRA configuration demo and display results.

In [None]:
movie_data, lora_config = movie_review_lora_demo()

print("\n" + "="*50)
print("DEMO RESULTS")
print("="*50)
print(f"\nGenerated {len(movie_data)} training samples")
print(f"\nRecommended LoRA Configuration:")
for key, value in lora_config.items():
    print(f"  {key}: {value}")

## Inspect Generated Files

Examine the CSV files that were created.

In [None]:
# Load and display the training data CSV
training_data = pd.read_csv('movie_reviews_training.csv')
print("Training Data CSV:")
display(training_data.head(10))

# Load and display the config CSV
config_data = pd.read_csv('lora_config.csv')
print("\nLoRA Config CSV:")
display(config_data)