# Dataset Recommendation Example 📊

This notebook demonstrates how to use SciSynth's dataset recommendation system to find relevant datasets for research hypotheses. The system analyzes hypotheses and suggests datasets that could be useful for testing them.

## Setup
First, let's import the required modules:

In [None]:
from app.data_recommender import recommend_datasets, extract_keywords
import pandas as pd

## 1. Extract Keywords from Hypotheses

Let's start by extracting keywords from some example hypotheses:

In [None]:
hypothesis = "Neural networks perform better with larger training datasets"
keywords = extract_keywords(hypothesis)
print("Extracted Keywords:")
print(keywords)

## 2. Get Dataset Recommendations

Now let's get dataset recommendations for multiple hypotheses:

In [None]:
hypotheses = [
    "Neural networks perform better with larger training datasets",
    "Transformer models excel at natural language understanding tasks",
    "Deep learning models require significant computational resources"
]

recommendations = recommend_datasets(hypotheses)

print("Dataset Recommendations:")
for rec in recommendations:
    print(f"\nHypothesis: {rec['hypothesis']}")
    print("Recommended Datasets:")
    for dataset in rec['datasets']:
        print(f"- {dataset}")

## 3. Analyze Recommendations

Let's create a summary of the recommendations:

In [None]:
# Create a summary DataFrame
summary_data = []
for rec in recommendations:
    summary_data.append({
        'hypothesis': rec['hypothesis'],
        'num_datasets': len(rec['datasets']),
        'top_dataset': rec['datasets'][0] if rec['datasets'] else 'None'
    })

summary_df = pd.DataFrame(summary_data)
print("\nRecommendation Summary:")
print(summary_df)