# SciSynth: Basic Research Pipeline Example 🧪

This notebook demonstrates the basic workflow of using SciSynth to analyze a research paper and generate insights. The pipeline includes:

1. Loading and processing a research paper
2. Generating insights from the paper
3. Generating hypotheses based on the insights
4. Testing a simple model on sample data
5. Visualizing results and citation networks

## Setup
First, let's import the required modules:

In [None]:
import os
from pathlib import Path
from app.ingest_paper import extract_text_from_pdf, chunk_and_index
from app.insight_agent import generate_insights
from app.hypothesis_gen import generate_hypotheses
from app.model_tester import run_basic_model
from app.citation_network import CitationNetwork
import matplotlib.pyplot as plt
import pandas as pd

## 1. Load and Process a Research Paper

First, we'll load a sample research paper and process it into manageable chunks:

In [None]:
# Example with a sample paper
paper_path = "data/sample_paper.pdf"
text = extract_text_from_pdf(paper_path)
index = chunk_and_index(text)
print(f"Processed {len(text.split())} words from the paper")

## 2. Generate Insights

Next, we'll use our insight generation system to extract key points from the paper:

In [None]:
insights = generate_insights(index)
print("Key Insights:")
for i, insight in enumerate(insights.split('\n'), 1):
    if insight.strip():
        print(f"{i}. {insight.strip()}")

## 3. Generate Hypotheses

Based on the insights, we'll generate testable research hypotheses:

In [None]:
hypotheses = generate_hypotheses(insights)
print("Generated Hypotheses:")
for i, hypothesis in enumerate(hypotheses.split('\n'), 1):
    if hypothesis.strip():
        print(f"{i}. {hypothesis.strip()}")

## 4. Test a Simple Model

Let's test a simple model on our sample dataset to evaluate our hypotheses:

In [None]:
# Example with a sample dataset
dataset_path = "data/sample_dataset.csv"
results = run_basic_model(dataset_path, target_column="growth_rate")
print("Model Results:")
for metric, value in results.items():
    if metric == "feature_importance":
        print("\nFeature Importance:")
        for feature, importance in value.items():
            print(f"{feature}: {importance:.3f}")
    else:
        print(f"\n{metric}: {value:.3f}")

## 5. Visualize Results

Finally, let's create visualizations of our results and the citation network:

In [None]:
# Visualize feature importance
plt.figure(figsize=(10, 6))
features = list(results['feature_importance'].keys())
importances = list(results['feature_importance'].values())

plt.bar(features, importances)
plt.title('Feature Importance')
plt.xticks(rotation=45)
plt.ylabel('Importance Score')
plt.tight_layout()
plt.show()

# Visualize citation network
network = CitationNetwork()
graph, relationships = network.analyze_papers(Path("results"))

plt.figure(figsize=(12, 8))
network.visualize_network()
plt.show()