Biotechnology Study Copilot
Project Overview

The Biotechnology Study Copilot is a practical LLM-powered study assistant built using the OpenAI Chat Completions API.

It demonstrates how large language models can be applied to real-world academic workflows ‚Äî specifically for graduate-level biotechnology and bioinformatics studies.

The assistant helps transform complex scientific topics and lecture notes into:

Clear conceptual explanations (simple + technical)

Exam-style questions

Structured flashcards

Concise summaries of lecture material

Instead of passively reading notes, this tool enables active, structured learning powered by LLMs.

Real-World Use Case

Graduate biotechnology programs involve dense material such as:

Gene editing mechanisms (e.g., CRISPR-Cas9)

RNA sequencing workflows

Genome assembly algorithms

Differential gene expression analysis

Bioinformatics pipelines

This project shows how LLMs can:

Break down complex molecular biology concepts

Generate revision materials automatically

Simulate exam preparation

Convert lecture notes into structured study assets

It demonstrates a practical and personal application of AI for higher education.

 Concepts Covered

This project implements several core LLM engineering principles:

1Ô∏è OpenAI Chat Completions API

Using the Python client to send structured system and user messages.

2Ô∏è System vs User Prompts

Clear role separation to guide model behavior and maintain domain expertise.

3Ô∏è Structured JSON Outputs

The model is instructed to return valid JSON for predictable parsing and downstream use.

4Ô∏è Prompt Engineering

Carefully designed prompts to:

Control response format

Maintain scientific accuracy

Generate exam-level content

5Ô∏è Prompt Chaining

Multiple GPT calls are chained together to:

Generate explanations

Transform explanations into flashcards

Build layered outputs

6Ô∏è Environment & API Key Management

Secure API key handling using .env and python-dotenv.

7Ô∏è Reusable LLM Abstraction

A clean, reusable call_gpt() function prevents duplication and follows DRY principles.

Step 1; Install & Imports

In [None]:
import os
import json
from dotenv import load_dotenv
from openai import OpenAI

 Step 2: Load API Key from .env

In [None]:
# Load environment variables
load_dotenv()

api_key = os.getenv("OPENAI_API_KEY")

if not api_key:
    raise ValueError("OPENAI_API_KEY not found in .env file")

client = OpenAI(api_key=api_key)

print("‚úÖ API key loaded successfully.")

Step 3 : Core LLM Function
This becomes your reusable LLM engine.

In [None]:
def call_gpt(system_prompt: str, user_prompt: str, temperature: float = 0.3):
    """
    Generic GPT caller function.
    Keeps things reusable and clean.
    """
    response = client.chat.completions.create(
        model="gpt-4",
        temperature=temperature,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    
    return response.choices[0].message.content

Step 4: Study Topic Generator

In [None]:
def generate_study_pack(topic: str):
    system_prompt = """
    You are a biotechnology university professor and expert tutor.
    You explain clearly but maintain scientific accuracy.
    Always respond in valid JSON format.
    """

    user_prompt = f"""
    Create a complete study pack for the topic: {topic}

    Return in this JSON format:
    {{
        "simple_explanation": "...",
        "technical_explanation": "...",
        "exam_questions": ["...", "...", "...", "...", "..."],
        "flashcards": [
            {{"question": "...", "answer": "..."}},
            {{"question": "...", "answer": "..."}}
        ]
    }}
    """

    response = call_gpt(system_prompt, user_prompt)

    return json.loads(response)

Step 5: Test It

In [None]:
topic = "CRISPR-Cas9 gene editing mechanism"

study_pack = generate_study_pack(topic)

print("üìò SIMPLE EXPLANATION:\n")
print(study_pack["simple_explanation"])

print("\nüìö TECHNICAL EXPLANATION:\n")
print(study_pack["technical_explanation"])

print("\nüìù EXAM QUESTIONS:\n")
for q in study_pack["exam_questions"]:
    print("-", q)

Step 6: Lecture Notes Summarizer

In [None]:
def summarize_lecture_notes(notes: str):
    system_prompt = """
    You are a bioinformatics teaching assistant.
    Your job is to summarize lecture notes clearly and extract key learning points.
    """

    user_prompt = f"""
    Summarize the following lecture notes.
    Then extract:
    - 5 key concepts
    - 3 potential exam questions

    Lecture notes:
    {notes}
    """

    return call_gpt(system_prompt, user_prompt)

Step 7: Test with Sample Notes

In [None]:
sample_notes = """
RNA-Seq is a next-generation sequencing technique used to analyze the transcriptome.
It involves reverse transcription of RNA into cDNA, fragmentation, sequencing, 
and alignment to a reference genome. Differential gene expression analysis 
is performed to compare biological conditions.
"""

summary = summarize_lecture_notes(sample_notes)

print(summary)

Step 8: Prompt Chaining
Generate explanation
Turn explanation into flashcards

In [None]:
def explanation_to_flashcards(explanation: str):
    system_prompt = "You are a biotech exam preparation assistant."
    
    user_prompt = f"""
    Convert the following explanation into 5 high-quality flashcards:

    {explanation}

    Format:
    Q: ...
    A: ...
    """

    return call_gpt(system_prompt, user_prompt)

Step 9 : Test

In [None]:
flashcards = explanation_to_flashcards(study_pack["technical_explanation"])
print(flashcards)