### **Description**

This notebook is designed to convert medical school lecture content from an Excel file into a JSONL format suitable for creating Anki flashcards. The process involves the following steps:

1. **Library Imports**: We import the necessary libraries, `pandas` for data manipulation and `json` for handling JSON data.

2. **Loading Data**: The Excel file containing the lecture slides and corresponding Anki card information is loaded into a Pandas DataFrame. This allows us to easily access and manipulate the data.

3. **System Message Definition**: We define a system message that sets the context for an AI model. This message describes the model's role as an expert medical professor, emphasizing the importance of high-yield information that frequently appears on medical exams.

4. **Creating JSONL Structure**: We iterate through each row of the DataFrame to create a structured JSON object for each entry. Each entry includes:
   - The system message,
   - The lecture slide content,
   - The corresponding Anki card content.

5. **Writing to JSONL**: The structured data is then written to a JSONL file, with each entry saved on a new line for easy access.

6. **Completion Message**: Finally, a message is printed to confirm that the JSONL file has been successfully created.


In [2]:
import pandas as pd
import json

# Load the Excel file
file_path = '/content/Copy of Examples for LectureAgent.xlsx'  # Update with your file path
df = pd.read_excel(file_path)

# Define the system content for all entries
system_content = (
    "You are an expert medical school professor with a deep understanding of what is most high-yield for medical school exams. "
    "Your task is to review content from medical school lectures and create Anki cards to facilitate learning for medical school students. "
    "Not everything from the lecture will require conversion into anki cards and it is very important that you focus only on the most essential "
    "information that frequently appears on medical school exams and the USMLE. Use your expertise to decide what is worth memorizing, and structure "
    "the cards in a way that maximizes active recall. Prioritize clarity and relevance. You will be condensing high yield concepts into the Anki cards."
)

# Create the JSONL content
jsonl_data = []

# Iterate through each row of the dataframe and create the JSON structure
for _, row in df.iterrows():
    entry = {
        "messages": [
            {"role": "system", "content": system_content},
            {"role": "user", "content": row["input_lecture_slide"]},
            {"role": "assistant", "content": row["output_anki_cards"]}
        ]
    }
    jsonl_data.append(entry)

# Specify the output JSONL file path
output_jsonl_path = 'output_anki_cards.jsonl'

# Write the JSONL file
with open(output_jsonl_path, 'w') as jsonl_file:
    for entry in jsonl_data:
        jsonl_file.write(json.dumps(entry) + '\n')

print(f"JSONL file has been created at {output_jsonl_path}")


JSONL file has been created at output_anki_cards.jsonl
