<a href="https://colab.research.google.com/github/ancestor9/2025_Fall_AI-Model-Operations-MLOps/blob/main/week09_Chavrusa_04/gemini_pydantic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pydantic to structure output from LLMs
- Pydantic을 사용하여 Gemini (LLM) 모델의 출력을 구조화하고, 이 구조화된 데이터를 파이썬에서 효과적으로 다루는 방법을 설명
- Gemini 모델에게 JSON 형식의 데이터를 생성하도록 요청하고, 이 JSON 문자열을 Pydantic 모델을 사용하여 강력하고 다루기 쉬운 Python 객체로 **역직렬화(Deserialize)** 하는 과정
- LLM이 생성하는 비정형(unstructured) 출력을 Pydantic의 강력한 타입 시스템을 통해 정형(structured) 데이터로 변환하여, 후속 작업(분석, 저장, 처리)을 안전하고 쉽게 수행할 수 있도록 해주는 모범적인 예시

In [2]:
from google import genai
import os

from google.colab import userdata
GEMINI_API_KEY = userdata.get('gemini-key')

client = genai.Client(api_key= GEMINI_API_KEY)

response = client.models.generate_content(
    model="gemini-2.5-flash", contents="Explain how AI works in a few words"
)
print(response.text)

AI lets computers **learn from data to make intelligent decisions.**


In [3]:
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="""You are a helpful assistant. I need you to create a JSON object representing a library.
    The library's name should be 'Coolu Libraru' and have the fields name and books that contains a list of book.
    Each book should have a 'title', 'author', and 'year' field. Make sure the output is a single, valid JSON object. Give me 10 books.
    Remove ```json and ``` """,
)

response.text

'{\n  "name": "Coolu Libraru",\n  "books": [\n    {\n      "title": "The Great Gatsby",\n      "author": "F. Scott Fitzgerald",\n      "year": 1925\n    },\n    {\n      "title": "To Kill a Mockingbird",\n      "author": "Harper Lee",\n      "year": 1960\n    },\n    {\n      "title": "1984",\n      "author": "George Orwell",\n      "year": 1949\n    },\n    {\n      "title": "Pride and Prejudice",\n      "author": "Jane Austen",\n      "year": 1813\n    },\n    {\n      "title": "The Catcher in the Rye",\n      "author": "J.D. Salinger",\n      "year": 1951\n    },\n    {\n      "title": "Moby Dick",\n      "author": "Herman Melville",\n      "year": 1851\n    },\n    {\n      "title": "War and Peace",\n      "author": "Leo Tolstoy",\n      "year": 1869\n    },\n    {\n      "title": "The Hobbit",\n      "author": "J.R.R. Tolkien",\n      "year": 1937\n    },\n    {\n      "title": "Brave New World",\n      "author": "Aldous Huxley",\n      "year": 1932\n    },\n    {\n      "title": 

In [4]:
print(response.text)

{
  "name": "Coolu Libraru",
  "books": [
    {
      "title": "The Great Gatsby",
      "author": "F. Scott Fitzgerald",
      "year": 1925
    },
    {
      "title": "To Kill a Mockingbird",
      "author": "Harper Lee",
      "year": 1960
    },
    {
      "title": "1984",
      "author": "George Orwell",
      "year": 1949
    },
    {
      "title": "Pride and Prejudice",
      "author": "Jane Austen",
      "year": 1813
    },
    {
      "title": "The Catcher in the Rye",
      "author": "J.D. Salinger",
      "year": 1951
    },
    {
      "title": "Moby Dick",
      "author": "Herman Melville",
      "year": 1851
    },
    {
      "title": "War and Peace",
      "author": "Leo Tolstoy",
      "year": 1869
    },
    {
      "title": "The Hobbit",
      "author": "J.R.R. Tolkien",
      "year": 1937
    },
    {
      "title": "Brave New World",
      "author": "Aldous Huxley",
      "year": 1932
    },
    {
      "title": "Crime and Punishment",
      "author": "Fyodor Do

- model_validate_json(): Gemini가 반환한 JSON 문자열을 곧바로 Library 모델 객체로 변환하고 유효성 검사를 수행
- JSON 문자열은 객체 지향 방식으로 다룰 수 있는 library 객체로 변경

In [5]:
from pydantic import BaseModel, Field
from typing import List
from datetime import datetime

class Book(BaseModel):
    title: str
    author: str
    year: int = Field(gt = 1000, lt = datetime.now().year)

class Library(BaseModel):
    name: str
    books: List[Book]


library = Library.model_validate_json(response.text)
library

Library(name='Coolu Libraru', books=[Book(title='The Great Gatsby', author='F. Scott Fitzgerald', year=1925), Book(title='To Kill a Mockingbird', author='Harper Lee', year=1960), Book(title='1984', author='George Orwell', year=1949), Book(title='Pride and Prejudice', author='Jane Austen', year=1813), Book(title='The Catcher in the Rye', author='J.D. Salinger', year=1951), Book(title='Moby Dick', author='Herman Melville', year=1851), Book(title='War and Peace', author='Leo Tolstoy', year=1869), Book(title='The Hobbit', author='J.R.R. Tolkien', year=1937), Book(title='Brave New World', author='Aldous Huxley', year=1932), Book(title='Crime and Punishment', author='Fyodor Dostoevsky', year=1866)])

In [6]:
# 객체(Object)가 가지고 있는 속성(Attributes)과 그 값을 딕셔너리(dictionary) 형태로 보여주는 특별한 속성
library.__dict__

{'name': 'Coolu Libraru',
 'books': [Book(title='The Great Gatsby', author='F. Scott Fitzgerald', year=1925),
  Book(title='To Kill a Mockingbird', author='Harper Lee', year=1960),
  Book(title='1984', author='George Orwell', year=1949),
  Book(title='Pride and Prejudice', author='Jane Austen', year=1813),
  Book(title='The Catcher in the Rye', author='J.D. Salinger', year=1951),
  Book(title='Moby Dick', author='Herman Melville', year=1851),
  Book(title='War and Peace', author='Leo Tolstoy', year=1869),
  Book(title='The Hobbit', author='J.R.R. Tolkien', year=1937),
  Book(title='Brave New World', author='Aldous Huxley', year=1932),
  Book(title='Crime and Punishment', author='Fyodor Dostoevsky', year=1866)]}

In [7]:
type(library)

In [8]:
isinstance(library, BaseModel)

True

In [16]:
# 도서관 이름 접근
library.name

'Coolu Libraru'

In [17]:
# # 책 리스트 접근
library.books

[Book(title='The Great Gatsby', author='F. Scott Fitzgerald', year=1925),
 Book(title='To Kill a Mockingbird', author='Harper Lee', year=1960),
 Book(title='1984', author='George Orwell', year=1949),
 Book(title='Pride and Prejudice', author='Jane Austen', year=1813),
 Book(title='The Catcher in the Rye', author='J.D. Salinger', year=1951),
 Book(title='Moby Dick', author='Herman Melville', year=1851),
 Book(title='War and Peace', author='Leo Tolstoy', year=1869),
 Book(title='The Hobbit', author='J.R.R. Tolkien', year=1937),
 Book(title='Brave New World', author='Aldous Huxley', year=1932),
 Book(title='Crime and Punishment', author='Fyodor Dostoevsky', year=1866)]

extract titles into a list

In [18]:
# 리스트 컴프리헨션을 사용한 데이터 추출 및 필터링
titles = [book.title for book in library.books]
titles

['The Great Gatsby',
 'To Kill a Mockingbird',
 '1984',
 'Pride and Prejudice',
 'The Catcher in the Rye',
 'Moby Dick',
 'War and Peace',
 'The Hobbit',
 'Brave New World',
 'Crime and Punishment']

extract titles with books after certain year

In [12]:
newer_books = [(book.title, book.year) for book in library.books if book.year > 1950]
newer_books

[('To Kill a Mockingbird', 1960), ('The Catcher in the Rye', 1951)]

to get back json data use model_dump()
- **직렬화**: library.model_dump()를 사용하여 Pydantic 객체를 다시 Python 딕셔너리로 변환하거나, library.model_dump_json()를 사용하여 JSON 파일로 저장

In [19]:
library.model_dump()

{'name': 'Coolu Libraru',
 'books': [{'title': 'The Great Gatsby',
   'author': 'F. Scott Fitzgerald',
   'year': 1925},
  {'title': 'To Kill a Mockingbird', 'author': 'Harper Lee', 'year': 1960},
  {'title': '1984', 'author': 'George Orwell', 'year': 1949},
  {'title': 'Pride and Prejudice', 'author': 'Jane Austen', 'year': 1813},
  {'title': 'The Catcher in the Rye', 'author': 'J.D. Salinger', 'year': 1951},
  {'title': 'Moby Dick', 'author': 'Herman Melville', 'year': 1851},
  {'title': 'War and Peace', 'author': 'Leo Tolstoy', 'year': 1869},
  {'title': 'The Hobbit', 'author': 'J.R.R. Tolkien', 'year': 1937},
  {'title': 'Brave New World', 'author': 'Aldous Huxley', 'year': 1932},
  {'title': 'Crime and Punishment',
   'author': 'Fyodor Dostoevsky',
   'year': 1866}]}

In [20]:
library.model_dump_json()

'{"name":"Coolu Libraru","books":[{"title":"The Great Gatsby","author":"F. Scott Fitzgerald","year":1925},{"title":"To Kill a Mockingbird","author":"Harper Lee","year":1960},{"title":"1984","author":"George Orwell","year":1949},{"title":"Pride and Prejudice","author":"Jane Austen","year":1813},{"title":"The Catcher in the Rye","author":"J.D. Salinger","year":1951},{"title":"Moby Dick","author":"Herman Melville","year":1851},{"title":"War and Peace","author":"Leo Tolstoy","year":1869},{"title":"The Hobbit","author":"J.R.R. Tolkien","year":1937},{"title":"Brave New World","author":"Aldous Huxley","year":1932},{"title":"Crime and Punishment","author":"Fyodor Dostoevsky","year":1866}]}'

output json file

In [14]:
with open("library.json", "w") as json_file:
    json_file.write(library.model_dump_json())

create pandas dataframe
- 데이터프레임 생성: 추출된 데이터를 사용하여 Pandas DataFrame을 생성, 이는 구조화된 데이터를 데이터 분석에 활용하는 일반적인 방법

In [15]:
import pandas as pd

titles = [book.title for book in library.books]
years = [book.year for book in library.books]
authors = [book.author for book in library.books]

pd.DataFrame({"title": titles, "year": years, "author": authors})

Unnamed: 0,title,year,author
0,The Great Gatsby,1925,F. Scott Fitzgerald
1,To Kill a Mockingbird,1960,Harper Lee
2,1984,1949,George Orwell
3,Pride and Prejudice,1813,Jane Austen
4,The Catcher in the Rye,1951,J.D. Salinger
5,Moby Dick,1851,Herman Melville
6,War and Peace,1869,Leo Tolstoy
7,The Hobbit,1937,J.R.R. Tolkien
8,Brave New World,1932,Aldous Huxley
9,Crime and Punishment,1866,Fyodor Dostoevsky
