<a href="https://colab.research.google.com/github/ancestor9/Data-Analyst-with-Gemini-/blob/main/8%EC%9D%BC%EC%B0%A8/1210_01_groq_output_parser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **[Groq API](https://wikidocs.net/259655)**

- https://console.groq.com/playground
- https://python.langchain.com/docs/how_to/sequence/ **(Langchain Tutorial)**
- https://wikidocs.net/book/14314 **(한글판 랑체인 튜토리얼)**

In [None]:
from google.colab import userdata
groq_key = userdata.get('groq')

In [None]:
# LangChain과 Groq API를 연결하는 패키지.
# Groq은 초고속 LLM 서비스를 제공하는 AI 회사이며, 특히 LLaMA, Mixtral, Gemma 등의 모델을 빠르게 실행할 수 있음.
# 이 패키지를 사용하면 LangChain을 통해 Groq의 LLM을 손쉽게 활용할 수 있음.
%%capture
!pip install langchain-groq --quiet

In [None]:
from langchain_groq import ChatGroq

# ChatGroq 모델 초기화
llm = ChatGroq(
    model="gemma2-9b-it", # google/gemma-2-9b-it
    temperature=0.7,
    max_tokens=300,
    api_key=groq_key
)

In [None]:
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7d7af4d12d50>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7d7af50a3f10>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=300)

In [None]:
llm.predict("안녕하세요?")

  llm.predict("안녕하세요?")


'안녕하세요! 👋  무엇을 도와드릴까요? 😊\n'

In [None]:
# 프롬프트 템플릿 정의
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "당신은 친절하고 유익한 AI 조수입니다. 한국의 역사와 문화에 대해 잘 알고 있습니다."),
    ("human", "{question}")
])

prompt

ChatPromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='당신은 친절하고 유익한 AI 조수입니다. 한국의 역사와 문화에 대해 잘 알고 있습니다.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], input_types={}, partial_variables={}, template='{question}'), additional_kwargs={})])

In [None]:
# Chain 생성
chain = prompt | llm

In [None]:
# 질문 리스트
questions = [
    "한글의 창제 원리는 무엇인가요?",
    "김치의 역사와 문화적 중요성에 대해 설명해주세요.",
    "조선시대의 과거 제도에 대해 간단히 설명해주세요."
]

# 각 질문에 대한 답변 생성
for question in questions:
    response = chain.invoke({"question": question})
    print(f"질문: {question}")
    print(f"답변: {response.content}\n") # Use response.content to access the text
    print("*" * 150)

질문: 한글의 창제 원리는 무엇인가요?
답변: 한글의 창제 원리는 **자연의 소리를 따라 글자를 만들어내는 것**에 있습니다. 

세종대왕이 백성들이 쉽게 배우고 쓰기 쉽도록 만들고자 했던 것입니다.  

좀 더 자세히 설명드리자면:

* **자음**: 자음은 입술, 혀, 기침, 찌르는 힘 등 **음성을 내는 기관의 움직임**을 본떠 만들어졌습니다. 
* **모음**: 모음은 **음성을 낼 때 입술과 혀가 만드는 모양**을 본떠 만들어졌습니다.

이러한 자연의 원리를 바탕으로 한글은 매우 체계적이고 논리적인 글자 체계를 가지고 있습니다.  한글의 창제 원리는 단순하지만 매우 혁신적이었으며, 세계적으로도 칭찬받는 사례입니다. 


궁금한 점이 있으시면 언제든지 물어보세요! 😊


******************************************************************************************************************************************************
질문: 김치의 역사와 문화적 중요성에 대해 설명해주세요.
답변: ## 김치: 한국의 맛과 문화를 담은 전통 음식

김치는 단순한 음식을 넘어 한국인의 삶과 문화를 대변하는 중요한 요소입니다. 

**역사**:

* **기원:** 김치의 역사는 정확히 언급되지는 않았지만, 기원전 300년경부터 발달되었다는 추측이 있습니다. 초기에는 염장, 발효 등을 이용하여 음식을 보관하는 방식이었고, 곧 맛의 변화와 영양적 가치를 높이는 과정으로 발전했습니다.
* **고려 시대:** 김치가 보다 다양하게 만들어지기 시작했으며,  "김치"라는 명칭이 처음 등장합니다. 
* **조선 시대:** 김치는 궁중 음식부터 백성들의 식탁까지 널리 섭취되기 시작했습니다. 다양한 재료와 레시피가 개발되면서 한국의 지역 특성을 반영하는 김치 문화가 형성되었습니다.

**문화적 중요성**:

* **음식 문화:** 김치는 한국인의 식탁에서 빠질 수 없는 필

### **UI 만들기**

In [None]:
# prompt: gradio로 만들어줘
%%capture
!pip install gradio --quiet

### **gradio**

In [None]:
from google.colab import userdata
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
import gradio as gr

groq_key = userdata.get('groq')

# ChatGroq 모델 초기화
llm = ChatGroq(
    model="gemma2-9b-it",
    temperature=0.7,
    max_tokens=300,
    api_key=groq_key
)

# 프롬프트 템플릿 정의
prompt = ChatPromptTemplate.from_messages([
    ("system", "당신은 친절하고 유익한 AI 조수입니다. 한국의 역사와 문화에 대해 잘 알고 있습니다."),
    ("human", "{question}")
])

# Chain 생성
chain = prompt | llm

def predict(message):
    response = chain.invoke({"question": message})
    return response.content

iface = gr.Interface(
    fn=predict,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="text",
    title="Korean History & Culture Q&A",
    description="Ask me anything about Korean history and culture!",
)

iface.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a2bff48681be50660a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
from langchain_core.prompts import PromptTemplate

# template 정의. {country}는 변수로, 이후에 값이 들어갈 자리를 의미
template = "{country}의 수도는 어디인가요?"

# from_template 메소드를 이용하여 PromptTemplate 객체 생성
prompt_template = PromptTemplate.from_template(template) # Change prompt to prompt_template

# prompt 생성. format 메소드를 이용하여 변수에 값을 넣어줌
prompt_string = prompt_template.format(country="대한민국") # Create a new variable to hold the formatted string


# chain 생성
chain = prompt_template | llm  # Use the original prompt_template object in the chain

In [None]:
chain

PromptTemplate(input_variables=['country'], input_types={}, partial_variables={}, template='{country}의 수도는 어디인가요?')
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x7985d3d710d0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x7985d45ae090>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=300)

In [None]:
# country 변수에 입력된 값이 자동으로 치환되어 수행됨
chain.invoke("대한민국").content

'대한민국의 수도는 **서울**입니다. 😊  \n'

In [None]:
chain.invoke("프랑스").content

'프랑스의 수도는 **파리**입니다. 🇫🇷  \n'

# **출력파서(Output Parser)**
### **LangChain의 출력파서는 언어 모델(LLM)의 출력을 더 유용하고 구조화된 형태로 변환하는 중요한 컴포넌트**

## **1. Fewshot Prompt**

In [None]:
from google.colab import userdata
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
import gradio as gr

# Retrieve Groq API key
groq_key = userdata.get('groq')

# Initialize ChatGroq model
llm = ChatGroq(
    model="gemma2-9b-it",
    temperature=0.7,
    max_tokens=300,
    api_key=groq_key
)

# Few-Shot Prompting Template for Korean Historical Figures
few_shot_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert in providing concise, informative descriptions of Korean historical figures.
    Always provide a response in the following format:
    - Name: [Full Name]
    - Era: [Historical Period]
    - Key Achievements: [3-4 most significant contributions]
    - Impact: [Lasting influence on Korean history]"""),

    # Few-shot examples to guide the model's response
    ("human", "Tell me about King Sejong"),
    ("ai", """- Name: 세종대왕 (King Sejong the Great)
            - Era: Joseon Dynasty (1418-1450)
            - Key Achievements:
            1. Created Hangul (Korean alphabet)
            2. Advanced scientific and cultural development
            3. Expanded agricultural techniques
            4. Promoted education and scholarship
            - Impact: Considered one of the most important monarchs in Korean history, revolutionized communication and cultural understanding"""),

    ("human", "Tell me about Admiral Yi Sun-sin"),
    ("ai", """- Name: 이순신 (Admiral Yi Sun-sin)
            - Era: Joseon Dynasty (Late 16th century)
            - Key Achievements:
            1. Defended Korea against Japanese invasions
            2. Invented the Turtle Ship (Geobukseon)
            3. Won 23 consecutive naval battles
            4. Exemplified military strategy and leadership
            - Impact: National hero who prevented Japanese conquest and saved Korea during the Imjin War"""),

    # The actual query will be added dynamically
    ("human", "{historical_figure}")
])

# Create the chain
chain = few_shot_prompt | llm

# Gradio interface function
def predict(historical_figure):
    response = chain.invoke({"historical_figure": historical_figure})
    return response.content

predict('경복궁')

'- Name: 경복궁 (Gyeongbokgung)\n- Era: Joseon Dynasty (Built in 1395)\n- Key Achievements:\n    1.  Largest and most magnificent royal palace in Seoul\n    2. Served as the main residence of Joseon Dynasty kings\n    3.  Home to numerous architectural marvels and cultural treasures\n    4.  Symbol of Korean history and traditional aesthetics\n- Impact:  Iconic landmark and UNESCO World Heritage site, representing the grandeur and cultural heritage of Korea.  A popular tourist destination and important site for historical and cultural study. \n\n\n'

In [None]:
# Create Gradio interface
iface = gr.Interface(
    fn=predict,
    inputs=gr.Textbox(lines=1, placeholder="Enter a Korean historical figure's name..."),
    outputs="text",
    title="Korean Historical Figures Insights",
    description="Get structured information about significant people in Korean history"
)

# Launch the interface
iface.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://c5d08cdafdedafcec1.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## **2. PydanticOuputParser**
- **PydanticOutputParser 는 언어 모델의 출력을 더 구조화된 정보로 변환 하는 데 도움이 되는 클래스**
- **단순 텍스트 형태의 응답 대신, 사용자가 필요로 하는 정보를 명확하고 체계적인 형태로 제공**

In [None]:
from typing import List
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from langchain_groq import ChatGroq
from google.colab import userdata
import gradio as gr

# Retrieve Groq API key
groq_key = userdata.get('groq')

# Define a Pydantic model for structured historical figure information
class HistoricalFigure(BaseModel):
    name: str = Field(description="Full name of the historical figure")
    korean_name: str = Field(description="Name in Korean characters")
    birth_year: int = Field(description="Year of birth")
    death_year: int = Field(description="Year of death")
    era: str = Field(description="Historical period")
    key_achievements: List[str] = Field(description="3-4 most significant contributions")
    impact: str = Field(description="Lasting influence on Korean history")
    interesting_fact: str = Field(description="A unique or surprising detail about the figure")

# Initialize ChatGroq model
llm = ChatGroq(
    model="gemma2-9b-it",
    temperature=0.7,
    max_tokens=500,
    api_key=groq_key
)

# Create a PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=HistoricalFigure)

# Create a prompt template that includes output instructions
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert historian specializing in Korean history.
    Provide detailed, accurate information about historical figures.

    Only respond with information about **Korean historical people**, not places or buildings.

    {format_instructions}

    Please provide comprehensive information about the requested historical figure."""),
    ("human", "{historical_figure}")
])



# Combine the prompt, parsing instructions, and model
chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser

# Gradio interface function
def get_historical_figure_info(figure_name):
    try:
        result = chain.invoke({"historical_figure": figure_name})
        # Convert Pydantic model to a formatted string
        return "\n".join([
            f"**Name:** {result.name} ({result.korean_name})",
            f"**Lived:** {result.birth_year} - {result.death_year}",
            f"**Era:** {result.era}",
            "**Key Achievements:**",
            *[f"- {achievement}" for achievement in result.key_achievements],
            f"\n**Historical Impact:** {result.impact}",
            f"\n**Interesting Fact:** {result.interesting_fact}"
        ])
    except Exception as e:
        return f"An error occurred: {str(e)}"

# Create Gradio interface
iface = gr.Interface(
    fn=get_historical_figure_info,
    inputs=gr.Textbox(lines=1, placeholder="Enter a Korean historical figure's name..."),
    outputs="markdown",
    title="Structured Korean Historical Figures",
    description="Get detailed, structured information about significant people in Korean history"
)

# Launch the interface
iface.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1ff05f3fe5f8e84ed4.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## **3. JsonOutputParser**
- **출력 파서는 사용자가 원하는 JSON 스키마를 지정할 수 있게 해주며, 그 스키마에 맞게 LLM에서 데이터를 조회하여 결과를 도출**
- **LLM이 데이터를 정확하고 효율적으로 처리하여 원하는 형태의 JSON을 생성하기 위해서는, 모델의 용량(여기서는 인텔리전스를 의미**
- **예. llama-70B 이 llama-8B 보다 용량이 크다) 이 충분해야 한다는 점을 참고**

In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
from langchain_groq import ChatGroq
from google.colab import userdata
import gradio as gr
import json

# Retrieve Groq API key
groq_key = userdata.get('groq')

# Initialize ChatGroq model
llm = ChatGroq(
    model="gemma2-9b-it",
    temperature=0.7,
    max_tokens=500,
    api_key=groq_key
)

# Create a JsonOutputParser
parser = JsonOutputParser()

# Create a prompt template that includes JSON output instructions
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert historian specializing in Korean history.
    Provide detailed, accurate information about historical figures in a strict JSON format.

    Respond with a JSON object containing the following keys:
    - name: Full name of the historical figure
    - korean_name: Name in Korean characters
    - birth_year: Year of birth (integer)
    - death_year: Year of death (integer)
    - era: Historical period
    - key_achievements: List of most significant contributions
    - impact: Lasting influence on Korean history
    - interesting_fact: A unique or surprising detail about the figure

    {format_instructions}"""),
    ("human", "Tell me about {historical_figure}")
])

# Combine the prompt, parsing instructions, and model
chain = prompt.partial(format_instructions=parser.get_format_instructions()) | llm | parser

# Gradio interface function
def get_historical_figure_info(figure_name):
    try:
        result = chain.invoke({"historical_figure": figure_name})

        # Format the JSON result as a readable markdown string
        formatted_output = "\n".join([
            f"**Name:** {result.get('name', 'N/A')} ({result.get('korean_name', 'N/A')})",
            f"**Lived:** {result.get('birth_year', 'N/A')} - {result.get('death_year', 'N/A')}",
            f"**Era:** {result.get('era', 'N/A')}",
            "**Key Achievements:**",
            *[f"- {achievement}" for achievement in result.get('key_achievements', [])],
            f"\n**Historical Impact:** {result.get('impact', 'N/A')}",
            f"\n**Interesting Fact:** {result.get('interesting_fact', 'N/A')}"
        ])

        # Also return the raw JSON for reference
        return formatted_output + f"\n\n**Raw JSON:**\n```json\n{json.dumps(result, indent=2, ensure_ascii=False)}```"
    except Exception as e:
        return f"An error occurred: {str(e)}"

# Create Gradio interface
iface = gr.Interface(
    fn=get_historical_figure_info,
    inputs=gr.Textbox(lines=1, placeholder="Enter a Korean historical figure's name..."),
    outputs="markdown",
    title="Structured Korean Historical Figures (JSON)",
    description="Get detailed, structured JSON information about significant people in Korean history"
)

# Launch the interface
iface.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7d150e5c9acfc512cf.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## **4. PandasDataFrameOutputParser**
- **Pandas DataFrame은 Python 프로그래밍 언어에서 널리 사용되는 데이터 구조로, 데이터 조작 및 분석을 위해 흔히 사용되며 구조화된 데이터를 다루기 위한 포괄적인 도구 세트를 제공하여, 데이터 정제, 변환 및 분석과 같은 작업에 다양하게 활용**

In [None]:
import pandas as pd
import gradio as gr

# 예시 데이터를 사용하여 데이터 프레임 생성 함수 정의
def get_historical_figures(figure_name=None):
    # 한국 역사적 인물들에 대한 예시 데이터
    figures_list = [
        {
            "name": "Yi Sun-sin",
            "korean_name": "이순신",
            "birth_year": 1545,
            "death_year": 1598,
            "era": "Joseon Dynasty",
            "primary_role": "Admiral",
            "key_achievement": "Defeated Japanese Navy during the Imjin War",
            "historical_significance": "National hero known for his naval victories against Japan"
        },
        {
            "name": "Sejong the Great",
            "korean_name": "세종대왕",
            "birth_year": 1397,
            "death_year": 1450,
            "era": "Joseon Dynasty",
            "primary_role": "King of Joseon",
            "key_achievement": "Created the Korean alphabet Hangul",
            "historical_significance": "One of the most respected kings, greatly improved Korean culture and literacy"
        },
        {
            "name": "Kim Gu",
            "korean_name": "김구",
            "birth_year": 1876,
            "death_year": 1949,
            "era": "Korean Empire / Japanese Occupation",
            "primary_role": "Politician",
            "key_achievement": "Leader of the Korean independence movement",
            "historical_significance": "Major figure in the movement for Korean independence from Japan"
        }
    ]

    # 데이터를 pandas DataFrame으로 변환
    df = pd.DataFrame(figures_list)

    # 특정 인물 이름을 입력받았을 경우 필터링
    if figure_name:
        df = df[(df['korean_name'] == figure_name) | (df['name'] == figure_name)]

    # 데이터가 없을 경우 메시지를 담은 DataFrame 반환
    if df.empty:
        return pd.DataFrame([{"Message": f"No information found for {figure_name}"}])

    return df

# Gradio 인터페이스 정의
iface = gr.Interface(
    fn=get_historical_figures,
    inputs="text",  # 특정 역사적 인물 이름을 입력받기 위해 텍스트 입력 사용
    outputs="dataframe",  # 데이터 프레임을 출력
    title="Structured Korean Historical Figures (Pandas DataFrame)",
    description="Get detailed, structured information about significant people in Korean history"
)

# 인터페이스 실행
iface.launch()


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://077dc05f4c18c5aa97.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
import langchain_core.output_parsers as parsers
print(dir(parsers))

['BaseCumulativeTransformOutputParser', 'BaseGenerationOutputParser', 'BaseLLMOutputParser', 'BaseOutputParser', 'BaseTransformOutputParser', 'CommaSeparatedListOutputParser', 'JsonOutputKeyToolsParser', 'JsonOutputParser', 'JsonOutputToolsParser', 'ListOutputParser', 'MarkdownListOutputParser', 'NumberedListOutputParser', 'PydanticOutputParser', 'PydanticToolsParser', 'SimpleJsonOutputParser', 'StrOutputParser', 'XMLOutputParser', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'base', 'format_instructions', 'json', 'list', 'openai_tools', 'pydantic', 'string', 'transform', 'xml']


In [None]:
# from langchain.chat_models import init_chat_model

# # Now you can initialize the model with the retrieved API key
# model = init_chat_model("llama3-8b-8192", model_provider="groq", api_key=groq_key)