In [4]:
from openai import OpenAI
import httpx
client = OpenAI(
    base_url="https://openai-proxy.jhyun.net/v1",
    http_client=httpx.Client()
)

response = client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Tell me about this image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
            ],
        },
    ],
)
print(response.choices[0].message)


ChatCompletionMessage(content='The image depicts a peaceful outdoor scene featuring a narrow wooden pathway that runs through a lush, green grassy field. The sky above is bright with scattered clouds, suggesting a sunny day with pleasant weather. In the distance, there are trees and shrubs, adding to the natural and serene environment. The overall atmosphere is calm and inviting, ideal for outdoor walks or enjoying nature.', role='assistant', function_call=None, tool_calls=None, refusal=None, annotations=[])


In [12]:
def query_image_description(image_url: str, prompt : str = "Tell me about this immage") -> str:
    import os
    client = OpenAI(
        base_url="https://openai-proxy.jhyun.net/v1",
        api_key=os.getenv("OPENAI_API_KEY"),
        http_client=httpx.Client(),
    )

    response = client.chat.completions.create(
        model="gpt-4.1-nano",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_url,
                        },
                   },
                ]
            }
        ],
        max_tokens=300,
    )

    return response.choices[0].message.content

In [13]:
image_url = "https://p6.itc.cn/q_70/images03/20200602/0c267a0d3d814c9783659eb956969ba1.jpeg"
content = query_image_description(image_url)
print(content)

This image is a humorous comparison between the speaker's feelings or characteristics at age 16 and after working. It features two cartoon-style dogs with different appearances and accompanying text in Chinese.

On the left side, labeled "16岁的我" (me at 16), the dog is depicted as a strong, muscular, and confident figure, representing youth and physical vitality. The text says:
- "我前途一片光明" (My future is bright)
- "身体素质高" (My physical fitness is good)
- "未来可期" (The future is promising)
- "八九点钟的太阳" (The sun at 8 or 9 o'clock) — implying a youthful, energetic period.

On the right side, labeled "工作后的我" (me after working), the dog looks tired, less vigorous, and somewhat slouched, reflecting fatigue or burnout from work. The text says:
- "好累好困" (Very tired, very sleepy)
- "好想睡懒觉" (Really want to sleep in)
- "重伤伤不要" (Heavy injuries, don't want to)
- "靠近我啊啊啊" (Come closer to me, ahh) — showing a desire for comfort or solitude.
- "我好弱啊" (I am so weak)
- "我就是普通人" (I'm just an ordinary person)



In [56]:
def query_image_description_from_local_file(image_path, prompt="Describe the image, and summarize the content in details."):
    import base64 
    import os
    client = OpenAI(
        base_url="https://openai-proxy.jhyun.net/v1",
        api_key=os.getenv("OPENAI_API_KEY"),
        http_client=httpx.Client(),
    )
    image_data = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
    mime_type = "image/jpeg"  # ← 注意这里根据实际文件类型设置
    response = client.chat.completions.create(
        model="gpt-4.1-nano",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:{mime_type};base64,{image_data}"
                        }
                    },
                ]
            }
        ],
        max_tokens=1000,
    )

    print(response.choices[0].message.content)

In [54]:
from IPython.display import display, Markdown
display(Markdown(query_image_description_from_local_file("./images/gdp_1980_2020.jpg")))


The image is a line chart titled "GDP Comparison from 1980 to 2020." It displays the gross domestic product (GDP) of four countries—USA, China, Japan, and Germany—over the years 1980 to 2020. 

- The USA's GDP (blue line) shows a steady increase, reaching over 21 trillion USD by 2020.
- China's GDP (red line) begins at a low level but experiences rapid growth after around 2005, surpassing Japan around 2010 and approaching 15 trillion USD by 2020.
- Japan's GDP (purple line) has fluctuations, peaking around 1995 and then gradually declining or leveling off.
- Germany's GDP (green line) remains relatively stable throughout the period, with slow growth, hovering around 4 trillion USD.

Overall, the chart illustrates the significant economic growth of China and the steady increase of the USA, compared to Japan and Germany's relatively stable trends.


<IPython.core.display.Markdown object>

In [57]:
content = query_image_description_from_local_file("./images/handwriting_1.jpg")
display(Markdown(content))

The image shows an open notebook with handwritten notes on both pages, primarily related to topics in machine learning, neural networks, and computer vision.

**Left Page:**
- The main heading appears to be related to **"Image and Tools"** or a similar topic.
- It discusses **1-IF Transformers** with subcategories including **Model**, **Data**, and **Benchmark**.
- The term **PEFT** (Parameter-Efficient Fine-Tuning) is extensively mentioned, including methods like **SOTA** (State Of The Art), **PEFT Methods**, and specific techniques such as **Prompt Tuning**.
- Under Prompt Tuning, there are notes on **Adaptaor (2019, Google)**, **Prefix (2021, Stanford)**, **Prompt (2021, Google)**, and **P-tuning V1 and V2 (2021, 2022)**.
- Additional notes mention **Soft Prompts**, **Hard Prompts**, and templates used.
- The bottom of the page includes some technical abbreviations and tools like **Gaojie**, **ChatGLM2**, **Bloom**, and **Alpaca**.

**Right Page:**
- The focus shifts towards **"Mult

<IPython.core.display.Markdown object>