## OpenAI新发布的部分功能实测总结
#### 1.Image-generation
```text
models：DALL·E
功能：文生图，使用dall-e-3模型完成，效果不错
功能：图生图，目前只支持dall-e-2模型，效果很一般，有点四不像
```
#### 2.Vision (Use GPT-4 to understand images)
```text
模型：gpt-4
功能：图生文，使用gpt-4-vision-preview模型完成，效果不错
特点：支持多图输入；既可以是 base64 编码格式，也可以是图片 URL。该模型将处理每张图片，并使用所有图片中的信息来回答问题。
```
#### 3.Text to speech (Turn text into lifelike spoken audio)

```text
模型：tts-1 or tts-1-hd
功能：文本生音频，使用tts-1模型完成，效果不错
特点：内部支持7位不同人声可供选择；支持中文、英文的音频输出（其他语言没测试，应该也可以）
```
#### 4.Speech to text (turn audio into text)
```text
模型：whisper-1
功能：音频生文本，使用whisper-1模型完成，效果还可以
特点：仅支持英文文本输出，即使输入是别的语言；对话不能区分人；文本内容准确度高
```

In [2]:
import os
import configparser
import openai
from openai import Audio
from openai import Image

conf = configparser.ConfigParser()
current_directory = os.path.dirname(os.path.realpath('__file__'))
config_file_path = os.path.join(current_directory, '..', '..', 'config.ini')
conf.read(config_file_path)
openai.api_key = conf.get("Openai", "api_key")

os.environ["HTTP_PROXY"] = conf.get("Proxy", "HTTP_PROXY")  # 配置自己的代理
os.environ["HTTPS_PROXY"] = conf.get("Proxy", "HTTPS_PROXY")
os.environ["OPENAI_API_KEY"] = conf.get("Openai", "api_key")


### 1.Image-generation
#### https://platform.openai.com/docs/guides/images/image-generation

In [9]:
from openai import Image

# 使用dall-e-3 模型根据文本生成图片
image_model = "dall-e-3"
# image_prompt = "A lady is having an elegant afternoon tea in the courtyard"
image_prompt = """
性别：女性
形象特点：淡蓝色的长发，飘逸柔顺，给人以清新脱俗的感觉；淡蓝色的眼睛，明亮清澈，透露出坚定和智慧；白色的骑士裙装，优雅高贵，展现出勇敢正直的性格。
场景设定：宫殿背景，华丽的宫殿在虚化处理下更具神秘感，仿佛一座梦幻城堡；长泽野美身在其中，宛如一位尊贵的公主。"""
# image_size = "1024x1024"
# image_size = "1024x1792"
image_size = "1792x1024"
# 不能带有人物名称

In [10]:
response = Image.create(
    model=image_model,
    prompt=image_prompt,
    size=image_size,
    quality="standard",
    n=1,
)
print(response)

{
  "created": 1699614124,
  "data": [
    {
      "revised_prompt": "Gender: Female. Visual Features: Effortless, flowing, light blue hair that gives off a fresh and refined vibe; light blue eyes that are bright and clear, reflecting determination and wisdom; a white knight dress that is elegant and noble, showing a brave and honest personality. Scene Settings: A blurred palace background that is beautifully lavish, evoking a sense of mystery as if it were a dreamy castle; the woman with long, wild beauty stands among it, like a noble princess.",
      "url": "https://oaidalleapiprodscus.blob.core.windows.net/private/org-aWBgdOaBz6DGJz6gxbzY0Cgs/user-lpVODrzi1CTR9zsuJgaHC1zV/img-UxONH8mxhhRd9ABKmYnJnQ2d.png?st=2023-11-10T10%3A02%3A04Z&se=2023-11-10T12%3A02%3A04Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-10T07%3A37%3A56Z&ske=2023-11-11T07%3A37%3A56Z&sks=b&skv=2021-08-06&sig=%2B2Iq

In [None]:
# 图生图
response = Image.create_variation(
    image=open("img.png", "rb"),
    n=2,
    size="1024x1024"
)
print(response)

In [None]:
# 图生图
response = Image.create_variation(
    image=open("img_1.png", "rb"),
    n=2,
    size="1024x1024"
)

image_url = response.data[0].url
print(image_url)


In [None]:
# 图生图
response = Image.create_variation(
    image=open("img-CWs11MH3BnkwqSRzoUWrtITD.png", "rb"),
    n=2,
    size="1024x1024"
)

image_url = response.data[0].url
print(image_url)


### 2.Vision (Use GPT-4 to understand images)
### https://platform.openai.com/docs/guides/vision/vision

In [None]:
# 图生文
response = openai.ChatCompletion.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://pics0.baidu.com/feed/3b292df5e0fe9925ff51e136f36f8cd38cb1714f.jpeg",
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)

print(response)
# print(response.choices[0])

In [4]:
# 图生文，base64格式，描述

import base64
import requests


# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')


# Path to your image
image_path = "files/work_10818_20231031_jxj3wSJ_preview.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {openai.api_key}"
}

payload = {
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What’s in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ],
    "max_tokens": 300
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

print(response.json())

{'id': 'chatcmpl-8JJN2EcPmlWNyq14F3zXrNn75oYAA', 'object': 'chat.completion', 'created': 1699613384, 'model': 'gpt-4-1106-vision-preview', 'usage': {'prompt_tokens': 1118, 'completion_tokens': 224, 'total_tokens': 1342}, 'choices': [{'message': {'role': 'assistant', 'content': 'This image appears to be promotional artwork for a game. It features an anime-style character with light blue hair and blue eyes, dressed in a fantasy-themed outfit with a white, blue, and gold color scheme. The character is depicted in front of a background that includes what seems to be ruins or ancient pillars, with a magical or twilight atmosphere suggested by the colors and lighting.\n\nThe top left of the image contains a logo with stylized text, which is likely the name of the game, although I cannot provide the game\'s name itself. There is also Japanese text present which includes the name of a voice actor (indicated by "CV") and hints of the genre or the theme of the game, signified by the presence of 

### 3.Text-to-speech  文生音频
### https://platform.openai.com/docs/guides/text-to-speech/text-to-speech


In [11]:
input = """第 一 卷

    今天，我开始新的生活。

    今天，我爬出满是失败创伤的老茧。

    今天，我从新来到这个世上，我出生在葡萄园中，园内的葡萄任人享用。

    今天我要从最高最密的藤上摘下智慧的果实，这葡萄藤是好几代前的智者种下的。

    今天，我要品尝葡萄的美味，还要吞下每一粒成功的种子，让新生命在我心里萌芽。

    我选择的道路充满机遇，也有辛酸与绝望。失败的同伴数不胜数，叠在一起，比金字塔还高。

    然而，我不会像他们一样失败，因为我手中持有航海图，可以领我越过汹涌的大海，抵达梦中的彼岸。

    失败不再是我奋斗的代价。它和痛苦都将从我的生命中消失。失败和我，就像水火一样，互不相容。我不再像过去一样接受它们。我要在智慧的指引下，走出失败的阴影，步入富足、健康、快乐的乐园，这些都超出了我以往的梦想。

    我要是能长生不老，就可以学到一切，但我不能永生，所以，在有限的人生里，我必须学会忍耐的艺术，因为大自然的行为一向是从容不迫的。造物主创造树中之王橄榄树需要一百年的时间，而洋葱经过短短的九个星期就会枯老。我不留恋从前那种洋葱式的生活，我要成为万树之王——橄榄树，成为现实生活中最伟大的推销员。"""

In [12]:
import requests
import json

# 设置API密钥和请求URL
url = "https://api.openai.com/v1/audio/speech"
audio_file = "files/speech.mp3"

# 设置请求头
headers = {
    "Authorization": f"Bearer {openai.api_key}",
    "Content-Type": "application/json"
}

# 设置请求体数据
data = {
    "model": "tts-1",
    "input": input,
    "voice": "echo"
}

# 发送POST请求
response = requests.post(url, headers=headers, data=json.dumps(data))

# 处理响应
if response.status_code == 200:
    # 保存响应内容到speech.mp3文件
    with open(audio_file, "wb") as file:
        file.write(response.content)
    print("Audio saved successfully.")
else:
    print(f"Error: {response.status_code}, {response.text}")


Audio saved successfully.


### 4.Speech to text (turn audio into text)
### https://platform.openai.com/docs/guides/speech-to-text


In [4]:
# 输入：英文单人音频
# 输出：英文文本
audio_file = open("files/英文单人.MP4", "rb")
filename = "英文单人.MP4"
transcript = Audio.translate_raw(
    model="whisper-1",
    filename=filename,
    file=audio_file,
    response_format="text")
# transcript = client.audio.translations.create(
#     model="whisper-1",
#     file=audio_file,
#     response_format="text"
#
# )
transcript

"Hello, friends. It's Rosie. Welcome to Radio Headspace and to Wednesday. Yesterday I was talking to my mom about how busy I am and the pressure that comes with that. And I was feeling a bit guilty because at times this makes me agitated and even sharp with people. As a suggestion, my mom asked me how my meditation practice was going, and that made me think about how meditation has helped me be more compassionate.\n"

In [5]:
# 输入：英文对话音频
# 输出：英文文本
# 目标：想实现角色区分
audio_file = open("files/英文对话.MP4", "rb")

filename = "英文对话.MP4"
transcript = Audio.translate_raw(
    model="whisper-1",
    filename=filename,
    file=audio_file,
    response_format="json")

# prompt = """
# This is a dialogue, please distinguish the content of different people according to different timings, the name of the character is replaced by capital letters, the output format is as follows:
# A: My people take the business of who sees them and who doesn't very seriously, which means of course that there's a lot of paperwork involved.
# B: What kind of paperwork?""",

transcript

<OpenAIObject at 0x11a3b04a0> JSON: {
  "text": "My people take the business of who sees them and who doesn't very seriously, which means, of course, that there's a lot of paperwork involved. What kind of paperwork? If you'll just go over to your desk, please. I took the liberty of filling out the relevant papers for you. I think you'll find the forgery quite impressive. Indeed. Now, if you'll sign this last page yourself, we'll be all set. This best not be some kind of trick. There. Now, where are you? Look in the mirror."
}

In [6]:
# 输入：中文音频
# 输出：英文文本
# 目标：想实现中文输出

audio_file = open("files/中文.MP4", "rb")
filename = "中文.MP4"
transcript = Audio.translate_raw(
    model="whisper-1",
    filename=filename,
    file=audio_file,
    response_format="json")
transcript

<OpenAIObject at 0x11a36b560> JSON: {
  "text": "These names should be able to think of if there is a result in the future may want to sell them can draw a tea on it It's pretty interesting"
}