# 看图听故事

```
# 上传一张图片，得到一个故事音频文件
```

```
# 该示例使用了3个组件:
1. 使用transformer 图生文模型，将图片转为文本
2. 使用app-builder 大语言模型，将文本扩展为一段故事
3. 使用app-builder 文本转语言，将故事转为音频
```


```
# 安装transformer库
```

In [1]:
pip install transformers



```
# 使用情感分类测试transformer是否安装成功
```

In [2]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
results = classifier("Today is a nice day!")
print(results)

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9998782873153687}]



```
# 安装appbuilder-sdk
```

In [4]:
pip install appbuilder-sdk




```
# 导入鉴权信息
```

In [5]:
import os
os.environ["APPBUILDER_TOKEN"] = "..."


```
# 测试appbuilder连接，可打印当前支持的模型列表
```

In [6]:
import appbuilder

models = appbuilder.get_model_list(api_type_filter=["chat"], is_available=True)
print(", ".join(models))

ERNIE-Bot 4.0, ERNIE-Bot-8K, ERNIE-Bot, ERNIE-Bot-turbo, EB-turbo-AppBuilder专用版, Qianfan-Chinese-Llama-2-7B, Yi-34B-Chat, Llama-2-7B-Chat, Llama-2-13B-Chat, Llama-2-70B-Chat, ChatGLM2-6B-32K, ChatLaw, BLOOMZ-7B, Qianfan-BLOOMZ-7B-compressed, AquilaChat-7B



```
# 正文来啦！！！
```

In [7]:
import os
import appbuilder

from transformers import pipeline


def img2text(url):
   pipe = pipeline("image-to-text",model="Salesforce/blip-image-captioning-large")
   result = pipe(url)
   print(result)
   text = result[0]["generated_text"]
   return text

def generate_story(scenario):
   template = """
      你是一位很会讲故事的老人，下面Context中的内容是一个外国人说的一句英文，请你根据这句话延伸出一个中文的故事。最好还能有一点小幽默，字数在100个字以内。
      CONTEXT: {scenario}
      STORY:
   """
   playground = appbuilder.Playground(prompt_template=template, model="eb-turbo-appbuilder")
   prompt = appbuilder.Message({"scenario": scenario})
   result = playground(prompt, stream=False)
   print(result)
   story = result.content
   return story

def text2speech(message):
    tts = appbuilder.TTS()
    cwd = os.getcwd()
    # 使用paddlespeech-tts模型，目前只支持返回WAV格式
    wav_sample_path = os.path.join(cwd, "story.wav")
    inp = appbuilder.Message({"text": message})
    out = tts.run(inp, model="paddlespeech-tts", audio_type="wav")
    with open(wav_sample_path, "wb") as f:
        f.write(out.content["audio_binary"])
    print("成功将文本转语音，wav格式文件已写入：{}".format(wav_sample_path))

# 需要上传一张图片(这里示例命名为1.png),产出一个story.wav的音频文件。
scenairo = img2text("1.png")
story = generate_story(scenairo)
text2speech(story)


config.json:   0%|          | 0.00/4.60k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/527 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/445 [00:00<?, ?B/s]



[{'generated_text': 'there is a small child holding a red and blue kite'}]
Message(name=msg, content=有一个小孩子拿着一个红蓝相间的风筝。

这个小孩子拿着风筝，跑来跑去，非常开心。他的妈妈告诉他，这个风筝很漂亮，但是要小心，不要让它飞走了。小孩子回答说：“没关系，我有魔法，可以把它变回来！”于是他妈妈笑着说：“哦，看来你真的有一颗魔法般的心呢！”, mtype=dict, extra={})
成功将文本转语音，wav格式文件已写入：/content/story.wav
