# Pororo
- [Repository](https://github.com/kakaobrain/pororo)
- [Documentation](https://kakaobrain.github.io/pororo/)

In [1]:
from pororo import Pororo

# misc.
## [Automatic Speech Recognition](https://kakaobrain.github.io/pororo/miscs/asr.html)
#### Speech Synthesis
- **음성합성** 모듈을 활용하기 위해서는 아래 모듈들을 추가로 설치해야 함
```
requirements = [
    "editdistance==0.5.3",
    "epitran==1.2",
    "fastdtw==0.3.4",
    "future",
    "jieba==0.42.1",
    "librosa==0.7.0",
    "phonemizer==2.1",
    "Pillow==7.1.0",
    "pinyin==0.4.0",
    "scipy",
    "SoundFile==0.10.2",
    "numba==0.48",
    "ko_pron",
]
```

### 1. 이용 가능한 모델 확인

In [2]:
Pororo.available_models("asr")

'Available models for asr are ([lang]: en, [model]: wav2vec.en), ([lang]: ko, [model]: wav2vec.ko), ([lang]: zh, [model]: wav2vec.zh)'

### 2. 객체 생성

In [None]:
asr_ko = Pororo(task="asr", lang="ko")
asr_en = Pororo(task="asr", lang="en")

In [None]:
asr_ko('korean_speech.wav')

In [None]:
asr_en('english_speech.wav')

## [Image Captioning](https://kakaobrain.github.io/pororo/miscs/caption.html)
### 1. 이용 가능한 모델 확인

In [3]:
Pororo.available_models("caption")

'Available models for caption are ([lang]: en, [model]: transformer.base.en.caption), ([lang]: ko, [model]: transformer.base.en.caption), ([lang]: zh, [model]: transformer.base.en.caption), ([lang]: ja, [model]: transformer.base.en.caption)'

### 2. 객체 생성

In [4]:
caption_ko = Pororo(task="caption", lang="ko")
caption_en = Pororo(task="caption", lang="en")

Using cache found in /Users/benny/.cache/torch/hub/facebookresearch_detr_master
Using cache found in /Users/benny/.cache/torch/hub/facebookresearch_detr_master


### 3. 이미지 캡션

![pancakes](https://images.unsplash.com/photo-1628083578371-e210a991d713?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=334&q=80)

In [5]:
caption_ko("https://images.unsplash.com/photo-1628083578371-e210a991d713?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=334&q=80")

'식사 한 접시가 있는 식탁에 음식 한 그릇을 얹었다.'

![working-cat](https://images.unsplash.com/photo-1547960450-2ea08b931270?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=890&q=80)

In [6]:
caption_ko("https://images.unsplash.com/photo-1547960450-2ea08b931270?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=890&q=80")

'고양이가 노트북 키보드에 앉아 있다'

![book](https://images.unsplash.com/photo-1517849325426-6eac321919a0?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=750&q=80)

In [7]:
caption_en("https://images.unsplash.com/photo-1517849325426-6eac321919a0?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=750&q=80")

'A book that is sitting on a table.'

![watermelon-orange](https://images.unsplash.com/photo-1621659911279-b08ce9ff421f?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=400&q=80)

In [8]:
caption_en("https://images.unsplash.com/photo-1621659911279-b08ce9ff421f?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=400&q=80")

'A vase with a picture of a bird on it.'

![apple-items](https://images.unsplash.com/photo-1550029402-8ea9bfe19f04?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=400&q=80)

In [9]:
caption_en("https://images.unsplash.com/photo-1550029402-8ea9bfe19f04?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&ixlib=rb-1.2.1&auto=format&fit=crop&w=400&q=80")

'A laptop computer sitting on top of a table.'