Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #54

Open
wants to merge 102 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
102 commits
Select commit Hold shift + click to select a range
0e781fe
Update Pretrain & Downsteam Tasks
Dec 27, 2022
321b1ba
Update Pretrain & Downsteam Tasks
Dec 27, 2022
7d87b6f
Update Pretrain & Downsteam Tasks
Dec 27, 2022
b83bda2
Update README.md
yinanhe Dec 27, 2022
9e03dfe
Update README.md
yinanhe Dec 27, 2022
23607f2
Update README.md
yinanhe Dec 27, 2022
7ea8e98
Rename README_zh-CN.md to README.md
shepnerd Dec 27, 2022
2ab50d3
Update README.md
shepnerd Dec 27, 2022
e9bc0b5
Update README.md
shepnerd Dec 27, 2022
a212a49
Update README.md
shepnerd Dec 27, 2022
e87eafb
Update ensemble.py
shepnerd Dec 27, 2022
e546267
Update README.md
shepnerd Dec 27, 2022
3bf8ac0
Merge branch 'main' of https://github.com/OpenGVLab/InternVideo
shepnerd Dec 27, 2022
e4b5b1b
add_stal
xings19 Jan 1, 2023
dedb591
Update README.md
shepnerd Jan 2, 2023
71c82d9
Update README.md
Richard-61 Jan 9, 2023
680809f
Merge pull request #4 from Richard-61/main
Jan 13, 2023
912cb5f
Add Multi-Modal Tasks Downstream
Jan 16, 2023
ef812ac
Update README.md
shepnerd Jan 16, 2023
58a551e
Delete MODEL_ZOO.md
shepnerd Jan 17, 2023
51c1178
Create LICENSE
Jan 17, 2023
dff898a
update
shepnerd Jan 17, 2023
c11fb35
Merge branch 'main' of https://github.com/OpenGVLab/InternVideo
shepnerd Jan 17, 2023
7cc40ce
Update README.md
shepnerd Jan 18, 2023
507fc34
Add VLN-CE Downstream
wz0919 Jan 18, 2023
3abc8aa
Update README.md
shepnerd Jan 18, 2023
88dbdbc
add modelzoo for videomae
congee524 Feb 2, 2023
db2cc81
refine readme
congee524 Feb 2, 2023
1055551
Merge pull request #10 from congee524/add_videomae_modelzoo
yinanhe Feb 2, 2023
341c856
newpr
Jazzcharles Feb 3, 2023
ed3a4a8
Update README.md
shepnerd Feb 4, 2023
7d57534
Merge pull request #11 from Jazzcharles/newpr
shepnerd Feb 4, 2023
9d38597
Update README.md
shepnerd Feb 4, 2023
5b14157
Update README.md
shepnerd Feb 4, 2023
9ee0834
Update README.md
shepnerd Feb 5, 2023
4c3df43
Add multi-modalities pre-training and B/16 model.
Feb 5, 2023
571f171
Merge pull request #12 from liyz15/add_multi_modalities_pretraining
shepnerd Feb 5, 2023
6664d9d
fixed bug, batch_nms
Richard-61 Feb 6, 2023
12b45b2
Merge pull request #13 from Richard-61/main
shepnerd Feb 6, 2023
91aaebc
Update README.md
shepnerd Feb 6, 2023
5dfd3f2
add link of vit_b_k710_ft
congee524 Feb 14, 2023
65815bb
Merge pull request #15 from congee524/release_vit_b_k710_ft
shepnerd Feb 14, 2023
54e0cab
model weights
shepnerd Feb 20, 2023
f1ae8f5
Update README.md
shepnerd Feb 20, 2023
edaf54c
Update README.md
shepnerd Feb 20, 2023
7431fc9
Update README.md
shepnerd Feb 20, 2023
2733374
Update README.md
shepnerd Feb 20, 2023
635c058
Update README.md
shepnerd Feb 20, 2023
15018f0
Update README.md
JerryFlymi Mar 9, 2023
4542069
Update README.md
shepnerd Mar 9, 2023
cab65c3
Update README.md
shepnerd Mar 9, 2023
06b6a05
update pretrained weight for vln-ce
wz0919 Mar 24, 2023
3c4718d
Merge branch 'main' of https://github.com/OpenGVLab/InternVideo into …
wz0919 Mar 24, 2023
a1e6c7f
Update zero-shot k400 readme.
liyz15 Mar 28, 2023
13ed0b8
Merge pull request #28 from liyz15/update_k400_readme
shepnerd Mar 28, 2023
7de671f
Update README.md
shepnerd Apr 23, 2023
69e037d
Update README.md
shepnerd Apr 23, 2023
6244731
Update README.md
shepnerd Apr 23, 2023
3de36c9
Update README.md
shepnerd Apr 23, 2023
97aaec6
Update README.md
shepnerd Apr 23, 2023
a064b85
Update README.md
shepnerd Apr 25, 2023
56348dc
Update README.md
shepnerd Apr 27, 2023
dc05eb1
add data
Andy1621 May 10, 2023
efc5a17
add data
Andy1621 May 10, 2023
438af9e
Update instruction_data.md
yinanhe May 11, 2023
56132eb
remove unnecessary files
shepnerd May 11, 2023
6d3e8d3
remove unnecessary files
shepnerd May 11, 2023
98b0798
remove unnecessary files
shepnerd May 11, 2023
d2de1cb
readme_cn
shepnerd May 11, 2023
ecab9b8
update
shepnerd May 11, 2023
56aa384
refactor
shepnerd May 11, 2023
a356cf3
Update README.md
yinanhe May 11, 2023
f059145
update
shepnerd May 11, 2023
408a775
Update README.md
yinanhe May 30, 2023
380cb95
Update README_cn.md
yinanhe May 30, 2023
8f0c8f3
np.int has deprecated and caused error
MasoudKaviani Jul 2, 2023
0df66a6
Create README.md
shepnerd Jul 12, 2023
7af166f
Update README.md
shepnerd Jul 14, 2023
fceee33
internvid subset download link
shepnerd Jul 17, 2023
d6aff57
Update README.md
shepnerd Jul 17, 2023
fb9160d
Update README.md
yinanhe Jul 17, 2023
21ca7bf
Update README.md
shepnerd Jul 17, 2023
daf6066
Merge branch 'main' of https://github.com/OpenGVLab/InternVideo
shepnerd Jul 17, 2023
3dd7062
Update README.md
shepnerd Jul 18, 2023
3f37d43
freeze layer name rectified
GoatWang Jul 20, 2023
4f7ae6c
Merge pull request #45 from GoatWang/main
shepnerd Jul 23, 2023
de1e082
Merge pull request #41 from MasoudKaviani/patch-1
shepnerd Jul 23, 2023
cf85634
Update README.md
yinanhe Sep 9, 2023
0a1ff45
Update README.md
yinanhe Sep 9, 2023
d96cfd7
Update README.md
yinanhe Sep 9, 2023
c7e97ca
Update README.md
yinanhe Sep 11, 2023
0239e6c
Update tgif.py
yinanhe Oct 24, 2023
65f465a
Update video_base_dataset.py
yinanhe Oct 24, 2023
a6e8a2d
More feasible downloading entries of InternVid and ViCLIP; Demo of Vi…
shepnerd Oct 25, 2023
f620f4c
Merge branch 'main' of https://github.com/OpenGVLab/InternVideo
shepnerd Oct 25, 2023
dc72fd7
Update README.md
shepnerd Oct 25, 2023
349c6f5
Update README.md
shepnerd Oct 25, 2023
0b51944
Update README.md
shepnerd Oct 25, 2023
5cf58e2
Update demo.ipynb
shepnerd Oct 25, 2023
92cf0f5
Update demo.ipynb
shepnerd Oct 25, 2023
1ea86ea
Delete Ego-Tasks
shepnerd Oct 25, 2023
fe2a963
Update README.md
hjzhang-forward Oct 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[submodule "Pretrain/UniFormerV2"]
path = Pretrain/UniFormerV2
url = https://github.com/OpenGVLab/UniFormerV2.git
[submodule "Downstream/Ego-Tasks"]
path = Downstream/Ego-Tasks
url = https://github.com/OpenGVLab/ego4d-eccv2022-solutions.git
60 changes: 60 additions & 0 deletions Data/InternVid/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# InternVid \[[Paper](https://arxiv.org/pdf/2307.06942.pdf)\]

[![Dataset meta](https://img.shields.io/badge/%F0%9F%A4%97%20InternVid-Dataset-blue)](https://huggingface.co/datasets/OpenGVLab/InternVid) | [![Model Checkpoint](https://img.shields.io/badge/%F0%9F%A4%97%20ViCLIP-Model-purple)](https://huggingface.co/OpenGVLab/ViCLIP)

# :fire: News
We are excited to announce the partial release of a large-scale video-text dataset aimed at facilitating multimodal understanding and generation. As part of this release, we are making available a [subset](https://huggingface.co/datasets/OpenGVLab/InternVid) of the dataset, which comprises 10 million video clips. Additionally, we have provided a [ViCLIP](https://huggingface.co/OpenGVLab/ViCLIP) model trained on this subset, using the ViT-L architecture. It achieves SOTA zero-shot action recognition performance on Kinetics.

We give a step-by-step instructions and clarify the process of accessing and utilizing ViClip in [demo.ipynb](https://github.com/OpenGVLab/InternVideo/blob/main/Data/InternVid/demo.ipynb).

Stay tuned for updates!

# Introduction

**Data**

We collected videos from 16 popular categories with varying percentages. We ensured diversity by selecting videos from countries with different languages instead of relying on a dominant language environment. The countries we sampled from include the UK, USA, Australia, Japan, Korea, China, Russia, and France, among others. In terms of duration, every video lasts 351.9s on average. Almost half (49%) of the videos are five minutes or less, while a quarter (26%) fall between five and ten minutes. Only 8% of the videos are over 20 minutes long. Among the curated videos, 85% were high-resolution (720P), while the remaining 15% had lower resolutions ranging from 360P to 720P. Although the lower-resolution videos may not perform as well as the high-resolution ones in content generation tasks, they can still be useful in video-language representation learning, provided that they have appropriate captions.

![b469e00b43d46a6b3f89899483abcf6](https://github.com/OpenGVLab/InternVideo/assets/43169235/7d6aca7d-362a-425d-9ef2-ec0189491b52)

InternVid exhibits diverse clip durations and caption lengths in the segmented clip level. The aesthetic scores and clip-caption similarities are distributed uniformly. The majority of clips are 0-10 seconds in length, accounting for 85% of all clips. Approximately half of the clips have captions with 10-20 words, while one-third of the clip captions have fewer than 10 words. About 11% of clips have long captions with more than 20 words.

![429af4993adb77478c000c865ae5a1b](https://github.com/OpenGVLab/InternVideo/assets/43169235/f64588c3-81e8-43de-b771-46500474d2ff)

**ViCLIP: a simple video CLIP for transferrable video-text representation**

Built upon <a href="https://github.com/openai/CLIP">CLIP</a>, we make a simple video-text pretraining baseline ViCLIP. It consists of a video encoder (ViT) and a text encoder, as given below. Both modules are initialized from the corresponding CLIP components. We update the native attention in the video encoder to spatiotemporal attention while maintaining other design elements. For efficient learning, we apply masking to videos in pre-training.

<img width="633" alt="87c6263cc4aceee72cc8e37085a8109" src="https://github.com/OpenGVLab/InternVideo/assets/43169235/1e540a2b-f503-4036-b2a8-ba99401fc5b0">


# Data & Model Zoo

### Pretrained Data & Model
<div>

| Model | Training Data | Descriptions |
| :-----------------: | :----------------------: | :---------------------------------------------------------------------------------------------------: |
| ViCLIP-L-14 \[[HuggingFace](https://huggingface.co/OpenGVLab/ViCLIP) \| [Aliyun](https://pjlab-gvm-data.oss-cn-shanghai.aliyuncs.com/internvideo/viclip/ViClip-InternVid-10M-FLT.pth )\] | InternVid-10M-FLT \[[HuggingFace](https://huggingface.co/datasets/OpenGVLab/InternVid) \| [OpenDataLab](https://opendatalab.com/shepshep/InternVid)\] | |
</div>


## Citation

If you find this work useful for your research, please consider citing InternVid. Your acknowledgement would greatly help us in continuing to contribute resources to the research community.

```
@article{wang2023internvid,
title={InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation},
author={Wang, Yi and He, Yinan and Li, Yizhuo and Li, Kunchang and Yu, Jiashuo and Ma, Xin and Chen, Xinyuan and Wang, Yaohui and Luo, Ping and Liu, Ziwei and Wang, Yali and Wang, Limin and Qiao, Yu},
journal={arXiv preprint arXiv:2307.06942},
year={2023}
}

@article{wang2022internvideo,
title={InternVideo: General Video Foundation Models via Generative and Discriminative Learning},
author={Wang, Yi and Li, Kunchang and Li, Yizhuo and He, Yinan and Huang, Bingkun and Zhao, Zhiyu and Zhang, Hongjie and Xu, Jilan and Liu, Yi and Wang, Zun and Xing, Sen and Chen, Guo and Pan, Junting and Yu, Jiashuo and Wang, Yali and Wang, Limin and Qiao, Yu},
journal={arXiv preprint arXiv:2212.03191},
year={2022}
}
```
102 changes: 102 additions & 0 deletions Data/InternVid/demo.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f86bc499",
"metadata": {},
"source": [
"## download ViCILP weights and put its pth file in viclip folder. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e7a90379-d9ee-45d9-9073-7ed5132fa6b1",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import os\n",
"import cv2\n",
"\n",
"from viclip import retrieve_text, _frame_from_video"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a425a5da-ceaf-4b89-9845-c8ba576902d8",
"metadata": {},
"outputs": [],
"source": [
"video = cv2.VideoCapture('example1.mp4')\n",
"frames = [x for x in _frame_from_video(video)]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "3fb7397a-02ef-41b5-9ffe-f2363b277778",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"text: A man in a gray sweater plays fetch with his dog in the snowy yard, throwing a toy and watching it run. ~ prob: 0.8264\n",
"text: A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon. ~ prob: 0.1587\n",
"text: A pet dog excitedly runs through the snowy yard, chasing a toy thrown by its owner. ~ prob: 0.0141\n",
"text: A person dressed in a blue jacket shovels the snow-covered pavement outside their house. ~ prob: 0.0006\n",
"text: A playful dog slides down a snowy hill, wagging its tail with delight. ~ prob: 0.0002\n"
]
}
],
"source": [
"text_candidates = [\"A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.\",\n",
" \"A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.\",\n",
" \"A person dressed in a blue jacket shovels the snow-covered pavement outside their house.\",\n",
" \"A pet dog excitedly runs through the snowy yard, chasing a toy thrown by its owner.\",\n",
" \"A person stands on the snowy floor, pushing a sled loaded with blankets, preparing for a fun-filled ride.\",\n",
" \"A man in a gray hat and coat walks through the snowy yard, carefully navigating around the trees.\",\n",
" \"A playful dog slides down a snowy hill, wagging its tail with delight.\",\n",
" \"A person in a blue jacket walks their pet on a leash, enjoying a peaceful winter walk among the trees.\",\n",
" \"A man in a gray sweater plays fetch with his dog in the snowy yard, throwing a toy and watching it run.\",\n",
" \"A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery.\"]\n",
"\n",
"texts, probs = retrieve_text(frames, text_candidates, name='viclip', topk=5)\n",
"\n",
"for t, p in zip(texts, probs):\n",
" print(f'text: {t} ~ prob: {p:.4f}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2969ba6-19d0-4893-b071-b82fa046c312",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Binary file added Data/InternVid/example1.mp4
Binary file not shown.
3 changes: 3 additions & 0 deletions Data/InternVid/viclip/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
---
license: mit
---
71 changes: 71 additions & 0 deletions Data/InternVid/viclip/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
from .simple_tokenizer import SimpleTokenizer as _Tokenizer
from .viclip import ViCLIP
import torch
import numpy as np
import cv2

clip_candidates = {'viclip':None, 'clip':None}

def get_clip(name='viclip'):
global clip_candidates
m = clip_candidates[name]
if m is None:
if name == 'viclip':
tokenizer = _Tokenizer()
vclip = ViCLIP(tokenizer)
# m = vclip
m = (vclip, tokenizer)
else:
raise Exception('the target clip model is not found.')

return m

def get_text_feat_dict(texts, clip, tokenizer, text_feat_d={}):
for t in texts:
feat = clip.get_text_features(t, tokenizer, text_feat_d)
text_feat_d[t] = feat
return text_feat_d

def get_vid_feat(frames, clip):
return clip.get_vid_features(frames)

def _frame_from_video(video):
while video.isOpened():
success, frame = video.read()
if success:
yield frame
else:
break

v_mean = np.array([0.485, 0.456, 0.406]).reshape(1,1,3)
v_std = np.array([0.229, 0.224, 0.225]).reshape(1,1,3)
def normalize(data):
return (data/255.0-v_mean)/v_std

def frames2tensor(vid_list, fnum=8, target_size=(224, 224), device=torch.device('cuda')):
assert(len(vid_list) >= fnum)
step = len(vid_list) // fnum
vid_list = vid_list[::step][:fnum]
vid_list = [cv2.resize(x[:,:,::-1], target_size) for x in vid_list]
vid_tube = [np.expand_dims(normalize(x), axis=(0, 1)) for x in vid_list]
vid_tube = np.concatenate(vid_tube, axis=1)
vid_tube = np.transpose(vid_tube, (0, 1, 4, 2, 3))
vid_tube = torch.from_numpy(vid_tube).to(device, non_blocking=True).float()
return vid_tube

def retrieve_text(frames, texts, name='viclip', topk=5, device=torch.device('cuda')):
clip, tokenizer = get_clip(name)
clip = clip.to(device)
frames_tensor = frames2tensor(frames, device=device)
vid_feat = get_vid_feat(frames_tensor, clip)

text_feat_d = {}
text_feat_d = get_text_feat_dict(texts, clip, tokenizer, text_feat_d)
text_feats = [text_feat_d[t] for t in texts]
text_feats_tensor = torch.cat(text_feats, 0)

probs, idxs = clip.get_predict_label(vid_feat, text_feats_tensor, top=topk)

ret_texts = [texts[i] for i in idxs.numpy()[0].tolist()]
return ret_texts, probs.numpy()[0]

Binary file not shown.
135 changes: 135 additions & 0 deletions Data/InternVid/viclip/simple_tokenizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
import gzip
import html
import os
from functools import lru_cache

import ftfy
import regex as re


@lru_cache()
def default_bpe():
return os.path.join(os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz")
# @lru_cache()
# def default_bpe():
# return "bpe_simple_vocab_16e6.txt.gz"


@lru_cache()
def bytes_to_unicode():
"""
Returns list of utf-8 byte and a corresponding list of unicode strings.
The reversible bpe codes work on unicode strings.
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
This is a signficant percentage of your normal, say, 32K bpe vocab.
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
And avoids mapping to whitespace/control characters the bpe code barfs on.
"""
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
cs = bs[:]
n = 0
for b in range(2**8):
if b not in bs:
bs.append(b)
cs.append(2**8+n)
n += 1
cs = [chr(n) for n in cs]
return dict(zip(bs, cs))


def get_pairs(word):
"""Return set of symbol pairs in a word.
Word is represented as tuple of symbols (symbols being variable-length strings).
"""
pairs = set()
prev_char = word[0]
for char in word[1:]:
pairs.add((prev_char, char))
prev_char = char
return pairs


def basic_clean(text):
text = ftfy.fix_text(text)
text = html.unescape(html.unescape(text))
return text.strip()


def whitespace_clean(text):
text = re.sub(r'\s+', ' ', text)
text = text.strip()
return text


class SimpleTokenizer(object):
def __init__(self, bpe_path: str = default_bpe()):
self.byte_encoder = bytes_to_unicode()
self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
merges = gzip.open(bpe_path).read().decode("utf-8").split('\n')
merges = merges[1:49152-256-2+1]
merges = [tuple(merge.split()) for merge in merges]
vocab = list(bytes_to_unicode().values())
vocab = vocab + [v+'</w>' for v in vocab]
for merge in merges:
vocab.append(''.join(merge))
vocab.extend(['<|startoftext|>', '<|endoftext|>'])
self.encoder = dict(zip(vocab, range(len(vocab))))
self.decoder = {v: k for k, v in self.encoder.items()}
self.bpe_ranks = dict(zip(merges, range(len(merges))))
self.cache = {'<|startoftext|>': '<|startoftext|>', '<|endoftext|>': '<|endoftext|>'}
self.pat = re.compile(r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", re.IGNORECASE)

def bpe(self, token):
if token in self.cache:
return self.cache[token]
word = tuple(token[:-1]) + ( token[-1] + '</w>',)
pairs = get_pairs(word)

if not pairs:
return token+'</w>'

while True:
bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))
if bigram not in self.bpe_ranks:
break
first, second = bigram
new_word = []
i = 0
while i < len(word):
try:
j = word.index(first, i)
new_word.extend(word[i:j])
i = j
except:
new_word.extend(word[i:])
break

if word[i] == first and i < len(word)-1 and word[i+1] == second:
new_word.append(first+second)
i += 2
else:
new_word.append(word[i])
i += 1
new_word = tuple(new_word)
word = new_word
if len(word) == 1:
break
else:
pairs = get_pairs(word)
word = ' '.join(word)
self.cache[token] = word
return word

def encode(self, text):
bpe_tokens = []
text = whitespace_clean(basic_clean(text)).lower()
for token in re.findall(self.pat, text):
token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
return bpe_tokens

def decode(self, tokens):
text = ''.join([self.decoder[token] for token in tokens])
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors="replace").replace('</w>', ' ')
return text