Transformer-attention

The full derivation of Transformer gradient. compare the theory attention gradient with PyTorch attention gradient

If you want see the detail calcualtion,please see CN,EN

Citation

If you find this open source release useful, please cite in your paper:

@software{He_The_full_derivation_2022,
author = {He, Longxiang},
month = may,
title = {{The full derivation of Transformer gradient}},
url = {https://github.com/Say-Hello2y/Transformer-attention.git},
version = {0.0.0},
year = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
README.md		README.md
citation.cff		citation.cff
multi_head_test.py		multi_head_test.py
test_wo.py		test_wo.py
test_wqk.py		test_wqk.py
test_wv.py		test_wv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

README.md

README.md

citation.cff

citation.cff

multi_head_test.py

multi_head_test.py

test_wo.py

test_wo.py

test_wqk.py

test_wqk.py

test_wv.py

test_wv.py

Repository files navigation

Transformer-attention

Citation

About

Releases

Packages

Languages

Say-Hello2y/Transformer-attention

Folders and files

Latest commit

History

Repository files navigation

Transformer-attention

Citation

About

Resources

Stars

Watchers

Forks

Languages