Skip to content

Commit 0f401b9

Browse files
committed
bak
1 parent 6fe58c2 commit 0f401b9

8 files changed

+262
-42
lines changed

artificial-intelligence.bigb

Lines changed: 78 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1229,13 +1229,19 @@ Bibliography:
12291229

12301230
= LLM KV Caching
12311231
{c}
1232-
{parent=Large language model}
1232+
{parent=LLM inference optimization}
12331233
{wiki=Transformer_(deep_learning_architecture)\#KV_caching}
12341234

12351235
Bibliography:
12361236
* https://huggingface.co/blog/not-lain/kv-caching
12371237
* https://medium.com/@joaolages/kv-caching-explained-276520203249
12381238

1239+
= Grouped-Query attention
1240+
{parent=LLM inference optimization}
1241+
1242+
Bibliography:
1243+
* https://aliissa99.medium.com/-a596e4d86f79
1244+
12391245
= Generative pre-trained transformer
12401246
{c}
12411247
{parent=Large language model}
@@ -1277,6 +1283,7 @@ It is however possible to make fuller utilization of the GPU's compute power by
12771283
Bibliography:
12781284
* https://www.reddit.com/r/LocalLLaMA/comments/1brcnps/is_inferencing_memory_bandwidth_limited/
12791285
* https://zeux.io/2024/03/15/llm-inference-sol/
1286+
8 https://jax-ml.github.io/scaling-book/
12801287

12811288
= Number of multiplications per token in a <GPT> model
12821289
{parent=Theoretical peak performance of GPT inference}
@@ -1285,23 +1292,37 @@ The following is for a "classic" <GPT-2>-style model, the following estimates th
12851292

12861293
For each layer (L):
12871294
* for each attention head (h):
1288-
* K = d_model * d_head (takes embedding and converts to vector of length d_head)
1295+
* K = d_model * d_head (takes embedding of one token and converts to vector of length d_head)
12891296
* Q = d_model * d_head (same)
1290-
* K Q dot product: d_head (element-wise multiplication of two vectors of size d_head)
1291-
* multiply that by V: d_head * d_model (goes back to a vector of size d_model)
1297+
* K Q dot product for attention pattern: n_ctx * d_head (n_ctx times dot products of vectors of size d_head, once new K vs every Q. Q vs every K zeroed out by causality.)
1298+
* new value vector for new token: d_model * d_model
1299+
* new updates: n_ctx * d_model (multiply each value vector by the new attention column scalar)
12921300
* fully connected: d_model * d_ff + d_ff * d_model (converts the embedding to the hidden layer size and then back)
1293-
So the total sum is: L * ( h * ( 2 * d_model * d_head + d_head + d_head * d_model ) + 2 * d_model * d_ff )
1301+
So the total sum is:
1302+
``
1303+
L * (
1304+
h * (
1305+
2 * d_model * d_head +
1306+
n_ctx * d_head +
1307+
d_model * d_model +
1308+
n_ctx * d_model
1309+
) +
1310+
2 * d_model * d_ff
1311+
)
1312+
``
12941313

1295-
Putting in the numbers for
1296-
* <GPT-2>:
1314+
This is coded at: \a[llm_count_mults.py].
12971315

12981316
Bibliography:
12991317
* https://www.reddit.com/r/theydidthemath/comments/1fzrs1k/request_how_many_individual/
13001318
* https://www.gaohongnan.com/playbook/training/how_to_calculate_flops_in_transformer_based_models.html#sanity-check-with-palm-paper-s-flops-calculation
13011319

1302-
= GPT model by <OpenAI>
1320+
= List of GPT models
13031321
{parent=GPT model}
13041322

1323+
= GPT model by <OpenAI>
1324+
{parent=List of GPT models}
1325+
13051326
= ChatGPT model
13061327
{c}
13071328
{synonym}
@@ -1406,29 +1427,10 @@ https://github.com/karpathy/nanoGPT
14061427

14071428
https://platform.openai.com/docs/models/gpt-4-turbo
14081429

1409-
= Open source LLM
1410-
{parent=Large language model}
1411-
{tag=Open source software}
1412-
1413-
= LLM model with open training data
1414-
{c}
1415-
{parent=Open source LLM}
1416-
1417-
= The Pile
1418-
{disambiguate=dataset}
1419-
{parent=LLM model with open training data}
1420-
{wiki}
1421-
1422-
= LLM360
1423-
{parent=LLM model with open training data}
1424-
1425-
= Open weight LLM model
1426-
{c}
1427-
{parent=Open source LLM}
1428-
14291430
= Llama
14301431
{disambiguate=language model}
1431-
{parent=Open weight LLM model}
1432+
{parent=List of GPT models}
1433+
{tag=Open weight LLM model}
14321434
{tag=Software developed by Facebook}
14331435
{wiki}
14341436

@@ -1437,21 +1439,65 @@ Homepage: https://www.llama.com/
14371439
= Llama
14381440
{synonym}
14391441

1440-
= Llama2
1442+
= Llama 2
14411443
{parent=Llama (language model)}
14421444
{title2=2023}
14431445

1446+
= Llama2
1447+
{synonym}
1448+
14441449
Page: https://www.llama.com/llama2/
14451450

1451+
= Llama 2 7B
1452+
{parent=Llama 2}
1453+
14461454
= Llama2 7B
1447-
{parent=Llama2}
1455+
{synonym}
14481456

1449-
= Llama3
1457+
= Llama 3
14501458
{parent=Llama (language model)}
14511459
{title2=2024}
14521460

1461+
= Llama3
1462+
{synonym}
1463+
14531464
https://www.llama.com/models/llama-3/
14541465

1466+
= Llama 3.1
1467+
{c}
1468+
{parent=Llama 3}
1469+
1470+
= Llama 3.1 8B
1471+
{parent=Llama 3.1}
1472+
1473+
= Llama 3.1 70B
1474+
{c}
1475+
{parent=Llama 3.1}
1476+
1477+
= Llama 3.1 405B
1478+
{c}
1479+
{parent=Llama 3.1}
1480+
1481+
= Open source LLM
1482+
{parent=Large language model}
1483+
{tag=Open source software}
1484+
1485+
= LLM model with open training data
1486+
{c}
1487+
{parent=Open source LLM}
1488+
1489+
= The Pile
1490+
{disambiguate=dataset}
1491+
{parent=LLM model with open training data}
1492+
{wiki}
1493+
1494+
= LLM360
1495+
{parent=LLM model with open training data}
1496+
1497+
= Open weight LLM model
1498+
{c}
1499+
{parent=Open source LLM}
1500+
14551501
= Ollama
14561502
{c}
14571503
{parent=Open source LLM}

cia-2010-covert-communication-websites.bigb

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -517,15 +517,19 @@ It is worth noting that democracies represent just a small minority of the websi
517517
<Snowden>'s 2013 revelations particularly shocked USA "allies" with the fact that they were being spied upon, and as of the 2020's, everybody knows this and has "stopped caring", and or moved to <end-to-end encryption> by default. This is beautifully illustrated in the <2016 film "Snowden"> when <Snowden> talks about his time in <Japan> working for <Dell> as an undercover <NSA> operative:
518518
> <NSA> wanted to impress the <Japanese>. Show them our reach. They loved the live video from drones. This is <Pakistan> right now \[video shows American agents demonstrating drone footage to Japanese officials\]. They were not as excited about that we wanted their help to spy on the Japanese population. They said it was against their laws.
519519

520-
We bugged the country anyway, of course.
520+
Of course we tapped the entire country anyway.
521521

522-
And we did not stop there. Once we had their communications we continued with the physical infrastructure. We sneaked into small programs in their power grids, dams, hospitals. The idea was that if Japan one day was not our allies we could turn off the lights.
522+
And we did not stop there. Once we owned their communications systems, we started going after the physical infrastructure.
523+
524+
We'd slip these little sleeper programs into power grids, dams, hospitals. The idea was that if the day came when Japan was no longer an ally, it would be "lights out".
523525

524-
And it was not just Japan. We planted software in <Mexico>, <Germany>, <Brazil>, Austria.
526+
And it wasn't just the Japanese. We were planting malware in <Mexico>, <Germany>, <Brazil>, Austria.
525527

526-
<China>, I can understand. Or <Russia> or <Iran>. Venezuela, okay.
528+
I mean, <China>, I can understand. <Russia>. <Iran>. Venezuela, okay.
527529

528-
But Austria?! \[shows footage of <cow> on an idyllic <Alpine> mountain grazing field, suggesting that there is nothing in Austria to spy on\]
530+
But Austria?!
531+
532+
\[shows footage of <cow> on an idyllic <Alpine> mountain grazing field, suggesting that there is nothing in Austria to spy on\]
529533

530534
\Video[https://www.youtube.com/watch?v=thqdMjKUjWI]
531535
{title=But Austria?! scene from <Snowden (2016)>}
@@ -9081,6 +9085,7 @@ Reactions by others:
90819085
{description=2025-06-26. 3M subs.}
90829086
* other voice media:
90839087
* https://www.smashingsecurity.com/419-star-wars-the-cia-and-a-whatsapp-malware-mirage/
9088+
* Meneame, a Spanish Reddit: 2025-05-27 https://www.meneame.net/m/tecnolog%C3%ADa/increible-web-star-wars-uso-cia-espiar-espana-mexico-otros/standard
90849089
Other media that picked it up:
90859090
* "mainstream":
90869091
* https://www.dailymail.co.uk/news/article-14752155/CIA-fake-websites-Star-Wars-communicate-spies.html Also announcing that:
@@ -9109,6 +9114,15 @@ Reactions by others:
91099114
* https://www.darkhorizons.com/how-u-s-spies-used-a-star-wars-fan-page/
91109115
* https://gigazine.net/news/20250527-starwars-fan-sites-made-by-cia/
91119116
Starting on that same day someone made starwarsweb.net redirect to cia.gov at 2025-05-26T13:28:02Z: https://www.whois.com/whois/starwarsweb.net
9117+
* 2025-08-01 saw another mini-trend due to The CIA Built Hundreds of Covert Websitesby Alan Macleod: https://www.mintpressnews.com/cia-secret-network-885-fake-websites/290325/
9118+
This then spawned some sindicated posts:
9119+
* https://www.sott.net/article/500997-The-CIA-built-hundreds-of-covert-websites-Heres-what-they-were-hiding
9120+
* https://scheerpost.com/2025/08/02/the-cia-built-hundreds-of-covert-websites-heres-what-they-were-hiding/
9121+
* 2025-08-02 https://alethonews.com/2025/08/02/the-cia-built-hundreds-of-covert-websites-heres-what-they-were-hiding/ (Greek)
9122+
* 2025-08-02 https://popularresistance.org/the-cia-built-hundreds-of-covert-websites/
9123+
* 2025-08-04 https://cz24.news/alan-macleod-cia-vytvorila-stovky-tajnych-webu-globalni-spionazni-terminaly-co-vlastne-skryvaly/ (Czech)
9124+
and forum threads:
9125+
* 2025-08-04 https://www.meneame.net/story/cia-creo-cientos-sitios-web-encubiertos-esto-ocultaban-eng/c01#c-1
91129126

91139127
Notable reactions to the websites themselves:
91149128
* 2022-09-29 https://www.reddit.com/r/soccer/comments/xrgua4/the_cia_used_a_message_board_on_a_fake_soccer/ "The CIA used a message board on a fake soccer website called "Iraniangoals.com" to communicate with Iranian spies, dozens of whom were arrested after the website was discovered." by user Carlos-Dangerzone

ciro-santilli.bigb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1055,7 +1055,7 @@ Previously, updates were being done with more focus to <sponsor>[sponsors] in th
10551055
> I'm a internal / external documentation master as you've never seen before.
10561056
* Random deeptech:
10571057
* <Oxford Instruments> 2025-07-21 https://jobs.oxinst.com/job/Oxford-Senior-Software-Engineer/819253802/ 70k salary. Cover: I just want to do something sciency. Rejected one day later.
1058-
* <Proxima Fusion> 2025-07-15 Research Software Engineer https://jobs.lever.co/proximafusion/23aab9a8-34ec-40d2-bb14-440f1130021c Got screening interview.
1058+
* <Proxima Fusion> 2025-07-15 Research Software Engineer https://jobs.lever.co/proximafusion/23aab9a8-34ec-40d2-bb14-440f1130021c Got screening interview, but rejected afterwards on 2025-08-08 before technical.
10591059
* biotech maybe
10601060
* <DNAScript>: just one front-end job, I don't think I will...
10611061
* <optical compute>

economy.bigb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,9 @@ Only being. Being, in the exact fraction of a moment where bid meets ask.
290290
{parent=Financial crime}
291291
{wiki}
292292

293+
= Launder the money
294+
{synonym}
295+
293296
= Know your customer
294297
{parent=Money laundering}
295298
{wiki}

electronics.bigb

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -674,6 +674,8 @@ It resists to change in <electric current>. Well seen at: <video LC circuit by E
674674
{parent=Resistor}
675675
{wiki}
676676

677+
\Image[https://upload.wikimedia.org/wikipedia/commons/3/3b/NTC_bead.jpg]
678+
677679
= Potentiometer
678680
{parent=Resistor}
679681
{wiki}

llm_count_mults.py

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
#!/usr/bin/env python
2+
3+
def metrics(
4+
# Number of layers
5+
L,
6+
# Embedding dimension
7+
d_model,
8+
# Dimension of hidden layer of fully connected layer
9+
d_ff,
10+
# Number of head
11+
h,
12+
# Dimension of K and Q
13+
d_head,
14+
# Context length
15+
n_ctx,
16+
vocab_size,
17+
# Grouped query attention. TODO implement.
18+
kv_heads=None,
19+
):
20+
return {
21+
'mults_per_token':
22+
# My limited brain
23+
L * (
24+
h * (
25+
# 1x K, 1x Q, and 2x for V rank decomposed
26+
4 * d_model * d_head +
27+
# Right-most column of KQ product
28+
n_ctx * d_head +
29+
# All values times newly calculated right-most column
30+
n_ctx * d_model
31+
) +
32+
# MLP for latest token only
33+
2 * d_model * d_ff
34+
) +
35+
# Output projection
36+
d_model * vocab_size
37+
38+
## ChatGPT
39+
#(
40+
# L * (
41+
# 4 * d_model**2 +
42+
# h * d_head * n_ctx +
43+
# # MLP for latest token only
44+
# 2 * d_model * d_ff
45+
# ) +
46+
# # Output projection
47+
# d_model * vocab_size
48+
#)
49+
,
50+
51+
# I think that with KV caching we are basically just doing matrix-vector multiplication.
52+
# So the number of params equals the number of FLOPs for the most part, and it is memory
53+
# bottle-necked, unless we do some query batching.
54+
'n_params': (
55+
L * (
56+
h * (
57+
# 1x K, 1x Q, and 2x for V rank decomposed
58+
4 * d_model * d_head
59+
) +
60+
# Fully connected layer, rank decomposed
61+
2 * d_ff * d_model
62+
) +
63+
# Output projection
64+
d_model * vocab_size
65+
)
66+
}
67+
68+
# https://stackoverflow.com/questions/579310/formatting-long-numbers-as-strings
69+
def human_format(num):
70+
num = float('{:.3g}'.format(num))
71+
magnitude = 0
72+
while abs(num) >= 1000:
73+
magnitude += 1
74+
num /= 1000.0
75+
return '{} {}'.format('{:f}'.format(num).rstrip('0').rstrip('.'), ['', 'K', 'M', 'G', 'T'][magnitude])
76+
77+
models = {
78+
'gpt-2': {
79+
"L": 12,
80+
"d_model": 768,
81+
"d_ff": 3072,
82+
"h": 12,
83+
"d_head": 64,
84+
"n_ctx": 1024,
85+
"vocab_size": 50257,
86+
},
87+
'gpt-3': {
88+
"L": 96,
89+
"d_model": 12288,
90+
"d_ff": 49152,
91+
"h": 96,
92+
"d_head": 128,
93+
"n_ctx": 2048,
94+
"vocab_size": 50257,
95+
},
96+
# https://arxiv.org/pdf/2407.21783
97+
'llama-3-1-70b': {
98+
"L": 80,
99+
"d_model": 8192,
100+
"d_ff": 28672,
101+
"h": 64,
102+
# TODO source
103+
"d_head": 128,
104+
"kv_heads": 8,
105+
"n_ctx": 8192,
106+
"vocab_size": 128000,
107+
},
108+
#'deepseek-v2-67b': {
109+
# "L": 80,
110+
# "d_model": 8192,
111+
# "d_ff": 28672,
112+
# "h": 64,
113+
# "n_ctx": 8192,
114+
# "vocab_size": 130000,
115+
#},
116+
}
117+
for name, params in models.items():
118+
res = metrics(**params)
119+
print(name)
120+
print(f'mults_per_token: {res['mults_per_token']:,} (~{human_format(res['mults_per_token'])})')
121+
print(f'n_params: {res['n_params']:,} (~{human_format(res['n_params'])})')
122+
print()

0 commit comments

Comments
 (0)