# Misc 2 - Musical Talent

> Can you appreciate the hidden message in my song? This is only for the musically & mathematically talented.
> 
> - Junhua
>
> Files: [result.mid](result.mid)

Alright, an outdated MID file. My laptop doesn't play it, so let's try some random [online MIDI editor](https://signal.vercel.app/).

![](signal1.png)

From looking around a bit, it just plays a single piano note at a time, for some fixed amount of time (something like 2 seconds?).

Let's just look inside a hex editor to see if we're missing anything else:

![](hex1.png)

Nope, it's just the same process all the way down, with the only information coming from the note value. It appears twice per row, for what I'm assuming are MIDI NOTE ON and MIDI NOTE OFF messages respectively.

Anyhow, let's start by collecting some statistics on these.

In [9]:
notes = open('result.mid', 'rb').read()[57::23]
len(notes), list(notes[:10])

(26583, [77, 67, 12, 52, 12, 76, 69, 41, 12, 76])

Ok, we have 26.6k bytes on actual useful information, that's way more than the flag. Also we just visually check that the first 10 notes do roughly correspond to the MIDI editor screenshot.

Now how about an actual frequency table?

In [12]:
from collections import Counter
sorted(Counter(notes).items())

[(12, 4439),
 (13, 3),
 (14, 9),
 (16, 6),
 (17, 758),
 (19, 21),
 (21, 547),
 (22, 1),
 (23, 190),
 (24, 26),
 (25, 3),
 (26, 4),
 (28, 3),
 (29, 2496),
 (31, 186),
 (33, 24),
 (34, 3),
 (35, 277),
 (36, 266),
 (38, 11),
 (40, 8),
 (41, 730),
 (43, 987),
 (45, 1248),
 (47, 130),
 (48, 206),
 (50, 8),
 (52, 1837),
 (53, 439),
 (55, 633),
 (57, 1458),
 (59, 473),
 (60, 24),
 (62, 14),
 (64, 284),
 (65, 733),
 (67, 1381),
 (69, 1907),
 (71, 42),
 (72, 16),
 (74, 9),
 (76, 878),
 (77, 1665),
 (79, 1644),
 (81, 555),
 (83, 1)]

Ooh, we immediately see quite a few patterns emerge out of this.
1. The value `12` appears very often
2. There are interesting gaps between numbers that seem to cycle every 12 values. In fact, let's see what it looks like mod 12.

In [13]:
sorted(Counter(n%12 for n in notes).items())

[(0, 4977),
 (1, 6),
 (2, 55),
 (4, 3016),
 (5, 6821),
 (7, 4852),
 (9, 5739),
 (10, 4),
 (11, 1113)]

This is very telling. Ignoring the very rare values of 1 and 10, we get the well-known major scale: [0,2,4,5,7,9,11].

![](major_scale.gif)

Looking back again at the earlier (full) frequency table, we do notice that the frequency itself follows this cycle as well, as in it peaks around E-F-G and troughs around B-C. Why is this? It's almost as if the note sounds are more significant that the note octave.

Let's look again at our note value table, which goes from 12 to 83 and ignores the sharps:

![](note_values.png)

The working theory now, is that instead of going by the MIDI order C1,D1,...,B1,C2,D2,...,B6, we instead want to read in the transpose order, i.e. C1,C2,...C6,D1,D2,...,B6. This seems doable, though let's use `H` and `I` instead of `A` and `B`.

In [17]:
def val_to_C1(val):
    letter = 'CCDDEFFGGHHI'[val % 12]
    number = val // 12
    return f'{letter}{number}'

sorted(Counter(val_to_C1(n) for n in notes).items())

[('C1', 4442),
 ('C2', 29),
 ('C3', 266),
 ('C4', 206),
 ('C5', 24),
 ('C6', 16),
 ('D1', 9),
 ('D2', 4),
 ('D3', 11),
 ('D4', 8),
 ('D5', 14),
 ('D6', 9),
 ('E1', 6),
 ('E2', 3),
 ('E3', 8),
 ('E4', 1837),
 ('E5', 284),
 ('E6', 878),
 ('F1', 758),
 ('F2', 2496),
 ('F3', 730),
 ('F4', 439),
 ('F5', 733),
 ('F6', 1665),
 ('G1', 21),
 ('G2', 186),
 ('G3', 987),
 ('G4', 633),
 ('G5', 1381),
 ('G6', 1644),
 ('H1', 548),
 ('H2', 27),
 ('H3', 1248),
 ('H4', 1458),
 ('H5', 1907),
 ('H6', 555),
 ('I1', 190),
 ('I2', 277),
 ('I3', 130),
 ('I4', 473),
 ('I5', 42),
 ('I6', 1)]

Immediately we spot that E4-I5 forms the normal [English letter frequency distribution](https://en.wikipedia.org/wiki/Letter_frequency). Maybe we can just offset it so that E4 takes ASCII 'A' or 'a' and work from there.

In [55]:
def val_to_ascii(val):
    letter = 'CCDDEFFGGHHI'[val % 12]
    number = val // 12
    return chr((ord(letter) - ord('C')) * 6 + number + 49)

''.join(map(val_to_ascii, notes))

'IN2A2CTF2CONTEXT42FORENSICS2CHALLENGES2CAN2INCLUDE2FILE2FORMAT2ANALYSIS42STEGANOGRAPHY42MEMORY2DUMP2ANALYSIS42OR2NETWORK2PACKET2CAPTURE2ANALYSIS52ANY2CHALLENGE2TO2EXAMINE2AND2PROCESS2A2HIDDEN2PIECE2OF2INFORMATION2OUT2OF2STATIC2DATA2FILES2AS2OPPOSED2TO2EXECUTABLE2PROGRAMS2OR2REMOTE2SERVERS2COULD2BE2CONSIDERED2A2FORENSICS2CHALLENGE2UNLESS2IT2INVOLVES2CRYPTOGRAPHY42IN2WHICH2CASE2IT2PROBABLY2BELONGS2IN2THE2CRYPTO2CATEGORY52FORENSICS2IS2A2BROAD2CTF2CATEGORY2THAT2DOES2NOT2MAP2WELL2TO2ANY2PARTICULAR2JOB2ROLE2IN2THE2SECURITY2INDUSTRY42ALTHOUGH2SOME2CHALLENGES2MODEL2THE2KINDS2OF2TASKS2SEEN2IN2INCIDENT2RESPONSE2IR52EVEN2IN2IR2WORK42COMPUTER2FORENSICS2IS2USUALLY2THE2DOMAIN2OF2LAW2ENFORCEMENT2SEEKING2EVIDENTIARY2DATA2AND2ATTRIBUTION42RATHER2THAN2THE2COMMERCIAL2INCIDENT2RESPONDER2WHO2MAY2JUST2BE2INTERESTED2IN2EXPELLING2AN2ATTACKER2ANDOR2RESTORING2SYSTEM2INTEGRITY52UNLIKE2MOST2CTF2FORENSICS2CHALLENGES42A2REAL2WORLD2COMPUTER2FORENSICS2TASK2WOULD2HARDLY2EVER2INVOLVE2UNRAVELING2A2SCHEME2OF2CLEVERLY2EN

That looks good in general. And there's GREY in the middle as well, which is the flag. So maybe it's all lowercase, and let's try to guess the range of the other things.

In [69]:
def val_to_ascii2(val):
    letter = 'CCDDEFFGGHHI'[val % 12]
    number = val // 12
    tmp = (ord(letter) - ord('C')) * 6 + number
    return " ',.0123456789_abcdefghijklmnopqrstuvwxyz{"[tmp-1]

''.join(map(val_to_ascii2, notes))

"in a ctf context, forensics challenges can include file format analysis, steganography, memory dump analysis, or network packet capture analysis. any challenge to examine and process a hidden piece of information out of static data files as opposed to executable programs or remote servers could be considered a forensics challenge unless it involves cryptography, in which case it probably belongs in the crypto category. forensics is a broad ctf category that does not map well to any particular job role in the security industry, although some challenges model the kinds of tasks seen in incident response ir. even in ir work, computer forensics is usually the domain of law enforcement seeking evidentiary data and attribution, rather than the commercial incident responder who may just be interested in expelling an attacker andor restoring system integrity. unlike most ctf forensics challenges, a real world computer forensics task would hardly ever involve unraveling a scheme of cleverly en

The end brace is missing, but the flag is `grey{y0u_h4v3_4_mus1c4l_t4l3nt_t00_c207584111e9a5bcd5d27813b719dee8}`.