Skip to content

Non english characters

Jeremy Clark edited this page May 7, 2021 · 3 revisions

TypeRip is now capable of scraping what is believed to be the entire character/glyph set for each font. This is made possible through the way the use.typekit.net API handles character set requests. A normal request looks like the following: https://use.typekit.net/pf/tk/hdwv/n4/m?unicode=<unicode character set>&features=<font-features>&v=3&token=<request token> the unicode parameter in the request tells the server which unicode characters to include in the font being requested. This was overlooked in previous versions of the tool, as a similar primer parameter was used, which similarly appears to request a specific set of characters. By omitting the primer and instead only using unicode in the url, we are able to request a specific charset.

The next challenge was to figure out which value to set unicode to in order to get the full charset. Adobe serves a large font file on some of their pages which includes the string unicode=AAAAAQAAAAEAAAAB&features=ALL&v=3. The resulting font file is over 18mb in size, indicating that there are at least a huge number of characters requested. By attaching this string to font request URLS, a much larger set of characters are included in all fonts, leading me to believe that these are very likely the entire character set. This works for all languages, including Japanese characters, which have been specifically problematic in the past.

As of writing, I haven't been able to decode what AAAAAQAAAAEAAAAB actually means, but from testing I can't find any missing glyphs. Further testing and research needs to be performed to determine

  1. If the charsets are in fact complete, and
  2. What the AAAAAQAAAAEAAAAB string actually means.

The below text is now depricated, as the issue has been resolved. It is left here only for archival purposes.

As of the time of writing, TypeRip is only able to rip a basic English character set. The reason for this is Adobe Font's preview page loads font glyphs dynamically based on the Sample Text input. For example, viewing the network requests from the page https://fonts.adobe.com/fonts/mvb-verdigris-pro reveals the following:

The browser loads the six basic English font family OTF preview files. (These are what TypeRip currently scrapes, converts and serves.)

  1. https://use.typekit.net/pf/tk/hdwv/n4/m?unicode=AAAH8QAAAAfaCGXr3N2vdS239zd4rBQQtk-VoUBHvxnemC4kVM5aXCFLvdJp8X_mNf9ABgj_NJLd51f255Dnqpc33RSejXuTJ6YUMAzZpNNz-so9tCWEUhBMM_1k9UsSgUe05OX32n4RZX7O2YfiKRrCpUpyGNbyPURxRDm8D7SalEDyI-EmAdvFF67v687d0V6JaT67APrG_XBJCwMRJWwxHmEPg9Cuh4umwf_R4WPl0DAa2sdwrlgu2cobgxC6IYHcuoHdexskDjnsdLVOWr_bEpidj9jAYSfrpY2HwPDD-cHazrvMnTx6xX_hDt92tukupjhBx52_TlvvyIWWSm2mEuoAAcuh&features=calt%2Cclig%2Cliga%2Clocl%2Crlig&v=3&token=UWiewzNGqfaXgl50sfTX597VqiNVO%2BVsJvzyIM5SabTjZinSwqaidRKypEX3LMfHdBUibNmD%2BPCRx%2F5yCXr4NA%3D%3D
  2. etc, etc...

Additionally, the browser loads three files with the HTTP header response type x-typekit-augmentation. These files are much larger than the basic font files. They appear to be OTF type files, but attempts to open them in OTF viewers has failed:

  1. https://use.typekit.net/af/83922b/00000000000000003b9b39a8/27/m?unicode=AAAAtwAAAAckFn6lptijy0sZnb9Yh-oAdY7aewBG5qc&gdyn=eJztWj-PHTUQt_eWuwWdyIIorojAiuhoTlBwSEg4VBQUSHyBFy5AEi7kFA4CDViIIkUKSrrkI_ANeIDE1yASHVBQBhFxPK_Xu_PPXu8RIYr9JbfP9oxnxuPx2M_7VKWU0krVb9e_PffXKYDSjfIwqhAVbXCg_KZ6Y0e5wFLL_dvu2aC2htED9lTrVKs70yNBGzeoxSpMxuogrdP1avdhu-f9i0jfBt4nO3_ffe3309NbP__ZO0lPSKbYGkp1sN1jPaqdwm4suJmKczBjsY5li1lcsLXxei-Hpio0cFi5pchVozw5SLhsDBSDRqVrHuscUXXh1eiglenV1B4j-oIbNWLk3yWUVcXGnxSegcbDGioVZuIMJPIj08jq_EOaoYYV-lpN9QYkUsEZmPpF7PFt93Sbv9slHdFozRT3elzCqyyj16_2UVPZQKIiMDNFHfcUTguh_xZk-SPXf3tahcNVGreeY6tow1jJzeeTQ91mMZlAYrF1cCgHOf8QhuDHIMDCyp7EwXCcoR2gGtAoWVSI8-cyxP0MTc3aUJ4mPbVcWeOMLzpVwpPMmiCo7QpG7LMVk3nZxMwCSGXAh06aqU38qh1ZynneVHZ4yAX0PJQuIZMmTQRRCi451HXOCdpNi16FD8Hrq-nOKWysmtHbnF3RI0Yu4TCY_tPP6YrQkkloy8Ga2xoCQlx4Z49dSZwB5bb7X7Iv3sdVl-P1w47bYC5fwa1yRYkgomHchwSwTurtkV-kJksVkVyycZvT6ykRE0lVXMLbeINbF6V_4x8uz7MuEZSFXyWlu5HpnivaLGRzAVXYzGSklph5hBkf4WCaRX0K1gtLJmV7K3UNC2jg-oF3O1agV-4U6Rv6jxiDjQShDm3ZgbjEBv5okI6HLCaPCv2Q1rlDPMoEMMgK0mh-4z_LseCBf8SZ2FIgLo4LL0MI_NCdRBBPH49ika2lxllf8jz-tSXWP3zUZg9TajQNm9gUffVLzEn5iaMzzgdKaeqN65AOKhprUh3dxEG39BwcMDir02cRLWmCBLfDA5HNfWhYd0_rJGWpNH6AhVnOsWey9snIXhlkkAnF--Fjb-pWqyQqZ6J86tlaWsGKRRnPItmOybIW1jwdz7th02VJnaK_tZdQq9Ra5f6GdXxX1hJqreJK8GoNoObTl2NcjpbGBg8bWWvBXqXomkN3D2StDmjmXdM7BVOkDmZgK9uOgNvUrFtaK2md6jL_Fjg5O0RUdmE8QcmPo1qNJQwq9ahm9lWHY0TTO917TvBD7GCLNAVrTazlzBuNMqBVnoncmy2mY_KgQFeAheIYc2MSk9iqpoY1CS7OKLJSw88WVgSHOf_wzmoEFclcdVsyiAWthc6qB20-KUvj-fzZX_Udr_Lh-DrzC0F5FlXCFhAGVkVTmolDHCYnljFrrmxQlxZuaQPK4mc8WZqxKEaLk7g3jVVY8CRNXUTcNq8a6kPz07-Y7Oj7hF2U-eD07g8_PjyFr7R_Of0ur30WSPhQI0rILtZWwOl7jA2CxQiboU6Q2e0v6eNLcd_KF-F4uNuEygs_ncLlojPOlcCDGobfU54eTEkuvM4-h5ocoyuJnjorEDR9n9Z2BatmLZFe10pFp8ivfInEeV82hiNvi7PvlBQXC2d5TyzBDKXhQEb83_RPE3_WgbdUyghguTAJMUzkIWX7V4my4MdKZgy4sBlQZwb_BYAmxpn8z1oIojQ0NlcTLbOvEiyRKLnQ8KIjYlwUFqGV6PAW0gdUVklxULIQ8gMOMt9SnYEXEkxWtAlIp-1NoswQByAbuTFJjetgGKyReOWkE1lr2CR5DXd33KKvRPlcrxv1DCPXQxOsUeogrB550yD0NkuNtrFvjA5wO1lRS8701eYf0lb52qxlVSN9DokzxEFrROJwUmNSp4QqWy0SRuqt410MqgkMRMREQP9XqLq_YXxwoLa35fuuMra7jDiBTSdWJIwDTm-7NqwqGDfmm0itehEjtyWcVEYKhrV4Sc6yZkcbUEbsp5aOynKT8ju65oq9kEE5H4xOfhnoMO89Ge8f3mTLV-JquKWGgzIqREDXkf4Khsmxw0cFhRSeB3POdIkO90hTgzTmjj5q4py_29_4-R7Jq8lm6vQnUHOA7Bl3pMwZpLDIa4eH92UuiuTzQJvIAs0oNlyEQOLreTMn9iQ3CM3Ypfrvego5xcSqA14cJLUVl-icyiBLpIbVmYNx4aVgrkNviwkf4FYSH_W4u2pkE49c_gPcejrbJpa2HQ1icJm6w2ZrTLVRclBbjVSTNLJDw7gaWsKK42_CrUpMfvobVmNlpqk3dBQNjOi4OzL3tHym3Qwl03ctvMsc-QEmRaCicqvjHiV_jWoWSxhUgmNI2QQYUJbud4LTN55zQufYofC4rYPIafOAUdkXBh42o4zpSDEjBqQnDk2aK-uCgyTt6BQm83zTfyIrW_SpQSW1HxmFEz-VRGB91uHchrXUFnaK2lQfFGp9jhyIHwM2sXxDLHqe2wXNTS9Lw5mhhWPJM36ZFNPBZWrp5rYLiSqTgy2tw-Vxxts7MGBRgjub2AULFixYsGDBggULFixYsGDBggULFixY8P_CM5cuHZ3ow-5xeP1YXz66-r5-79bVy_rK1Y9O9BVfunb88sHmcfDS5vHKvt5wXNJH167f0Ec3Do_0h0eHH-hjz3fz43c-0yee8Mm7N0_0J13p5smL-h-60g68&v=3
  2. etc, etc...

When non-english character text is entered into the Sample Text box in the Adobe Fonts preview page, for example the character ñ, the page requests three new x-typekit-augmentation files. Again, these appear to be OTF files, but opening them traditionally does not work:

  1. https://use.typekit.net/af/83922b/00000000000000003b9b39a8/27/m?primer=7946fcb094cf66847997f3cf3b82042f17f182120adff976378a185e1cb56fec
  2. https://use.typekit.net/af/d0f941/00000000000000003b9b39a5/27/m?primer=095060056b103f478d97f1273b02cbda0ea3cd9536b31c1fbdab05d3c9997f42
  3. https://use.typekit.net/af/3ea9c2/00000000000000003b9b39a7/27/m?primer=755e96339fe1ac8bf86a81020a54f0c21cda5beeffb61a842aab0fd8200a9664

Some more reverse-engineering needs to be done to determine what these files specifically are, and how Adobe is using them to create regular font glyphs in the browser.