Skip to content

bikashpadhikari/nepali-brihat-sabdakosh-json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Structured JSON of Nepali Brihat Sabdakosh

This repository contains a structured JSON dump of all 122,000 words of the Nepali Brihat Sabdakosh (नेपाली बृहत् शब्दकोश, Unabridged Nepali Dictionary) published by Nepal Academy. It also contains the tools necessary to generate the JSON.

Data Source

Data was extracted from the version 19 APK of the np.com.naya.sabdakosh Android app (play store link). The app uses an embedded realm database encrypted with the following key:

7feb8c9cd654567106e867956802fd25609d58624539e10f02eaf6aef5facda9d5b9f5024fe4234c1f08c01ed875976719369dfa94b645a1212fdd968e00b6f3

You'll have to use Realm Studio version 13 to open the Realm database embedded in the app under assets/db.realm, since newer versions aren't backwards compatible with the version the app uses. extract.go assumes that the input was exported using Realm Studio's "export to JSON" feature.

Each meaning in the Realm database is also encrypted with an AES-256-CBC passphrase of 058aa5325d7d2e7. You can decrypt individual rows with a command like the following:

echo "U2FsdGVkX19pmyuNuE6X1Cne+Qc2mEhxBXrawcMdh/tkkZnj7Dj2Z0HYGPCQl27RM30pTEvYM6VuAK/WZtlJh07YkLaM6CRJI6XjrL4egaHF3ijpm/kuyT7hzQjHOU2gRtJNLCFXTbLP/RHUPj1+sHNylAmsbnI8zHSO7C
PU61A=" | openssl enc -aes-256-cbc -d -a -A -md md5 -pass pass:058aa5325d7d2e7
<span▥>चिकामारी</span><br/><br/><a◳>ना.</a><p▦>चिकीखेल।</p> 

Data Schema

Each entry is a JSON object. For example, for अ, the object is:

{
  "word": "अ",
  "definitions": [
    {
      "grammar": "ना.",
      "senses": [
        "१. देवनागरी वर्णमालाको स्वर वर्णमध्ये पहिलो स्वर वर्ण; परम्परागत रूपमा कण्ठस्थानबाट उच्चारण हुने ह्रस्व स्वर वर्ण र भाषाविज्ञानअनुसार आधा खुला; केन्द्रीय स्वर वर्ण; लेख्य रूपमा सो स्वर वर्णको प्रतिनिधित्व गर्ने लिपिचिह्न।",
        "२. लेखाइका क्रममा विषयको विभाजन उपविभाजनका निम्ति स्वर वर्णको प्रयोग गरिँदा दिइने क्रमबोधक पहिलो चिह्न।"
      ]
    },
    {
      "grammar": "ना.",
      "etymology": "[सं.]",
      "senses": [
        "१. संस्कृत एकाक्षरी कोशअनुसार मूलतः विष्णुलाई जनाउने मङ्गलवाची शब्द।",
        "२. ॐ भित्र निहित अ+उ+म् तीन ध्वनिमध्ये विष्णुलाई बुझाउने पहिलो ध्वनि (उ तथा म् ध्वनि क्रमशः शिव तथा ब्रह्मालाई बुझाउने मानिन्छन्)।"
      ]
    },
    {
      "grammar": "नि.",
      "senses": [
        "झर्को, गाली, बेवास्ता, अस्वीकार आदि बुझाउन आवेगका अवस्थामा प्रयोग गरिने विस्मयादिबोधक शब्द; आ।"
      ]
    },
    {
      "grammar": "पूस.",
      "senses": [
        "शब्दका अगाडि लागेर अभाव, भिन्नता, विपरीतता आदि बुझाउने पूर्वसर्ग।"
      ]
    },
    {
      "grammar": "नि.",
      "senses": [
        "दिक्क लागेको अवस्थामा व्यक्त गरिने उपेक्षा भाव।"
      ]
    }
  ]
}

Each definition can have a grammar, etymology and senses field. A sense is actually an HTML string, and may include examples tagged by a <span class="example">.

Example

$ curl --no-progress-meter 'https://raw.githubusercontent.com/bikashpadhikari/nepali-brihat-sabdakosh-json/main/sabdakosh.json.gz' | gunzip | jq '.[] | select(.word=="किरण")'
{
  "word": "किरण",
  "definitions": [
    {
      "grammar": "ना.",
      "etymology": "[सं.]",
      "senses": [
        "सूर्य, चन्द्र, बत्ती आदिबाट निस्किने चम्किलो मिहिन रेखा; प्रकाशबाट चारैतिर फिँजिने उज्यालो रूप; ज्योति; रश्मि; प्रभा; तेज।"
      ]
    }
  ]
}

License

All code is under the MIT license. I'm not sure what the license for the actual dictionary would be. Use at your own risk.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages