# Surahs Dataset

**This will convert ```holy_quran.json``` to a pandas ```DataFrame()``` (DF) for each surah.**

In this notebook, some variable names will be in *Arabic* though written in *English*. Here's a glossary of the variable names:

- ```Surah```: Chapter (e.g. The Quran has a total of 114 surahs)
- ```Ayahs```: Verse
- ```Juz```: Part (i.e. The Quran is divided into 30 ayahs)

## Setup

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import mode

import json

# notebook configurations
pd.options.display.max_colwidth = 1000

import warnings
warnings.filterwarnings('ignore')

### 1. Read in ```holy_quran.json```

In [2]:
f = open("data/holy_quran.json", "r", encoding = "utf-8")

In [3]:
data = json.load(f)

In [4]:
data;

### 2. Convert JSON string to pandas object

In [5]:
df = pd.read_json("data/holy_quran.json")

## ```DataFrame()``` for each Surah

In [6]:
def surah_df(surah_no):
    surah_no = int(surah_no) - 1
    
    # ayahs
    surah_ayahs_lst = []
    for ayah in df["data"]["surahs"][surah_no]["ayahs"]:
        surah_ayahs_lst.append(ayah["text"])
        
    # whether there's a sajda in the respective ayah
    sajda_ayahs_lst = []
    for ayah in df["data"]["surahs"][surah_no]["ayahs"]:
        sajda_ayahs_lst.append(ayah["sajda"])
    
    surah_df = pd.DataFrame({ "ayahs": surah_ayahs_lst,
                             "ayah_no": [i for i in range(1, len(surah_ayahs_lst) + 1)],
                            "sajda": sajda_ayahs_lst})
    
    # rename values
    surah_df["sajda"] = surah_df["sajda"].replace({True: "Yes", False: "No"})
    
    # export df to csv
    
    surah_df.to_csv(f"data/surahs/{surah_no + 1}_{df['data']['surahs'][surah_no]['englishName']}.csv", index = False)
    
    return surah_df

In [7]:
for i in range(1, 115):
    surah_df(i)