# Quran Dataset

**This will convert ```holy_quran.json``` to a pandas ```DataFrame()```**

1. One ```DataFrame()``` for the whole Quran
1. A seperate ```DataFrame()``` for each Surah

In this notebook, some variable names will be in *Arabic* though written in *English*. For readeability purposes, here's a glossary of the variable names:

- ```Surah```: Chapter (e.g. The Quran has a total of 114 surahs)
- ```Ayahs```: Verse
- ```Juz```: Part (i.e. The Quran is divided into 30 ayahs)

In [1]:
import pandas as pd
import json

### 1. Read in ```holy_quran.json```

In [2]:
f = open("data/holy_quran.json", "r", encoding = "utf-8")

In [3]:
data = json.load(f)

In [4]:
data;

### 2. Convert JSON string to pandas object

In [5]:
df = pd.read_json("data/holy_quran.json")

## 1. One ```DataFrame()``` for the whole Quran
1. Create lists for all of the features needed 
2. Pass those lists to the respective keys in the DF

In [6]:
surah_lst = []
for chapter in range(0, 114):
    surah_lst.append(df["data"]["surahs"][chapter]["name"])

In [7]:
surah_english_lst = []
for chapter in range(0, 114):
    surah_english_lst.append(df["data"]["surahs"][chapter]["englishName"])

In [8]:
surah_english_translation_lst = []
for chapter in range(0, 114):
    surah_english_translation_lst.append(df["data"]["surahs"][chapter]["englishNameTranslation"])

In [9]:
revelation_type_lst = []
for chapter in range(0, 114):
    revelation_type_lst.append(df["data"]["surahs"][chapter]["revelationType"])

In [10]:
num_ayahs_lst = []
for chapter in range(0, 114):
    num_ayahs_lst.append(len(df["data"]["surahs"][chapter]["ayahs"]))

In [11]:
juz_lst = []
for chapter in range(0, 114):
    for ayah in range(0, len(df["data"]["surahs"][chapter]["ayahs"])):
        juz_lst.append(df["data"]["surahs"][chapter]["ayahs"][ayah]["juz"])
        
# need to create 2-d lst?

In [12]:
quran_df = pd.DataFrame({"surah": surah_lst, 
              "surah_english": surah_english_lst,
              "surah_english_translation": surah_english_translation_lst,
              "revelation_type": revelation_type_lst,
              "num_ayahs": num_ayahs_lst
            })

In [13]:
quran_df

Unnamed: 0,surah,surah_english,surah_english_translation,revelation_type,num_ayahs
0,سُورَةُ ٱلْفَاتِحَةِ,Al-Faatiha,The Opening,Meccan,7
1,سُورَةُ البَقَرَةِ,Al-Baqara,The Cow,Medinan,286
2,سُورَةُ آلِ عِمۡرَانَ,Aal-i-Imraan,The Family of Imraan,Medinan,200
3,سُورَةُ النِّسَاءِ,An-Nisaa,The Women,Medinan,176
4,سُورَةُ المَائـِدَةِ,Al-Maaida,The Table,Medinan,120
...,...,...,...,...,...
109,سُورَةُ النَّصۡرِ,An-Nasr,Divine Support,Medinan,3
110,سُورَةُ المَسَدِ,Al-Masad,The Palm Fibre,Meccan,5
111,سُورَةُ الإِخۡلَاصِ,Al-Ikhlaas,Sincerity,Meccan,4
112,سُورَةُ الفَلَقِ,Al-Falaq,The Dawn,Meccan,5


#### To-Do's:
- Complile a list of all of the verses for each surah
- make juz_lst to a 2-D lst (do a set, then choose the mode as sometimes a new juz starts at X page of a surah hence why there's multiple diff juz numbers for some surahs)
- output to csv

In [14]:
# do first surah only
verses = []
for i in df["data"]["surahs"][0]["ayahs"]:
    verses.append(i["text"])

In [15]:
verses

['\ufeffبِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ',
 'ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ',
 'ٱلرَّحْمَٰنِ ٱلرَّحِيمِ',
 'مَٰلِكِ يَوْمِ ٱلدِّينِ',
 'إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ',
 'ٱهْدِنَا ٱلصِّرَٰطَ ٱلْمُسْتَقِيمَ',
 'صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُوبِ عَلَيْهِمْ وَلَا ٱلضَّآلِّينَ']

## 2. A seperate ```DataFrame()``` for each Surah
#### To-Do's:
- Look over JSON to extract useful features 
- Follow similar path to for the above df w/ some tweaks here and there
- output to csv