## Notebook Description
This Jupyter Notebook opens a csv file that contains narrators and information on their students/teachers, cleans it up, turns it into a graph, and uploads that graph to GraphSpace. 

Useful references: 
- https://graphspace-python-library.readthedocs.io/en/develop/tutorial/tutorial.html 
- https://manual.graphspace.org/projects/graphspace-python/en/latest/reference

## Imports

In [1]:
from graphspace_python.graphs.classes.gsgraph import GSGraph
import plotly.express as px
import json
import pandas as pd

## Functions

### clean_index_list(column_name)
- **input**: column_name, a string. The name of the column that stores the strings of comma-separated digits (indices of the scholars - either students or teachers)
- **output**: list of numeric indices built from that column

basically: strings of lists of numbers ---> turns into ---> lists of numeric indices

In [70]:
def clean_index_list(column_name): 
    inds_corrected = []
    for indx, data in df.loc[:,[column_name]].iterrows():
        
        inds_original = data[0] # currently a string of numbers separated by commas
        
        # if it's null, append it to corrected list: null students = no students.
        if pd.isna(inds_original):
            inds_corrected.append(inds_original)

         # if it's a string, split by commas and turn the strings of digits into ints. 
        elif isinstance(inds_original, str):
            temp = []
            for item in inds_original.split(','):
                if item.strip().isdigit():
                    temp.append(int(item.strip()))
                else:
                    print("Non-numeric character found in what is supposed to be a string of comma-separated digits of teachers or students at id="+str(indx)+", value: "+item.strip())
            inds_corrected.append(temp)
        else:
            raise TypeError("index value at indx "+str(indx)+" is neither str nor NaN")
        
    return inds_corrected

### makegraph(G, df, teacherIDs, studentIDs):
- **G**: a GraphSpace graph 
- **df**: the dataframe that contains the nodes as row entries (scholars) and teacher/student IDs in their columns
- **teacherIDs**: string. The name of the column that has the indices of each node's teachers we want to include in the graph.
- **studentIDs**: string. The name of the column that has the indices of each node's students we want to include in the graph.

This function that takes a graph as input, adds nodes and edges to it, then outputs it. It allows you to specify which graph, dataframe, and column for teachers/students to include. Currently in my csv files, each row/scholar has multiple columns for teachers and students. There is the original teachers/students column that came from muslimscholars.info, and there are additional teachers/students columns that I made that may have more or less teachers/students for each respective scholar. So with this function I get to choose whether I want to specify which edges between each node and its teachers/students are shown, or whether I just want to keep it as the muslimscholars.info data had it. 

#### Examples: 
- **makegraph(G, df, 'students_inds', 'teachers_inds'):**
    - 'students_inds' and 'teachers_inds' are the names of the original columns with teacher/student IDs in them as taken from muslimscholars.info. So this will makes edges between nodes of the csv if they had any teacher/student relationship, regardless of whether it traces back to the node of interest (e.g.: Aishah) or not. So for example, if I wanted to make a graph just of Aishah and the hadiths she transmitted to her students and the hadiths her students transmitted from HER to their students, this graph would not work - because it would show edges between her teachers/students if they transmitted ANY hadiths to each other, regardless of whether the hadith was transmitted from Aishah or not.
- **makegraph(G, df, 'specified_teachers', 'specified_students')**
    - 'specified_teachers' and 'specified_students' are the names of the columns where I've specified which students/teachers to include for each scholar. For example, if our specified scholar is Aishah r.a., then the graph should have edges from her to all her students (the ones listed in the csv) and from her students to her students' students. It is essentially a subgraph of interrelationships() in that the interrelationships() should include all the edges in onescholar(), but not necessarily the other way around. interrelationships() might show connections between students and students' students whether the hadiths they transmitted to each other were narrated originally from Aishah r.a. or someone else, whereas onescholar() should only show the edges that trace back to Aishah r.a.

In [46]:
def makegraph(G, df, teachersIDs, studentIDs):
    # add nodes to G
    for indx, data in df.iterrows():
        # The node's id is its scholar_indx, NOT its row number as in previous versions.
        #G.add_node(int(df['scholar_indx'][indx]),name=data['name'], label=data['name'],gender=data['gender'])
        G.add_node(int(df['scholar_indx'][indx]),label=data['simplename'], fullname=data['fullname'],gender=data['gender'], info=data['info'], generation=data['generation'])
        
    # keep track of all the scholar indices who have nodes in the graph/entries in the data
    # this is because I only want to include nodes of scholars who have their own entries in the data set, and not necessarily any scholar that may be listed as a student/teacher of another.
    scholars_with_entries = []
    for indx, scholar_indx in df.loc[:,['scholar_indx']].iterrows():
        scholars_with_entries += [int(scholar_indx)]


    # add edges from students_inds
    for teacher, students in df[studentIDs].items():
        if isinstance(pd.isna(students), bool):
            pass
        else:
            for student in students:
                # check to make sure the student/teacher each have their own entries in the data
                # note: once I make it so that the teacher/student lists are only for nodes that exist in the data, I might not need this except for as a sanity check
                if not ((student in scholars_with_entries) & (int(df['scholar_indx'][teacher]) in scholars_with_entries)):
                    pass
                elif G.has_edge(int(df['scholar_indx'][teacher]), student):
                    pass
                else:
                    G.add_edge(int(df['scholar_indx'][teacher]), student, directed=True)

    # add edges from teachers_inds 
    for student, teachers in df[teachersIDs].items():
        if isinstance(pd.isna(teachers), bool):
            pass
        else:
            for teacher in teachers:
                # check to make sure the student/teacher each have their own entries in the data
                if not ((int(df['scholar_indx'][student]) in scholars_with_entries) & (teacher in scholars_with_entries)):
                    pass
                elif G.has_edge(teacher, int(df['scholar_indx'][student])):
                    pass
                else:
                    G.add_edge(teacher, int(df['scholar_indx'][student]), directed=True)
    
    return G

## Making the graph

### Clean the data

In [71]:
# Read the csv as a dataframe
#df = pd.read_csv('sourcedata/narratorsTESTING.csv')
df = pd.read_csv('sourcedata/aishah.csv')

In [72]:
df

Unnamed: 0,scholar_indx,name,grade,parents,spouse,siblings,children,birth_date_place,places_of_stay,death_date_place,...,gender,simplename,fullname,relationship,specified_teachers,specified_students,info,hadiths,generation,notes
0,1,Prophet Muhammad(saw) ( محمّد صلّی اللہ علیہ و...,Rasool Allah,'Abdullah ibn 'Abd al-Muttalib [9991] / Amina ...,"Khadijah [51] , Sawda bint Zam'a [52] , 'Aisha...",,"al-Qasim bin Muhammad [516] , Zaynab bint Muha...",53 BH/570 CE (9th Rabi' awwal) (Makkah),"Makkah, Medina",11 AH/632 CE (12th Rabi' awwal (Medina)[ Natur...,...,male,Prophet Muhammad,"Prophet Muhammad, peace and blessings be upon him",teacher,,53,,,,
1,13,Abu Hurairah ( أبو هريرة - عبد الرحمن بن صخر ا...,Comp.(RA) [1st Generation],/ Umaima/Maymuna,Basra bint Ghazwan [3656],,Daughter,"12 BH/603 CE (Baha, Yemen)","Makkah, Medina, Yemen, Bahrain",59 AH/681 CE (Medinah)[ Natural ],...,male,Abu Hurayrah,Abu Hurayrah Abdur-Rahman ibn Sakhr,student,53,,,,Companion,
2,17,ibn Abbas ( عبد الله بن العباس بن عبد المطلب ...,Comp.(RA) [1st Generation],al-'Abbas ibn 'Abd al-Muttalib [100] / Umm Fad...,Zar'ah bint Mishrah b. Madi-Karib,"Fadl ibn al-'Abbas [135] , 'Ubaydallah bin al-...","'Ali bin 'Abdullah bin 'Abbas [10949] , 'Abbas...",3 BH/619 CE (Makkah),"Makkah, Medina",68 AH/687 CE (Ta'if)[ Natural ],...,male,`Abdullah ibn `Abbas,`Abdullah ibn `Abbas,student,53,,,,Companion,
3,18,ibn Umar ( عبد الله بن عمر بن الخطاب ( رضي الل...,Comp.(RA) [1st Generation],'Umar ibn al-Khattab [3] / Zaynab bint Maz'un ...,"Safiyya bint Abi 'Ubaid al-Thaqafi [11811] , U...","Hafsa bint Umar [54] , 'Abdur Rahman bin 'Umar...","Abu Bakr, Abu 'Ubaida, Waqid bin 'Abdullah ibn...",10 BH/613 CE (Makkah),"Makkah, Medina",74 AH/693 CE (Makkah)[ Natural ],...,male,`Abdullah ibn `Umar,`Abdullah ibn `Umar ibn al-Khattab,student,53,,,,Companion,
4,28,'Amr bin al-'Aas ( عمرو بن العاص بن وائل ( رضي...,Comp.(RA) [1st Generation],al-'As ibn Wa'il / Layla bint Harmalah,"Rayta bint Munabbih bin al-Hajjaj [417] , Umm ...",Hisham ibn al-'Aas [144],'Abdullah bin 'Amr bin al-'Aas [29],50 BH/573 CE (Makkah),"Makkah, Medinah, Egypt, Syria",~43 AH/664 CE or 51 AH (Egypt)[ Natural ],...,male,`Amr ibn al-`As,`Amr ibn al-`As,student,53,,,,Companion,
5,41,Abu Musa al-Asha'ari ( أبو موسى الأشعري ( رضي ...,Comp.(RA) [1st Generation],'Abdullah ibn Qays bin Saleem / Zabiya bint Wa...,"Umm Khultum bint Fadl ibn al-'Abbas [534] , Um...","Abu Burda al-Asha'ari [407] , Abu Rahm bin Qay...","Ibrahim bin Abi Musa al-Asha'ari [409] , Abu B...",(Yemen),"Yemen, Makkah, Medina, Basra, Kufa",~43 or 52 AH /662 or 672 CE (Makkah)[ Natural ],...,male,Abu Musa al-Ash`ari,Abu Musa al-Ash`ari,student,53,,,,Companion,
6,53,Aisha bint Abi Bakr ( أمّ المؤمنين عائشة بنت أ...,Comp.(RA) [1st Generation],Abu Bakr As-Siddique [2] / Umm Ruman [91],Prophet Muhammad(saw) [1],'Abdur Rahman bin Abi Bakr [107],,9 BH/614 CE (Makkah),"Makkah, Medina",57 AH/678 CE (17 Ramadan) (Medina)[ Natural ],...,female,`Aishah bint Abi Bakr,`Aishah bint Abi Bakr,,"1, 2, 3, 6, 9, 63, 961","70, 106, 13, 17, 18, 41, 28, 10535, 10511, 105...",,,Companion,"10504, 10567, 11455 were not originally listed..."
7,70,Asma' bint Abi Bakr ( أسماء بنت أبي بكر الصديق...,Comp.(RA) [1st Generation],Abu Bakr As-Siddique [2] / Qutaylah bint 'Abdu...,Zubayr ibn al-Awwam [7],'Abdullah bin Abi Bakr [110],"'Abdullah ibn al-Zubayr [106] , al-Mundhir bin...",27 BH (Makkah),"Makkah, Medina",73 AH/692 CE (Medinah)[ Natural ],...,female,Asma bint Abi Bakr,Asma bint Abi Bakr,student,53,,Sister of `Aishah bint Abi Bakr,,Companion,
8,106,'Abdullah ibn al-Zubayr ( عبد الله بن الزبير ب...,Comp.(RA) [1st Generation],Zubayr ibn al-Awwam [7] / Asma' bint Abi Bakr ...,,"al-Mundhir bin al-Zubayr [10510] , 'Urwa ibn a...","Bakr, Thabit bin 'Abdullah bin al-Zubair [1376...",1 AH/624 CE (Medina),"Medinah, Makkah",73 AH (Makkah)[ Martyred ],...,male,`Abdullah ibn al-Zubayr,`Abdullah ibn al-Zubayr ibn al-`Awwam,student,53,,Nephew of Aishah bint Abi Bakr,,Companion,
9,10511,'Urwa ibn al-Zubayr عروة بن الزبير,Follower(Tabi') [3rd Generation],Zubayr ibn al-Awwam [7] / Asma' bint Abi Bakr ...,"Umm Yahya bint al-Hakam b. Abi al-'Aas, Umm Wa...","'Abdullah ibn al-Zubayr [106] , al-Mundhir bin...","Hisham bin 'Urwa [11065] , 'Abdullah bin 'Urwa...",23 AH/643 CE (Medina),Medina,93 AH/713 CE (Medina)[ Natural ],...,male,`Urwah ibn al-Zubayr,`Urwah ibn al-Zubayr ibn al-`Awwam,student,"53, 11455","11013, 11065",Nephew of Aishah bint Abi Bakr,,Follower,https://isnad.io/hadith/177


In [73]:
# Clean the columns with the teacher/student indices


specified_teachers_corrected = clean_index_list('specified_teachers')
specified_students_corrected = clean_index_list('specified_students')

# students_inds_corrected = clean_index_list('students_inds')
# teachers_inds_corrected = clean_index_list('teachers_inds')

# remove old columns and 
del df['specified_teachers']
del df['specified_students']
#del df['students_inds']
#del df['teachers_inds']

# assign corrected columns to the dataset
#df = df.assign(students_inds=students_inds_corrected, teachers_inds=teachers_inds_corrected)
df = df.assign(specified_teachers=specified_teachers_corrected, specified_students=specified_students_corrected)

df = df.fillna('')
df

Unnamed: 0,scholar_indx,name,grade,parents,spouse,siblings,children,birth_date_place,places_of_stay,death_date_place,...,gender,simplename,fullname,relationship,info,hadiths,generation,notes,specified_teachers,specified_students
0,1,Prophet Muhammad(saw) ( محمّد صلّی اللہ علیہ و...,Rasool Allah,'Abdullah ibn 'Abd al-Muttalib [9991] / Amina ...,"Khadijah [51] , Sawda bint Zam'a [52] , 'Aisha...",,"al-Qasim bin Muhammad [516] , Zaynab bint Muha...",53 BH/570 CE (9th Rabi' awwal) (Makkah),"Makkah, Medina",11 AH/632 CE (12th Rabi' awwal (Medina)[ Natur...,...,male,Prophet Muhammad,"Prophet Muhammad, peace and blessings be upon him",teacher,,,,,,[53]
1,13,Abu Hurairah ( أبو هريرة - عبد الرحمن بن صخر ا...,Comp.(RA) [1st Generation],/ Umaima/Maymuna,Basra bint Ghazwan [3656],,Daughter,"12 BH/603 CE (Baha, Yemen)","Makkah, Medina, Yemen, Bahrain",59 AH/681 CE (Medinah)[ Natural ],...,male,Abu Hurayrah,Abu Hurayrah Abdur-Rahman ibn Sakhr,student,,,Companion,,[53],
2,17,ibn Abbas ( عبد الله بن العباس بن عبد المطلب ...,Comp.(RA) [1st Generation],al-'Abbas ibn 'Abd al-Muttalib [100] / Umm Fad...,Zar'ah bint Mishrah b. Madi-Karib,"Fadl ibn al-'Abbas [135] , 'Ubaydallah bin al-...","'Ali bin 'Abdullah bin 'Abbas [10949] , 'Abbas...",3 BH/619 CE (Makkah),"Makkah, Medina",68 AH/687 CE (Ta'if)[ Natural ],...,male,`Abdullah ibn `Abbas,`Abdullah ibn `Abbas,student,,,Companion,,[53],
3,18,ibn Umar ( عبد الله بن عمر بن الخطاب ( رضي الل...,Comp.(RA) [1st Generation],'Umar ibn al-Khattab [3] / Zaynab bint Maz'un ...,"Safiyya bint Abi 'Ubaid al-Thaqafi [11811] , U...","Hafsa bint Umar [54] , 'Abdur Rahman bin 'Umar...","Abu Bakr, Abu 'Ubaida, Waqid bin 'Abdullah ibn...",10 BH/613 CE (Makkah),"Makkah, Medina",74 AH/693 CE (Makkah)[ Natural ],...,male,`Abdullah ibn `Umar,`Abdullah ibn `Umar ibn al-Khattab,student,,,Companion,,[53],
4,28,'Amr bin al-'Aas ( عمرو بن العاص بن وائل ( رضي...,Comp.(RA) [1st Generation],al-'As ibn Wa'il / Layla bint Harmalah,"Rayta bint Munabbih bin al-Hajjaj [417] , Umm ...",Hisham ibn al-'Aas [144],'Abdullah bin 'Amr bin al-'Aas [29],50 BH/573 CE (Makkah),"Makkah, Medinah, Egypt, Syria",~43 AH/664 CE or 51 AH (Egypt)[ Natural ],...,male,`Amr ibn al-`As,`Amr ibn al-`As,student,,,Companion,,[53],
5,41,Abu Musa al-Asha'ari ( أبو موسى الأشعري ( رضي ...,Comp.(RA) [1st Generation],'Abdullah ibn Qays bin Saleem / Zabiya bint Wa...,"Umm Khultum bint Fadl ibn al-'Abbas [534] , Um...","Abu Burda al-Asha'ari [407] , Abu Rahm bin Qay...","Ibrahim bin Abi Musa al-Asha'ari [409] , Abu B...",(Yemen),"Yemen, Makkah, Medina, Basra, Kufa",~43 or 52 AH /662 or 672 CE (Makkah)[ Natural ],...,male,Abu Musa al-Ash`ari,Abu Musa al-Ash`ari,student,,,Companion,,[53],
6,53,Aisha bint Abi Bakr ( أمّ المؤمنين عائشة بنت أ...,Comp.(RA) [1st Generation],Abu Bakr As-Siddique [2] / Umm Ruman [91],Prophet Muhammad(saw) [1],'Abdur Rahman bin Abi Bakr [107],,9 BH/614 CE (Makkah),"Makkah, Medina",57 AH/678 CE (17 Ramadan) (Medina)[ Natural ],...,female,`Aishah bint Abi Bakr,`Aishah bint Abi Bakr,,,,Companion,"10504, 10567, 11455 were not originally listed...","[1, 2, 3, 6, 9, 63, 961]","[70, 106, 13, 17, 18, 41, 28, 10535, 10511, 10..."
7,70,Asma' bint Abi Bakr ( أسماء بنت أبي بكر الصديق...,Comp.(RA) [1st Generation],Abu Bakr As-Siddique [2] / Qutaylah bint 'Abdu...,Zubayr ibn al-Awwam [7],'Abdullah bin Abi Bakr [110],"'Abdullah ibn al-Zubayr [106] , al-Mundhir bin...",27 BH (Makkah),"Makkah, Medina",73 AH/692 CE (Medinah)[ Natural ],...,female,Asma bint Abi Bakr,Asma bint Abi Bakr,student,Sister of `Aishah bint Abi Bakr,,Companion,,[53],
8,106,'Abdullah ibn al-Zubayr ( عبد الله بن الزبير ب...,Comp.(RA) [1st Generation],Zubayr ibn al-Awwam [7] / Asma' bint Abi Bakr ...,,"al-Mundhir bin al-Zubayr [10510] , 'Urwa ibn a...","Bakr, Thabit bin 'Abdullah bin al-Zubair [1376...",1 AH/624 CE (Medina),"Medinah, Makkah",73 AH (Makkah)[ Martyred ],...,male,`Abdullah ibn al-Zubayr,`Abdullah ibn al-Zubayr ibn al-`Awwam,student,Nephew of Aishah bint Abi Bakr,,Companion,,[53],
9,10511,'Urwa ibn al-Zubayr عروة بن الزبير,Follower(Tabi') [3rd Generation],Zubayr ibn al-Awwam [7] / Asma' bint Abi Bakr ...,"Umm Yahya bint al-Hakam b. Abi al-'Aas, Umm Wa...","'Abdullah ibn al-Zubayr [106] , al-Mundhir bin...","Hisham bin 'Urwa [11065] , 'Abdullah bin 'Urwa...",23 AH/643 CE (Medina),Medina,93 AH/713 CE (Medina)[ Natural ],...,male,`Urwah ibn al-Zubayr,`Urwah ibn al-Zubayr ibn al-`Awwam,student,Nephew of Aishah bint Abi Bakr,,Follower,https://isnad.io/hadith/177,"[53, 11455]","[11013, 11065]"


### Make the graph

In [51]:
# Set up connection to GraphSpace

from graphspace_python.api.client import GraphSpace
graphspace = GraphSpace('USERNAME', 'PASSWORD')

In [64]:
# Create a variable and initialize it as a GraphSpace graph
narratorsgraph = GSGraph()

# set metadata for the graph
metadata = {
     'description': 'This is a graph of hadith narrators - work in progress',
     'directed': True
}
narratorsgraph.set_data(metadata)

# make a graph using 'students_inds' and 'teachers_inds' as the names of the columns we want to use for teacher/student info
narratorsgraph = makegraph(narratorsgraph, df, 'specified_teachers', 'specified_students')
# narratorsgraph = makegraph(narratorsgraph, df, 'teachers_inds', 'students_inds')

print('There are '+str(len(narratorsgraph.nodes))+' nodes and '+
      str(len(narratorsgraph.edges))+' edges in the original graph.')
narratorsgraph.nodes()
narratorsgraph.edges

There are 28 nodes and 34 edges in the original graph.


OutEdgeView([(1, 53), (1, 2), (1, 3), (1, 6), (1, 9), (1, 63), (1, 961), (53, 70), (53, 106), (53, 13), (53, 17), (53, 18), (53, 41), (53, 28), (53, 10535), (53, 10511), (53, 10520), (53, 10522), (53, 11002), (53, 10504), (53, 10567), (53, 11455), (53, 10737), (10511, 11013), (10511, 11065), (10535, 11555), (10535, 10530), (2, 53), (3, 53), (6, 53), (9, 53), (63, 53), (961, 53), (11455, 10511)])

In [65]:
graph = graphspace.post_graph(narratorsgraph)
graph.get_name()
graph.id

34136