## Notebook Description
This Jupyter Notebook opens a csv file that contains narrators and information on their students/teachers, cleans it up, turns it into a graph, and uploads that graph to GraphSpace. 

Useful references: 
- https://graphspace-python-library.readthedocs.io/en/develop/tutorial/tutorial.html 
- https://manual.graphspace.org/projects/graphspace-python/en/latest/reference

## Imports

In [1]:
from graphspace_python.graphs.classes.gsgraph import GSGraph
import plotly.express as px
import json
import pandas as pd

## Functions

### clean_index_list(column_name)
- **input**: column_name, a string. The name of the column that stores the strings of comma-separated digits (indices of the scholars - either students or teachers)
- **output**: list of numeric indices built from that column

basically: strings of lists of numbers ---> turns into ---> lists of numeric indices

In [2]:
def clean_index_list(column_name): 
    inds_corrected = []
    for indx, data in df.loc[:,[column_name]].iterrows():
        
        inds_original = data[0] # currently a string of numbers separated by commas
        
        # if it's null, append it to corrected list: null students = no students.
        if pd.isna(inds_original):
            inds_corrected.append(inds_original)

         # if it's a string, split by commas and turn the strings of digits into ints. 
        elif isinstance(inds_original, str):
            temp = []
            for item in inds_original.split(','):
                if item.strip().isdigit():
                    temp.append(int(item.strip()))
                else:
                    print("Non-numeric character found in what is supposed to be a string of comma-separated digits of teachers or students at id="+str(indx)+", value: "+item.strip())
            inds_corrected.append(temp)
        else:
            raise TypeError("index value at indx "+str(indx)+" is neither str nor NaN")
        
    return inds_corrected

### makegraph(G, df, teacherIDs, studentIDs):
- **G**: a GraphSpace graph 
- **df**: the dataframe that contains the nodes as row entries (scholars) and teacher/student IDs in their columns
- **teacherIDs**: string. The name of the column that has the indices of each node's teachers we want to include in the graph.
- **studentIDs**: string. The name of the column that has the indices of each node's students we want to include in the graph.

This function that takes a graph as input, adds nodes and edges to it, then outputs it. It allows you to specify which graph, dataframe, and column for teachers/students to include. Currently in my csv files, each row/scholar has multiple columns for teachers and students. There is the original teachers/students column that came from muslimscholars.info, and there are additional teachers/students columns that I made that may have more or less teachers/students for each respective scholar. So with this function I get to choose whether I want to specify which edges between each node and its teachers/students are shown, or whether I just want to keep it as the muslimscholars.info data had it. 

#### Examples: 
- **makegraph(G, df, 'students_inds', 'teachers_inds'):**
    - 'students_inds' and 'teachers_inds' are the names of the original columns with teacher/student IDs in them as taken from muslimscholars.info. So this will makes edges between nodes of the csv if they had any teacher/student relationship, regardless of whether it traces back to the node of interest (e.g.: Aishah) or not. So for example, if I wanted to make a graph just of Aishah and the hadiths she transmitted to her students and the hadiths her students transmitted from HER to their students, this graph would not work - because it would show edges between her teachers/students if they transmitted ANY hadiths to each other, regardless of whether the hadith was transmitted from Aishah or not.
- **makegraph(G, df, 'specified_teachers', 'specified_students')**
    - 'specified_teachers' and 'specified_students' are the names of the columns where I've specified which students/teachers to include for each scholar. For example, if our specified scholar is Aishah r.a., then the graph should have edges from her to all her students (the ones listed in the csv) and from her students to her students' students. It is essentially a subgraph of interrelationships() in that the interrelationships() should include all the edges in onescholar(), but not necessarily the other way around. interrelationships() might show connections between students and students' students whether the hadiths they transmitted to each other were narrated originally from Aishah r.a. or someone else, whereas onescholar() should only show the edges that trace back to Aishah r.a.

In [3]:
def makegraph(G, df, teachersIDs, studentIDs):
    # add nodes to G
    for indx, data in df.iterrows():
        # The node's id is its scholar_indx, NOT its row number as in previous versions.
        #G.add_node(int(df['scholar_indx'][indx]),name=data['name'], label=data['name'],gender=data['gender'])
        #G.add_node(int(df['scholar_indx'][indx]),label=data['simplename'], fullname=data['fullname'],gender=data['gender'], info=data['info'], generation=data['generation'])
        G.add_node(int(df['id'][indx]),label=data['displayname'], fullname=data['fullname'],searchname=data['searchname'],gender=data['gender'], info=data['info'], generation=data['generation'])
        
    # keep track of all the scholar indices who have nodes in the graph/entries in the data
    # this is because I only want to include nodes of scholars who have their own entries in the data set, and not necessarily any scholar that may be listed as a student/teacher of another.
    scholars_with_entries = []
    for indx, scholar_indx in df.loc[:,['id']].iterrows():
        scholars_with_entries += [int(scholar_indx)]


    # add edges from students_inds
    for teacher, students in df[studentIDs].items():
        if isinstance(pd.isna(students), bool):
            pass
        else:
            for student in students:
                # check to make sure the student/teacher each have their own entries in the data
                # note: once I make it so that the teacher/student lists are only for nodes that exist in the data, I might not need this except for as a sanity check
                if not ((student in scholars_with_entries) & (int(df['id'][teacher]) in scholars_with_entries)):
                    pass
                elif G.has_edge(int(df['id'][teacher]), student):
                    pass
                else:
                    G.add_edge(int(df['id'][teacher]), student, directed=True)

    # add edges from teachers_inds 
    for student, teachers in df[teachersIDs].items():
        if isinstance(pd.isna(teachers), bool):
            pass
        else:
            for teacher in teachers:
                # check to make sure the student/teacher each have their own entries in the data
                if not ((int(df['id'][student]) in scholars_with_entries) & (teacher in scholars_with_entries)):
                    pass
                elif G.has_edge(teacher, int(df['id'][student])):
                    pass
                else:
                    G.add_edge(teacher, int(df['id'][student]), directed=True)
    
    return G

## Making the graph

### Clean the data

In [4]:
# Read the csv as a dataframe
#df = pd.read_csv('sourcedata/narratorsTESTING.csv')
#df = pd.read_csv('sourcedata/aishah.csv', encoding='utf-8')
df = pd.read_csv('../../hadith-narrators/aishah_53.csv', encoding='utf-8')

In [5]:
df

Unnamed: 0,id,displayname,fullname,searchname,gender,generation,teachers,students,specifiedteachers,specifiedstudents,info,hadiths,notes
0,1,Prophet Muḥammad,"Prophet Muḥammad, peace and blessings be upon him",Prophet Muhammad,male,,,,,53,,,
1,13,Abu Hurayrah,Abu Hurayrah Abdur-Rahman ibn Sakhr,Abu Hurayrah Abdur-Rahman ibn Sakhr,male,,,,53,,,,
2,17,ʿAbdullāh ibn ʿAbbās,ʿAbdullāh ibn ʿAbbās,Abdullah ibn Abbas,male,,,,53,,,,
3,18,ʿAbdullāh ibn ʿUmar,ʿAbdullāh ibn ʿUmar ibn al-Khaṭṭāb,Abdullah ibn Umar ibn al-Khattab,male,,,,53,,,,
4,28,ʿAmr ibn al-ʿĀṣ,ʿAmr ibn al-ʿĀṣ,Amr ibn al-As,male,,,,53,,,,
5,41,Abu Mūsā al-Ashʿari,Abu Mūsā al-Ashʿari,Abu Musa al-Ashari,male,,,,53,,,,
6,53,ʿAʾishah bint Abi Bakr,ʿAʾishah bint Abi Bakr,Aishah bint Abi Bakr,female,,,,"1, 2, 3, 6, 9, 63, 961","70, 106, 13, 17, 18, 41, 28, 10535, 10511, 105...",,,"10504, 10567, 11455 were not originally listed..."
7,70,Asmāʾ bint Abi Bakr,Asmāʾ bint Abi Bakr,Asma bint Abi Bakr,female,,,,53,,Sister of ʿAʾishah bint Abi Bakr,,
8,106,ʿAbdullāh ibn al-Zubayr,ʿAbdullāh ibn al-Zubayr ibn al-ʿAwwām,Abdullah ibn al-Zubayr ibn al-Awwam,male,,,,53,,Nephew of ʿAʾishah bint Abi Bakr,,
9,10511,ʿUrwah ibn al-Zubayr,ʿUrwah ibn al-Zubayr ibn al-ʿAwwām,Urwah ibn al-Zubayr ibn al-Awwam,male,,,,"53, 11455","11013, 11065",Nephew of ʿAʾishah bint Abi Bakr,,https://isnad.io/hadith/177


In [6]:
# Clean the columns with the teacher/student indices


specified_teachers_corrected = clean_index_list('specifiedteachers')
specified_students_corrected = clean_index_list('specifiedstudents')

# students_inds_corrected = clean_index_list('students_inds')
# teachers_inds_corrected = clean_index_list('teachers_inds')

# remove old columns and 
del df['specifiedteachers']
del df['specifiedstudents']
#del df['students_inds']
#del df['teachers_inds']

# assign corrected columns to the dataset
#df = df.assign(students_inds=students_inds_corrected, teachers_inds=teachers_inds_corrected)
df = df.assign(specified_teachers=specified_teachers_corrected, specified_students=specified_students_corrected)

df = df.fillna('')
df

Unnamed: 0,id,displayname,fullname,searchname,gender,generation,teachers,students,info,hadiths,notes,specified_teachers,specified_students
0,1,Prophet Muḥammad,"Prophet Muḥammad, peace and blessings be upon him",Prophet Muhammad,male,,,,,,,,[53]
1,13,Abu Hurayrah,Abu Hurayrah Abdur-Rahman ibn Sakhr,Abu Hurayrah Abdur-Rahman ibn Sakhr,male,,,,,,,[53],
2,17,ʿAbdullāh ibn ʿAbbās,ʿAbdullāh ibn ʿAbbās,Abdullah ibn Abbas,male,,,,,,,[53],
3,18,ʿAbdullāh ibn ʿUmar,ʿAbdullāh ibn ʿUmar ibn al-Khaṭṭāb,Abdullah ibn Umar ibn al-Khattab,male,,,,,,,[53],
4,28,ʿAmr ibn al-ʿĀṣ,ʿAmr ibn al-ʿĀṣ,Amr ibn al-As,male,,,,,,,[53],
5,41,Abu Mūsā al-Ashʿari,Abu Mūsā al-Ashʿari,Abu Musa al-Ashari,male,,,,,,,[53],
6,53,ʿAʾishah bint Abi Bakr,ʿAʾishah bint Abi Bakr,Aishah bint Abi Bakr,female,,,,,,"10504, 10567, 11455 were not originally listed...","[1, 2, 3, 6, 9, 63, 961]","[70, 106, 13, 17, 18, 41, 28, 10535, 10511, 10..."
7,70,Asmāʾ bint Abi Bakr,Asmāʾ bint Abi Bakr,Asma bint Abi Bakr,female,,,,Sister of ʿAʾishah bint Abi Bakr,,,[53],
8,106,ʿAbdullāh ibn al-Zubayr,ʿAbdullāh ibn al-Zubayr ibn al-ʿAwwām,Abdullah ibn al-Zubayr ibn al-Awwam,male,,,,Nephew of ʿAʾishah bint Abi Bakr,,,[53],
9,10511,ʿUrwah ibn al-Zubayr,ʿUrwah ibn al-Zubayr ibn al-ʿAwwām,Urwah ibn al-Zubayr ibn al-Awwam,male,,,,Nephew of ʿAʾishah bint Abi Bakr,,https://isnad.io/hadith/177,"[53, 11455]","[11013, 11065]"


### Make the graph

In [7]:
# Set up connection to GraphSpace

from graphspace_python.api.client import GraphSpace
graphspace = GraphSpace('USERNAME', 'PASSWORD')

In [8]:
# Create a variable and initialize it as a GraphSpace graph
narratorsgraph = GSGraph()

# set metadata for the graph
metadata = {
     'description': 'This is a graph of hadith narrators - work in progress',
     'directed': True
}
narratorsgraph.set_data(metadata)

# make a graph using 'students_inds' and 'teachers_inds' as the names of the columns we want to use for teacher/student info
narratorsgraph = makegraph(narratorsgraph, df, 'specified_teachers', 'specified_students')
# narratorsgraph = makegraph(narratorsgraph, df, 'teachers_inds', 'students_inds')

print('There are '+str(len(narratorsgraph.nodes))+' nodes and '+
      str(len(narratorsgraph.edges))+' edges in the original graph.')
narratorsgraph.nodes()
narratorsgraph.edges

There are 29 nodes and 35 edges in the original graph.


OutEdgeView([(1, 53), (1, 2), (1, 3), (1, 6), (1, 9), (1, 63), (1, 961), (53, 70), (53, 106), (53, 13), (53, 17), (53, 18), (53, 41), (53, 28), (53, 10535), (53, 10511), (53, 10520), (53, 10522), (53, 11002), (53, 10504), (53, 10567), (53, 11455), (53, 10737), (10511, 11013), (10511, 11065), (10535, 11555), (10535, 10530), (2, 53), (3, 53), (6, 53), (9, 53), (63, 53), (961, 53), (11455, 10511), (10737, 11016)])

In [9]:
graph = graphspace.post_graph(narratorsgraph)
graph.get_name()
graph.id

34181