# P7 Chapter 3 HDDT Using SQLite database standard 'views' #
## Index of views, dataframe info and rendering corrections ## 

## Thesis Chapter 6.19.6 ###

 jnb_ceda_database_views 

### Generic code block used to set up every notebook ###

In [1]:
# First we call up the python packages we need to perform the analysis:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from operator import itemgetter
import networkx as nx
from networkx.algorithms import community #This part of networkx, for community detection, needs to be imported separately.
import nbconvert
import csv

# to add an image <img src="xxxx.png">

# to jump to another paragraph <a id='another_cell'></a>

# to Insert a hyperlink [https://github.com/KelvinBeerJones](https://github.com/KelvinBeerJones)

# Convert float64 to INT64;   table['column'] = table['column'].fillna(0).astype(np.int64)


### Generic code block used to make gexf file from dataframes ###

In [2]:
with open('vw_3_bipartite_names.csv', 'r') as nodecsv: # Open the file
    nodereader = csv.reader(nodecsv) # Read the csv
    nodes = [n for n in nodereader][1:]  # Retrieve the data
    #using Python list comprhension and list slicing to remove the header row.
    
node_names = [n[0] for n in nodes] # Get a list of only the node names    

with open('vw_3_bipartite_nodes.csv', 'r') as edgecsv: # Open the file
    edgereader = csv.reader(edgecsv) # Read the csv
    edges = [tuple(e) for e in edgereader][1:]  # Retrieve the data

In [3]:
#nodes

In [4]:
#edges

In [5]:
print(len(node_names))
print(len(edges))

3094
514


In [6]:
G = nx.Graph()
G.add_nodes_from(node_names)
G.add_edges_from(edges)
print(nx.info(G))

Name: 
Type: Graph
Number of nodes: 3869
Number of edges: 514
Average degree:   0.2657


In [7]:
nx.write_gexf(G, 'project_name')

# 3.1 Introduction and explanation #

## 3.1.1 Introduction ##

In the HDDT methodology Jupyter Notebooks (JNB) are used to visualise dataframes, each of which is generated from a SQLite 'view'. JNB is used to generate charts and graphs using Pyplot and Seaborn libraries and also to generate GexF files for Gephi.

SQLite database views have been built to comprehensively 'map' the structure of the database as shown in the Entity Relationship Diagram below. Views capture:

1. Individual Name data tables - such as persons, occupations, locations, clubs and societies. Note: religion here is solely an attribute of persons (because the HDDT currently only captures Quakers). Religion would become a meaningful data table if other religious affiliations were also captured. Data tables form 'Name' tables (and Names are also known as Nodes depending on which technology is open - SQLite, JNB or Gephi). 
2. Tuples tables that show many to many relationships. These are person Name(s) and their relationship to other Name(s) (occupation, location, club and society) and they are made of pairs of nodes, which combined are called a  tuple in the form of 'person Name (is asociated with) other Name'. Persons here are also known as 'Source' and the associatey d Name(s) as 'Target'. 
3. Both Name and Tuple tables can have attributes attached. In Gephi attributes attached to records in a Names table will allow filtering based on Nodes whereas attributes attached to individual records in Tuples tables will allow filtering of edges based on attributes. A GexF file can contain attributes for both Names and Tuples.  


Note - First letter capitalisation in the HDDT must be followed. The Gephi dictionary requires Name, Source and Target to be in this form.

### The process: ###

1. Devise a set of comprehensive database views (Using DBeaver).
1. Export the views as csv files to the container 'jnd_ceda_database_sql_views'. 
1. This container is also is a GitHub repo to enable version control. 
1. This Jupyter Notebook is located here (all resources necessary for a JNB must be in the same container). 
1. Use JNB to make a dataframe for each csv file and display the first 10 records.
1. Check the dataframe info to ensure that no tables have columns rendered as float64. ( Apply the method: table['column'] = table['column'].fillna(0).astype(np.int64) to covert float64 to int64 if necessary.
2. Make subsets of dataframes to slice the data. (Use SQLIte database select queries to make INNER and LEFT joins between tables)





## 3.1.2  Explanation ##

This workbook resides in a GitHub container facilitating version control. Each time this workbook is amended a record is made in the corresponding GitHub repo.

Gephi requires a Names file (generated by Networkx), and this comprises of all Nodes irrespective of which side of an EDGE they will later be attached to. In the Edges file in Gephi the Names now become Nodes paired as Tuples (Source and Target) and Gephi infers an Edge between them. Person names and bipartite group names are variously called - Names, Nodes, Source and Target depending where they are being used.

Names files in Gephi. Names can have attributes and these can be used to style Nodes.

Tuples in Gephi format = First column must be "Source", second column must be "Target", additional columns are 'attributes' of each tuple. (Note: attributes will be used in Gephi to style edges and not nodes.)

All dataframes consist of data types OBJECT and INT64 only. CSV sheets occasionally render columns that render as FLOAT64. Where this occurs a fix is applied immeditely after the pd.read_csv command.

# 3.2 To make a new project #

1. Make a new project repo in Github.
2. Clone the new project repo to a new container in the HDDT workspace.
2. Create a JNB notebook in the new container workspace.
3. Copy selected csv files from this container to the new container.
4. In the new JNB use routines to validate selected data.- 
	1. pd.read_csv, 
	2. df.iloc [0:10]
	3. df.info () 
4. Use this routine to correct float64 columns. df['column'] = df['column'].fillna(0).astype(np.int64)
5. Slice data as needed and make a new df of the subset data.
6. Use pandas to make graphs of the data.
7. If wanted use the routine df.to_csv ('vw_hddt_newdataframe.csv') to put a csv of the subset of a dataframe in the new workspace container.
7. If wanted make a GefX file (in the new workspace container) to use Gephi for data visualisation.
8. Don't forget to use VSC to update Github for version control if changes are made to the csv files in this container or this jnb for version control and to make latest version available to all users.

### Links to sections in this workbook ###

| Table |
|---|
|[Person Table](#person_table)|
|[Person Attributes](#person_attributes)|
|[Person names and other nodes combined](#all_names_and_nodes)|
|[Other Nodes](#bigraph_nodes)|
|[Bigraph aa tuples](#bigraph_all_tuples)|
|[Religion tuples](#religion_tuples)|
|[Location tuples](#location_tuples)|
|[Occupation tuples](#occupation_tuples)|
|[Society tuples](#society_tuples)|
|[Club tuples](#club_tuples)|
|[CEDA Name attributes](#ceda_name_attributes)|
|[CEDA tuples](#ceda_tuples)|
|[CEDA tuples attributes](#ceda_tuples_attributes)|
|[Quakers](#Quakers)|
|[Quaker immediate](#quaker_immediate)|
|[Quaker close](#quaker_close)|
|[Quaker distant](#quaker_distant)|
|[Quaker CEDA tuples](#quaker_ceda_tuples)|

# 3.3 Entity Relationship Diagram #

<img src="ERD (3).png">

# 3.4 Person table  (3094 records) #

### Dataframe ###

<a id='person_table'></a>

In [8]:
person_table = pd.read_csv ('vw_hddt_person_table.csv')
person_table ['gender_id'] = person_table ['gender_id'].fillna(0).astype(np.int64)
person_table ['birth_year'] = person_table ['birth_year'].fillna(0).astype(np.int64)
person_table['death_year'] = person_table['death_year'].fillna(0).astype(np.int64)

In [9]:
person_table.iloc [0:10]

Unnamed: 0,Name,title,gender_id,birth_year,death_year,data_source_id,notes
0,Arthur William A Beckett,,1,1844,1909,1,"17 King Street, S. James's, S.W. 88 St James's..."
1,Andrew Mercer Adam,,1,0,0,1,"Boston, Lincolnshire"
2,H R Adam,,1,0,0,1,"Old Calabar, W. Africa"
3,William Adam,,1,0,0,1,
4,Henry John Adams,,1,0,0,1,"14 Thornhill Square, N."
5,William (1) Adams,,1,0,0,1,
6,William (2) Adams,,1,1820,1900,1,5 Henrietta Street Cavendish Square [1862]7 L...
7,William Adlam,,1,0,0,1,"9 Brook Street, Bath [1863]Manor House, Chew ...."
8,Louis Agassiz,,1,1807,1873,1,Cambridge Mass
9,Anastasius Agathides,,1,1805,1881,1,"28 Kildare Terr. Westbourne Park, W [A3]1861A..."


In [10]:
person_table.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3094 entries, 0 to 3093
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Name            3094 non-null   object
 1   title           776 non-null    object
 2   gender_id       3094 non-null   int64 
 3   birth_year      3094 non-null   int64 
 4   death_year      3094 non-null   int64 
 5   data_source_id  3094 non-null   int64 
 6   notes           1770 non-null   object
dtypes: int64(4), object(3)
memory usage: 169.3+ KB


# 3.5 Person Names (3094 records) #

### Datatable ###

<a id='person_name'></a>

In [11]:
person_name = pd.read_csv ('vw_hddt_person_name.csv')

In [12]:
person_name.iloc [0:10]

Unnamed: 0,Name
0,Arthur William A Beckett
1,Andrew Mercer Adam
2,H R Adam
3,William Adam
4,Henry John Adams
5,William (1) Adams
6,William (2) Adams
7,William Adlam
8,Louis Agassiz
9,Anastasius Agathides


In [13]:
person_name.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3094 entries, 0 to 3093
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3094 non-null   object
dtypes: object(1)
memory usage: 24.3+ KB


# 3.6 Persons with attributes (Names file) (3094 records) #

### Datatable ###

<a id='person_attributes'></a>

In [14]:
person_attributes = pd.read_csv ('vw_hddt_person_attributes_religion.csv')
person_attributes ['religion_1_quaker'] = person_attributes ['religion_1_quaker'].fillna(0).astype(np.int64)
person_attributes ['birth_year'] = person_attributes ['birth_year'].fillna(0).astype(np.int64)
person_attributes['death_year'] = person_attributes['death_year'].fillna(0).astype(np.int64)

In [15]:
person_attributes.iloc [0:10]

Unnamed: 0,Name,birth_year,death_year,religion_1_quaker
0,Arthur William A Beckett,1844,1909,0
1,Andrew Mercer Adam,0,0,0
2,H R Adam,0,0,0
3,William Adam,0,0,0
4,Henry John Adams,0,0,0
5,William (1) Adams,0,0,0
6,William (2) Adams,1820,1900,0
7,William Adlam,0,0,0
8,Louis Agassiz,1807,1873,0
9,Anastasius Agathides,1805,1881,0


In [16]:
person_attributes.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3094 entries, 0 to 3093
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Name               3094 non-null   object
 1   birth_year         3094 non-null   int64 
 2   death_year         3094 non-null   int64 
 3   religion_1_quaker  3094 non-null   int64 
dtypes: int64(3), object(1)
memory usage: 96.8+ KB


# 3.7 All Names (Nodes) (3608 records) #

### Dataframe ###

<a id='all_names_and_nodes'></a>

All person names (3095) and bipartite nodes (514) in Gephi 'Names' format. Can be used with a 'tuples' file to generate a GexF file for Gephi where all possible nodes would appear on the visualisation, including nodes with no associated tuple.

In [17]:
all_names_and_nodes = pd.read_csv('vw_hddt_all_names_and_nodes.csv')

In [18]:
all_names_and_nodes.loc [0:10]

Unnamed: 0,Name
0,Joseph Storrs
1,A Mackintosh Shaw
2,A de Fullner
3,"A , jun Ramsay"
4,A A Stewart
5,A Ambrose
6,A B Stark
7,A B Wright
8,A Bell
9,A C Brebner


In [19]:
all_names_and_nodes.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3608 entries, 0 to 3607
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3608 non-null   object
dtypes: object(1)
memory usage: 28.3+ KB


# 3.8 All bipartite Names (514 records) #

### Dataframe ###

Bigraph nodes in Gephi Name format. This is a subset of 'all_names_and_nodes'

<a id='bigraph_nodes'></a>

In [20]:
bigraph_nodes = pd.read_csv ('vw_hddt_bigraph_nodes.csv')

In [21]:
bigraph_nodes.iloc [0:10]

Unnamed: 0,Name
0,AI
1,APS
2,ASL
3,Aberdeen Horticultural Society
4,Academia Quirurgia of Madrid
5,Academie Hongroise de Pest
6,Academy of Anatolia
7,Academy of Medicine and Surgery of Madrid and ...
8,Academy of Natural Sciences Philadelphia
9,Academy of Natural Sciences of Spain


In [22]:
bigraph_nodes.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 514 entries, 0 to 513
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    514 non-null    object
dtypes: object(1)
memory usage: 4.1+ KB


# 3.9 All Names (Nodes) as Tuples  (9989 records) #

### dataframe ###

All tuples from the HDDT in Gephi format. (There are 9991 edges between the 3095 persons and the 514 Bipartite nodes.)

<a id='bigraph_all_tuples'></a>

In [23]:
bigraph_all_tuples = pd.read_csv ('vw_hddt_all_bigraph_tuples.csv')

In [24]:
bigraph_all_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,Joseph Storrs,QCA
1,Joseph Storrs,Quaker
2,A Mackintosh Shaw,ASL
3,A Mackintosh Shaw,country
4,A de Fullner,AI
5,"A , jun Ramsay",AI
6,"A , jun Ramsay",ASL
7,"A , jun Ramsay",Geological Society
8,"A , jun Ramsay",London
9,A A Stewart,ASL


In [25]:
bigraph_all_tuples.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9989 entries, 0 to 9988
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  9989 non-null   object
 1   Target  9989 non-null   object
dtypes: object(2)
memory usage: 156.2+ KB


# 3.10 Religion tuples (592 records) #

### Dataframe ###

Quakers

<a id='religion_tuples'></a>

In [26]:
religion_tuples = pd.read_csv ('vw_hddt_religion_tuples.csv')

In [27]:
religion_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,William Spicer Wood,Quaker
1,William Wilson,Quaker
2,James Wilson,Quaker
3,E T Wakefield,Quaker
4,John Ross,Quaker
5,J Robinson,Quaker
6,William Horton Lloyd,Quaker
7,Joseph Lister,Quaker
8,Jonathan Hutchinson,Quaker
9,William Holmes,Quaker


In [28]:
religion_tuples.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  592 non-null    object
 1   Target  592 non-null    object
dtypes: object(2)
memory usage: 9.4+ KB


# 3.11 Location tuples (2061 records) #

### Dataframe ###

Location (UK but not London)

<a id='location_tuples'></a>

In [29]:
location_tuples = pd.read_csv ('vw_hddt_location_tuples.csv')

In [30]:
location_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,Arthur William A Beckett,London
1,Andrew Mercer Adam,country
2,H R Adam,Africa
3,Henry John Adams,London
4,William (2) Adams,London
5,William Adlam,country
6,Louis Agassiz,America
7,Anastasius Agathides,London
8,Joseph Agnew,Scotland
9,William Francis Harrison Ainsworth,London


In [31]:
location_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2061 entries, 0 to 2060
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  2061 non-null   object
 1   Target  2061 non-null   object
dtypes: object(2)
memory usage: 32.3+ KB


# 3.12 Occupation tuples (1883 records) #

### Dataframe ###

Occupations

<a id='occupation_tuples'></a>

In [32]:
occupation_tuples = pd.read_csv ('vw_hddt_occupation_tuples.csv')

In [33]:
occupation_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,Arthur William A Beckett,literary
1,Andrew Mercer Adam,medical
2,Andrew Mercer Adam,armed services
3,William Adam,political
4,William (2) Adams,medical
5,Louis Agassiz,academic
6,Louis Agassiz,biologist
7,Louis Agassiz,geologist
8,Anastasius Agathides,academic
9,Augustine Aglio,artist


In [34]:
occupation_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1883 entries, 0 to 1882
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  1883 non-null   object
 1   Target  1883 non-null   object
dtypes: object(2)
memory usage: 29.5+ KB


# 3.13 Society tuples (1238 records) #

### Dataframe ###

Society memberships

<a id='society_tuples'></a>

In [35]:
society_tuples = pd.read_csv ('vw_hddt_society_tuples.csv')

In [36]:
society_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,William (2) Adams,Royal College of Surgeons
1,William (2) Adams,Pathological Society of London
2,William (2) Adams,Medical Society of London
3,William (2) Adams,Medical and Chirurgical Society of London
4,William Adlam,Somersetshire Archaeological and Natural Histo...
5,William Francis Harrison Ainsworth,Royal Geographical Society
6,William Francis Harrison Ainsworth,Society of Antiquaries
7,William Francis Harrison Ainsworth,Syro Egyptian Society
8,William Francis Harrison Ainsworth,Geological Society
9,William Baird Airston,Royal College of Surgeons


In [37]:
society_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1238 entries, 0 to 1237
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  1238 non-null   object
 1   Target  1238 non-null   object
dtypes: object(2)
memory usage: 19.5+ KB


# 3.14 Club tuples (323 records) #

### Dataframe ###

Club memberships

<a id='club_tuples'></a>

In [38]:
club_tuples = pd.read_csv ('vw_hddt_club_tuples.csv')

In [39]:
club_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,William (1) Adams,Athenaeum Club
1,Rutherford Alcock,Athenaeum Club
2,William Amhurst Tyssen Amhurst,Athenaeum Club
3,William Amhurst Tyssen Amhurst,Marlborough Club
4,William Amhurst Tyssen Amhurst,Carlton Club
5,William Arbuthnot,Oriental Club
6,Richard Edward Arden,National Club
7,Richard Edward Arden,Junior Athenaeum Club
8,William Armstrong,Athenaeum Club
9,William Henry Ashurst,Reform Club


In [40]:
club_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 323 entries, 0 to 322
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  323 non-null    object
 1   Target  323 non-null    object
dtypes: object(2)
memory usage: 5.2+ KB


# 3.15 CEDA tuples (3892 records) #

### Dataframe ###

Tuples in Gephi format to graph the memberships of CEDA

<a id='ceda_tuples'></a>

In [41]:
ceda_tuples = pd.read_csv('vw_hddt_ceda_tuples.csv')

In [42]:
ceda_tuples.iloc [0:10]

Unnamed: 0,Source,Target
0,William Adam,ESL
1,William (1) Adams,ESL
2,William (2) Adams,ESL
3,Louis Agassiz,ESL
4,Augustine Aglio,ESL
5,William Francis Harrison Ainsworth,ESL
6,Alexander Muirhead Aitken,ESL
7,Rutherford Alcock,ESL
8,William Aldam,ESL
9,William Allen,ESL


In [43]:
ceda_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3892 entries, 0 to 3891
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Source  3892 non-null   object
 1   Target  3892 non-null   object
dtypes: object(2)
memory usage: 60.9+ KB


# 3.16 Quaker Committee on the Aborigines (QCA) #

In [44]:
qca = ceda_tuples[ceda_tuples["Target"] == "QCA"]

In [45]:
qca.iloc [0:10]

Unnamed: 0,Source,Target
2732,Thomas (1) Hodgkin,QCA
2998,James Bowden,QCA
2999,William Nash,QCA
3000,Joseph Sturge,QCA
3001,William Jun Grimshaw,QCA
3003,Henry Knight,QCA
3004,Edward Paull,QCA
3005,Robert Jun Alsop,QCA
3006,Abram Rawlinson Barclay,QCA
3007,John Barclay,QCA


In [46]:
qca.to_csv ('vw_hddt_ceda_qca.csv')

# 3.17 Aborigines Protection Society (APS) #

In [47]:
aps = ceda_tuples[ceda_tuples["Target"] == "APS"]

In [48]:
aps.iloc [0:10]

Unnamed: 0,Source,Target
2692,William Aldam,APS
2693,Samuel C Baker,APS
2694,James Bell,APS
2695,John Bell (2),APS
2696,John Brown,APS
2697,Henry Christy,APS
2698,Thomas junior Christy,APS
2699,William Clay,APS
2700,Richard King,APS
2701,John James Sturz,APS


In [49]:
aps.to_csv ('vw_hddt_ceda_aps.csv')

# 3.18 Ethnological Society of London (ESL) #

In [50]:
esl = ceda_tuples[ceda_tuples["Target"] == "ESL"]

In [51]:
esl.iloc [0:10]

Unnamed: 0,Source,Target
0,William Adam,ESL
1,William (1) Adams,ESL
2,William (2) Adams,ESL
3,Louis Agassiz,ESL
4,Augustine Aglio,ESL
5,William Francis Harrison Ainsworth,ESL
6,Alexander Muirhead Aitken,ESL
7,Rutherford Alcock,ESL
8,William Aldam,ESL
9,William Allen,ESL


In [52]:
esl.to_csv ('vw_hddt_ceda_esl.csv')

# 3.19 Anthropological Society of London (ASL) #

In [53]:
asl = ceda_tuples[ceda_tuples["Target"] == "ESL"]

In [54]:
asl.iloc [0:10]

Unnamed: 0,Source,Target
0,William Adam,ESL
1,William (1) Adams,ESL
2,William (2) Adams,ESL
3,Louis Agassiz,ESL
4,Augustine Aglio,ESL
5,William Francis Harrison Ainsworth,ESL
6,Alexander Muirhead Aitken,ESL
7,Rutherford Alcock,ESL
8,William Aldam,ESL
9,William Allen,ESL


In [55]:
asl.to_csv ('vw_hddt_ceda_asl.csv')

# 3.20 Anthropological Institute (AI) ##

In [56]:
ai = ceda_tuples[ceda_tuples["Target"] == "ESL"]

In [57]:
ai.iloc [0:10]

Unnamed: 0,Source,Target
0,William Adam,ESL
1,William (1) Adams,ESL
2,William (2) Adams,ESL
3,Louis Agassiz,ESL
4,Augustine Aglio,ESL
5,William Francis Harrison Ainsworth,ESL
6,Alexander Muirhead Aitken,ESL
7,Rutherford Alcock,ESL
8,William Aldam,ESL
9,William Allen,ESL


In [58]:
ai.to_csv ('vw_hddt_ceda_ai.csv')

# 3.21CEDA Name with attributes (3892 records)  #

### Dataframe ###

Datatable of all people and their memberships of CEDA (some people are in more than one). Attaches attributes to Nodes in Gephi. (Note: records = greater that 3095 persons due to multiple memberships). Gephi will disregard duplicate Names (but not tuples)

<a id='ceda_name_attributes'></a>

In [59]:
ceda_name_attributes = pd.read_csv ('vw_hddt_ceda_name_attributes.csv')
ceda_name_attributes ['quaker'] = ceda_name_attributes ['quaker'].fillna(0).astype(np.int64)

In [60]:
ceda_name_attributes.iloc [0:10]

Unnamed: 0,Name,quaker,first_year,last_year,birth_year,death_year
0,William Adam,0,1844,1844,,
1,William (1) Adams,0,1844,1844,,
2,William (2) Adams,0,1858,1871,1820.0,1900.0
3,Louis Agassiz,0,1860,1871,1807.0,1873.0
4,Augustine Aglio,0,1843,1845,1777.0,1857.0
5,William Francis Harrison Ainsworth,0,1856,1860,1807.0,1896.0
6,Alexander Muirhead Aitken,0,1864,1871,,
7,Rutherford Alcock,0,1862,1871,1809.0,1897.0
8,William Aldam,1,1844,1848,1813.0,1890.0
9,William Allen,0,1858,1858,,


In [61]:
ceda_name_attributes.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3892 entries, 0 to 3891
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        3892 non-null   object
 1   quaker      3892 non-null   int64 
 2   first_year  3892 non-null   int64 
 3   last_year   3892 non-null   int64 
 4   birth_year  1528 non-null   object
 5   death_year  1639 non-null   object
dtypes: int64(3), object(3)
memory usage: 182.6+ KB


# 3.22 CEDA tuples with attributes (3892 records) #

### Dataframe ###

CEDA tuples with attrubutes attaches attributes to edges in Gephi.

<a id='ceda_tuples_attributes'></a>

In [62]:
ceda_tuples_attributes = pd.read_csv ('vw_hddt_ceda_tuples_attributes.csv')

In [63]:
ceda_tuples_attributes.iloc [0:10]

Unnamed: 0,Source,Target,first_year,last_year,birth_year,death_year
0,William Adam,ESL,1844,1844,,
1,William (1) Adams,ESL,1844,1844,,
2,William (2) Adams,ESL,1858,1871,1820.0,1900.0
3,Louis Agassiz,ESL,1860,1871,1807.0,1873.0
4,Augustine Aglio,ESL,1843,1845,1777.0,1857.0
5,William Francis Harrison Ainsworth,ESL,1856,1860,1807.0,1896.0
6,Alexander Muirhead Aitken,ESL,1864,1871,,
7,Rutherford Alcock,ESL,1862,1871,1809.0,1897.0
8,William Aldam,ESL,1844,1848,1813.0,1890.0
9,William Allen,ESL,1858,1858,,


In [64]:
ceda_tuples_attributes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3892 entries, 0 to 3891
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Source      3892 non-null   object
 1   Target      3892 non-null   object
 2   first_year  3892 non-null   int64 
 3   last_year   3892 non-null   int64 
 4   birth_year  1528 non-null   object
 5   death_year  1639 non-null   object
dtypes: int64(2), object(4)
memory usage: 182.6+ KB


# 3.23 Quakers (592 records) #

### Dataframe ###

<a id='Quakers'></a>

In [65]:
quakers = pd.read_csv ('vw_hddt_quakers.csv')
quakers ['birth_year'] = quakers ['birth_year'].fillna(0).astype(np.int64)
quakers ['death_year'] = quakers ['death_year'].fillna(0).astype(np.int64)

In [66]:
quakers.iloc [0:10]

Unnamed: 0,Name,birth_year,death_year
0,William Aldam,1813,1890
1,S Stafford Allen,1840,1870
2,Edward Backhouse,1808,1879
3,James (1) Backhouse,1794,1869
4,James Bell,1818,1872
5,Antonio Brady,1811,1881
6,William Bull,1828,1902
7,Charles Buxton,1823,1871
8,Henry Christy,1810,1865
9,William Clay,1791,1869


In [67]:
quakers.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Name        592 non-null    object
 1   birth_year  592 non-null    int64 
 2   death_year  592 non-null    int64 
dtypes: int64(2), object(1)
memory usage: 14.0+ KB


# 3.24 Quaker family relationships, (2086 records) #

### Dataframe ###

<a id='person_person'></a>

In [68]:
person_relationships = pd.read_csv ('vw_hddt_person1_person2.csv')

In [69]:
person_relationships.iloc [0:10]

Unnamed: 0,Source,Target,relationship_type_id
0,William Aldam,x Fox,1
1,William Jun Aldam,x Fox,1
2,Frederick Alexander,R D Alexander,1
3,G W Alexander,R D Alexander,1
4,Henry Alexander,R D Alexander,1
5,R D Alexander,John M Candler,1
6,Thomas Allis,James Jun Backhouse,1
7,Thomas Allis,Francis Brown,1
8,Thomas Allis,Septimus Warner,1
9,R Arthington,J Gurney Barclay,1


In [70]:
person_relationships.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2086 entries, 0 to 2085
Data columns (total 3 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Source                2086 non-null   object
 1   Target                2086 non-null   object
 2   relationship_type_id  2086 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 49.0+ KB


# 3.25 Quaker immediate relationships (246 records) #

### Datatable ###

<a id='quaker_immediate'></a>

In [98]:
quaker_immediate_relationships = pd.read_csv ('vw_hddt_person_person_immediate.csv')

In [101]:
quaker_immediate_relationships.iloc [0:10]

Unnamed: 0,'Source',Target,immediate
0,Source,John M Albright,3
1,Source,Rachel Albright,3
2,Source,William Albright,3
3,Source,John M Albright,3
4,Source,William Albright,3
5,Source,John M Albright,3
6,Source,William Jun Aldam,3
7,Source,G W Alexander,3
8,Source,Henry Alexander,3
9,Source,Henry Alexander,3


In [102]:
quaker_immediate_relationships.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   'Source'   246 non-null    object
 1   Target     246 non-null    object
 2   immediate  246 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 5.9+ KB


# 3.26 Quakers close relationships (519 records) #

### Datatable ###

<a id='quaker_close'></a>

In [103]:
quaker_close_relationships = pd.read_csv ('vw_hddt_person_person_close.csv')

In [104]:
quaker_close_relationships.iloc [0:10]

Unnamed: 0,'Source',Target,close
0,Source,Christopher Bowley,2
1,Source,Robert Charleton,2
2,Source,Frederick H Fox,2
3,Source,Thomas Maw,2
4,Source,William Norton,2
5,Source,Algernon Peckover,2
6,Source,Cornelius Hanbury,2
7,Source,Edward Beck,2
8,Source,Cornelius Hanbury,2
9,Source,Martha Lucas,2


In [105]:
quaker_close_relationships.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 519 entries, 0 to 518
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   'Source'  519 non-null    object
 1   Target    519 non-null    object
 2   close     519 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 12.3+ KB


# 3.27 Quaker distant relationships (1321 records) #

### Datatable ###

<a id='quaker_distant'></a>

In [110]:
quaker_distant_relationships = pd.read_csv ('vw_hddt_person_person_distant.csv')

In [109]:
quaker_distant_relationships.iloc [0:10]

Unnamed: 0,'Source',Target,distant
0,Source,x Fox,1
1,Source,x Fox,1
2,Source,R D Alexander,1
3,Source,R D Alexander,1
4,Source,R D Alexander,1
5,Source,John M Candler,1
6,Source,James Jun Backhouse,1
7,Source,Francis Brown,1
8,Source,Septimus Warner,1
9,Source,J Gurney Barclay,1


In [108]:
quaker_distant_relationships.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1321 entries, 0 to 1320
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   'Source'  1321 non-null   object
 1   Target    1321 non-null   object
 2   distant   1321 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 31.1+ KB


# 3.28 Quaker CEDA membership (tuples) (643 records) #

### Dataframe ###

<a id='quaker_ceda_tuples'></a>

In [80]:
quakers_ceda_tuples = pd.read_csv ('vw_hddt_quakers_ceda_tuples.csv')

In [81]:
quakers_ceda_tuples.iloc [0:10]

Unnamed: 0,Source,Target,religion_name,first_year,last_year
0,William Spicer Wood,APS,Quaker,1864,1867
1,William Spicer Wood,ASL,Quaker,1863,1871
2,William Spicer Wood,AI,Quaker,1863,1871
3,William Wilson,APS,Quaker,1838,1865
4,William Wilson,ASL,Quaker,1865,1866
5,James Wilson,APS,Quaker,1862,1867
6,James Wilson,ASL,Quaker,1865,1865
7,E T Wakefield,APS,Quaker,1853,1864
8,E T Wakefield,ASL,Quaker,1865,1868
9,John Ross,APS,Quaker,1839,1852


In [82]:
quakers_ceda_tuples.info ()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 643 entries, 0 to 642
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Source         643 non-null    object
 1   Target         643 non-null    object
 2   religion_name  643 non-null    object
 3   first_year     643 non-null    int64 
 4   last_year      643 non-null    int64 
dtypes: int64(2), object(3)
memory usage: 25.2+ KB


# 3.29 Quakers in the QCA #

In [83]:
quakers_qca = quakers_ceda_tuples[quakers_ceda_tuples["Target"] == "QCA"]

In [84]:
quakers_qca

Unnamed: 0,Source,Target,religion_name,first_year,last_year
21,Thomas (1) Hodgkin,QCA,Quaker,1839,1847
287,James Bowden,QCA,Quaker,1842,1847
288,William Nash,QCA,Quaker,1842,1847
289,Joseph Sturge,QCA,Quaker,1842,1847
290,William Jun Grimshaw,QCA,Quaker,1840,1847
291,Henry Knight,QCA,Quaker,1840,1847
293,Edward Paull,QCA,Quaker,1840,1847
294,Robert Jun Alsop,QCA,Quaker,1837,1847
295,Abram Rawlinson Barclay,QCA,Quaker,1837,1839
296,John Barclay,QCA,Quaker,1837,1839


In [85]:
quakers_qca.to_csv ('vw_hddt_ceda_quaker_qca.csv')

# 3.30 Quakers in the APS #

In [86]:
quakers_aps = quakers_ceda_tuples[quakers_ceda_tuples["Target"] == "APS"]

In [87]:
quakers_aps.iloc [0:10] 

Unnamed: 0,Source,Target,religion_name,first_year,last_year
0,William Spicer Wood,APS,Quaker,1864,1867
3,William Wilson,APS,Quaker,1838,1865
5,James Wilson,APS,Quaker,1862,1867
7,E T Wakefield,APS,Quaker,1853,1864
9,John Ross,APS,Quaker,1839,1852
10,J Robinson,APS,Quaker,1839,1840
12,William Horton Lloyd,APS,Quaker,1862,1862
14,Joseph Lister,APS,Quaker,1851,1855
16,Jonathan Hutchinson,APS,Quaker,1857,1866
19,William Holmes,APS,Quaker,1840,1867


In [88]:
quakers_aps.to_csv ('vw_hddt_ceda_quaker_aps.aps')

# 3.31 Quakers in the ESL #

In [89]:
quakers_esl = quakers_ceda_tuples[quakers_ceda_tuples["Target"] == "ESL"]

In [90]:
quakers_esl

Unnamed: 0,Source,Target,religion_name,first_year,last_year
13,William Horton Lloyd,ESL,Quaker,1844,1847
15,Joseph Lister,ESL,Quaker,1844,1847
23,Thomas (1) Hodgkin,ESL,Quaker,1844,1862
25,John Henry Gurney,ESL,Quaker,1860,1867
29,Charles Henry Fox,ESL,Quaker,1861,1871
32,William Fowler,ESL,Quaker,1851,1851
34,Robert Nicholas Fowler,ESL,Quaker,1851,1871
39,David Dale,ESL,Quaker,1860,1863
44,x Collier,ESL,Quaker,1844,1844
46,William Clay,ESL,Quaker,1861,1868


In [91]:
quakers_esl.to_csv ('vw_hddt_ceda_quaker_esl.csv')

# 3.32 Quakers in the ASL #

In [92]:
quakers_asl = quakers_ceda_tuples[quakers_ceda_tuples["Target"] == "ASL"]

In [93]:
quakers_asl

Unnamed: 0,Source,Target,religion_name,first_year,last_year
1,William Spicer Wood,ASL,Quaker,1863,1871
4,William Wilson,ASL,Quaker,1865,1866
6,James Wilson,ASL,Quaker,1865,1865
8,E T Wakefield,ASL,Quaker,1865,1868
11,J Robinson,ASL,Quaker,1865,1865
17,Jonathan Hutchinson,ASL,Quaker,1863,1871
20,William Holmes,ASL,Quaker,1865,1869
27,George Stacey Gibson,ASL,Quaker,1864,1866
37,James T J Doyle,ASL,Quaker,1865,1868
41,Henry Crowley,ASL,Quaker,1864,1871


In [94]:
quakers_asl.to_csv ('vw_hddt_ceda_quaker_asl.csv')

# 3.33 Quakers in the AI #

In [95]:
quakers_ai = quakers_ceda_tuples[quakers_ceda_tuples["Target"] == "AI"]

In [96]:
quakers_ai

Unnamed: 0,Source,Target,religion_name,first_year,last_year
2,William Spicer Wood,AI,Quaker,1863,1871
18,Jonathan Hutchinson,AI,Quaker,1863,1871
30,Charles Henry Fox,AI,Quaker,1861,1871
35,Robert Nicholas Fowler,AI,Quaker,1851,1871
42,Henry Crowley,AI,Quaker,1864,1871
53,William Bull,AI,Quaker,1867,1871
56,Antonio Brady,AI,Quaker,1864,1871
63,Edward Backhouse,AI,Quaker,1870,1871


In [97]:
quakers_ai.to_csv ('vw_hddt_ceda_quaker_ai.csv')