# Overview
This Python notebook gives a few basic insights into your genealogy by answering a few very high-level questions about the data you import.

## Getting Started
To use this notebook, input a file path to your gedcom file in the first cell.


In [3]:
from parser import parse_file
import pandas as pd

# Path to your `.ged` file
file_path = input() # TODO: Paste a path to your gedcom file here
pgraph = parse_file(file_path)

print("Data imported")

df = pd.DataFrame([p.df() for p in pgraph.values()], columns=[
    'name',
    'birth_date',
    'birth_place',
    'death_date',
    'death_place',
    'sex',
    'child_count',
    'years_lived',
    'mother',
    'father'])

print("Pandas dataframe created")


Parsing data...
All data parsed
Data imported
Pandas dataframe created


## High-Level Metrics
This section will give high-level metrics about the people in your database.

In [4]:
print("How many people in this database?")
print(len(df.index))

print("How many men in this database?")
print(df[df['sex']=='M'].shape[0])

print("How many women in this database?")
print(df[df['sex']=='F'].shape[0])

print("How many non-specified sex in this database?")
print(df[~df['sex'].isin(['F','M'])].shape[0])

How many people in this database?
4070
How many men in this database?
2092
How many women in this database?
1930
How many non-specified sex in this database?
48


## Places
This section will give metrics on places in connection to people

In [5]:
print("Where are people born?")
print(df[~df['birth_place'].str.endswith('alogy')]['birth_place'].value_counts().head(20))

Where are people born?
                                           3318
the Ulu-Hema Genealogy (Big Island)          62
the Ulu-Hema Genealogy (Maui)                40
Honolulu, Hawaii                             27
Honolulu, Hi                                 27
Lahaina, Maui                                24
the Ulu Genealogy estimated timeline         17
Hamakua District, Hawaii, Hawaii             16
Hawaii                                       16
Niulii, Haw, Hi                              14
Honolulu, Oahu                               13
Niu Hawaii                                   11
Waipio Valley, Haumakua, Hawaii, Hawaii      11
Waipio Valley, Hamakua, Hawaii                9
Kailua, Kona, Hawaii, Hawaii                  8
Lahaina, Maui, Hawaii                         8
Pololu, Kohala, Hawaii, Hawaii                8
Hilo, Hawaii                                  7
Kalopa, Hawaii                                7
Paauilo, Hamakua, Hawaii, Hawaii              6
Name: birth_place

In [6]:
print("Where are people dying?")
print(df[~df['death_place'].str.endswith('alogy')]['death_place'].value_counts().head(20))


Where are people dying?
                                           3850
Honolulu, Hawaii                             31
Honolulu, Oahu                               22
Lahaina, Maui                                11
Honolulu, Hi                                 11
Waipio Valley, Hamakua, Hawaii                5
Hilo, Hawaii                                  5
Honolulu, Honolulu, Hawaii                    4
Niu Hawaii                                    4
Kailua, Kona, Hawaii Island                   3
Honolulu, Oahu,                               3
Honolulu, Oahu, Hawaii                        3
Waipio Valley, Haumakua, Hawaii, Hawaii       2
Kukihaela, Hamakua, Hawaii                    2
Queen's Hospital, Honolulu, Oahu              2
Honolulu                                      2
Paauilo, Hawaii                               2
Honokaa, Hawaii                               2
Lapahoehoe, Hawaii, Hawaii                    2
Maui                                          2
Name: death_plac

## Names
This section gives you insight into some of the names that exist in your genealogy.

In [7]:
# What are the most common names?
name_df = df['name'].str.split().str.get(0)
mask = ~name_df.eq('LIVING')
print("What are the most common names?")
print(name_df[mask].value_counts().head(20))

What are the most common names?
John         42
William      31
Elizabeth    21
Mary         21
George       20
Charles      19
Joseph       13
Maria        13
Shaw         13
Sarah        12
Henry        12
David        12
Annie        11
Robert       11
James        11
Samuel       10
Edward       10
Isaac        10
Albert        8
Alexander     7
Name: name, dtype: int64


In [8]:
print("What are the least common names?")
print(name_df[mask].value_counts().tail(20))

What are the least common names?
Kumalae               1
Moku-a-Hualeiakea     1
Akahiilikapu          1
Haua                  1
Kahakumakaliua        1
Kawaihalaniwailuau    1
Kapohelemai           1
Makuakaumanamana      1
Kanakeawe             1
Kapukamola            1
Laieaku               1
Keakalaulani          1
Ho'opiliahae          1
Umiokalani            1
Hoolaaikaiwi          1
Kukukalani-o-Pae      1
Hakaukalalapuakea     1
Iliilikikuahine       1
Pupuakea              1
Kaakau                1
Name: name, dtype: int64


## Health
This section gives you insight into the life expectancy of your ancestors and family sizes.

In [9]:
print("What is the average life expectancy?")
print(df['years_lived'].mean())

print("Who lived the longest?")
df.sort_values(by=['years_lived'],ascending=False).head(20)

# TODO: How many people born/died in the same place?

# TODO: Of those who moved, what were the common migration patterns?

# TODO: How did migration patterns change over time?

# TODO: Count of people by century


What is the average life expectancy?
54.646408839779006
Who lived the longest?


Unnamed: 0,name,birth_date,birth_place,death_date,death_place,sex,child_count,years_lived,mother,father
2146,Ruth Cecilia Mitchell,31 Aug 1891,"Kola, Kauai",19 Feb 1993,,M,0,102.0,Ruth Aulani Mahoe,Louis Murie Mitchell
857,Meli Kahiwa Swinton,24 Jun 1823,"Kalae, Moloka’i",21 Apr 1925,"Lahaina, Maui",F,14,102.0,Kaumeaha'ulewaliekahakawai,Harry Swinton
3762,Kamakahei (Kamakihei),ABT 1760,"Hamakua District, Hawaii, Hawaii",1860,,F,5,100.0,,
2647,Naili Pohakukahi,17 Mar 1819,"Palaueka, Holualoa, North Kona",12 Nov 1919,,M,0,100.0,"Kaupekamoku (Kaupekamoku II, Kaupe-ka-moku, Ka...",Pohakukahi
1122,Moses Kahue Kupukaa (Paakahili),6 Feb 1830,"Waiapuke, Kohala, Hawaii, HI",10 Nov 1928,"Kalaupapa, Molokai, HI",M,13,98.0,Hiiakaikapoliopele,Paakahili
258,Abbie Bannister,17 Sep 1866,,1 Jun 1962,"Honolulu, Hi",F,3,96.0,Victoria Kaailama Adams,Andrew Bannister
301,Louisa Kekahilinaniopauahi Adams,26 Nov 1888,"Heeia, Oahu, Hi",19 Jan 1983,"St. Francis Hospital, Oahu, Hi",F,0,95.0,Anna Kalili Akona,Isaac Kapulealii Loakealii Jr Adams
452,Abigail Kawahinepoaimoku Kaaeae,9 Apr 1875,"Kapulena, Hamakua District, Hawaii",15 Sep 1968,"Paauilo, Hamakua, Hawaii, Hawaii",F,0,93.0,Nawahinelua Mailou,Timothy Paaluhi Kaaeae (Kaaeoe)
1939,Charles Reed Bishop,25 Jan 1822,"Glen Falls, Warren, New York, USA",7 Jun 1915,"Berkeley, California, USA",M,0,93.0,,
1219,Mukoi (Makaoi),ABT 1806,"Keokea, Hawaii, Hawaii",ABT 1898,,M,8,92.0,,


In [10]:
print("How many children on average did people have?")
print(df['child_count'].mean())

print("Who had the most children??")
df.sort_values(by=['child_count'],ascending=False).head(20)

How many children on average did people have?
1.2891891891891891
Who had the most children??


Unnamed: 0,name,birth_date,birth_place,death_date,death_place,sex,child_count,years_lived,mother,father
488,Kekaulike Kalani-kui-hono-i-ka-moku,,,,,M,18,,LIVING,Kaulahea II (Kaulaheanuiokamoku II)
909,Alexander Pollard Hussey,4 Apr 1825,"Nantucket, Massachusetts",10 May 1896,"Kohala, Hawaii",M,16,71.0,Lydia Emmet Pollard,Joseph Starbuck Hussey
576,Ellarene Kapapaihaleonaalii (Papaihaleonaalii)...,12 Nov 1881,"Waiohinu, Ka'u, Hawaii",28 Feb 1946,"Honolulu, Hawaii",F,16,65.0,Jeanette Ka'umekekoi Keawepooole,Wahinekona Kaawa (Moi Kaawa)
227,Alexander Adams,27 Dec 1780,"Abroath, Angus, Scotland",27 Oct 1871,Niu Hawaii,M,15,91.0,Jean Adams,John Fyfe
492,Kahekili (II) (Kahekilinuiahumanu III),1737,,Jul 1794,"Ulukou, Waikiki, Oahu",M,15,57.0,"Kekuiapoiwa (Kekuaipoiwa I, Kekuiapoiwa Nui , ...",Kekaulike Kalani-kui-hono-i-ka-moku
655,Keawenui-a-Umi,ABT 1648,the Ulu-Hema Genealogy,,,M,15,,"Kapulani-Nui (Kapukini I, Kapulani-o-Liloa)",Umi (Umi-a-Liloa I)
940,Kaai-Kaula-Kalei-Kau-Welaha-Makanoe Naweluokek...,ABT 1838,"Niulii, Kohala, Hawaii",29 Apr 1916,"Niulii, Kohala, Hawaii",F,14,78.0,Naaiokauahi Opalaau,Naweluokekikipaa
856,Abel Keliionuuanu (Keliimakekauonuuanu) Makekau,6 Oct 1819,"Kailua, Kona, Hawaii, Hawaii",16 Oct 1907,"Kapulena, Hamakua, Hawaii",M,14,88.0,"Kumiaakea (Kunuiakea, Kekuhemahemaanaaialii)",Naohuleolua
857,Meli Kahiwa Swinton,24 Jun 1823,"Kalae, Moloka’i",21 Apr 1925,"Lahaina, Maui",F,14,102.0,Kaumeaha'ulewaliekahakawai,Harry Swinton
1123,Sarah Kalai Keawehawaii,14 Jun 1857,"Pololu, Kohala, Hawaii, Hawaii",9 Oct 1932,"Makanikahio, Kohala, Hawaii, Hawaii",F,13,75.0,Apualaulu Kai-a,Keawehawaii
