# End-of-Workshop Project
<br>

# Project 3: Family Trees
---
<br>

##### Programming Workshop for Scientists in Africa <<a href="https://pwsafrica.org">www.pwsafrica.org</a>>. 
##### Supported by the School of Computing Science, University of Glasgow.
##### Funded by the Global Challenges Research Fund and Scottish Funding Council.
<br>

##### PWSA2021. Python 3.x
---

<div class="alert alert-danger">
    
**This project is only suitable for teams that took part in track 2 in both weeks 1 and 2.** If you took part in track 1 in week 2, you should choose from either projects 1 or 2.</div>

## Background

A family tree can be used to systematically describe the relationships in a family. Each person is represented by a small box, containing their name and possibly other information, like their date of birth. Each child is connected to their two parents (in most cases, their mother and father). Other connections can show marriages.

Usually only part of the family tree is shown, due to missing information. It would be impossible to show someone's whole family tree!

Here is part of the Simpsons' family tree:

<img src="images/simpsons.png" width=80%/>

## Input files

The `data` directory contains a data file named `family_tree.txt`. This file contains a list of important events in a fictious town. The format of the file is a chronological log, where each line is either a birth or death.

Lines describing a birth are formatted like this:
```
<year>,<id>,born,<name>,<mother_id>,<father_id>
```

Lines describing a death are formatted like this:
```
<year>,<id>,died
```

The town decided to assign individual IDs to all people in order to avoid confusion relating to multiple people with the same name.

When information is not known (e.g., the `id` of the mother or father for a birth), this is recorded as `---` (three hyphens).

## Your task

Your task is to write a program that allows for the management of family tree data. In particular, your program should:
- read in a family tree from an input data file, formatted as described above;
- store the family tree in an appropriate data structure;
- allow for the family tree to be queried; and
- display the family tree.

Your program should implement the basic management features first, allowing users to read in a family tree and display it, before adding queries. This notebook contains

You are expected to perform extensive error checking for your program, and to package your program as a set of classes, using object-oriented techniques. If you aren't yet comfortable with object-oriented programming, then you can use the procedural style.

Finally, you should try to make use of functional programming techniques in your solution, using, for example, `map`, `filter`, and `reduce` where these make sense.

## Your solution

You should tackle this problem in three broad stages:
- first, you need to write a program plan, and have this checked by one of the volunteers;
- next, you need to implement the basic structure of the program, following your plan;
- and finally, you can start to add various queries for your family tree data structure.

We have provided three sets of query tasks -- simple, advanced, and complex -- below. You don't need to implement all of these: use the different categories as a guide to how difficult they might be to implement.

You should consider which queries you might want to implement before thinking about your program plan: this will let you think about the data structures you need.

### Program plan

In the cell below, please write out your program plan. The following should be considered when thinking about your plan:
- describe any complex data structures you think you'll need, and explain why they are suitable;
- break the problem down into sub-problems: explain what each sub-problem is, and how they are organised within the wider problem;
- are there are any parts of the problem that are particularly tricky? If so, explain those in detail;
- what are the possible errors that might occur? If you're going to handle them, how? Use examples as needed.

In [None]:
# write your program plan here

<div class="alert alert-danger">
    
Before proceeding, you need to have your program plan checked by a volunteer, via Gathertown. If you're having trouble finding a volunteer, reach out via Slack on the #pwsa2021 channel.
    
**It is important to have your plan checked: it'll save you time and effort later on.**    
</div>

### Implementation

Enter your basic family tree management program here. It should:
- read in the data file;
- store the family tree in a suitable data structure; and
- display the family tree.

You'll implement the separate queries in their own cells below.

In [1]:
# write your solution here

## Queries

Your implementation should first allow the user to read in the family tree from a data file, and display the family tree that is read in. Once you've done this, consider adding support for the queries that we've described here. We've loosely divided these queries into three sets, based on their difficulty.

### Simple

#### Census

Write a function, `census`, that takes a year, and returns a list of people alive in that year.

In [2]:
# write your solution here

#### Roll

Write a function, `roll`, that takes a year, and returns a string of the people alive in that year, along with their ages. The list should be ordered by last name.

For example, this function would return:
```
Yatzil McGrue, age 34
Imhotep Jones, age 69
Ad-Habip Smith, age 19
...
```
`roll` should make use of `census`.

In [3]:
# write your solution here

#### Finding by name

Write a function, `find_by_name`, that takes a name, and finds all of the people with the given year. The function should take an optional parameter, `year`, that if specified, means that the function returns only those people that were alive in that year. The function should return a list of people, ordered by birth date, with the earliest first.

In [4]:
# write your solution here

What is the worst-case time complexity of your `find_by_name` function, in terms $n$, where $n$ is the number of people in the family tree?

Could you use a different data structure to achieve $O(1)$ performance for this function? If not, why not? (_Hint: what if a number of people have the same name?_)

In [5]:
# time complexity analysis here

#### Memorial

Write a function, `memorial`, that takes a year, and returns a string listing everyone that has died prior to the year specified. The listing should be ordered by date of death, with the oldest date first). The string should be in the following format:

```
Yatzil McGrue died 1678, age 74
Tzecan Barston died 1679, age 56
...
```

In [6]:
# write your solution here

### Advanced

#### Parents

Write a function, `parents`, that returns the parents of a given person. If your solution uses object-oriented programming, this should be a method, and return `Person` objects, rather than IDs. Return `None` if the data is not known.

In [7]:
# write your solution here

#### Siblings

Write a function, `siblings`, that returns a list of the siblings of a given person, or `None` if the person is an only child. Remember to include step-siblings, who might, for example, have the same mother but a different father.

In [8]:
# write your solution here

#### Children

Write a function, `children`, that returns a list of children of a given person.

In [9]:
# write your solution here

#### Time complexity

Once you have implemented the `parents`, `siblings`, and `children` functions, try to determine the worst-case time complexity of each, in terms of $n$, where $n$ is the number of people in the family tree.

In [10]:
# time complexity analysis here

### Complex

For the purposes of this set of queries, we define the following terms:
- an **ancestor** is a parent, grandparent, great-grandparent, great-great-grandparent, and so on, of a given person;
- a **descendent** is a child, grandchild, great-grandchild, great-great-grandchild, and so on, of a given person;
- a **nearest common ancestor** of two people is the closest ancestor that two people share. For example, the nearest common ancestor of a niece and uncle is the niece's grandparent. The nearest common ancestor of two siblings is their parent.

Notice that, from these definitions, "ancestor" and "descendent" can be defined recursively. For example, we ancestor of person $x$ is either their parent, or an ancestor of their parent.

From these definitions, implement the following queries:
- `is_ancestor` that takes two people, `a` and `b`, and returns `True` if person `a` is an ancestor of person `b`

In [11]:
# write your solution here

- `is_descendent` that takes two people, `a` and `b`, and returns `True` if person `a` is a descent of person `b`

In [12]:
# write your solution here

- `nearest_common_ancestor` that takes two people, `a` and `b` and returns the nearest common ancestor of `a` and `b`. There may be more than one nearest common ancestor (e.g., for siblings, both parents are nearest common ancestors). If so, return only one.

In [13]:
# write your solution here

- `related` that takes two people, and returns `True` if `a` and `b` have a known common ancestor.

In [14]:
# write your solution here

#### Time complexity

Once you have implemented the methods `is_ancestor`, `is_descendent`, `nearest_common_ancestor`, and `related`, try to calculate the worst-case time complexity of each, in terms of $n$, where $n$ is the number of people in the family tree.

_Hint: calculating the time complexity of `nearest_common_ancestor` might be a challenge. To think about the worst-case running time of your code, think about the worst possible structure of the family trees that might connect persons `a` and `b`. For example, what if the two people only have one parent, who only has one parent, and so on? Which structure of families would make your code take the longest time?_

In [15]:
# time complexity analysis here

### Additional tasks

If you are looking for more challenges, here are some ideas to get you started. Feel free to extend your program in other ways: discuss all of the possibilities with your volunteer.

#### Statistical information

Write some functions to calculate statistical information about the family tree. For example:
- who is the person have ever have lived the longest?
- who is the person with the most children?
- what is the average (mean) lifespan of people in the town?

In [16]:
# write your solution here

#### Visualisation

Printing out the entire family tree will probably be messy. Write a method that, given a person in the family tree, draws a diagram of only their children and grandchildren. You can use `print` to do this.

In [17]:
# write your solution here

#### Interactive interface

Write an interactive, command-line interface, that lets users query to the people in the family tree. Commands will correspond to the methods you have already written. This could look something like this:

  ```
  Family Tree Software v1.0
  Enter command, then press enter.
  
  > CENSUS 1900
  ...
  
  > FINDBYNAME Debby Boon
     u296995 Debby Boon born 1650 died 1694,
     u340359 Debby Boon born 1832 died 1835,
     u340583 Debby Boon born 1833 died 1835,
     u353724 Debby Boon born 1894 died 1898
    
  > CHILDREN u296995
  ...
  
  > PARENTS u296995
  ...
  
  > IS_ANCESTOR u297081 u297081
  TRUE
  ```

In [1]:
# write your solution here