## Assignment 3 (Group)

For this task, you must first place the `data.txt` file from MyLearn in your Jupyter environment; this must be placed in the same directory as this notebook.

With the following lines of programme code, a class `Person` is created. A person is described by a name, its gender, and a list in which the children of this person are stored (again as Person objects).

In [2]:
class Person:
    def __init__(self, name, sex):
        self.name = name
        self.sex = sex
        self.children = []

Here, a series of people and their relationship (`parentOf`) are read from the file `data.txt`. These are stored in the dictionary `familyTrees`:

$ familyTrees := \{jan \rightarrow \{name \rightarrow jan,  sex \rightarrow male, children \rightarrow [...],  ...\} $


In [3]:
familyTrees = {}
names = []

# In a first iteration, we add all people
with open('data.txt') as f:
    for row in f:
        if row.startswith('female'):
            name = row[row.index('(')+1:row.index(')')]
            familyTrees[name] = Person(name, 'female')
        elif row.startswith('male'):
            name = row[row.index('(')+1:row.index(')')]
            familyTrees[name] = Person(name, 'male')
            

# In another iteration, we add all children 
with open('data.txt') as f:
    for row in f:
        if row.startswith('parentOf'):
            line = row[row.index('(')+1:row.index(')')]
            names = line.split(',')
            familyTrees[names[0]].children.append(familyTrees[names[1]])

In [4]:
# Check the number of recorded people
assert len(familyTrees) == 2000

In [5]:
# access a single person in familyTrees via the name as key
p1 = familyTrees['jan']

print("Name: ", p1.name)
print("Sex: ", p1.sex)
print("Children:")
for p in p1.children:
    print(p.name)

Name:  jan
Sex:  male
Children:
marko
anna15


## Task 1

* Calculate the maximum and average number of children in this list of people and the male/female ratio.
* Complete the function `describe`: This function returns a list containing three `float` entries. 
    - Entry 0 contains the maximum number of children across all people in familyTrees.
    - Entry 1 contains the average number of children over all people in familyTrees.
    - Entry 3 contains the male/female ratio of all entries (i.e. num. of male divided by num. of female).

In [6]:
def describe(persons):
    total_children = 0
    max_children = 0
    male_count = 0
    female_count = 0
    
    for person in persons.values():
        num_children = len(person.children)
        total_children += num_children
        
        if num_children > max_children:
            max_children = num_children
        
        if person.sex == 'male':
            male_count += 1
        elif person.sex == 'female':
            female_count += 1
    
        average_children = total_children / len(persons)  
        male_female_ratio = male_count / female_count  
        
    return [max_children, average_children, male_female_ratio]

description = describe(familyTrees)
print(f"Maximum number of children: {description[0]}")
print(f"Average number of children: {description[1]}")
print(f"Male/Female ratio: {description[2]}")



Maximum number of children: 5
Average number of children: 1.354
Male/Female ratio: 1.034587995930824


In [108]:
# Do not edit this cell.

## Task 2

* What is the total number of occurrences of people with a given first name (e.g.: `"stefan"`) in the data?
* Complete the function `count` below: populate a dictionary with the unique first name as the key and the number of occurrences in `familyTrees` as the value.
* Note that the first names have numerical suffices (e.g. `"stefan3"`) which you should ignore in the count (`"stefan3"` counts as an occurrence of `"stefan"`).

In [7]:
def count(persons):
    name_counts = {}

    for person in persons.values():
        name = person.name.rstrip('1234567890') 

        if name in name_counts:
            name_counts[name] += 1
        else:
            name_counts[name] = 1
    
    return name_counts

In [8]:
assert count(familyTrees)['stefan'] == 20


## Task 3

* How many people are there without ancestors (*root nodes*)?
  - Complete `findRoots`: The function returns a list of all people who have no ancestors (*person objects*).
* Which family tree is the largest one?
  - Complete `findLargestFamily`: The function returns the name (*string*) of the person who spans the largest family tree, i.e. has the most ancestors.

In [9]:
def findRoots(persons):
    all_people = set(persons.keys())
    children = set()
    for person in persons.values():
        for child in person.children:
            children.add(child.name)
    
    roots = all_people - children
    return [persons[name] for name in roots]

In [112]:
# DO NOT ALTER OR DELETE THIS CELL! 

In [10]:
def findLargestFamily(persons):
    def count_descendants(person):
        count = 0
        stack = [person]
        while stack:
            current_person = stack.pop()
            count += 1
            stack.extend(current_person.children)
        return count

    largest_family = None
    max_descendants = 0
    
    for person in persons.values():
        num_descendants = count_descendants(person)
        if num_descendants > max_descendants:
            max_descendants = num_descendants
            largest_family = person.name
    
    return largest_family

In [11]:
findLargestFamily(familyTrees)

'selina'

In [12]:
# DO NOT ALTER OR DELETE THIS CELL! 

## Task 4

* Determine the longest generation path.
* Complete `findSinglePath` and `findLongestPath`.
* `findSinglePath` should output the number of descendant generations of a person. 
  * If the person has no descendants, 1 is returned.
  * If the person has descendants, > 1 is returned, i.e. increased by 1 for each generation. (i.e.: no children: 1; children: 2; grandchildren: 3 etc.)
* `findLongestPath` returns the length of the longest generation path. 

In [20]:
def findSinglePath(p):
    if(p.children == []):
        return 1
    return 1 + max(findSinglePath(child) for child in p.children)

def findLongestPath(persons):
    return max(findSinglePath(person) for person in persons.values())

In [21]:
# DO NOT ALTER OR DELETE THIS CELL!
assert findSinglePath(p1) == 2

In [22]:
print(findSinglePath(familyTrees['selina/']))
print(findLongestPath(familyTrees))

10
10
