# CDCS Summer School
## Additional Examples: Collections, List Comprehensions and Loops

Once upon a time there was an ancient civilisation...the land of the penguins. The land spanned far across the world and was vast and icy with majestic glaciers, snow-capped mountains and regal penguins. Sadly over decades the land was lost but many led expeditions to recover the ancient texts of the land of the penguins. Many explorers navigated through icy terrain, examining ancient artifacts and climibing the icy cliffs. Finally one day they found a grand temple which hid the treasures they sought. The great explorer Christopher Pingu discovered the ancient texts inside and with deep concentration examined them and wrote down information in the icy caves...

Even to the modern day, the texts are still examined and taught about in all aspects of penguin school. In particular within the computing classes in every school the data is used to teach penguins around the world how to brudge the past and present through data analysis and coding. One day a portal opened up between the penguin and human world and through it, only one thing was passed to the human world...the documents of the ancient civilisation of the land of the penguins! 

In [1]:
documents = [
    ("Letter from Emperor Icebeak", "Letter", "Emperor Icebeak", 1203),
    ("Diary of Explorer Frostfeather", "Diary", "Explorer Frostfeather", 1302),
    ("Proclamation of the Great Snowfall", "Edict", "Council of Penguins", 1150),
    ("Chronicler's Tale of the Frozen Flock", "Chronicle", "Chronicler Flipper", 1401),
    ("Letter from Admiral Snowclaw", "Letter", "Admiral Snowclaw", 1256),
    ("Edict of the Eternal Winter", "Edict", "Empress Frostwing", 1189),
    ("Diary of Fisher Penguino", "Diary", "Fisher Penguino", 1320),
    ("Speech of Unity at Iceberg Summit", "Speech", "Chancellor Icybeak", 1433),
    ("Chronicler's Account of the Great Migration", "Chronicle", "Chronicler Flipper", 1398),
    ("Treaty of the Icy Shores", "Edict", "Council of Penguins", 1280),
    ("Diary of Scholar Snowfoot", "Diary", "Scholar Snowfoot", 1277),
    ("Speech at the Glacial Gathering", "Speech", "Emperor Icebeak", 1225),
    ("Letter from Healer Frostbill", "Letter", "Healer Frostbill", 1345),
    ("Chronicler's Record of the Frost Wars", "Chronicle", "Chronicler Flipper", 1403),
    ("Proclamation of the Icebound Pact", "Edict", "Empress Frostwing", 1195)
]

Firstly with the collection, we may want to group the documents by their type. To do that we have to loop through the entire list of documents and add them to a set under the name of the document type...

In [2]:
documents_by_type = {}

for title, doc_type, author, year in documents:
    if doc_type not in documents_by_type:
        documents_by_type[doc_type] = []
    documents_by_type[doc_type].append((title, author, year))

print(documents_by_type)

{'Letter': [('Letter from Emperor Icebeak', 'Emperor Icebeak', 1203), ('Letter from Admiral Snowclaw', 'Admiral Snowclaw', 1256), ('Letter from Healer Frostbill', 'Healer Frostbill', 1345)], 'Diary': [('Diary of Explorer Frostfeather', 'Explorer Frostfeather', 1302), ('Diary of Fisher Penguino', 'Fisher Penguino', 1320), ('Diary of Scholar Snowfoot', 'Scholar Snowfoot', 1277)], 'Edict': [('Proclamation of the Great Snowfall', 'Council of Penguins', 1150), ('Edict of the Eternal Winter', 'Empress Frostwing', 1189), ('Treaty of the Icy Shores', 'Council of Penguins', 1280), ('Proclamation of the Icebound Pact', 'Empress Frostwing', 1195)], 'Chronicle': [("Chronicler's Tale of the Frozen Flock", 'Chronicler Flipper', 1401), ("Chronicler's Account of the Great Migration", 'Chronicler Flipper', 1398), ("Chronicler's Record of the Frost Wars", 'Chronicler Flipper', 1403)], 'Speech': [('Speech of Unity at Iceberg Summit', 'Chancellor Icybeak', 1433), ('Speech at the Glacial Gathering', 'Emper

It is important to ensure that our documents are the most clean possible and so next we clean ensure that all our data is cleaned and in the specific type we want it. For this we can use a list comprehension...

In [3]:
cleaned_documents = [(title, doc_type, author.title(), str(year)) for title, doc_type, author, year in documents]

print(cleaned_documents)

[('Letter from Emperor Icebeak', 'Letter', 'Emperor Icebeak', '1203'), ('Diary of Explorer Frostfeather', 'Diary', 'Explorer Frostfeather', '1302'), ('Proclamation of the Great Snowfall', 'Edict', 'Council Of Penguins', '1150'), ("Chronicler's Tale of the Frozen Flock", 'Chronicle', 'Chronicler Flipper', '1401'), ('Letter from Admiral Snowclaw', 'Letter', 'Admiral Snowclaw', '1256'), ('Edict of the Eternal Winter', 'Edict', 'Empress Frostwing', '1189'), ('Diary of Fisher Penguino', 'Diary', 'Fisher Penguino', '1320'), ('Speech of Unity at Iceberg Summit', 'Speech', 'Chancellor Icybeak', '1433'), ("Chronicler's Account of the Great Migration", 'Chronicle', 'Chronicler Flipper', '1398'), ('Treaty of the Icy Shores', 'Edict', 'Council Of Penguins', '1280'), ('Diary of Scholar Snowfoot', 'Diary', 'Scholar Snowfoot', '1277'), ('Speech at the Glacial Gathering', 'Speech', 'Emperor Icebeak', '1225'), ('Letter from Healer Frostbill', 'Letter', 'Healer Frostbill', '1345'), ("Chronicler's Record

Now with the clean data, we may be interested in investigating the collection as a whole. For example, finding how many of each type of document we have, or perhaps what the earliest document is in the collection...

In [4]:
# We open up here an empty set, and set earliest_document to None in order to have a 
# placeholder for the earliest document.
document_counts = {}
earliest_document = None

for title, doc_type, author, year in cleaned_documents:
    # This bit counts the documents.
    if doc_type not in document_counts:
        document_counts[doc_type] = 0
    document_counts[doc_type] += 1
    # This bit does the check for the earliest document.
    if earliest_document is None or int(year) < int(earliest_document[3]):
        earliest_document = (title, doc_type, author, year)

print("Document Counts by Type:", document_counts)
print("Earliest Document:", earliest_document)

Document Counts by Type: {'Letter': 3, 'Diary': 3, 'Edict': 4, 'Chronicle': 3, 'Speech': 2}
Earliest Document: ('Proclamation of the Great Snowfall', 'Edict', 'Council Of Penguins', '1150')


Perhaps in examining the collection we are more interested in a very specific ancient penguin author. We can do a deep dive with what we have already learnt...

In [5]:
# Extract documents authored by Emperor Icebeak using list comprehension
icebeak_documents = [(title, doc_type, year) for title, doc_type, author, year in cleaned_documents if author == "Emperor Icebeak"]

print("Documents by Emperor Icebeak:", icebeak_documents)

# Analyze the types of documents authored by Emperor Icebeak
icebeak_document_types = [doc_type for title, doc_type, year in icebeak_documents]

# Manually count the occurrences of each document type
icebeak_document_counts = {}
for doc_type in icebeak_document_types:
    if doc_type not in icebeak_document_counts:
        icebeak_document_counts[doc_type] = 0
    icebeak_document_counts[doc_type] += 1

print("Emperor Icebeak Document Counts by Type:", icebeak_document_counts)

Documents by Emperor Icebeak: [('Letter from Emperor Icebeak', 'Letter', '1203'), ('Speech at the Glacial Gathering', 'Speech', '1225')]
Emperor Icebeak Document Counts by Type: {'Letter': 1, 'Speech': 1}
