<a href="https://colab.research.google.com/github/powderflask/cap-comp215/blob/main/examples/week2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sequence and Map data structures - Strings, Tuples, Lists, Dictionaries
This is our week 2 examples notebook and will be available on Github from the powderflask/cap-comp215 repository.

As usual, the first code block just imports the modules we will use.

In [None]:
import datetime
import matplotlib.pyplot as plt
import  matplotlib.dates as mdates
from pprint import pprint

## f-strings
A `string` is a sequence of characters / symbols.
This familiar data structure is quite powerful, and format-strings (f-strings) take it to the next level....

In [None]:
today = datetime.date.today()
the_answer = 42
PI = 3.1415926535

f'{today:%dth of %m month, %Y} is not special, but {the_answer} and {PI:0.3} are!'

## List Comprehension
Provides a compact syntax for two very common sequence-processing algorithms:  Map  and Filter

Basic syntax:

In [None]:
[i for i in range(10)]

### Map Algorithm
Apply the same function to every item in another sequence (i.e., provide a "mapping" from the source sequence to the target

In [None]:
# Problem:  compute the first 10 natural squares
[(i+1)**2 for i in range(10)]

### Filter
Select a sub-set of the elements from another sequence based on some criteria.

In [None]:
VOWELS = 'aeiou'
textt = '''
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
'''
# Problem:  create a string with just the vowels from the text, in order.
vowels = [i for i in VOWELS]
''.join([i for i in textt if i in vowels])
seen = set()
def is_seen(c):
    if c in seen:
        return True
    else:
        seen.add(c)
        return False

unique = []
for v in vowels:
    if not is_seen(v):
        unique.append(v)
#[v for v in vowels if not is_seen(v)]
unique

## Data Wrangling with List Comprehension
E-learn's Live Quiz module does track quiz scores for each student, but does not store them in the gradebook,
and it reports on them in the most useless way.

Let's do some "data wrangling" to make sense out of this mess!

### The Problem: Unstructured Data!
Notice it is just a single large string!  The real data set has 36 students, and I need to do this every week!

In [None]:
text = """
  1.                 Ali Oop scored  7/ 8 = 87%


  2.          Alison Ralison scored  8/ 8 = 100%


  3.         Ambily Piturbed scored  8/ 8 = 100%


  4.  Arshan Risnot Farquared scored  5/ 8 = 62%


  5.       Ayushma Jugernaugh scored  5/ 8 = 62%


  6.       Brayden Labaguette scored  7/ 8 = 87%
"""

### Goal
Turn this into structured data: a list of 2-tuples, each student's full name and their integer score.

In [None]:
lines = [line.split()[1:-3] for line in text.split('/n') if line]
scores = [(' '.join(line[:-2]), line[-1][:-1]) for line in lines]
scores

# from collections import defaultdict
# def counter_factory():
#     return 0
# def calc(val):
#     return 2*val if val%2==0 else 3*val if 
# counts = defaultdict(lambda : 0)
# for i in range(0,100):
#     counts[i] += 1


# counts

## Records
A *record* is a compound data value - a collection of simpler data values (fields) that all describe a single entity.

 * tuple
 * dictionary
 * object

Problem: develop the data representation for a `student` in a student record system,
where a `student` has a first and last name, student id, and date of birth

In [None]:
# Tuple
tuple_students = [
('Bob', 'Squarepants', 123456789, datetime.date(year=1994, month=2, day=25)),
('Dora', 'Explorer', 123456545, datetime.date(year=2000, month=8, day=14))
]
s = tuple_students[-1]
age = datetime.date.today() - s[3]
age.days // 365
# Dictionary
dict_students = [
    {
        'first': 'Bob',
        'last': 'Squarepants',
        'sn': 123456789,
        'DoB': datetime.date(year=1994, month=2, day=25)
    },
    {
        'first': 'Dora',
        'last': 'Explorer',
        'sn': 123456545,
        'DoB': datetime.date(year=2000, month=8, day=14)
    }
]

s = dict_students[-1]
s['DoB']

students = [
    {'first':s[0], 'last':s[1], 'sn':s[2], 'dob':s[3],} for s in tuple_students
]

In [13]:
# Object
from dataclasses import dataclass

@dataclass
class Student:
    first: str
    last: str
    sn :int
    DoB: datetime.date

    def full_name(self):
        return f'{self.first} {self.last}'

@dataclass
class SkilledStudent(Student):
    skill : str

students = [
    Student('Bob', 'Squarepants', 123456789, datetime.date(year=1994, month=2, day=25)),
    SkilledStudent('Dora', 'Explorer', 123456545, datetime.date(year=2000, month=8, day=14), 'Spanish')
]
dora = [s for s in students if s.first=='Dora'][0]
dora.skill

'Spanish'