# M02 Notes

# Review Assignment

See [M01 Notebook](http://localhost:8888/lab/tree/Dropbox/Courses/DS/DS5100/DS5100-2023-08-O/repo/notebooks/M01_GettingStarted/M01_ImportHello.ipynb) for results.

# Completion Rates

In [85]:
import pandas as pd
import numpy as np

In [86]:
df = pd.DataFrame(columns=['key','val']).set_index('key')

In [87]:
df.loc['n_students'] = 47
df.loc['n_forks'] = 42
df.loc['n_pull_requests'] = 34
df.loc['n_hello_texts'] =  34
df.loc['n_correct_texts'] = 33

In [88]:
df['completion'] = np.round(df['val'] / df.loc['n_students', 'val'], 2)

In [89]:
df['missing'] = df.loc['n_students', 'val']- df['val']

In [90]:
df

Unnamed: 0_level_0,val,completion,missing
key,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
n_students,47,1.0,0
n_forks,42,0.89,5
n_pull_requests,34,0.72,13
n_hello_texts,34,0.72,13
n_correct_texts,33,0.7,14


In [7]:
 df.T.n_students

val           47.0
completion     1.0
missing        0.0
Name: n_students, dtype: float64

# Concepts

## Data / Code

Data vs algorithm (code). How are they related?

## Data Types and Structures

Sata types and data structures. What are the differences?

## Strings

Strings are data types, but internally they are like data structures.

Internally, a string is a sequence of Unicode code points.

- A code point is a numerical value that maps to a specific character.
- Unicode is an international standard of code points that map onto the alphabets of many languages.

Each character is an element in an immutable list-like structure.

You can access it's elements as if it were a tuple of characters:

In [95]:
my_string = "This is a string"
print(my_string[0])
print(my_string[-1])
print(my_string[1:4])

But also like a tuple, you can't change its values:

In [97]:
my_string[2] = 'a'
my_string[3] = 't'

TypeError: 'str' object does not support item assignment

Note that some languages, like Java, have a data type for individual characters, e.g. `A`.

## Mutability

A mutable object is one whose internal value can be changed. 

This property applies to data structures.

Tuples are immutable, lists are not.

**Demonstration**

Here, we mutate a list by appending a value to it.

In [98]:
a = [1,2,3,4,5]
a.append(10)
print(a)

[1, 2, 3, 4, 5, 10]


In [99]:
a[0] = 5
print(a)

[5, 2, 3, 4, 5, 10]


If we try the same things with a tuple, we get an error.

In [20]:
b = (1,2,3,4,5)
b.append(10)
print(b)

AttributeError: 'tuple' object has no attribute 'append'

In [22]:
b[0] = 5
print(b)

TypeError: 'tuple' object does not support item assignment

This, on the other hand, is not mutation:

In [67]:
a = [1,2,3,4,5,10] # A list
b = (1,2,3,4,5,10) # A tuple
print(a)
print(b)

[1, 2, 3, 4, 5, 10]
(1, 2, 3, 4, 5, 10)


We are just **re-assigning** a new value to the variable. 

The new value just **replaces** the old one.

In mutation, the same data structure remains in place and its contents are changed.

Note, however, that this works with tuples:

In [59]:
b += (11,)
print(b)

(1, 2, 3, 4, 5, 10, 11)


It looks like mutation, but it's not.

This is because we are replacing `b` with a new tuple value.

> Notice that we write a single valued tuple with a comma. Why?

Relatedly, mutable and immutable objects behave differently.

For example, when you assign a variable to another variable of mutable datatype, \
any changes to the data are reflected by both variables. 

The new variable is just an alias for the old variable. 

This is only true for mutable datatypes. 

In [68]:
a1 = a # Make a copy of a list

In [69]:
id(a), id(a1)

(4635760768, 4635760768)

In [62]:
b1 = b # Make a copy of a tuple

In [63]:
id(b), id(b1)

(4635509280, 4635509280)

This means the `+` operator behaves differently, too:

In [77]:
b1 = b1 + (12,)
print(b)

(1, 2, 3, 4, 5, 10)


In [78]:
a1 = a1 + [12]
print(a)

[1, 2, 3, 4, 5, 10]


## Comparing Floats

Let's do an experiment:

In [None]:
f1 = 0.1 + 0.2
f2 = 0.3

In [None]:
f1 == f2

In the above case, `f1` and `f2` don't hold precisely the same value because of the limitations of representing base-10 fractions in base-2 (binary).

Inspecting their values, we find minor differences in the lower significant digits:

In [None]:
f1, f2

Note that sometimes floating point comparisons do work:

In [None]:
f3 = 4.0
f4 = 3.5 + .5

In [None]:
f3 == f4

See the [Wikipedia](https://en.wikipedia.org/wiki/Floating-point_arithmetic#:~:text=In%20computing%2C%20floating%2Dpoint%20arithmetic,are%20called%20floating%2Dpoint%20numbers.) article on floating point arithmetic to learn more about how this arises. 

It will provide you with insight into how computers actually work as machines that process numbers.

## The Word "scalar"

Sometimes you will see the word "scalar" in the literature to refer to certain kinds of values.

Scalars are **single values** as opposed to structures or collections of values. 

> Strings as data types sometimes behave as scalars and sometimes as sequential structures.

# Summary

In [100]:
%%html
<style>table {float: left; clear: right;}</style>

**Types**
| name | type | literal |
|------|------|---------|
| `int` | integer | `1` |
| `str` | string | `"1"`, `'1'` |
| `float` | floating point (real) | `1.` |
| `complex` | complex | `1j` (imaginary component) |
| `bool` | boolean | `True` |

**Structures**
| name | mutable | constructor |
|------|---------|-------------|
| `tuple` | no | `()` |
| `list` | yes | `[]`
| `dict` | yes | `{}` with key/value pairs |
| `set` | yes | `{}` with single values |