 ## Sp25 CS477/577: Python for Machine Learning
 
 Lecture 2: Python data types and structures
   - [Overview](#section1)
   - [Numeric types](#section2)
   - [Strings](#section3)
   - [Lists](#section4)
   - [Dictionaries](#section5)
   - [Tuples](#section6)
   - [Sets](#section7)
   
In an informal sense, the purpose of programming is to solve problems by defining and performing operations on **data**.

Data have different forms and units, e.g., grades (100 points), body weight (150.5 lbs), and temperature: 80$^{o} F$

In [1]:
!python --version

Python 3.12.7


## 1. Overview of Python core data types and structures <a id = "section 1"/>

#### Numbers: interger, float, complex, decimal(fixed-length), boolean, fraction

```pyhton
i = 2200000 # positive integer
i1 = -1234   # negative integer
f1 = 3.1415926 #float-point number
Decimal('1.0') # fixed-length floating point
Fraction(1, 3) #fraction
bool(1), True, False

```

#### Strings
```python
name = 'Bruce Lee'
```

#### Lists
```python
nums = [1, 2, 3.0]
mixed = [1, 2.5, 'spam']
```

#### Dictionaries
```python
grades = {'Bruce': 100, 'Henry': 80}
```

#### Tuples
```python
t = (100, 200)
```
#### Sets
```python
s = {100, 200, 300}
```

#### Files

```python
f = open('hello.txt')
```

#### Python is a **dynamically-typed** programming language
- It does not need data type declarations and tracks data types automatically based on the expressions.

```C++
//C++ example
int a; // type decalaration
a = 5;
```

```python
#python example
a = 5
```

- Dynamic typing languages
    - Pyhton, Perl, PHP, and JavaScript
- Static typing languages
    - Java, C, and C++

## 2. Numerical types and operations <a id="section2"/>
- [Numbers in Python](#numbers) 
- [Basic operations](#ops)
- [Math functions](#math)

#### Numbers in Python <a id="numbers"/>

In [1]:
#int


In [2]:
#float


In [4]:
#float, 100, 1-e2


In [5]:
# issue of float numbers: We do not have exact representations in binary floating point
f1 = .1 + .1 + .1
print(f1)

print(f1-.3)

0.30000000000000004
5.551115123125783e-17


In [10]:
#decimal floating point: Decimal numbers can be represented exactly
from decimal import Decimal
f = Decimal('1.1') + Decimal('1.1')  + Decimal('1.1') 
type(f)
f

Decimal('3.3')

In [11]:
# mixed types
i = 1 
j = 1.1
i + j

2.1

In [13]:
## convert between types
print(int(f))


3


#### Basic perations <a id="ops"/>
- Basic operations: +, -, \*, /, ** , %(remainder), <, >, ==, etc.
- Build-in mathematical functions: pow, abs, round, int, etc.
- Other modules: math, random, etc. 


In [12]:
# practice 
d = 5 / 3 # classic division
print(d)
d1 = 5 // 3 # floor division
print(d1)

a = 2 **4 # 2 to the power of 4
print(a)

b = pow(2, 4)
print(b)

price = 101
tax_rate = 0.06

pay = price + price * tax_rate
print(pay, round(pay, 1))

1.6666666666666667
1
16
16
107.06 107.1


#### Math operations <a id="math"/>

Build-in mathematical functions: pow, abs, round, sum, etc.
- we have used some buit-in functions in Python, e.g., **round**, and **type**. The functions are built into Python intepreter. We use it and do not need to import any packages. Check the list of all built-in functions in Python:
https://docs.python.org/3.11/library/functions.html

In [28]:
help(pow) # check document to know how to use a function

Help on built-in function pow in module builtins:

pow(x, y, z=None, /)
    Equivalent to x**y (with two arguments) or x**y % z (with three arguments)
    
    Some types, such as ints, are able to use a more efficient algorithm when
    invoked using the three argument form.



In [9]:
#abs, pow, sum
print(pow(2, 3))

8


Math functions from non-builtin functions

In [2]:
# use the math module. https://docs.python.org/3.12/library/math.html#module-math
import math



In [3]:
# use the random module
import random

r = random.random() #random floating point number in the range 0.0 <= X < 1.0
print(r)

0.8043302770303506


In-class Practice: Calculate the square and square root of a number

In [47]:
a = 9
import math
math.sqrt(a)


3.0

## 3. Strings and operations <a id="section3"/>
Python strings (type str) are enclosed in single quotes ('...') or double quotes ("...").

In [15]:
#str type
s1 = "Bruce"
type(s1)

str

In [48]:
#single and doubel quotes, and '\'


In [49]:
#concatenates two strings
s1 = 'Python'
s2 = 'programming'



In [34]:
s1 * 2

'PythonPython'

In [35]:
len(s1*2)

12

In [50]:
# check all operations for string


In [51]:
# number 2 str


In [52]:
# str to number


#### Practice
Given a string "s,pa,m", extract the two characters ('pa') in the middle 

In [55]:
s = "s,pa,m"


## 4. Lists <a id="section4"/>
In Python, lists are the most flexible **ordered** built-in collection object type.
```python
    L = [1, 2, 3, 5] # a list example
```
#### 1. Major properties :
(1) Ordered collections of arbitrary objects: a left-to-right positional ordering of the items.

(2) Access by **positional offset**; fetch an item by indexing the list on the object's offset

(3) Variable length: can grow and shrink in place

(4) **Mutable**

(5) Lists are arrays (in standard Python interpreter)

#### 2. List Operations

(1) length: len(L)

(2) concatenation: +, L.extend(another list)

(3) repetition: *

(4) membership: 1 in L
 
(5) deletion: del L[0] or L[0] = []

(6) insertion

(7) replacement

(8) iteration and comprehensions

(9) Matrix

In [58]:
#(1) Ordered collections of arbitrary objects: a left-to-right positional ordering of the items.
L = []
L = [1,2, 3.5, 'str']
L = [1,2, 3.5, 'str', [2, 3,4]] # arbitrary nesting

print(L)
print(L[0])

[1, 2, 3.5, 'str', [2, 3, 4]]
1


In [59]:
#(2) Access by offset; fetch an item by indexing the list on the object's offset
#index: L[i] for 1d, L[i][j] for 2d
L = [1,2, 3.5, 'str', [2, 3,4]] # arbitrary nesting



In [60]:
#(3) Variable length: can grow and shrink in place
L = [1,2, 3.5]


In [61]:
#(4) Mutable
L = [1,2, 3.5]


In [63]:
#(5) -(7) shrink
L = [1, 2, 3, 5]


In [65]:
# (8) List iteration and comprehensions
#List iteration
L = [1, 2, 3.5]


In [67]:
# extra: convert to list object type
L = list(range(0, 10))
print(L)
L = list('python')
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


['p', 'y', 't', 'h', 'o', 'n']

#### Practices: Shallow copy and deep copy

what do you get from the following practices?

In [39]:
# practice 1
L = [1, 2, 3.5]
L1 = L
print('Practice 1: before change\nL is: {} \nL1 is: {}'.format(L, L1))
L1[2] = '3'
print('Practice 1: after change\nL is: {} \nL1 is: {}'.format(L, L1))

Practice 1: before change
L is: [1, 2, 3.5] 
L1 is: [1, 2, 3.5]
Practice 1: after change
L is: [1, 2, '3'] 
L1 is: [1, 2, '3']


In [68]:
#practice 2 
L = [1, 2, 3.5]
L1 = L.copy()


In [70]:
# extra practice 3 
L = [1, 2, 3.5]
L1 = L[:] # slicing will create a new list


## 5. Dictionary creation and operations <a id="section5"/>

Flexible **unordered** built-in collection object type. Items are sorted and fetched by **key** instead of by positional offset.
```python
    D = {'name': 'Tim', 'age': 22}
    print(D)

    print(D['name']) # (1) accessed by key
    D['age'] = 23 # (4) mutable
    print(D)
```
#### Main properties
(1) accessed by key

(2) unsorted collections of arbitrary objects

(3) variable-length

(4) **mutable**

(5) implemented as **hash tables**

#### Dict operations
(1) length,: len()

(2) membership: in

(3) get keys: list(D.keys())

(4) deletion: D.pop(key), del D[key]

(5) comprehensions

In [39]:
# define dict using {}
wordcnt = {'cat': 100, 'dog': 90} # represent keyword-value pairs
print(wordcnt)

wordcnt['cat'] # make data acess very fast

{'cat': 100, 'dog': 90}


100

In [40]:
dir(wordcnt)

['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [41]:
wordcnt.keys()

dict_keys(['cat', 'dog'])

## 6. Tuples <a id="section6" />
>A tuple is a collection of objects which is **ordered and immutable**, and it is commonly written as a series of items in **parentheses** ()

#### Main properties
(1) Ordered collections of arbitrary objects: a left-to-right positional ordering of the items.

(2) Access by offset; fetch an item by indexing the list on the object's offset

(3) fixed-length

(4) **immutable**

(5) tuples are arrays

In [42]:
# Creating a tuple
t = (1, 2, 3)
t

(1, 2, 3)

In [43]:
# Show the created tuple
t
print(t)

(1, 2, 3)


In [44]:
# Check the length of the tuple using len(), just like a list
len(t)

3

In [47]:
# We can also mix object types: e.g., strings, integer numbers, floating-point numbers
t = ('one', 2, 490.2)

# Show
t

('one', 2, 490.2)

In [46]:
# Tuples, lists or dictionaries can be nested into other tuples
w = ('one', 'two', (4, 5), 6, ['r', 100])
w

('one', 'two', (4, 5), 6, ['r', 100])

In [48]:
# An empty tuple
u = () 
u

()

In [49]:
# A 1-item tuple
v = ('thing', )    
v

('thing',)

In [50]:
#The parentheses () can be omitted in the syntax, and tuples can be created just by listing items separated with commas
t = 'one', 2, 490.2
t

('one', 2, 490.2)

In [71]:
# Use indexing just like in lists and strings


In [72]:
# Slicing


In [53]:
# Concatenation
(1, 'book') + ('notes', 4) 

(1, 'book', 'notes', 4)

In [54]:
# Repetition
(1, 'thing') * 4 

(1, 'thing', 1, 'thing', 1, 'thing', 1, 'thing')

In [55]:
# Immutability
# If we try to change the first element, we will get an error message
t[0] = 'four'

TypeError: 'tuple' object does not support item assignment

In [57]:
# instead we have to create a new tuple to update contents
t = (t[0], 7, t[2])
t

('one', 7, 490.2)

## 7 Sets <a id="section6" />
>  A ***set*** is a collection of **unique objects** which is **unordered and immutable**, and are constructed by using the set() function.

Sets support operations corresponding to mathematical set theory (intersection, union, etc.). By definition, an item appears only once in a set, no matter how many times it is added. 

Because sets are collections of objects, they share some behavior with lists and dictionaries. For example, sets are iterable, can grow and shrink on demand, and may contain a variety of object types. 

However, since sets are unordered and do not map keys to values, they are neither a sequence (list) or mapping type (dict).

Sets have a variety of applications, especially in numeric and database-focused work.

In [59]:
x = set('abcde')
y = set('bdxyz')
# Difference
x - y

{'a', 'c', 'e'}

In [60]:
# Union
x | y

{'a', 'b', 'c', 'd', 'e', 'x', 'y', 'z'}

In [61]:
# Intersection
x & y

{'b', 'd'}

In [62]:
# Superset, subset
x > y, x < y

(False, False)

In [63]:
# Membership of a set
'e' in x

True

In [None]:
z = set()
# Add to set with the add() method
z.add(1)
print(z)
# Add a different element
z.add(2)
print(z)
# Try to add the same element
z.add(1)
print(z)

In [None]:
# Delete one item
z.remove('b') 
z