# Chapter 1: Python Collections

## Strings
A string is a fundamental data type used to used to handle and manipulate textual data. It is represented by a sequence of characters that might include letters, numbers, symbols, and whitespace. They are considered an ordered and immutable collection.

### Basic Syntax and Concepts

#### print()

In [16]:
# print a string
s = "Hello World!"
print(s)

Hello World!


#### quotes to use

In [17]:
single_quotes = 'hi'
double_quotes = "Hey"
triple_quotes = '''Hello'''

print(single_quotes, double_quotes, triple_quotes)

hi Hey Hello


#### len()

In [18]:
# length of characters 
len(s)

12

#### lower() vs upper()
Strings are immutable but methods like lower, upper, split essentially create a new string for us.

In [19]:
s = 'hello world!'

# convert to lowercase
print(s.lower())

# convert to uppercase
s.upper()

hello world!


'HELLO WORLD!'

#### ord() vs chr()
- ord(c) returns an ASCII ordinal number of the provided character 
- chr(c) converts the provided ASCII ordinal number back to the character.

In [20]:
print(ord('A'))  # Prints: 65
print(chr(65))  # Prints: 'A'
print(chr(ord('A') + 1)) # Prints: 'B'

65
A
B


#### isalpha() vs isdigit() vs isalnum()

In [21]:
print("C".isalpha()) # Prints: True
print("C++".isalpha()) # Prints: False
print("239".isdigit()) # Prints: True
print("C239".isdigit()) # Prints: False
print("C98".isalnum()) # Prints: True
print("C98++".isalnum()) # Prints: False

True
False
True
False
True
False


#### string.split() vs string.split(some_delimiter, 1)

In [22]:
# Split along whitespace
s.split()

['hello', 'world!']

In [23]:
s = "A1=foo=bar,C1=hello"
s.split("=")

['A1', 'foo', 'bar,C1', 'hello']

In [24]:
s = "A1=foo=bar,C1=hello"
s.split("=", 1)

['A1', 'foo=bar,C1=hello']

#### ' '.join(lst)

In [25]:
# turn an iterable into a string
lst = ['Let', 'us', 'go', 'on', 'a', 'hike']
print(' '.join(lst))

Let us go on a hike


#### string.strip()

In [26]:
# strip whitespace
s = '       Let us go on a hike       '
print(s)
print(s.strip())

       Let us go on a hike       
Let us go on a hike


In [27]:
# left and right strip()
name = '    John Doe    '
print(name.lstrip())  # Output: 'John Doe    '
print(name.rstrip())  # Output: '    John Doe'

John Doe    
    John Doe


#### string.replace()

In [28]:
s = 'tiger'
print(s.replace('t', ''))

iger


#### s.find(x) 
Returns start index of the first occurrence of substring x in a given string. Returns -1 if x is not in the string.

In [29]:
s = 'tigeress'
print(s.find('s'))

6


#### s.count(x) 
Returns the frequency of the substring x in the given string.

In [30]:
s = 'tigeress'

print(s.count('s'))

print(s.count('ss'))

2
1


#### inequality operators

In [31]:
s = "I am a unicorn"

for c in s:
    if 'a' <= c <= 'z':
        print("I am a small case letter")
    elif 'A' <= c <= 'Z':
        print("I am a capital letter")
    elif '0' <= c <= '9':
        print("I am an integer")
    else:
        print("I am not sure, who I am (philosophically speaking)")

I am a capital letter
I am not sure, who I am (philosophically speaking)
I am a small case letter
I am a small case letter
I am not sure, who I am (philosophically speaking)
I am a small case letter
I am not sure, who I am (philosophically speaking)
I am a small case letter
I am a small case letter
I am a small case letter
I am a small case letter
I am a small case letter
I am a small case letter
I am a small case letter


#### reverse a string

In [32]:
s = "hello"
print(s[::-1])

olleh


### Practice Questions

````{tab-set}
```{tab-item} Q1 Replace characters/substrings
Write a function tiggerfy() that accepts a string word and returns a new string that removes any substrings t, i, gg, and er from word. The function should be case insensitive.
```

```{tab-item} Solution

    
    def tiggerfy(word):
        word_lower = word.lower()
        
        word_lower = word_lower.replace('t', '')
        word_lower = word_lower.replace('i', '')
        word_lower = word_lower.replace('gg', '')
        word_lower = word_lower.replace('er', '')
        
        return word_lower
    

```
````

## Lists
Lists are mutable. They let us organize data so that each item holds a definite position.

### Basic Syntax and Concepts

#### sum(lst)

In [33]:
numbers = [1,2,3,4]
sum(numbers)

10

#### min(lst)

In [34]:
min(numbers)

1

#### max(lst)

In [35]:
max(numbers)

4

#### lst.append()

In [36]:
# appends new items at the end
numbers.append(5)
print(numbers)

[1, 2, 3, 4, 5]


#### lst.extend(other_lst)

In [37]:
# Modifies the given list by appending all elements in iterable x.
small_list = [11,12,13]
big_list = [1,2,3,4,5,6,7]
big_list.extend(small_list)
print(big_list)

[1, 2, 3, 4, 5, 6, 7, 11, 12, 13]


#### lst.sort()

In [38]:
# sorts in-place and does not return anything
numbers.sort(reverse=True)
print(numbers)

[5, 4, 3, 2, 1]


#### sorted(lst)

In [39]:
# doesn't modify the original list but does return a sorted list
new_list = sorted(numbers)
print(new_list)

[1, 2, 3, 4, 5]


#### lst.insert()

In [40]:
# insert an element at a particular index
alphabets = ["a", "b", "c", "d"]
alphabets.insert(3, "!")
print(alphabets)

['a', 'b', 'c', '!', 'd']


#### lst.remove() vs lst.pop()

In [41]:
alphabets = ["a", "b", "c"]
print(alphabets)

alphabets.insert(3, "d")
print(alphabets)

# removes just the first occurence of element passed
alphabets.remove("d")
print(alphabets)

# Removes element at index x from list
alphabets.pop(2)
print(alphabets)

['a', 'b', 'c']
['a', 'b', 'c', 'd']
['a', 'b', 'c']
['a', 'b']


#### reversal

In [42]:
list1 = [1, 2, 3]
print(list1[::-1])
print(list1)

list2 = [10, 20, 30]
# reverses in-place
list2.reverse()
print(list2)

[3, 2, 1]
[1, 2, 3]
[30, 20, 10]


#### slicing[:]

In [43]:
# Slicing: my_list[start:end], `start` inclusive, `end` exclusive
print(alphabets[1:4])

['b']


#### concatenation (+)

In [44]:
# Concatenation: my_list + another_list
animals = ["Panda", "Cat"]
print(alphabets + animals)

['a', 'b', 'Panda', 'Cat']


#### repetition (*)

In [45]:
# Repetition: my_list * n
print(animals * 2)

['Panda', 'Cat', 'Panda', 'Cat']


#### lst.index()

In [46]:
# returns the index of given element
animals.index("Cat")

1

#### item in lst

In [47]:
print("Dog" in animals)

False


In [48]:
# fun example
x =  animals[0].lower().index('n') if 2>0 else -1
print(x)

2


#### bisect_left(lst, item)

In [49]:
from bisect import bisect_left

# pass a sorted list
Y = [1, 2, 3, 4, 5]
print(bisect_left(Y, 0))
print(bisect_left(Y, 6))
print(bisect_left(Y, 2.5))

0
5
2


#### lst.copy() 
Creates a copy of a list

In [50]:
list1 = [1,2,3]
list2 = list1.copy()
list2[0] = 9

print(list1)
print(list2)

[1, 2, 3]
[9, 2, 3]


## Tuples
A string is a fundamental data type used to used to handle and manipulate textual data. It is represented by a sequence of characters that might include letters, numbers, symbols, and whitespace. They are considered an ordered and immutable collection. Tuples are commonly used to store pairs of data together or return multiple values inside a function.

### Basic Syntax and Concepts

#### tuple creation and update

In [54]:
my_tuple = (10, 20)

try:
    my_tuple[0] = 30 # Results in TypeError: 'tuple' object does not support item assignment since it is unmutable
except TypeError as e:
    print(e)

'tuple' object does not support item assignment


## Sets

Important Considerations:
1. Uniqueness
2. Unordered
3. Hashable (elements must be immutable i.e. numbers, strings, and tuples instead of mutable objects like lists, dictionaries, or other sets)

### Basic Syntax and Concepts

#### Creation

In [None]:
my_set = {1, 2, 3, "apple", "banana"}
print(my_set)

{1, 2, 3, 'banana', 'apple'}


In [None]:
# Creating an empty set
empty_set = set()
print(empty_set)

# Creating a set from a list
list_data = [1, 2, 2, 3, 4, 4, 5]
set_from_list = set(list_data)
print(set_from_list)  # Output: {1, 2, 3, 4, 5}

# Creating a set from a string
string_data = "hello"
set_from_string = set(string_data)
print(set_from_string)  # Output: {'h', 'e', 'l', 'o'} (order may vary)

set()
{1, 2, 3, 4, 5}
{'e', 'l', 'h', 'o'}


#### add()

In [None]:
my_set = {1, 2, 3}
my_set.add(4)
print(my_set)  # Output: {1, 2, 3, 4}

{1, 2, 3, 4}


#### remove() vs discard()

In [None]:
my_set = {1, 2, 3}
my_set.remove(2)
print(my_set)  # Output: {1, 3}
# my_set.remove(4) # This would raise a KeyError

{1, 3}


In [None]:
my_set = {1, 2, 3}
my_set.discard(2)
print(my_set)  # Output: {1, 3}
my_set.discard(4) # No error, set remains {1, 3}
print(my_set)

{1, 3}
{1, 3}


#### union() vs update()
- union returns a new set without modifying the original sets
- update returns None as it alters the original set in-place

In [None]:
set1 = {1, 2, 3}
set2 = {3, 4, 5}
new_set = set1.union(set2)
print(set1)  # Output: {1, 2, 3} (original set1 unchanged)
print(new_set) # Output: {1, 2, 3, 4, 5} (a new set is created)

{1, 2, 3}
{1, 2, 3, 4, 5}


In [None]:
set1 = {1, 2, 3}
set2 = {3, 4, 5}
set1.update(set2)
print(set1)  # Output: {1, 2, 3, 4, 5} (original set1 is modified)

{1, 2, 3, 4, 5}


#### clear()
Removes all elements from the set.

In [None]:
myset = {1,2,3}
myset.clear()
print(myset)

set()


#### union, intersection, difference, and symmetric difference

- Union: a | b: Returns the set of elements contained in either set a or set b.
- Intersection: a & b: Returns the set of elements contained in both set a or set b.
- Difference: a - b: Returns the set of elements contained in set a but not in set b.
- Symmetric Difference: a ^ b: Returns the set of elements contained in either set a or set b but not in both (i.e. elements left after removing the intersecting elements).

In [None]:
set1 = {1, 2, 3}
set2 = {3, 4, 5}

# Union: Elements in either set
union_set = set1 | set2           # {1, 2, 3, 4, 5}
print(union_set)

# Intersection: Elements common to both sets
intersection_set = set1 & set2    # {3}
print(intersection_set)

# Difference: Elements in set1 but not in set2
difference_set = set1 - set2      # {1, 2}
print(difference_set)

# Symmetric Difference: Elements in either set, but not both
symmetric_difference_set = set1 ^ set2  # {1, 2, 4, 5}
print(symmetric_difference_set)

{1, 2, 3, 4, 5}
{3}
{1, 2}
{1, 2, 4, 5}


In [None]:
sorted(symmetric_difference_set, reverse=True)

[5, 4, 2, 1]

## Dictionaries
A string is a fundamental data type used to used to handle and manipulate textual data. It is represented by a sequence of characters that might include letters, numbers, symbols, and whitespace. They are considered an ordered and immutable collection.

### Basic Syntax and Concepts

#### dict.get(key, default_val)
To avoid KeyError

In [None]:
d = {'a': 1, 'b': 2, 'c': 3}
print(d.get('a'))       # Outputs: 1
print(d.get('z'))       # Outputs: None
print(d.get('z', 'Not Found'))  # Outputs: 'Not Found'

1
None
Not Found


In [None]:
# using d.get() to create a counter
freq = {}
s = "aabbbcccc"
for c in s:
    freq[c] = freq.get(c, 0)+1

print(freq)

{'a': 2, 'b': 3, 'c': 4}


#### defaultdict
It is a dictionary type that allows us to set default values to keys. Unlike typical dictionaries, defaultdict returns a default value for the missing keys.

In [None]:
from collections import defaultdict

def default_value_function():
    return "N/A"

# Create a defaultdict using a custom function as the default factory
info = defaultdict(default_value_function)

info['name'] = 'Alice'
print(info['name'])
print(info['age']) # Accessing a non-existent key

print(info)
# Output:
# Alice
# N/A
# defaultdict(<function default_value_function at 0x...>, {'name': 'Alice', 'age': 'N/A'})

Alice
N/A
defaultdict(<function default_value_function at 0x11b942340>, {'name': 'Alice', 'age': 'N/A'})


In this example below, when word_counts[word] is accessed for the first time for a new word, int() is called, which returns 0. This 0 is then assigned as the value for that key, and the += 1 operation increments it.

In [None]:
from collections import defaultdict

# Create a defaultdict where the default value for new keys is 0 (int())
word_counts = defaultdict(int)

text = "apple banana apple orange banana apple"
words = text.split()

for word in words:
    word_counts[word] += 1

print(word_counts)
# Output: defaultdict(<class 'int'>, {'apple': 3, 'banana': 2, 'orange': 1})

#### del dict["key"]

In [None]:
my_dict = {"apple": 1, "banana": 2, "cherry": 3}
del my_dict["banana"]
print(my_dict)
# Output: {'apple': 1, 'cherry': 3}

{'apple': 1, 'cherry': 3}


#### dict.pop(key, default_val)
To avoid getting KeyError

In [None]:
d = {'a': 1, 'b': 2, 'c': 3}
print(d.pop('a', None)) # Returns 1
print(d) # Prints {'b': 2, 'c': 3, 'd': 4}

print(d.pop('e', None)) # Returns None
print(d) # Prints {'b': 2, 'c': 3, 'd': 4}

1
{'b': 2, 'c': 3}
None
{'b': 2, 'c': 3}


#### isinstance() vs type()

In [None]:
my_dict = {'name': 'Alice', 'age': 30}
my_list = [1, 2, 3]
my_string = "hello"

print(isinstance(my_dict, dict))  # Output: True
print(isinstance(my_list, dict))  # Output: False
print(isinstance(my_string, dict)) # Output: False

True
False
False


In [None]:
type(my_dict) is dict

True