<a href="https://colab.research.google.com/github/armitakar/GGS366_Spatial_Computing/blob/main/Lectures/3_1_Data_structures_Lists%2C_Tuples%2C_and_Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In addition to simple data types such as integers, floats, and strings, Python can work with **compound data structures**, such as **lists, tuples, dictionaries, and sets**. These compound data structures typically contain multiple items or elements.

Compound data can be categorized as either sequenced (ordered) or unsequenced (unordered):

- **Sequenced (ordered) data structures** store multiple elements in a specific order. Each element is assigned an index based on its position, which we use to access the element. Examples include **lists, tuples, and strings**.
- **Unsequenced (unordered) data structures** do not maintain any specific order of elements. Examples include **dictionaries and sets**.

Additionally, compound data structures can be classified as mutable or immutable:

- **Mutable data structures** can be changed, modified, or updated after creation. Examples include **lists, dictionaries, and sets**.
- **Immutable data structures** cannot be edited once created. Examples include **strings and tuples**.



In this lecture, we will explore **compound, sequenced data structures** and learn how to manipulate them effectively.

# Lists

A list is a group of elements/items seperated by commas and enclosed with square brackets [ ]. These items can be of various data types (e.g., integers, strings, or even other compound data). In practice, items on a list are often of the same data type.

**Lists are ordered**, thus the elements retain the sequence in which they were placed.

**Lists are mutable**, meaning that items can be edited as necessary, e.g., using a range of in-built python functions for manipulation, including append(), extend(), insert(), remove(), sort(), reverse(), and sorted(). See here: https://docs.python.org/3/tutorial/datastructures.html

The key use case is storing elements in a sequenced order.

In [50]:
# A list of integers
prime_numbers = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

# A list of floats
temperature = [72.5, 68.4, 75.0, 70.2, 69.8]

# A list of strings
VA_cities = ["Fairfax", "Arlington", "Bedford", "Blumont", "Richmond", "Woodburn", "Dover"]

# A list of mixed data types
random = [4, 9.1, "abc"]

The items on the list do not necessarily have to be unique. Some item values can be the same.

In [51]:
# A list of students' age in a class
age = [20, 24, 22, 22, 24, 21, 20, 19, 20]

To get the total number of items in a list, you can use the **len() function**. This len() function is also applicable to other compound data structures, such as tuples, dictionaries, and sets, which we will cover later in this lecture.

In [52]:
len(age)

9

In [53]:
len(VA_cities)

7

### Indexing with lists

Since lists are sequenced/ordered data, all items in a list are indexed. **Index refers to the position of an element within the list.**

**List indexing** starts at 0 and increases in ascending order. For example, if a list contains five elements, the first element (i.e., the leftmost element) is at index 0, followed by elements at indices 1, 2, 3, and 4.

You can also use **negative indexing** to access elements from the end of the list. In this case, the last element is indexed as -1, and counting continues backward in descending order of negative integers (e.g., -2, -3, and so on).

!["Image"](https://www.programiz.com/sites/tutorial2program/files/numpy-array-negative-index.png)



In [54]:
# examples of positive and negative indexing
print(VA_cities[0]) # printing the first element of VA_cities list
print(VA_cities[3]) # printing the fourth element of VA_cities list

# say you want to know the last element of this list
# if you already know the length of the list, then you can calculate index of the last element as list_length - 1
# for instance VA_cities list has 7 items, in other words the length of this VA_cities list is 7 and the last item index will be 7-1 = 6
print(VA_cities[6])

# or to avoid complexity, you can just use the negative indexing
print(VA_cities[-1]) # printing the last element of VA_cities list

Fairfax
Blumont
Dover
Dover


If you want to know index of a specific item within a list, you can use the **.index()** function. Here's an example:

In [55]:
# Printing the index of value "Bedford" within the list VA_cities
VA_cities.index("Bedford")

2

Indexes are used to manipulate lists in various ways. For example, using index numbers, you can:

- Create a subset of the list
- Replace or remove an item at a specific position
- Iterate through the list elements

### Slicing with lists

Any sequenced data can be sliced using their indexes. In terms of lists, **slicing means creating a subset of the original list**. All slice operations return a new list containing the specified elements.

To create a slice, use index positions by writing the list name followed by square brackets containing the start and end index values, separated by a colon (:).

When slicing, Python **includes the element at the start index but excludes the element at the end index**.

In [56]:
#just to remind you what the entire VA_cities list looked like
print(VA_cities)

VA_cities_slice1 = VA_cities[1:4] # items from index 1 (included) to 4 (excluded)
print(VA_cities_slice1)

['Fairfax', 'Arlington', 'Bedford', 'Blumont', 'Richmond', 'Woodburn', 'Dover']
['Arlington', 'Bedford', 'Blumont']


If you do not provide a start/end index value, Python will use defaults; an **omitted start index defaults to zero**, **an omitted end index defaults to the size of the list being sliced**.

In [57]:
VA_cities_slice2 = VA_cities[:4] # items from the beginning to index 4 (excluded)
print(VA_cities_slice2)

VA_cities_slice3 = VA_cities[3:] # items from index 3 (included) to the end
print(VA_cities_slice3)

VA_cities_slice4 = VA_cities[-2:] # items from the second-last (included) to the end
print(VA_cities_slice4)

['Fairfax', 'Arlington', 'Bedford', 'Blumont']
['Blumont', 'Richmond', 'Woodburn', 'Dover']
['Woodburn', 'Dover']


### Append/ Replace/ Remove items

**Lists are mutable.** So, you can edit them.

You can append values to a list using the **.append()** function. The new value would be added at the end of the list.

In [58]:
# Example of appenidng values to VA_cities list
VA_cities.append("Bloomfield")
print(VA_cities)

['Fairfax', 'Arlington', 'Bedford', 'Blumont', 'Richmond', 'Woodburn', 'Dover', 'Bloomfield']


You can **replace items** at specific positions without changing their placement in the list.

In [59]:
# for instance, this is what we have for index 3 on VA_cities list
print(VA_cities[3])

Blumont


In [60]:
# however, the spelling is incorrect, and we want to replace it with the correct spelling
VA_cities[3] = "Bluemont"

# printing the list to make sure it's updated
print(VA_cities)

['Fairfax', 'Arlington', 'Bedford', 'Bluemont', 'Richmond', 'Woodburn', 'Dover', 'Bloomfield']


You can remove an element using the **.remove()** function. Inside the parenthesis put the values that you want to be removed.

In [61]:
VA_cities.remove("Bloomfield") # asking to remove a specific value
print(VA_cities)

VA_cities.remove(VA_cities[3]) # asking to remove the value at a specific index
print(VA_cities)

['Fairfax', 'Arlington', 'Bedford', 'Bluemont', 'Richmond', 'Woodburn', 'Dover']
['Fairfax', 'Arlington', 'Bedford', 'Richmond', 'Woodburn', 'Dover']


### Concatenate lists

You can concatenate multiple lists using the plus (+) operator.

In [62]:
NP_list1 = ["Yellowstone", "Grand Teton", "Arches"]
NP_list2 = ["Smokey mountain", "Shenandoah"]
NP_list3 = NP_list1 + NP_list2
print(NP_list3)

['Yellowstone', 'Grand Teton', 'Arches', 'Smokey mountain', 'Shenandoah']


# Tuples

The tuples are similar to lists - **a group of items/elements enclosed with parenthesis**, instead of square brackets.

Like lists, **tuples are also ordered** (so they can also be indexed and sliced).

A key use for tuples is storing a fixed collection of elements which you do not want re-ordered, such as geographic coordinates.

In [63]:
coord = (-77.31, 38.83)
print(coord)

(-77.31, 38.83)


Given tuples are also ordered, you can find their values based on index position

In [64]:
print(coord[0]) # print value at index 0
print(coord[1]) # print value at index 1

-77.31
38.83


As opposed to lists, **tuples are immutable**. You cannot be further change elements of a tuple once created.

In [65]:
coord[1] = 39.02 # trying to assign a new value to a particular tuple element is giving us an error message

TypeError: 'tuple' object does not support item assignment

If we want to update one element of the tuple, we need to reassign the whole tuple.

In [66]:
coord_updated = (coord[0], 39.02)
print(coord_updated)

(-77.31, 39.02)


# Strings

You can also think of a string as a compound data type—a sequence of characters enclosed in quotes. Given this characteristic, **strings can be treated as sequenced or ordered data**, meaning they **can be indexed and sliced** just like lists and tuples.

However, similar to tuples, **strings are immutable**.

In [67]:
# Defining a string
text = "Strings can be indexed and sliced!"

# Indexing: Accessing individual characters
print(text[0])   # Output: S (first character)
print(text[8])   # Output: c (character at index 8)
print(text[-1])  # Output: ! (last character using negative indexing)

# Slicing: Creating a substring
print(text[0:7])    # Output: Strings (characters from index 0 to 7)
print(text[15:])    # Output: indexed and sliced! (slicing from index 15 to the end)

S
c
!
Strings
indexed and sliced!
