# Tutorial 3: Collection Data Types in Python


Sections of this notebook come from several other tutorial notebooks including but not limited to the Python Documents at (https://docs.python.org/3/library/collections.html), the Geeks for Geels website (https://www.geeksforgeeks.org/python-data-types/). the Krittika Tutorials, and additional sources.

This tutorial was compiled for the PAARE project at South Carolina State University in partnership with Clemson University and the University of the Virgin Islands and funded by NSF.  (NSF grant AST  2319415)

* Original posting:
  * JCash 06-05-2025
* Last modification:
  * JCash 06-09-2025

##  Overview

In this tutorial, we cover some special data types that are collections of data objects. These are also referred to as compound data types, used to group other values together. 

Collections in Python are essentially container data types. They hold a group of data objects of various types in a way that allows those items to be kept together. They have different characteristics based on the declaration and the usage.

This tutorial will cover:
* Common Types of Data Collections.
* How to declare and access each type.

Specifically, we will cover:
* Strings
* Lists
* Tuples
* Dictionaries
* NumPy Arrays

### Python uses 0 indexing

Programming languages, including Python, need a standard way to indicate where an item is within the collection. This process is called **Indexing**.

Programming languages are split between 0-index and 1-index standards. We could debate which is better, but you just need to know that Python uses the 0-index standard. 

In this standard, the index can be thought of as the displacement from the start of the collection. 
* The index 0 is the start of the collection.
* The index 1 shifts over to the first one after the start.
* The index 2 shifts over to the item two spaces after the start.
* The index 13 would be 13 spots after the start of the collection.

In the index-0 standard, we can also use a negative index to count backwards from the end of the collection.
* The index -1 is the last item in the collection.
* The index -2 is the second to last item from the end.
* The index -5 would be the fifth item from the end.


From the Python.org documentation, they use the following to help explain the indexing:


One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

The first row of numbers gives the position of the indices 0…6 in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.

https://docs.python.org/3/tutorial/introduction.html#text

###  Imports needed

It is always best practice to import the needed packages at the very top of the notebook. As a notebook is developed, you can always add to or take away from that cell. 

The cell should always be run before any other cells in the notebook and right after any changes to the import list are given. 

For this tutorial we will use NumPy.

In [None]:
import numpy as np

## 1.0) Lists

In Python, a list is an ordered collection of items stored within one variable. In many ways, lists are the most versatile of the container types. The items in a list can be any data type or a combination of different data types (integers, floats, strings, booleans, etc). An item in a list can even be a more complicated object.

Lists are like a row of boxes. You can put anything into a box and then get it back out of the same box. You NEED to know which box your item is stored in to be able to efficiently get it back out. 

- You create a list by enclosing the items in square brackets `[]` and separating the items with commas.
- You can print a list.
- The length of a list is the number of items stored in the list.
- You can access individual items in the list.
- You can modify individual items in the list.
- You can select a slice of the list.
- You can create nested lists (or 2D lists).

### 1.1) declaring and assigning lists

- List values are placed in between square brackets [ ], separated by commas.

- It is good practice to put a space between the comma and the next value for easier readability. 

- The values in a list do not need to be unique (the same value can be repeated).

- Empty lists do not contain any values within the square brackets.

Below is an example of a list assigned to the variable `a`. It contains 5 items of different types. 
Shown is how to manually declare the list and print the list. 

We will use this `a` list in a lot of our examples since the mixed data types will make some aspects more obvious.

In [None]:
# Create the list and assign it to the variable name a
a = [1, "apple", True, 35.6, "banana"]

# Print the list
print("list 'a' contains the items: ",a)

# See what type of object the list is
print(type(a))

Notice that the data type of `a` is just `list`.

This doesn't tell us what types of data are stored inside the items in the list, just that the variable name `a` is assigned to an object that is a list.

Below are a few more examples of lists. Even though each of these lists only contains one type of data, the list still has a DataType `list`.

In [None]:
list1 = [0,1,2,3,4,5,6,7,8,9,10]
list2 = ["a","b","c","d"]
list3 = [6.3, 1.45, 8.6]


In [None]:
type(list1)

**Caution**

`list` is a built-in class in python and you should avoid using just `list` as a variable name. 

Consider how you are using the list and what it holds:
- If your code only has a few lists that are similar in nature, you could use variable names such as list1, list2, etc.
- If a list in your code represents a group of similar items, you can assign it's name to clearly relate to that group.
- Consider how often you need to refer to the list and make sure it has a name that will not easily be misspelled.
- Consider if you will often need to reference individual items in the list.
- You can also add comment lines when you declare a list to explain what it contains.

Below are some different examples to show you the variety that is possible.

In [None]:
names = ['Bert', 'Ernie', 'Gonzo']

prices = [93,24,67,142]

# The following list contains the high temperature for each day last week.
htemps = [92,94,95.3,97,91,87,88.5]
# The following list contains the low temperatures for each day last week.
ltemps = [74,69,72,75,76,71,69]



### 1.2) Length of a list

You can find out how many items are in a list using the built-in `len()` function.

The result you get will be an integer value for the number of items in the list. 


In [None]:
a = [1, "apple", True, 35.6, "banana"]
print(a)
print(len(a))

You can also save that integer value to a variable name, just be careful that you know what each variable name is assigned to.

In [None]:
b = len(a)
print(b)
print("list ", a, "contains ", b, "items")

### 1.3) Accessing list items using indexing

As discussed at the top of this tutorial, Python uses a 0-index system.

So the starting item is position 0, not 1.

We can access a single item from a list using the list variable name followed by square brackets with the index inside the brackets as shown below. 

In [None]:
a = [1, "apple",True, 35.6,"banana"]

print(a[0])  # Print the initial 0 position item in the list.
print(a[1])  # Print the item 1 position to the right of the starting position.
print(a[2])  # Print the item 2 position to the right of the starting position.
print(a[3])  # Print the item 3 position to the right of the starting position.
print(a[4])  # Print the item 4 position to the right of the starting position.

Once you are accessing a single item in the list, you can use `type()` to find out the data type for the item itself.

In [None]:
type(a[0])

In [None]:
type(a[1])

In [None]:
# Try looking at the data type for each item in our list a.



**Allowed index range**

When indexing lists, you can not use a value of the index that would be outside of the bounds of the list.

Since we use 0 as the starting position, the last item has an index of `len()-1`.

For our example of list2, the length is 4 items, which is:

* Positive positions 0 through 3.
* Backwards -1 through -4.


Attempting to call an index outside of that range will cause an IndexError.

In [None]:
print(list2)

In [None]:
# This works.
list2[3]

In [None]:
# This also works.
list2[-2]

In [None]:
# This doesn't work and will give an IndexError.
list2[4]

### 1.4) Slicing a list

In addition to indexing, slicing is also supported.

Slicing allows you to get a section of the list (often called a sublist).

With slices, we can access multiple items by creating a range of index numbers separated by a colon `[x:y]`.

When constructing a slice, as in `[x:y]`, the first index number is where the slice starts (inclusive), and the second index number is where the slice ends (exclusive). This is similar to selecting indices within the mathematical range of `x <= i < y`.


This can be seen with the example `list1` defined above which lists the integers from zero to ten. For this case, the index and the value stored in that location are the same. 

In [None]:
print(list1)

In [None]:
# Select the range where 0 <= index < 3; which would be positions 0, 1, and 2.
list1[0:3] 

In [None]:
# Select the range where 3 <= index < 7; which would be positions 3, 4, 5, and 6.
list1[3:7]

If either x or y is blank, then Python reverts to the ends of the list as that boundary as shown in the examples below.

In [None]:
# Select the range from the start of the list to right below the index of 4.
list1[:4]

In [None]:
# Select the range from index 5 to the end of the list.
list1[5:]

You can also use a negative index to specify the range.

In [None]:
# Get the last three values.
list1[-3:] 

When you have a mixed list such as the list `a` we have been using, it takes a little more counting to figure out what to use for the range. 

In [None]:
# Start with one position to the right of the start and stop before you get to the position 3 spaces to the right of the start.
a[1:3] 

### 1.5) Lists are "Mutable"

Once you create a list and assign it to a variable name, you can still go in and change items in the list without having to recreate the list. 

The term "mutable" probably refers to being able to mutate the object.

You can:
* Modify individual items of a list.
* Add new items to the end of a list.
* Remove items from the end of a list.

####  Modifying elements of a list

To change a single item in a list, you can assign the new value to the specific index in the list.

This will have the general format of `listname[index] = value`.

In [None]:
# List example.
my_list = [1, 2, 3]  # Declaring the list.
print(my_list)   # Output of the original list.
my_list[0] = 100  # Modifying the item 0 value.
print(my_list)    # Output of modified list.
my_list[1] = 34   # Modifying the item 1 value.
print(my_list)

####  Adding to a list (Appending)

You can add a single item to the end of a list using the append method.

This will have the general format of `listname.append(value)`.

In [None]:
my_list.append(15)
print(my_list)
my_list.append(23)
print(my_list)

#### Combining lists (Concatenation)

If you want to take two lists and create one longer list, you can use the addition operator as shown below.

In [None]:
alltemps=htemps+ltemps
print(alltemps)

### 1.6) Comparing lists

**Equality**

To be equal, two lists must have the same items in the same order.


In [None]:
print([1,2,3] == [1,2,3])
print([1,2,3] == [2,1,3])
print([1,2,3] == [1,2,3,4])

**Ranking lists**

Using the `>` or `<` operators on lists is a little more complicated.

Python looks at the first element in each list and compares how they would be sorted.

If the two first elements are equal, it then looks at the second element.

See some examples below.

In [None]:
print([1,2,3] < [2,1,3])

In [None]:
print([1,2,3] < [1,1,3])

If you just want to see if one list is longer, shorter, or the same length as another array, you can compare the lengths.

In [None]:
len(list1) >= len(list2)

### 1.7) Nested Lists

You can even store lists inside of lists.

The length of the nested list is the number of lists inside the outside list. 

The lengths and data types of the inside lists do not need to match to place them inside the outside list. 

A list of lists is sometimes also referred to as a 2D list.

In [None]:
nlist = [list1, list2, list3]

print(nlist)
print(type(nlist))
print(len(nlist))

**Accessing nested lists**

For these sorts of nested or 2D lists, you can access one of the inner lists using indexing. 

Similar to accessing items in a single list, the first inner list has an index of 0, the next inner list would have an index of 1, etc.  

The code cells below show how to access each inner list. 

In [None]:
nlist[0]

In [None]:
nlist[1]

In [None]:
nlist[2]

**Accessing elements inside inner list**

If you want to access a single value within one of the inner lists, you need both the index of the inner list and the index of the value inside that inner list. 

You give the index for the inner list first and then the index for the position inside that inner list.

For example you can get the very first element of the very first list using [0][0]

If you want to get the 4th element inside the second list, you need to use [1][3]

In [None]:
nlist[0][0] 

In [None]:
nlist[1][3]

## 2.0) Tuples

Tuples and lists are very similar but with a few important differences. 

- Tuples are defined using `( )`.
- Tuples are accessed with the same indexing system.
- Tuples are immutable.

This last property is the most important.

### 2.1) Creating a tuple

Similar to creating a list, you can create a tuple by listing the values separated by commas, but now you need to use the round brackets instead of square brackets.

The naming of tuples has the same cautions of readability as those for list variable names.

In [None]:
lista = [1, "apple",True, 35.6,"banana"]
tuplea =(1, "apple",True, 35.6,"banana")
print(type(lista))
print(type(tuplea))

In [None]:
b = (120,300)
coords = (33.493317, -80.855415)

In [None]:
len(coords)

### 2.2) Accessing a tuple

The same indexing system is used for tuples as was used for lists. 

Only one example is shown below; notice that the indices are still given in square brackets `[]` for tuples and lists.

In [None]:
tuplea[0:3]

### 2.3) Tuples are immutable

Unlike lists, you can not change individual values inside a tuple.

To change a tuple, you would need to re-declare the tuple assignment statement.

This can actually be very useful if you have information that you want to ensure is not accidentally changed as code is run. 

In the cells below we see what happens when you try to change a value in a tuple, followed by an attempt to append to a tuple, then reassigning a tuple.

In [None]:
# Here we declare a 3-element tuple.
my_tuple = (1, 2, 3)

# Now we try to change the initial value.
my_tuple[0] = 100

# This will give a TypeError since you can't change the tuple bu assigning a new value

In [None]:
# Here we declare a 3-element tuple.
my_tuple = (1, 2, 3)

# Now we try to add another value.
my_tuple.append(100)

# This will also give a AttributeError since you can't append to a tuple

In [None]:
# Here we declare a 3-element tuple.
my_tuple = (1, 2, 3)

# Now we redefine the tuple (which does work).
my_tuple = (100,2,3,100)
print(my_tuple)

## 3.0) NumPy Arrays

Lists and tuples are a standard part of Python, but sometimes we want a collection object that has a little more power. We can use extra packages like NumPy to add that functionality.

NumPy arrays are a very power data collection object.

<div class="alert alert-block alert-warning">
<b>import Reminder:</b> If you get an error message that indicates that "name 'np' is not defined", that indicates that you did not import the numpy library. Return to the top of the notebook and run or rerun that cell before continuing. 
</div>


### 3.1) Creating NumPy Arrays

You can manually declare a NumPy Array using the function `np.array()` similar to how we used `np.sqrt()` in the last tutorial.

Inside the `()`, you include a list either using the `[]` with comma-separated values or putting in the name for a list you already created. 


See the examples below.

#### Manual creation

In [None]:
# Here we manually create a NumPy Array by listing all the values.
my_arr = np.array([1, 2, 3])

In [None]:
type(my_arr)

In [None]:
print(my_arr)

In [None]:
my_arr

#### Create from a list

Here we have a list declared, and then create an array from that list variable.

In [None]:
print("list1 information")
list1 = [0,1,2,3,4,5,6,7,8,9,10]
print(list1)
print(type(list1))

print("array1 information")
array1 = np.array(list1)
print(array1)
print(type(array1))

#### 2D arrays

If we use the same method for creating a nested list, the nested arrays are now called 2D arrays and print out with a each array as a row instead of list of lists all on one line. See the difference below.

In [None]:
print("nested list")
list_nested = [[1,2,3], [4,5,6], [7,8,9], [10,11,12]]
print(list_nested)

print("2D array")
array2D = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
print(array2D)

####  DataTypes in Numpy arrays

So far, all of the arrays we have created contained only numerical values. 
NumPy Arrays work best with numerical values (it is the Numerical Python package after all). 

Unlike lists, NumPy Arrays **cannot** have mixed data types in the same array. 

The `np.array` function will look at the values you want to put into the array and try to determine the common data type. 

Look at what happens if we take the mixed list that we used before and try to put it into an array.

In [None]:
array_mixa = np.array([1, "apple",True, 35.6,"banana"])
print(array_mixa)
print(type(array_mixa[0]))

With all of the data types in the list we gave it, they could all be interpreted as strings, so that is what NumPy used.

The array below starts with integers and floats.

Since integers easily convert to floats, but floats don't convert to integers, all elements are converted to floats.

In [None]:
array_mix2 = np.array([1, 2, 3.1])
print(array_mix2)
print(type(array_mix2[0]))

There are ways to force mixed data types into a NumPy Array, but it is generally considered better to use a different structure than a NumPy Array for those situations.

A later tutorial on data files will examine some of those choices.

### 3.2) Array information



#### Dimensions and ND arrays

You may have noticed that the DataType for these arrays is `numpy.ndarray` 

The `nd` stands for n-dimensional arrays. 

This means that NumPy Arrays are the primary data structure used for representing multidimensional arrays. 

* A 1D array is a single row.
* A 2D array is a table with rows and columns.
* A 3D array can be thought of:
    * as a cube with each table stacked behind another.
    * as a grid of regular x,y,z positions in space with values at each spot.
 
Higher dimensions are easily manipulated with arrays, although it becomes hard to visualize what these mean.

#### Length

You find the length for an array in the same way as for lists and tuples. 

**1D arrays:**
For a 1D array, the length is the number of elements in the singular row.

In [None]:
print(array1)
print(len(array1))

**2D arrays:**
For a 2D array, the length is the number of rows.

In [None]:
print(array2D)
print(len(array2D))

#### Shape

For these NumPy Arrays, we now have some additional methods to get information about the size and shape of the array.

The syntax for this is to list the variable name for the array followed by a dot followed by the method name. (You will learn more about methods in the next tutorial.)
* `array.ndim` gives the number of dimensions in the array.
* `array.shape` gives you the number of elements in each dimension.

In [None]:
# Shape of the array (number of rows, number of columns).
print('For the array1:')
print(array1.ndim, 'dimensions')
print(array1.shape, 'shape')

In [None]:
# Shape of the array (number of rows, number of columns).
print('For the array2D:')
print(array2D.ndim, 'dimensions')
print(array2D.shape, 'shape')

So we see above that the shape of our 2D array was 4 rows and 3 columns. 


#### DataType

Since a NumPy Array can only have one datatype, it is possible to get that datatype without accessing each element to determine its type.

The `array.dtype` method allows this.

In [None]:
# Shape of the array (number of rows, number of columns).
print("Shape:", array2D.shape)

# Number of dimensions.
print("Dimensions:", array2D.ndim)

# Data type of the elements.
print("Data Type:", array2D.dtype)


### 3.3) Indexing for Arrays

The indexing for arrays works the same way as the indexing for lists and tuples.

With the added information for the shape of a 2D array, we start to see some of the power of arrays compared to lists for data that naturally falls into columns.

See the examples below for indexing on both 1 and 2D arrays.

#### 1D arrays

- 1D arrays will use indexing along a single dimension just like lists. 
- You can pull out single values along the array.
- You can pull out slices of the array.

In [None]:
print(array1)

In [None]:
array1[0]

In [None]:
array1[5]

In [None]:
array1[2:8]

#### 2D arrays

- 2D arrays will use a pair of indexes separated by a comma; the first represents the row and the second one is the column.
- The syntax is `array[rows,cols]`.
- If only one dimension is given, it gives all columns for that row.

In [None]:
print(array2D)

In [None]:
# Here we get the 0 index row and the 0 index column.
array2D[0,0]

In [None]:
# Here we get the zero index row and the 2 index column.
array2D[0,2]

In [None]:
# Here we get the 2 index row and the 0 index column.
array2D[2,0]

In [None]:
# Here we get rows with index 0:3 and columns with index 0:2.
# Remember that slices go from the first number up to and not including the next number.
array2D[0:3,0:2]

In [None]:
array2D[0:3,1:3] # Select  0<=row<3 and  1<=column<3.

In [None]:
# Here we get every row but only the last two columns.
array2D[:,-2:]

In [None]:
# If only 1 dimension is given, it assumes that it is a row, so it takes the full row with index = 0.
array2D[0]

#### ND arrays

The following example is taken from w3schools. Again, it is harder to imagine a 3D array, but it is like slices of a cube.

In [None]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print('The full 3d array.')
print(arr)
print('')
print('The first dimension tells which slice of the cube.')
print('So arr[0] gives:')
print(arr[0])
print(' ')
print('The second dimension tells which row of that slice.')
print('So arr[0,1] gives:')
print(arr[0,1])
print(' ')
print('The third dimension tells which column of that row.')
print('So arr[0,1,2] gives:')
print(arr[0, 1, 2])


### 3.4) Calculations with arrays

One of the biggest advantages of using NumPy Arrays instead of lists is the power to easily do calculations on a whole array at a time. 

When we covered lists, we noted that adding two lists creates a single longer list. Other mathematical operations can't be done with lists, even if the lists contain numbers. 

If we have NumPy Arrays with numbers, we can do much more on these arrays.

Using the `list1` and `array1` defined above with numbers 0-10, we can demonstrate some of these operations.

In [None]:
# Here we add lists.
print(list1+list1)

In [None]:
# Here we add arrays.
print(array1+array1)

#### Vectorization

For NumPy, mathematical operations operate on each element in the array. So adding two arrays added the elements from one array to the other. 

You can also do each of the following:
- `+`,`-`,`*`,`/`: all elements of an array by a number.
- `+`,`-`,`*`,`/`: one array by another array.

In [None]:
# Here we add a constant value to each element in the array.
a2 = array1 + 2
print(a2)


In [None]:
# Here we multiply each value by 3.
a3 = a2*3
print(a3)

# Here we subtract a constant value of 10.
a4 = a3 - 10
print(a4)

In [None]:
# Here we subtract one array from another.
a5 = a4-a2
print(a5)

In [None]:
# For the temperature lists we had defined above:
# Now we convert those to NumPy Arrays.
ht = np.array(htemps)
lt = np.array(ltemps)

# Then we can get an array of the temperature differences on each day.
print( ht-lt)

In [None]:
# Here we take the sqrt of each value in the array.
print(np.sqrt(a5))

# Notice that we get the warning, and some of the results are the NumPy Not a Number 'nan'.

#### Modifying NumPy Arrays

Similar to lists, you can add elements to an array, but the syntax is a little different.
You use the `np.append()` function with the array you want to add to and what you want to add, assigned to a new array name.

In [None]:
a6 = np.append(a5, 99)
print(a6)
print(a5)

In [None]:
print(a6)
a6[1] = 0
print(a6)

**Caution**

When using some mathematical operations involving two or more arrays, the arrays will need to be the same length. 

Below is an example that doesn't work because one array is a different size.

In [None]:
# This will give a ValueError and let you know the shapes are different.
print(a5+a6)

**2D arrays**

You can do all of the above mathematical operations with 2D arrays, but also matrix operations. 
These will be explored in more detail in later tutorials.


## 4.0) Strings

In Python, strings are a collection of individual characters. 

There is not a *character* data type. A single character is just a string on length 1.

We looked at many of the properties of strings in the previous tutorial, and will review those basic properties here.


### 4.1) Declaring a string

- You declare a string by enclosing the text in quotation marks.
- You can use double quotes `"` or single quotes `'`.
- You can create an empty string by putting the quotation marks right next to each other `""`.

See our earlier tutorial for more information about string declarations. 

You can also refer to https://docs.python.org/3/tutorial/introduction.html#text for more information.

In [None]:
string1 = "Hello "
string2 = 'Hello world!'
string3 = ""
string4 = "string \n with a newline"

### 4.2) Printing a string

The `print()` function produces the most readable form of the string, by omitting the enclosing quotes and by printing escaped and special characters such as newline `\n`.


In [None]:
print(string1)
print(string2)
print(string3)
print(string4)

Sometimes you may want to see exactly what is in a string. 

In a notebook cell or at an interactive python console, you can also view the contents of a string by just putting in the variable name and executing. The output will show the string enclosed in single quotes.

For strings with special characters or empty, this can be more informative. 

For strings with space characters at the ends, it will also clarify where the string ends.

In [None]:
string1

In [None]:
string2

In [None]:
string3

In [None]:
string4

### 4.3) String length
- You can find out how many characters are in a string using the built-in `len()` function.
- The length includes all visible characters, spaces, and punctuation.
- Special characters such as the newline `\n` count as a single item.

In [None]:
print(len(string1))
string1

In [None]:
print(len(string2))
string2

In [None]:
print(len(string3))
string3

In [None]:
print(len(string4))
string4

### 4.4) Comparing strings 
We can compare the equality of two strings using the `==` operator.

To be equal they must be exactly the same (capital letters and spaces matter).

Using the numerical `>` `<` `>=` `<=` on strings uses their alphabetical order.

In [None]:
print("a" == "a")
print("a" == "A")
print("a " == "a")

In [None]:
'apple' < 'banana'

In [None]:
'monkey' > 'gorilla'

### 4.5) Indexing for strings

As discussed at the top of this tutorial, Python uses a 0-index system. 

So the starting character is 0, not 1.

We can access a single character from a string using the string variable name followed by square brackets with the index inside the brackets as shown below.

In [None]:
word = 'Python'
print(word[0])   # Character in the 0 position.
print(word[1])   # Character in position 1.
print(word[3])   # Character in position 3.
print(word[-1])  # Last character.
print(word[-2])  # Second to last character.

When indexing strings, you cannot use a value of the index that would be outside of the bounds of the collection. 

Since we use 0 as the starting position, the last character has an index of len()-1

For our example of `word = 'Python'`, the length is 6 characters which is:
* positive positions 0 through 5.
* backwards -1 through -6.

Attempting to call an index outside of that range will cause an IndexError.

In [None]:
# You will get an IndexError for this one, since the word only has 6 characters.
word[8]

### 4.6) Slicing strings

In addition to indexing, slicing is also supported. 

Slicing allows you to get a section of the string (often called a substring).

With slices, we can call multiple character values by creating a range of index numbers separated by a colon `[x:y]`.

When constructing a slice, as in `[x:y]`, the first index number is where the slice starts (inclusive), and the second index number is where the slice ends (exclusive), which is why in our example above the range has to be the index number that would occur after the string ends. This is similar to selecting indices within the mathematical range of `x <= i < y`.


In [None]:
a = 'Python'
print(a[1:4])   # Select starting at position 1 and ending before position 4.

If either x or y is blank, then Python reverts to the ends of the string as that boundary, as shown in the examples below:

In [None]:
print(a[:4])     # Select starting at the beginning and ending before position 4.
print(a[1:])     # Select starting at position 1 and going to the end of the string.

You can also use a negative index in a slice, which can be a convenient way to get the last n characters of the string. 

In [None]:
print(a[-3:])

### 4.7) Other string methods

In Data Science, a string may contain useful information that we want to extract. 

`str.split` is a way to split a long string into smaller strings whenever a specific character appears.

The general syntax is `stringvariblename.split(delimiterstring)`.

Common delimiter strings might be:
- `','` for comma-separated strings.
- `' '` for a single space separator.
- `'_'` for underscore separators.
- If you don't specify the separator character, it will use any white space ( single space, double space, etc).


If you have a string saved into a variable name such as `str1` with white space, you can use `str1.split()`.

If you have a string saved into a variable name such as `txt` with commas, then it would be `txt.split(',')`.

The result is a list of substrings, and you can access the substrings using list indexing. 

In [None]:
# Here we split a string into words, assuming it is white space that separates them.
txt = "welcome to the jungle"
x = txt.split()
print(x)

In [None]:
# Here we have to specify the character separating the parts of the string is a comma.
txt = 'Moon,Earth,1737.1'
x = txt.split(',')
print(x)

In [None]:
# Then you can get one of these values with indexing.
x[1]

## 5.0) Dictionary

Much of this section is taken from the tutorials at: 
- https://www.dataquest.io/blog/python-dictionaries/
and
- https://realpython.com/python-dicts/

A Python dictionary is a data structure that allows us to easily write very efficient code. A Python dictionary is a collection of `key:value` pairs. You can think about them as words and their meaning in an ordinary dictionary. Values are said to be mapped to keys. For example, in a physical dictionary, the definition "science that searches for patterns in complex data using computer methods" is mapped to the key "Data Science".

Python dictionaries allow us to associate a value to a unique key, and then to quickly access this value. It's a good idea to use them whenever we want to find (lookup for) a certain Python object. We can also use lists for this scope, but they are much slower than dictionaries.
It is best to think of a dictionary as a set of `key:value` pairs, with the requirement that the keys are unique (within one dictionary).


Dictionaries and lists share the following characteristics:

- Both are mutable.
- Both are dynamic. They can grow and shrink as needed.
- Both can be nested. A list can contain another list. A dictionary can contain another dictionary. A dictionary can also contain a list, and vice versa.

Dictionaries differ from lists primarily in how elements are accessed:
- List elements are accessed by their position in the list, via indexing.
- Dictionary elements are accessed via keys.
- Accessing dictionaries can be much faster than accessing lists.




### 5.1) Creating a dictionary

There are two ways to create a dictionary. 

We have two main methods to define a dictionary:
- With curly braces {}.
    - A pair of braces creates an empty dictionary.
    - Placing a comma-separated list of `key:value` pairs within the braces adds initial `key:value` pairs to the dictionary.
- The `dict()` method.
    - An assignment of `dict()` creates an empty dictionary.
    - Placing a comma-separated list of `key=value` assignments within the parentheses adds initial `key:value` pairs to the dictionary.

We'll create two empty dictionaries to show these methods and see the data types associated with them.

In [None]:
# Create an empty dictionary.
dictionary = {} # The curly braces method.
another_dictionary = dict() # The dict() method.

print('DataType ',type(dictionary))
print('Contents ',dictionary)

You can create a dictionary and insert contents in one step using several different methods. 

In [None]:
# Create the same dictionary with pre-inserted keys/values using different methods.

# Using {} with a  list of key:value pairs.
dict1 = {"key1": "value1","key2":"value2"}

# Using dict() with a list of key=value assignments.
dict2 = dict(key1="value1", key2="value2")

# Using dict() with a list of tuples.
dict3 = dict([("key1", "value1"), ("key2", "value2")])

print('dict1' , dict1)
print('dict2' , dict2)
print('dict3' , dict3)

### 5.2) Accessing a Dictionary

When accessing values in a dictionary, you don't need to know the position in the dictionary, just the keywords.

This property also means that it doesn't matter what order the `key:value` pairs are placed inside the dictionary.

See the examples below.

In [None]:
# Accessing the value associated with key1.
dict1['key1']

In [None]:
# Accessing the value associated with key2.
dict1['key2']

In [None]:
# Creating a dictionary that matches the letter with the position in the alphabet.
# Note that the order in the dictionary is not alphabetical.
letters = {'a':1,"b":2, "z":26,"d":4,"c":3}
print(letters)

# Accessing a few of the entries.
print(letters['c'])
print(letters['z'])

### 5.3) Modifying a dictionary

Dictionaries are mutable, and we can add new entries and change entries dynamically. This can be very useful when you have a program where the values you want to put into the dictionary are discovered/created at different points in the program, or when those values may change.

You can add a single `key:value` pair to the dictionary using an assignment statement with the new keyword.

You can add multiple `key:value` pairs using the update method. 

In [None]:
# Adding a single new letter to the letters dictionary.
letters['y'] = 25
print(letters)

In [None]:
# Adding a list of tuples to the letters dictionary.
letters.update([("e",5),('f',6)])
print(letters)

In [None]:
# Adding a dictionary to another dictionary.
newletters = {"m":13,"p":16,"t":19}
letters.update(newletters)
print(letters)

In [None]:
# Changing a single value in a dictionary by assignment.
# We had the wrong value stored for the letter t.
letters['t'] = 20
print(letters)

In [None]:
# If a value is added to the dictionary but it already exists in the dictionary,
# it will replace that value instead of adding it again.
letters.update([("y",25),("i",9)])
print(letters)

### 5.4) Limitations on Keys and Values

So far we showed examples where the `key` was a string. We also saw examples where the `value` was either a string or an integer. 

Dictionaries are much more flexible than this.

Keys:
- Must be unique in the dictionary.
- Can be strings, floats, integers, or booleans.
- Can be tuples.
- Can NOT be lists, arrays, or dictionaries.


Values: 
- Can be strings, floats, integers, or booleans.
- Can be lists, tuples, dictionaries, or arrays.
- Can be repeated in the dictionary.


Below are some additional examples showing dictionaries of different forms.

In [None]:
# Here is a dictionary with a mix of key types.
foo = {42: 'aaa', 2.78: 'bbb', True: 'ccc'}
print(foo)
print(foo[True])

In [None]:
# Here is a dictionary with integers for keys and strings for values.
# Notice that the value a is associated with several keys.
d = {0: 'a', 1: 'a', 2: 'a', 3: 'a'}

In [None]:
# Here is a dictionary with tuples as keys.
d = {(1, 1): 'a', (1, 2): 'b', (2, 1): 'c', (2, 2): 'd'}
print(d[(1,2)])

### 5.5) Dictionary Functions and Methods

**Length**:
- `len(dict)` will tell you how many `key:value` pairs are in a dictionary.

In [None]:
len(letters)

**Keys** and **Values**:
- `d.keys()` returns a list of all keys in d.
- `d.values()` returns a list of all values in d (if a value is associated with more than one key, it will be repeated).



In [None]:
letters.keys()

In [None]:
letters.values()

**Get**:
- `d.get(<key>)` searches dictionary d for "key", and returns the associated value if it is found.
    - If "key" is not found, it returns "None".

In [None]:
print(letters.get('z'))
print(letters.get('k'))

**Clear**:
- `d.clear()` empties dictionary d of all `key:value` pairs.

In [None]:
print(dict1)

dict1.clear()
print(dict1)

### 5.6) Other Examples

The examples above were very generic. Below, we look at a few examples where dictionaries might be useful and show how they are used.

You will also notice that it is often easier to format a dictionary by splitting the lines at a comma; just make sure to indent properly.

In [None]:
# Example of a dictionary with planet properties.
Earth = {"name":"Earth",
         "mass": 5.9722E24,'mass_unit':"kg",
         'radius':6371.0,'radius_unit': "km",
         'grav':9.82,'grav_unit':"m/s^2",
         'year':365.24,'year_unit':"days"
         }
print('the mass is',Earth['mass'], Earth['mass_unit'])
print('the year is ',Earth['year'],Earth['year_unit'],'long')

In [None]:
# Example of filter band info from RSP tutorial.
# Here they first create a list of the possible filter band names.
flabel = ['u', 'g', 'r', 'i', 'z', 'y']

# Then they create dictionaries to hold colors and symbols that will be associated with each filter.
fcolor = {'u': '#56b4e9', 'g': '#008060', 'r': '#ff4000',
            'i': '#850000', 'z': '#6600cc', 'y': '#000000'}
fsymbol = {'u': 'o', 'g': '^', 'r': 'v', 'i': 's', 'z': '*', 'y': 'p'}

# Then you can get all the values for whichever filter you want.
a = flabel[1]
print(a,fcolor[a],fsymbol[a])
b = flabel[2]
print(b,fcolor[b],fsymbol[b])

In [None]:
# You can store a set of arrays in a dictionary, and you can create a list of dictionaries.

# Creating arrays to hold data.
t1 = np.array([0,1,2,3,4,5,6,7,8,9,10])
f1 = np.array([0,1,2,1,0,-1,-2,-1,0,1,2])
t2 = t1+50
f2 = f1**2

lcs = [{'t':t1,'f':f1},{'t':t2,'f':f2}]


print(lcs[1]['t'])
print(lcs[1]['f'])

# Assignment



## Exercise 1)

1) Manually create a list containing the names of the eight major planets. 

2) Use the append method to add the names of at least three dwarf planets to your list. 

In [None]:
# Create a list of the names of the planets.

# Use an append to add the names of at least three dwarf planets.

# Print out the full list at the end.


## Exercise 2)

1) Use the internet to look up information on the eight major planets in our solar system.
2) Manually create a NumPy Array with the values for the mass of each major planet in units of kg.
3) Manually create a NumPy Array with the values for the radius of each major planet in units of m.
    - (You will want to put in the radius in scientific notation not as integers) 
5) Using mathematical operations, create a new NumPy Array that holds the calculated values for the average density of each planet.
6) Create a 2D array with rows for the mass, radius, and density, where each column is for a planet.
7) Use slicing on the 2D array to get all of the values for the inner terrestrial planets.
8) Use slicing on the 2D array to get just the density values for the outer jovian planets.

In [None]:
# Enter your code below.

## Exercise 3)

In this exercise, you are given a filename as a string. This file name contains information about the data contained in the file (a common practice in data science).

You are told that the file name has three parts separated by underscore characters `_`.

You need to:
- Split the string using the delimiter.
- Save the first part of the filename to a variable called `id` and convert it into an integer value.
- Save the second part of the filename to a variable called `period` and convert it into a float.
- Save the third part of the filename into a string variable called `ftype`.

In [None]:
fname = '450119293_0.522044_lc.dat'

# Put your code below.


## Exercise 4)

Create a dictionary representing the stellar characteristics of the Sun.

1) Use the internet to look up the values for the Sun.
2) Think about which values are important for an astronomer.
3) Manually create a dictionary of the values.
4) Use the dictionary values to calculate the average density of the Sun from the radius and mass.



In [None]:
# Insert your code here.
