
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>



# Collection Types and Methods

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lesson you:<br>

- Introduce objects and methods
- Create lists
- Use methods on new collection data types



## Objects

In this lesson we are first going to look at some new functionality provided by data types, and then see how we can use that in some new data types. But before we do that, we need to look at some terminology.

An [**object**](https://www.w3schools.com/python/python_classes.asp) is an instance of a specific data type. 

For example, **`1`** is an Integer, so we would call it an Integer object. **`"Hello"`** is a String, so we would call it a String object.



## Methods: More Functionality

As a reminder, data types provide **data** of some kind and **operations** we can do on that kind of data. So far, we have actually only looked at a small fraction of the operations provided by each type. 

Data types provide special functions called [**methods**](https://www.w3schools.com/python/gloss_python_object_methods.asp) which provide more functionality. Methods are exactly like normal functions except we call them on objects and they can edit the object they are called on. We invoke a method like this:

**`object.method_name(arguments)`**

This is a little tricky and we have a whole lesson on these coming up, but right now all you need to know is:

**Methods are functions provided by a data type that we can call on objects of that type. They act on the object we call them with and allow us to use more functionality provided by that data type**



### String Methods

Let's take a look at an example of a method on a type we already know well: Strings. Strings provide a method called [**upper()**](https://www.w3schools.com/python/ref_string_upper.asp) which capitalizes a String.

In [0]:
greeting = "hello"
print(greeting.upper())
print(greeting)

HELLO
hello




### In-place methods

Methods are functions that act on objects, and can either perform operations in-place (modify the underlying object it was called upon) or return a new object.

Notice that the method **`upper()`** was not a stateful, in-place method as it returned a new string and did not modify the **`greeting`** variable. Take a look <a href="https://www.w3schools.com/python/python_ref_string.asp" target="_blank">W3Schools</a> provides information on other string methods in Python.



### Tab Completion

If you want to see a list of methods you can apply to an object, type **`.`** after the object, then hit tab key to see a drop down menu of available methods on that object.

Try it below on the **`greeting`** string object. Type **`greeting.`** then hit the Tab key.

In [0]:
# Type . and hit Tab
greeting.capitalize

Out[8]: <function str.capitalize()>



### `help()`

While using tab completion is extremely helpful, if we use it to look through all possible methods for a given object, we might still not be certain how those methods work.

We can look up their documentation, or we can use the [**help()**](https://www.geeksforgeeks.org/help-function-in-python/) function we saw last lesson.

As a reminder, the **`help()`** function displays some of the documentation for the item passed into it.

For example, when using tab completion above, we see the [**capitalize()**](https://www.w3schools.com/python/ref_string_capitalize.asp) string method, but we are not certain how it works.

In [0]:
help(greeting.capitalize)

Help on built-in function capitalize:

capitalize() method of builtins.str instance
    Return a capitalized version of the string.
    
    More specifically, make the first character have upper case and the rest lower
    case.



In [0]:
greeting.capitalize()

Out[9]: 'Hello'



## Methods with Collection Types

Now that we have a brief understanding of methods, let's look at some more advanced data types and the methods they provide.

We are going to look at **collection data types** next. Like the name suggests, the data in these data types is a collection of other data types.



### Collection Type 1: Lists

A list is just an ordered sequence of items. 

It is defined as a sequence of comma separated items inside square brackets like this: **`[item1, item2, item3...]`**

The items may be of any type, though in practice you'll usually create lists where all of the values are of the same type.

Let's make a <a href="https://www.w3schools.com/python/python_lists.asp" target="_blank">list</a> of what everyone ate for breakfast this morning.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/20/Scrambed_eggs.jpg/1280px-Scrambed_eggs.jpg" width="20%" height="10%">

In [0]:
breakfast_list = ["pancakes", "eggs", "waffles"]
breakfast_list

Out[10]: ['pancakes', 'eggs', 'waffles']

In [0]:
# Python can tell us breakfast_list's type
type(breakfast_list)

Out[11]: list



We'll use our **`breakfast_list`** as the running example, but note that the values in a list can be of any type, as shown below.

In [0]:
# any type works
["hello", True, 1, 1.5]

Out[16]: ['hello', True, 1, 1.5]



#### List Methods

Now that we understand the **data** a list data type provides, let's look at some of its **functionality**.

Something you will frequently want to do is add a new item to an existing list. 

Lists provide a method called [**append()**](https://www.w3schools.com/python/ref_list_append.asp) to do just that. 

**`append()`** takes in an argument of any type and edits the list it is called on so that the argument is stuck onto the end of the list. 

Let's say after we ate our pancakes, eggs, and waffles, we also had yogurt.

Here, we can use **`append()`** to add yogurt to our **`breakfast_list`**.

In [0]:
breakfast_list.append("yogurt")
breakfast_list

Out[18]: ['pancakes', 'eggs', 'waffles', 'yogurt']



**Note:** Notice here that **`append()`** is an in-place method.
The method does not return a new list, but rather edits the original **`breakfast_list`** object. 

**`+`** is also defined as concatenation for lists as shown below.

In [0]:
["pancakes", "eggs"] + ["waffles", "yogurt"]



While we typically use **`append()`**, it is possible to append elements to a list using **`+`**.

In [0]:
breakfast_list = ["pancakes", "eggs", "waffles"]
breakfast_list = breakfast_list + ["yogurt"]
breakfast_list



A useful shortcut operation for this is **`+=`**.

**`breakfast_list`** `+=` **`["yogurt"]`** is the same thing is **`breakfast_list`** `=` **`breakfast_list`** `+` **`["yogurt"]`**.

The **`+=`** operator works for other types as well, using their respective **`+`** operator.

In [0]:
breakfast_list = ["pancakes", "eggs", "waffles"]
breakfast_list += ["yogurt"]
breakfast_list



#### List indexing

Often, we want to reference a specific item or items in a list. This called [list indexing](https://www.w3schools.com/python/python_lists_access.asp).

Lists provide an operation to get the item at a certain index in the list like this:

      list_name[index]

In Python indices start from 0, so the first element of the list is 0, the second is 1, etc.

In [0]:
breakfast_list[0]

Out[19]: 'pancakes'



We can also use negative indexing, which starts counting from right to left, starting from -1. 

Thus, the last element of the list is -1, the second to last is -2, etc.

In [0]:
breakfast_list[-1]

Out[20]: 'yogurt'



We can also provide a range of indices we want to access like this:

**`list_name[start:stop]`**

This returns a list of the values starting at **`start`** and up to but not including **`stop`**.

In [0]:
# Note the stop index is exclusive
breakfast_list[0:2]

Out[21]: ['pancakes', 'eggs']



If we don't provide a start index, Python assumes we start at the beginning.

If we don't provide a stop index, Python assumes we stop at the end.

In [0]:
print(breakfast_list[:2])
print(breakfast_list[1:])

['pancakes', 'eggs']
['eggs', 'waffles', 'yogurt']




We can also change the value of an index in a list to be something new like this:

In [0]:
print(breakfast_list)
breakfast_list[0] = "sausage"

print(breakfast_list)

['pancakes', 'eggs', 'waffles', 'yogurt']
['sausage', 'eggs', 'waffles', 'yogurt']




We can also use **`in`** to check if an element is in a given list. This is a boolean operation:

In [0]:
"waffles" in breakfast_list

Out[24]: True


####  Filtering Lists

Sometimes we need to extract specific items from a list based on certain criteria.

You can achieve this by filtering a list, which allows us to create a new list containing only the elements that meet our defined conditions.

Let's start with a list of breakfast items that we had this morning.

In [0]:
breakfast_list = ["pancakes", "eggs", "waffles", "milk", "yogurt", "bacon", "fruit", "cereal"]


Now, suppose you want to create a new list containing breakfast items made from dairy products. We'll define a list of milk products that we want to filter for.

In [0]:
milk_products = []
milk_items = ["milk", "yogurt"]


Let's get the milk products from the breakfast we just ate.

We are going to use a loop to do this. Looping is a programming construct that we will cover in more detail in a different discussion. For now, just know that it provides a way to iterate over each item in a list.

In [0]:
for item in breakfast_list:
    if item in milk_items:
        milk_products.append(item)

print(milk_products)

['milk', 'yogurt']



This works, but it's not very concise. You can also achieve this using the **`filter()`** method. It is a useful technique for extracting elements from a list based on specific criteria.

Now, let's get the milk products from the breakfast we just ate, but this time, we'll use the **`filter()`** method.

In [0]:
milk_products = list(filter(milk_items.count, breakfast_list))

print(milk_products)

['milk', 'yogurt']



In this example, the **`filter()`** method iterates over each item in the **`breakfast_list`** and includes it in a new list if it matches the specified criteria. In this case, that criteria is the **`milk_items.count()`**, which returns the number of times an element is found in the list. So in otherwise, we're filtering items that can be found in the **`milk_items`** list; otherwise, the item is excluded.

As you can see, we achieved the same result as the loop-based approach, obtaining a list of milk products from the breakfast list.


#### List Comprehensions 

List comprehensions are a concise pattern for creating new lists by applying an inline expression to each item in an existing list. They are often preferred for transforming or filtering list elements because they offer shorter, more compact, and readable code.

Now, we already have our breakfast list. Let's use [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp) to filter milk products from **`breakfast_list`**.

In [0]:
milk_products = [item for item in breakfast_list if item in milk_items]
print(milk_products)

['milk', 'yogurt']


--i18n-8a7f0b84-0522-4036-b975-1a1f196e870a
 
 You can see here that the list comprehension filters out items from **`breakfast_list`** that are present in the **`milk_items`** list.



### Collection Type 2: Dictionaries

A [Dictionary](https://www.w3schools.com/python/python_dictionaries.asp) is a sequence of key-value pairs. We define a dictionary as follows:

`{key_1: value_1, key_2: value_2, ...}`

The keys and values can all be of any type. However, because each key maps to a value, it is important that *all keys are unique*.

Let's create a breakfast dictionary, where the keys are the type of food and the values are the number of those foods we ate for breakfast.

In [0]:
breakfast_dict = {"pancakes": 1, "eggs": 2, "waffles": 3}
breakfast_dict

Out[30]: {'pancakes': 1, 'eggs': 2, 'waffles': 3}



#### Dictionary Methods

Dictionaries provide the method [**dict_object.get()**](https://www.w3schools.com/python/ref_dictionary_get.asp) to get the value in the dictionary for the given argument. 

Let's see how many waffles we ate.

In [0]:
breakfast_dict.get("waffles")

Out[31]: 3



Alternatively, you can use the syntax **`dict_object[key]`**.

In [0]:
breakfast_dict["waffles"]

Out[32]: 3




You can update a dictionary similarly to a list by assigning **`breakfast_dict[key]`** to be something. 

If the key is present, it overwrites the current value. If not, it creates a new key-value pair. 

Let's say we ate another waffle, bringing our total up to 4 waffles, and then ate a yogurt.

In [0]:
print(breakfast_dict)
breakfast_dict["waffles"] += 1
breakfast_dict["yogurt"] = 1
print(breakfast_dict)

{'pancakes': 1, 'eggs': 2, 'waffles': 3}
{'pancakes': 1, 'eggs': 2, 'waffles': 4, 'yogurt': 1}




Notice the use of **`+=`** to increment the count of waffles.

**Question**: Why did we not use **`+=`** to increment the yogurt count?



In order to determine if a key is in a dictionary, we can use the method [**dict_name.keys()**](https://www.w3schools.com/python/ref_dictionary_keys.asp). This returns a list of the keys in the dictionary. 

Similar to lists, we can use **`in`** to see if our key is in the dictionary. Let's see if we ate bacon for breakfast.

In [0]:
print(breakfast_dict.keys())
print("bacon" in breakfast_dict.keys())

dict_keys(['pancakes', 'eggs', 'waffles', 'yogurt'])
False



### Collection Type 3: Tuples

A tuple is an ordered sequence of items, just like a list.

[Tuples](https://www.w3schools.com/python/python_tuples.asp), unlike lists and dictionaries, are immutable, meaning they cannot be changed once they are created. We define a tuple as follows: **`(item1, item2, item3, ...)`**.

Tuples can contain items of various types, and they maintain the order of the elements.

Let's create a breakfast tuple to keep track of the items we had for breakfast.

In [0]:
breakfast_tuple = ("pancakes", "eggs", "waffles")
breakfast_tuple

Out[35]: ('pancakes', 'eggs', 'waffles')


#### Tuple Methods 

Tuples are simple and do not have many built-in methods compared to lists and dictionaries. Tuples don't have methods like **`append()`**, for example, since they are immutable. Once a tuple is created, you cannot change, add, or remove elements from it. However, you can perform operations like indexing, slicing, and checking for the presence of an element, similar to lists. Let's take a look at what we had for breakfast first using indexing.

In [0]:
breakfast_tuple[0]  

Out[36]: 'pancakes'


Slicing can also be used with tuples, which allows you to extract a range of elements.

In [0]:
breakfast_tuple[1:3]

Out[37]: ('eggs', 'waffles')


You can use the  **`in`** operator to check if an element exists in a tuple. let's see if we had pancakes for breakfast or not.

In [0]:
print("pancakes" in breakfast_tuple)

True


&copy; 2023 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="https://help.databricks.com/">Support</a>