# Python Fundamentals: Data Types and Structures

* * * 

<div class="alert alert-success">  
    
### Learning Objectives 
    
* Distinguish between the main data types in Python.
* Distinguish between the main data structures in Python.     
* Use methods on different data types.
* Recognize arguments when using methods.
</div>


### Icons Used in This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
⚠️ **Warning:** Heads-up about tricky stuff or common mistakes.<br>

### Sections
1. [Data Types in Python](#dtypes) 
2. [Methods](#methods)
3. [Lists: Ordered Data Structures](#lists)
4. [Dictonaries: Key-Value Pairs](#dicts)

<a id='dtypes'></a>

# Data Types in Python

**Data types** are categories of data items. Programming languages need separate data types, because operations or functions might behave differently depending on what data types they're working with. 

There are a variety of data types in Python (see the picture below). Some of the most important types include integers, floats, strings, and Booleans. 

There are also data types such as lists and dictionaries. They offer ways of organizing data and are sometimes called **data structures**.

<img src="../img/Python-data-structure.jpeg" alt="data_types" width="700"/>

We use the `type()` **function** to identify what the type is of a current variable. Functions are signified by parentheses following them, which contain any inputs to the function.

🔔 **Question:** Let's check the types of two variables below. What do you think the type for each variable will be?

In [None]:
life_exp = 28.021
type(life_exp)

In [None]:
continent = 'Asia'
type(continent)

Here are some of the most common data types you'll encounter while using Python (and programming languages in general):

* **int**: Integers (e.g., `a = 2`).
* **float**: Decimal numbers (e.g., `a = 2.01`).
* **str**: Strings, which denotes text (e.g., `a = "2"` or `a = '2'`).

Operations and functions work differently for different types. For example, subtraction works with numeric types like floats, but not with strings.

In [None]:
# Subtraction with floats
life_exp - 2.0

In [None]:
# Subtraction with strings?
continent - 2

In contrast, addition works for both strings and numbers:

In [None]:
# Addition with floats
life_exp + 2.0

In [None]:
# Addition with strings
'South-' + continent

### 💡 Tip: Guidelines for Variable Names

- Python is case-sensitive (`life_exp` and `Life_exp` are two separate variables).
- Use meaningful variable names (e.g. `continent` is more informative than `a_variable`). Ideally, you should be able to tell what is going on in the code and variables without having to run it.
- There are different styles of writing variables, like **snake case** (`life_exp`) and **camel case** (`lifeExp`). You're free to choose, but be consistent. 
- Don't use variable names that refer to existing variables and functions in Python (e.g., `print`, `sum`, `str`).

## Type Conversion

Types can get confusing. For instance, we can write a number as either an integer or a string. Python treats these differently, even if to us the value is the same:

In [None]:
a = '3'
b = 3

b - a

Even though our intention is to do numeric subtraction, the type of `a` is a string, which results in an error. 

🔔 **Question:** Let's check the type of each variable using `type()`. What do you predict the types of each variable is?

In [None]:
type(a)

In [None]:
type(b)

As we can predict from the line where we assigned the variable, `a` is a string. If we could convert this to an integer, the operation will work. 

We can do this with **type conversion**. The `int()` function will convert the input to an integer:

In [None]:
int(a)

In [None]:
type(int(a))

In [None]:
b - int(a)

There are other type conversion functions.

- `str()` converts a variable to a string.
- `float()` converts a variable to a float. 

If the value cannot be converted to that type, the function will return a `ValueError`. Run the cell below.

In [None]:
int('Netherlands')

In the above case, the error means that **non-numeric characters** cannot be interpreted as a number.

## 🥊 Challenge 1: String to Integer

Try converting `pi` to an int type. Do you run into an error? How do you fix it?

💡 **Tip**: consider using multiple conversion functions.

In [None]:
pi = '3.14'

# YOUR CODE HERE


<a id='func'></a>

# Functions

A **function** is a reusable block of code that performs a specific task. They allow us to run operations over and over without needing to write the code over and over again. 

Functions can be recognized by their trailing parentheses `()`. The data you want to apply the function to goes inside those parentheses!

## 🥊 Challenge 2: Using a Function

Google "Python len()" and find a resource that tells you how to use it. Come back here and use `len()` on the variable we have set below.

In [None]:
country = 'Zimbabwe'

# YOUR CODE HERE



<a id='methods'></a>

# Methods

A **method** is a special type of function: one that executes on a **particular type of object**, like a string or an integer. They allow you to do different things with different objects.

For instance, we can use a method to turn a string variable into lowercase or uppercase. The lowercase and uppercase methods don't exist for `int`s, though. That's why we call them methods instead of functions.

You can access methods with **dot notation**. It looks like this: `variable.method()`

Let's look at the built-in method [`upper()`](https://python-reference.readthedocs.io/en/latest/docs/str/upper.html), which can be applied to strings:

In [None]:
country = 'Greece'
country.upper()

🔔 **Question**: What do you think the below cell does?

In [None]:
country.lower()

Note that you can run methods on variables that hold a data value, or on the data values directly!

## 🥊 Challenge 3: Chaining Methods

Methods can be **chained** in a single line. This is fine, as long as the output of one method directly feeds into the input of the next. These lines can be read sequentially left to right. 

Don't run the next code cell yet! Use your search engine to look up the two methods that `country` goes through first. What do you think the final output will be?

In [None]:
country.lower().startswith('g')

## Adding Arguments

Notice that in the previous cell, `lower()` doesn't take any values in between the brackets, but `startswith()` does.

Methods (and functions in general) can often take values in order to alter their behavior. These values, that go in between the brackets, are called **arguments**.

## 🥊 Challenge 4: Time to Split

First, let's save a string in a variable.

In [None]:
sentence = 'The capital of Brazil is Brasília. It has a tropical savanna climate.'

Let's look at the [documentation](https://docs.python.org/3/library/stdtypes.html#str.split) for the `.split()` method. Try to use this method on `sentence`. What does it look like this method does?

In [None]:
# YOUR CODE HERE


Finally, try using `sep='.'` in between the parenthesis of `.split()`, when applying it to `sentence`. What is the output?

In [None]:
# YOUR CODE HERE


<a id='lists'></a>

# Lists: Ordered Data Structures

A **data structure** is a specialized format for organizing, processing, retrieving and storing data.

Lists are a collection of **ordered** items. Lists have a length, and the items inside can be **indexed**, or accessed based on their positions.

A list is an **iterable**: an object with multiple values that can be iterated through. Specifically, we can proceed through each value of the list, one by one. Other examples of iterables in Python are tuples, and even strings.

One nice thing about lists is that they can contain different types of data. For example, the entries of a list can be integers, floats, strings, and even other lists!

We specify a list with square brackets: `[]` and commas separating each entry in the list.

In [None]:
country_list = ['Ethiopia', 'Canada', 'Thailand', 'Denmark', 'Japan']
type(country_list)

🔔 **Question:** `len()` gives the number of items in a list. What is the output of the line below?

In [None]:
len(country_list)

## Indexing and Slicing Lists

Let's say we only want one element, or a portion of a list. We can do so by telling Python which elements we want (e.g., we want the first, second, and third entries). This is called **indexing** the list.

🔔 **Question:** Look at the index we create for `country_list` below. What do you think will be printed?

In [None]:
country_list[1]

⚠️ **Warning:** Python is **zero**-indexed, meaning the first entry has index zero, not one! In addition, the `stop` index indicates 'up to but not including'. So, in `list[start:stop]`, `list[stop]` is not included.

Getting multiple items from a list can be done with **slicing**. We specify the start index and the end index, separated by a colon.

You can index a list using square brackets following the list name, using the notation `[start:stop]`. Note we're using square brackets instead of parentheses! 

The colon indicates that you want to access all entries between the two endpoints. If one side of the colon is empty, it indicates using one end of the list as the starting or ending points. 

🔔 **Question:** Can you guess what the output of these statements will be?

In [None]:
country_list[1:3]

In [None]:
country_list[2:5]

In [None]:
country_list[3:]

## 🥊 Challenge 5: Slicing Lists

Say we have a list and we want to get rid of the string values. Slice the list to retrieve these elements.

In [None]:
years = [1990, 1994, 2002, 'Missing', 'NaN']

# YOUR CODE HERE


## List Methods

Recall that methods are functions that operate specifically on objects with a particular data type. They are accessed via dot notation (`object.method()`). 

Lists have their own methods that perform operations specific to lists. The most common method is the `append()` method, which adds an item to the end of a list. 

The code below adds a country to `country_list` using `append()`:

In [None]:
print(country_list)

In [None]:
country_list.append('USA')

In [None]:
print(country_list)

<a id='dicts'></a>

# Dictionaries: Key-Value Pairs

Dictionaries are organized in pairs of keys and values. The **keys** can be used to access the **values**. They're most useful when you have unordered data organized in pairs. This occurs, for example, in storing metadata (data describing other data).

Dictionaries are specified in Python using curly braces. **Colons separate the keys and values**. 

Let's take a look at an example dictionary:

In [None]:
# An example dictionary
example_dict = {
    'country': 'Afghanistan',
    'year': 1952,
    'population': 8425333}

We can access the items of a dictionary by referring to its key name, inside square brackets. 

In [None]:
example_dict['year']

## 🥊 Challenge 6: Creating a Dictionary

Let's create a dictionary called `country_dict` that takes in a list of items as its values.

1) Use the variables we're defining below - `country`, `continent` and `life_expectancy` - as **values**. 
2) Choose an appropriately named string for each of the **keys**.
3) Finally, print the keys in the dictionary.

In [None]:
country = ['Afghanistan', 'Greece', 'Liberia']
continent = ['Asia', 'Europe', 'Africa']
life_exp = [28.801, 76.670, 46.027]

# YOUR CODE HERE
country_dict = ...

<div class="alert alert-success">

## ❗ Key Points

* Methods are functions that only work on certain data types.
* Lists are a collection of ordered items, which can contain different data types.
* List indices start at 0, not 1.
* The `.append()` method adds an item to a list.
* Lists can be indexed using square brackets - e.g. `some_list[0]` indexes the first item of `some_list`. 
* Dictionaries are mappings of key-value pairs. 
* Dictionary values can be accessed using square brackets – e.g. `some_dict['name']` accesses the value corresponding to the 'name' key.
    
</div>