# Python Foundations: Data types and structures

Python syntax allows its users to quickly define a range of different data types and structures. At an abstract level, each object in Python will have a data type, and they can be grouped into data structures.

Each data type/structure comes with its own selection of useful features and "methods". Which makes selecting data type an important thing to consider!

It is also possible to convert an object from one type to another. Sometimes you may be retrieving data from an external source and not necessarily know how the data types have been defined. If you need to check a data type in Python there's a very easy way to do so, the `type()` function. The following code block will return the types for each object that is covered in the notebook. 

In [1]:
print(type('a string'))
print(type(10))
print(type(10.01))
print(type(True))
print(type(['a string','another string']))
print(type({"key":"value"}))
print(type({1,2,3}))
print(type((1,2,3)))

<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>
<class 'list'>
<class 'dict'>
<class 'set'>
<class 'tuple'>


**N.B.** note that all the types returned are prefixed with `<class ...>`. That's because under the hood in Python, all objects are 'classes'. For the moment it doesn't matter exactly what that means.

## Strings

Strings in Python are defined by surrounded characters by either `'` or `"`. Both can be used throughout a script. 

However, you cannot mix them as part of the same string e.g. both `"a string"` and `'a string'` are valid whereas `"a string'` is not.

Strings have their own methods that can be used to manipulate and interrogate them: [https://www.w3schools.com/python/python_ref_string.asp](https://www.w3schools.com/python/python_ref_string.asp)

As well built in methods, Python also provides some more options to manipulate strings e.g. mathematical operators  

In [4]:
concatenated_string = 'string one' + 'string 2'
print(concatenated_string)

daft_punk_lyric = "Around the world \n" # \n indicates a newline
daft_punk_song = daft_punk_lyric * 5     
print(daft_punk_song)

string onestring 2
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 
Around the world 



### String exercises

Using string methods listed here [https://www.w3schools.com/python/python_ref_string.asp](https://www.w3schools.com/python/python_ref_string.asp) or mathematical operators come up with solutions to the following:

1. Capitalise the **first** word in the following string `"the man who stole the world"`
2. Capitalise **each** word in the following string `"the man who stole the world"`
3. Count the number of times "a" appears in the string `"I am the eggman. They are the eggmen. I am the walrus"`
4. Count the number of times "a" appears in `""So, bye, bye Miss American pie` ensuring that both lower and upper cases instances are counted.
5. In the following string replace `"<your_name>"` with your name and store it in a variable called `all_work`: `"All work and no play makes <your_name> a dull coder"`
6. Create a variable called `no_play` that repeats the `all_work` variable ten times.
7. What is the **len**gth of `no_play`? 

## Numbers: ints & floats

Integers and floats are the two of the simplest and most commonly used number data types. Although they have some characteristics unique to each of them, they are also compatible in many ways.

  - Integers are whole numbers and can be positive or negative
  - Floating points are numbers with decimal points

### Casting

Casting allows you to change the data type of an object. For example, a number may have been read from a spreadsheet as a string and you want to do some maths on it. To cast an object use the type name (e.g. float, int, list etc) and pass the object into parentheses afterwards e.g. `int(7.0)`

### Number exercises

1. Add the following together:
    - `7 + 3`
    - `6.7 + 3.3`
    - `7.0 + 3`
    - `"7" + "3"`
    - what do you observe?
2. Try the following casts
    - `int("7")`
    - `int("Seven")`
    - `int(7.4)`
    - `int(7.7)`
    - `float("7")`
    - `float(7)`
    - `str(7.34)`
    - `str(7)`
    - is the anything that didn't behave as you might expect?

7.0

## Booleans

A Boolean value is always either `True` or `False` (note the capitalisation). In Python *expressions* will always evaluate to a Boolean.

e.g. `10 > 5` is `True` and `5 > 10` is `False`

If you would like to check if two values are equal you need to use `==` and `!=` for not equal.

e.g. `10 == (5 + 5)` is `True` and `(2 + 2) != 5` is `True`

### `and`, `or` and `not`

Python uses the `and` and `or` logical operators to combine conditional statements. 

- `and` will return `True` if both statements are true e.g.  `10 == (5 + 5) and (2 + 2) != 5`
- `or` will return `True` if either statement is true e.g `10 == (5 + 5) or (2 + 2) == 5`

The `not` operator is used to reverse the Boolean value e.g. `not(10 == (5 + 5) or (2 + 2) == 5)` will return `False`

### "Truthy" and "falsey" data types

By using `bool()` to cast other data types to a Boolean, you evaluate any value. Most values are *truthy* e.g. `bool("My string")` will return `True`, whereas `bool("")` returns `False`

Values that evaluate as `False` or *falsey values* include the following:

- String: `""`
- Int: `0`
- Float: `0.0`
- List: `[]`
- Dictionary: `{}`

### Boolean exercises

What Boolean values would you expect the following evaluate to?
1. `10 > 100`
2. `10 >= (5 + 5)`
3. `10.0 == "10"`
4. `10.0 == 10`
5. `"dog" != "cat"`
6. `(4 < 5) and (5 < 6)`
7. `(4 < 5) and (9 < 6)`
8. `(1 == 2) or (2 == 2)`
9. `not (5 > 4)`
10. `(not bool(0)) or (not bool("Zero"))`

Hint: use the code block below to check and interrogate the answers

False

## Lists

In Python, lists are the imaginative name for lists of values. Here are their key characteristics:

- they are defined by square brackets e.g. `my_list = [1,2,3,4]`
- they are **ordered** and therefore indexed
- they can hold duplicate values
- they are mutable (i.e. they can be changed)
- they can contain all other data types, including other lists e.g. `another_list = ['1','2','3',1,2,3,[1,2,3]]`

### List indexes

Because lists are ordered it is possible to access items by using their index. The index begins at 0 and increments 1 for each list item. You can access the index by appending square brackets to the list and passing an integer.

You can also use negative integers to retrieve items based on their position relative to the end of the list i.e. `-1` is the final entry `-2` is the penultimate etc.

Indexes also allow you to *slice* lists, or retrieve parts of the list by using `[:]`

Here's an example:

In [18]:
animals = ["Monkey", "Bear", "Tiger", "Fox", "Giraffe", "Penguin"]
animals[0]
animals[1]
animals[-1]

animals[1:3] # note that the second integer is not inclusive
animals[1:]
animals[:-1]

['Monkey', 'Bear', 'Tiger', 'Fox', 'Giraffe']

### List exercises

Lists have great deal of useful methods that can be used to update and manipulate them, and should be useful when tackling the following questions. Read about them here: [https://www.w3schools.com/python/python_lists_methods.asp](https://www.w3schools.com/python/python_lists_methods.asp)

1. Create a list, with a minimum of 5 items in it.
2. *append* a new item to the list
3. *insert* a duplicate of the second element to fourth index in the list the list
4. *count* how many times the second element appears in the list
5. Create a variable to store the 3rd, 4th and 5th elements
6. Extend the original list with the items in the variable created for 5.
7. What is the first item in the list alphabetically? 
8. What is the **len**gth of the list?
9. *append* a new items to the list with the value `""`
10. Remove the item from the list that has the value `""`

**n.b.** Strings are also indexed and can be sliced and diced in similar ways to lists e.g.

In [21]:
my_str = 'this is a str'
my_str[0]
my_str[1:5]


'his '

## Dictionaries

Dictionaries are used to store data values in *key:value* pairs, they are 

- ordered ("interestingly", they used to be unordered)
- mutable/changeable
- do not allow duplicate keys

The are written using curly brackets e.g.

In [25]:
my_dict = {
    "f_name":"Sotirios",
    "s_name":"Alpanis",
    "hair_style": False,
    "job_title": "Senior Data Engineer"
}
print(my_dict.keys())
print(my_dict.values())

dict_keys(['f_name', 's_name', 'hair_style', 'job_title'])
dict_values(['Sotirios', 'Alpanis', False, 'Senior Data Engineer'])


There are different methods for accessing values in a dictionary, but the most straightforward are to either use the key name in square brackets, or the `get()` method e.g.

In [24]:
my_dict["f_name"] # Note if the key doesn't exist an error will be thrown
my_dict.get("f_name") # Note if the key doesn't exist it will return None

'Sotirios'

### Dictionary exercises

Here are some potentially useful dictionary methods [https://www.w3schools.com/python/python_dictionaries_methods.asp](https://www.w3schools.com/python/python_dictionaries_methods.asp)

1. Create a dictionary with the following keys (and add your own values): `first_name` (str), `lucky_number` (int), `pets` (list)
2. Add a new key/value pair to your dictionary for `surname`
3. `print()` the first name and surname together from your dictionary
4. Congratulations! You just got a new pet lobster called `"Mr Pinchy"`, update the `pets` list to include him.
5. Create a variable and store an alphabetic list of the keys in it.  


## Sets

This is one of the less common data collections, but they can be very useful. Their characteristics are:

- unordered and not indexed
- stored items are immutable (although you can add/remove items)
- they do not allow duplicates

They are written with curly brackets

e.g. `animal_set = {"cat","dog","lobster"}`

One of the things that makes sets so useful is their associated methods which allow for quick comparisons of sets e.g.

In [34]:
curry_ingredients = {"ginger", "garlic", "tomatoes", "chicken", "coriander", "cumin", "salt", "pepper","turmeric"}

pasta_sauce_ingredients = {"tomatoes","garlic","basil", "salt", "pepper", "oregano", "eggplant"}

curry_ingredients.difference(pasta_sauce_ingredients)
# curry_ingredients - pasta_sauce_ingredients

# pasta_sauce_ingredients.difference(curry_ingredients)
# pasta_sauce_ingredients - curry_ingredients

# curry_ingredients.intersection(pasta_sauce_ingredients)

# shopping_list = curry_ingredients.union(pasta_sauce_ingredients)
# shopping_list

# common_ingredients = {"salt", "pepper", "garlic"}
# curry_ingredients.issuperset(common_ingredients)
# common_ingredients.issubset(curry_ingredients)


True

Another useful trick/tip is to use casting to remove duplicates from lists e.g. 

In [36]:
my_list = [1,1,2,2,2,4,1,5,7,8,3,4,10]
my_set = set(my_list)
len(my_set)

8

## Tuples

Tuples are pretty uncommon, but every now and then you might encounter one so it's good to know how they behave.

Here's their characteristics:

- they are ordered (and therefore indexed)
- they're immutable or unchangeable
- they're defined using `()`

Because they're immutable, they tend to be used to store information that should be protected or secured e.g. passport numbers.

Another feature of their immutability is that that they cannot be directly updated. the indirect method is to convert to and from a list and add/remove/update items in between e.g.

In [38]:
fruit_tuple = ("apple","banana","blueberry")
print(fruit_tuple[-1])

fruit_list = list(fruit_tuple)
fruit_list.append("cherry")
fruit_tuple = tuple(fruit_list)
print(fruit_tuple[-1])

blueberry
cherry


### Unpack a tuple

Another tuple-specific feature is the concept of "unpacking". This is fairly commonly used when working with dictionaries
e.g.

In [41]:
# Unpacking a tuple
(green, yellow, *berries) = fruit_tuple
print(green)
print(yellow)
print(berries)

# In a dictionary context
my_dict = {
    "f_name":"Sotirios",
    "s_name":"Alpanis",
    "hair_style": False,
    "job_title": "Senior Data Engineer"
}

for key, value in my_dict.items(): # the items() method returns a list of tuples
    print(key)
    print(value)
    print("")

apple
banana
['blueberry', 'cherry']
f_name
Sotirios

s_name
Alpanis

hair_style
False

job_title
Senior Data Engineer

