[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FCNepomuceno/Python_Course/blob/main/02_Variables_Data_Types_and_Structures.ipynb)


# Variables & Data Types & Data Structures

## Overview

There are four data types that will be relevant to us: ints, floats, booleans, strings.
And there are four main data structures which will be relevant to us: lists, sets, tuples, and dictionaries.

#### Ints and floats:

Ints and floats are the two main numerical data types in Python. Ints are integers, while floats are numbers with decimal values.

In [None]:
# An int
4

# Another int
-17

# A float
3.9

# Another float (if you run this cell, the output will be -17.0, because only the final line's output is written)
-17.0

You can perform mathematical operations on ints and floats. Try running the following:

In [None]:
4 + 3

7

In [None]:
16.8 - 14.3

2.5

In [None]:
# It's okay to mix ints and floats
6.5 * 5 

32.5

In [None]:
# An example with numerical variables
numerator = 42
denominator = 7

numerator/denominator

6.0

In the cell below, compute (4 * (3 + 11.7))/(1800 - 46)

In [93]:
# EXERCISE

# compute (4 * (3 + 11.7))/(1800 - 46)
(4 * (3 + 11.7))/(1800 - 46)

0.033523375142531356

### Keep in mind that if you have one float variable in the operation, the result will be given in float as well.

#### Booleans:

There are only 2 possible values for a Boolean variable: True or False. Boolean variables are the outputs of logical statements like the following:

In [None]:
# Returns True
4 + 4 == 5 + 3

True

In [None]:
# Returns False
1 + 7 == 8 + 2

#### Notice that, as shown in the cells above, "==" and "=" are different in Python. "=" is used to assign a value to a variable, while "==" is used to check whether two things are equal but does not change the value of either of those things. The following cells explore this further; feel free to try your own examples to get a better intuition of how these operators work.

In [None]:
# Gives no output
variable = 3

In [None]:
# Checks whether the variable is 0; gives an output
variable == 0

In [None]:
# The cell above did not change the value of the variable to 0
print(variable)

In [None]:
# But now this will change the value of the variable
variable = 0
print(variable)

You can create more complicated logical expressions using "and", "or", and "not".

In [None]:
not ((4 + 3 == 7) and (9 == 8))

True

#### Parentheses matter! We've copied and pasted the expression from the cell above into the cell below; this expression evaluates as True. Modify this expression by only changing the positions of the parentheses so that it now evaluates as False.

In [96]:
# EXERCISE


# modify the positions of parentheses to make this expression False instead of True
not ((4 + 3 == 7) and (9 == 8))

False

#### Strings:

Strings are strings of characters. To represent a string in Python, put the set of characters inside quotation marks. You can use single quotes or double quotes; it doesn't matter which, but it is best to pick one and stick to it. Here are some strings:

In [None]:
# A string using single quotes
'cat'

# A string using double quotes
"dog"

'dog'

In [36]:
x = 'a string'
y = "a string"
x == y

True

In [37]:
multiline = """
one
two
three
"""
multiline

'\none\ntwo\nthree\n'

In [None]:
# This will generate an error because "4.0" is a string, not a float. Thus,
# you cannot add 3 to it.
"4.0" + 3

In [None]:
# But you can add two strings

"cat" + "dog"

'catdog'

#### Python has many extremely useful string functions and methods; here are a few of them:

In [42]:
message = 'what do you like?'
response = 'spam'

In [40]:
# length of string
len(response)

4

In [43]:
# Capitalize. See also str.title()
message.capitalize()

'What do you like?'

In [44]:
# concatenation with +
message + response

'What do you like?spam'

In [45]:
# multiplication is multiple concatenation
5 * response

'spamspamspamspamspam'

In [46]:
# Access individual characters (zero-based indexing)
message[0]

'W'

In [47]:
message[:5]

'What '

In [49]:
message[::-1]

'?ekil uoy od tahW'

## Simple String Manipulation in Python

For basic manipulation of strings, Python's built-in string methods can be extremely convenient.
If you have a background working in C or another low-level language, you will likely find the simplicity of Python's methods extremely refreshing.
We introduced Python's string type and a few of these methods earlier; here we'll dive a bit deeper

### Formatting strings: Adjusting case

Python makes it quite easy to adjust the case of a string.
Here we'll look at the ``upper()``, ``lower()``, ``capitalize()``, ``title()``, and ``swapcase()`` methods, using the following messy string as an example:

In [53]:
fox = "tHe qUICk bROWn fOx."

In [54]:
fox.upper()

'THE QUICK BROWN FOX.'

In [55]:
fox.lower()

'the quick brown fox.'

In [56]:
fox.title()

'The Quick Brown Fox.'

In [57]:
fox.capitalize()

'The quick brown fox.'

In [58]:
fox.swapcase()

'ThE QuicK BrowN FoX.'

## Note: Python Variables Are Pointers

Assigning variables in Python is as easy as putting a variable name to the left of the equals (``=``) sign:

```python
# assign 4 to the variable x
x = 4
```

This may seem straightforward, but if you have the wrong mental model of what this operation does, the way Python works may seem confusing.
We'll briefly dig into that here.

In many programming languages, variables are best thought of as containers or buckets into which you put data.
So in C, for example, when you write

```C
// C code
int x = 4;
```

you are essentially defining a "memory bucket" named ``x``, and putting the value ``4`` into it.
In Python, by contrast, variables are best thought of not as containers but as pointers.
So in Python, when you write

```python
x = 4
```

you are essentially defining a *pointer* named ``x`` that points to some other bucket containing the value ``4``.

#### Note one consequence of this: because Python variables just point to various objects, there is no need to "declare" the variable, or even require the variable to always point to information of the same type!
This is the sense in which people say Python is *dynamically-typed*: variable names can point to objects of any type.


## Everything Is an Object

Python is an object-oriented programming language, and in Python everything is an object. Call `type` function to get the object's type.

In [None]:
x = 4
type(x)

int

In [None]:
x = 'hello'
type(x)

str

In [None]:
x = 3.14159
type(x)

float

# Basic Python Semantics: Operators

## Arithmetic Operations

| Operator     | Name           | Description                                            |
|--------------|----------------|--------------------------------------------------------|
| ``a + b``    | Addition       | Sum of ``a`` and ``b``                                 |
| ``a - b``    | Subtraction    | Difference of ``a`` and ``b``                          |
| ``a * b``    | Multiplication | Product of ``a`` and ``b``                             |
| ``a / b``    | True division  | Quotient of ``a`` and ``b``                            |
| ``a // b``   | Floor division | Quotient of ``a`` and ``b``, removing fractional parts |
| ``a % b``    | Modulus        | Integer remainder after division of ``a`` by ``b``     |
| ``a ** b``   | Exponentiation | ``a`` raised to the power of ``b``                     |
| ``-a``       | Negation       | The negative of ``a``                                  |
| ``+a``       | Unary plus     | ``a`` unchanged (rarely used)                          |

These operators can be used and combined in intuitive ways, using standard parentheses to group operations.
For example:

In [None]:
# addition, subtraction, multiplication
(4 + 8) * (6.5 - 3)

42.0

Floor division is true division with fractional parts truncated:

In [None]:
# True division
print(11 / 2)

5.5


In [None]:
# Floor division
print(11 // 2)

5


In [98]:
# EXERCISE

# multiply 5 by 2 to power of 4

80

In [99]:
# EXERCISE

# get the remainder of the division 27/4

3

## Comparison Operations

Another type of operation which can be very useful is comparison of different values.
For this, Python implements standard comparison operators, which return Boolean values ``True`` and ``False``.
The comparison operations are listed in the following table:

| Operation     | Description                       || Operation     | Description                          |
|---------------|-----------------------------------||---------------|--------------------------------------|
| ``a == b``    | ``a`` equal to ``b``              || ``a != b``    | ``a`` not equal to ``b``             |
| ``a < b``     | ``a`` less than ``b``             || ``a > b``     | ``a`` greater than ``b``             |
| ``a <= b``    | ``a`` less than or equal to ``b`` || ``a >= b``    | ``a`` greater than or equal to ``b`` |

These comparison operators can be combined with the arithmetic and bitwise operators to express a virtually limitless range of tests for the numbers.
For example, we can check if a number is odd by checking that the modulus with 2 returns 1:

In [None]:
# 25 is odd
25 % 2 == 1

True

In [None]:
# 66 is odd
66 % 2 == 1

False

We can string-together multiple comparisons to check more complicated relationships:

In [None]:
# check if a is between 15 and 30
a = 25
15 < a < 30

True

In [100]:
# EXERCISE

# return True if a variable is equal to another

var1 = 
var2 = 

True

## Boolean Operations
When working with Boolean values, Python provides operators to combine the values using the standard concepts of "and", "or", and "not".
Predictably, these operators are expressed using the words ``and``, ``or``, and ``not``:

In [None]:
x = 4
(x < 6) and (x > 2)

True

In [None]:
(x > 10) or (x % 2 == 0)

True

In [None]:
not (x < 6)

False

In [102]:
# EXERCISE

# return True if a variable is different than another using 'not'

var1 = 
var2 = 


True

In [105]:
# EXERCISE

# Return True if two out of three variables are equal

var1 = 
var2 = 
var3 = 

True

## Identity and Membership Operators

Like ``and``, ``or``, and ``not``, Python also contains prose-like operators  to check for identity and membership.
They are the following:

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a is b``    | True if ``a`` and ``b`` are identical objects     |
| ``a is not b``| True if ``a`` and ``b`` are not identical objects |
| ``a in b``    | True if ``a`` is a member of ``b``                |
| ``a not in b``| True if ``a`` is not a member of ``b``            |

### Identity Operators: "``is``" and "``is not``"

The identity operators, "``is``" and "``is not``" check for *object identity*.
Object identity is different than equality, as we can see here:

In [None]:
a = [1, 2, 3]
b = [1, 2, 3]

In [None]:
a == b

True

In [None]:
a is b

False

In [None]:
a is not b

True

What do identical objects look like? Here is an example:

In [None]:
a = [1, 2, 3]
b = a
a is b

True

#### The difference between the two cases here is that in the first, ``a`` and ``b`` point to *different objects*, while in the second they point to the *same object*.
#### As we saw in the previous section, Python variables are pointers. The "``is``" operator checks whether the two variables are pointing to the same container (object), rather than referring to what the container contains.
#### With this in mind, in most cases that a beginner is tempted to use "``is``" what they really mean is ``==``.

### Membership operators
Membership operators check for membership within compound objects.
So, for example, we can write:

In [None]:
1 in [1, 2, 3]

True

In [None]:
2 not in [1, 2, 3]

False

These membership operations are an example of what makes Python so easy to use compared to lower-level languages such as C.
In C, membership would generally be determined by manually constructing a loop over the list and checking for equality of each value.
In Python, you just type what you want to know, in a manner reminiscent of straightforward English prose.

## Built-In Types

When discussing Python variables and objects, we mentioned the fact that all Python objects have type information attached. Here we'll briefly walk through the built-in simple types offered by Python.
We say "simple types" to contrast with several compound types, which will be discussed in the following section.

Python's simple types are summarized in the following table:

<center>**Python Scalar Types**</center>

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |

We'll take a quick look at each of these in turn.

## Integers
The most basic numerical type is the integer.
Any number without a decimal point is an integer:

In [None]:
x = 1
type(x)

int

Python integers are actually quite a bit more sophisticated than integers in languages like ``C``.
C integers are fixed-precision, and usually overflow at some value (often near $2^{31}$ or $2^{63}$, depending on your system).
Python integers are variable-precision, so you can do computations that would overflow in other languages:

In [None]:
2 ** 200

1606938044258990275541962092341162602522202993782792835301376

## Floating-Point Numbers
The floating-point type can store fractional numbers.
They can be defined either in standard decimal notation, or in exponential notation:

In [None]:
x = 0.000005
y = 5e-6
print(x == y)

True


In [None]:
x = 1400000.00
y = 1.4e6
print(x == y)

True


In the exponential notation, the ``e`` or ``E`` can be read "...times ten to the...",
so that ``1.4e6`` is interpreted as $~1.4 \times 10^6$.

An integer can be explicitly converted to a float with the ``float`` constructor:

In [None]:
float(1)

1.0

### Aside: Floating-point precision
One thing to be aware of with floating point arithmetic is that its precision is limited, which can cause equality tests to be unstable. For example:

In [None]:
0.1 + 0.2 == 0.3

False

Why is this the case? It turns out that it is not a behavior unique to Python, but is due to the fixed-precision format of the binary floating-point storage used by most, if not all, scientific computing platforms.
All programming languages using floating-point numbers store them in a fixed number of bits, and this leads some numbers to be represented only approximately.
We can see this by printing the three values to high precision:

In [None]:
print("0.1 = {0:.17f}".format(0.1))
print("0.2 = {0:.17f}".format(0.2))
print("0.3 = {0:.17f}".format(0.3))

0.1 = 0.10000000000000001
0.2 = 0.20000000000000001
0.3 = 0.29999999999999999


We're accustomed to thinking of numbers in decimal (base-10) notation, so that each fraction must be expressed as a sum of powers of 10:
$$
1 /8 = 1\cdot 10^{-1} + 2\cdot 10^{-2} + 5\cdot 10^{-3}
$$
In the familiar base-10 representation, we represent this in the familiar decimal expression: $0.125$.

Computers usually store values in binary notation, so that each number is expressed as a sum of powers of 2:
$$
1/8 = 0\cdot 2^{-1} + 0\cdot 2^{-2} + 1\cdot 2^{-3}
$$
In a base-2 representation, we can write this $0.001_2$, where the subscript 2 indicates binary notation.
The value $0.125 = 0.001_2$ happens to be one number which both binary and decimal notation can represent in a finite number of digits.

In the familiar base-10 representation of numbers, you are probably familiar with numbers that can't be expressed in a finite number of digits.
For example, dividing $1$ by $3$ gives, in standard decimal notation:
$$
1 / 3 = 0.333333333\cdots
$$
The 3s go on forever: that is, to truly represent this quotient, the number of required digits is infinite!

Similarly, there are numbers for which binary representations require an infinite number of digits.
For example:
$$
1 / 10 = 0.00011001100110011\cdots_2
$$
Just as decimal notation requires an infinite number of digits to perfectly represent $1/3$, binary notation requires an infinite number of digits to represent $1/10$.
Python internally truncates these representations at 52 bits beyond the first nonzero bit on most systems.

This rounding error for floating-point values is a necessary evil of working with floating-point numbers.
The best way to deal with it is to always keep in mind that floating-point arithmetic is approximate, and *never* rely on exact equality tests with floating-point values.

## None Type
Python includes a special type, the ``NoneType``, which has only a single possible value: ``None``. For example:

In [None]:
type(None)

NoneType

You'll see ``None`` used in many places, but perhaps most commonly it is used as the default return value of a function.
For example, the ``print()`` function in Python 3 does not return anything, but we can still catch its value:

In [None]:
return_value = print('abc')

abc


In [None]:
print(return_value)

None


Likewise, any function in Python with no return value is, in reality, returning ``None``.

# Lists, Tuples, Dictionaries and Sets


We have seen Python's simple types: ``int``, ``float``, ``complex``, ``bool``, ``str``, and so on.
Python also has several built-in compound types, which act as containers for other types.
These compound types are:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

As you can see, round, square, and curly brackets have distinct meanings when it comes to the type of collection produced.
We'll take a quick tour of these data structures here.

## Lists
Lists are the basic *ordered* and *mutable* data collection type in Python.
They can be defined with comma-separated values between square brackets; for example, here is a list of the first several prime numbers:

In [2]:
L = [2, 3, 5, 7]

Lists have a number of useful properties and methods available to them.
Here we'll take a quick look at some of the more common and useful ones:

In [3]:
# Length of a list
len(L)

4

In [4]:
# Append a value to the end
L.append(11)
L

[2, 3, 5, 7, 11]

In [None]:
# Addition concatenates lists
L + [13, 17, 19]

In [5]:
# sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

[1, 2, 3, 4, 5, 6]

In addition, there are many more built-in list methods; they are well-covered in Python's [online documentation](https://docs.python.org/3/tutorial/datastructures.html).

While we've been demonstrating lists containing values of a single type, one of the powerful features of Python's compound objects is that they can contain objects of *any* type, or even a mix of types. For example:

In [6]:
L = [1, 'two', 3.14, [0, 3, 5]]

This flexibility is a consequence of Python's dynamic type system.
Creating such a mixed sequence in a statically-typed language like C can be much more of a headache!
We see that lists can even contain other lists as elements.
Such type flexibility is an essential piece of what makes Python code relatively quick and easy to write.

So far we've been considering manipulations of lists as a whole; another essential piece is the accessing of individual elements.
This is done in Python via *indexing* and *slicing*, which we'll explore next.

In [92]:
# EXERCISE

# Add a fruit to the list and sort the following list of fruits

fruits = ['Apple', 'Mango', 'Banana', 'Guava', 'Melon']

['Apple', 'Banana', 'Blueberry', 'Guava', 'Mango', 'Melon']

### List indexing and slicing
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.
As we'll see, both are indicated by a square-bracket syntax.
Suppose we return to our list of the first several primes:

In [7]:
L = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [8]:
L[0]

2

In [9]:
L[1]

3

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [11]:
L[-1]

11

In [12]:
L[-2]

7

You can visualize this indexing scheme this way:

![List Indexing Figure](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/fig/list-indexing.png?raw=1)

Here values in the list are represented by large numbers in the squares; list indices are represented by small numbers above and below.
In this case, ``L[2]`` returns ``5``, because that is the next value at index ``2``.

Where *indexing* is a means of fetching a single value from the list, *slicing* is a means of accessing multiple values in sub-lists.
It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array.
For example, to get the first three elements of the list, we can write:

In [14]:
L[0:3]

[2, 3, 5]

Notice where ``0`` and ``3`` lie in the preceding diagram, and how the slice takes just the values between the indices.
If we leave out the first index, ``0`` is assumed, so we can equivalently write:

In [15]:
L[:3]

[2, 3, 5]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last three elements can be accessed as follows:

In [16]:
L[-3:]

[5, 7, 11]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [17]:
L[::2]  # equivalent to L[0:len(L):2]

[2, 5, 11]

A particularly useful version of this is to specify a negative step, which will reverse the array:

In [18]:
L[::-1]

[11, 7, 5, 3, 2]

Both indexing and slicing can be used to set elements as well as access them.
The syntax is as you would expect:

In [19]:
L[0] = 100
print(L)

[100, 3, 5, 7, 11]


In [20]:
L[1:3] = [55, 56]
print(L)

[100, 55, 56, 7, 11]


A very similar slicing syntax is also used in many data science-oriented packages, including NumPy and Pandas (mentioned in the introduction).

Now that we have seen Python lists and how to access elements in ordered compound types, let's take a look at the other three standard compound data types mentioned earlier.

In [21]:
# EXERCISE 

# select the 2nd, 3rd and the last element of the list L 

L = ['Red', 'Green', 'White', 'Black', 'Pink', 'Yellow']

# expected output 'Green', 'White', 'Yellow'

In [22]:
# EXERCISE

# print an every second element of a list in a reverse order

L = [8, 4, 3, 6, 7]

# expected output [7, 3, 8]

# List Comprehensions

If you read enough Python code, you'll eventually come across the terse and efficient construction known as a *list comprehension*.
This is one feature of Python I expect you will fall in love with if you've not used it before; it looks something like this:

In [84]:
[i for i in range(20) if i % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

The result of this is a list of numbers which excludes multiples of 3.
While this example may seem a bit confusing at first, as familiarity with Python grows, reading and writing list comprehensions will become second nature.

## Basic List Comprehensions
List comprehensions are simply a way to compress a list-building for-loop into a single short, readable line.
For example, here is a loop that constructs a list of the first 12 square integers:

In [85]:
L = []
for n in range(12):
    L.append(n ** 2)
L

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

The list comprehension equivalent of this is the following:

In [86]:
[n ** 2 for n in range(12)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

#### As with many Python statements, you can almost read-off the meaning of this statement in plain English: "construct a list consisting of the square of ``n`` for each ``n`` up to 12".

#### This basic syntax, then, is ``[``*``expr``* ``for`` *``var``* ``in`` *``iterable``*``]``, where *``expr``* is any valid expression, *``var``* is a variable name, and *``iterable``* is any iterable Python object.

## Conditionals on List Comprehension
You can further control the list comprehension by adding a conditional to the end of the expression.
In the first example of the section, we iterated over all numbers from 1 to 20, but left-out multiples of 3.
Look at this again, and notice the construction:

In [87]:
[val for val in range(20) if val % 3 > 0]

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

The expression ``(i % 3 > 0)`` evaluates to ``True`` unless ``val`` is divisible by 3.
Again, the English language meaning can be immediately read off: "Construct a list of values for each value up to 20, but only if the value is not divisible by 3".
Once you are comfortable with it, this is much easier to write – and to understand at a glance – than the equivalent loop syntax:

In [88]:
L = []
for val in range(20):
    if val % 3 > 0:
        L.append(val)
L

[1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19]

In [107]:
# EXERCISE

# given a list of numbers, return the same list but without even numbers

L = [8, 3, 5, 4, 7]


## Tuples
Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets:

In [23]:
t = (1, 2, 3)

They can also be defined without any brackets at all:

In [24]:
t = 1, 2, 3
print(t)

(1, 2, 3)


Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:

In [26]:
len(t)

3

In [27]:
t[0]

1

#### The main distinguishing feature of tuples is that they are *immutable*: this means that once they are created, their size and contents cannot be changed:

In [28]:
t[1] = 4

TypeError: 'tuple' object does not support item assignment

In [29]:
t.append(4)

AttributeError: 'tuple' object has no attribute 'append'

## Dictionaries
Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation.
They can be created via a comma-separated list of ``key:value`` pairs within curly braces:

In [30]:
numbers = {'one':1, 'two':2, 'three':3}

Items are accessed and set via the indexing syntax used for lists and tuples, except here the index is not a zero-based order but valid key in the dictionary:

In [31]:
# Access a value via the key
numbers['two']

2

New items can be added to the dictionary using indexing as well:

In [32]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}


In [108]:
# EXERCISE

# test if 'twelve' is in dictionary `numbers` (use the operator `in`)


False

In [109]:
# now, add `twelve` to the dictionary so that the statement is True

numbers['twelve'] == 12 

True

## Sets

The fourth basic collection is the set, which contains unordered collections of unique items.
They are defined much like lists and tuples, except they use the curly brackets of dictionaries:

In [35]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

#### If you're familiar with the mathematics of sets, you'll be familiar with operations like the union (`|`), intersection (`&`), difference (`-`), symmetric difference (`^`), and others.

## Some More String Manipulation in Python

### Formatting strings: Adding and removing spaces

Another common need is to remove spaces (or other characters) from the beginning or end of the string.
The basic method of removing characters is the ``strip()`` method, which strips whitespace from the beginning and end of the line:

In [60]:
line = '         this is the content         '
line.strip()

'this is the content'

To remove just space to the right or left, use ``rstrip()`` or ``lstrip()`` respectively:

In [61]:
line.rstrip()

'         this is the content'

In [62]:
line.lstrip()

'this is the content         '

To remove characters other than spaces, you can pass the desired character to the ``strip()`` method:

In [63]:
num = "000000000000435"
num.strip('0')

'435'

In [113]:
# EXERCISE

# fix the following string so it has no spaces in it

string = '                wow'

'wow'

### Finding and replacing substrings

If you want to find occurrences of a certain character in a string, the ``find()``/``rfind()``, ``index()``/``rindex()``, and ``replace()`` methods are the best built-in methods.

``find()`` and ``index()`` are very similar, in that they search for the first occurrence of a character or substring within a string, and return the index of the substring:

In [64]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [65]:
line.index('fox')

16

The only difference between ``find()`` and ``index()`` is their behavior when the search string is not found; ``find()`` returns ``-1``, while ``index()`` raises a ``ValueError``:

In [66]:
line.find('bear')

-1

In [67]:
line.index('bear')

ValueError: substring not found

The related ``rfind()`` and ``rindex()`` work similarly, except they search for the first occurrence from the end rather than the beginning of the string:

In [68]:
line.rfind('a')

35

For the special case of checking for a substring at the beginning or end of a string, Python provides the ``startswith()`` and ``endswith()`` methods:

In [69]:
line.endswith('dog')

True

In [70]:
line.startswith('fox')

False

To go one step further and replace a given substring with a new string, you can use the ``replace()`` method.
Here, let's replace ``'brown'`` with ``'red'``:

In [71]:
line.replace('brown', 'red')

'the quick red fox jumped over a lazy dog'

The ``replace()`` function returns a new string, and will replace all occurrences of the input:

In [72]:
line.replace('o', '--')

'the quick br--wn f--x jumped --ver a lazy d--g'

In [114]:
# EXERCISE

# make so the new string have all letter 'a's and 'e's in upper case


'thE quick brown fox jumpEd ovEr A lAzy dog'

### Splitting strings

The ``split()`` method finds *all* instances of the split-point and returns the substrings in between.
The default is to split on any whitespace, returning a list of the individual words in a string:

In [73]:
line.split()

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']

A related method is ``splitlines()``, which splits on newline characters.
Let's do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō:

In [74]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']

Note that if you would like to undo a ``split()``, you can use the ``join()`` method, which returns a string built from a splitpoint and an iterable:

In [75]:
'--'.join(['1', '2', '3'])

'1--2--3'

A common pattern is to use the special character ``"\n"`` (newline) to join together lines that have been previously split, and recover the input:

In [76]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

matsushima-ya
aah matsushima-ya
matsushima-ya


In [125]:
# EXERCISE

# split the following string into a list, add a new element to the list and join it using '_'

s = 'the little piggy went to the'


'the_little_piggy_went_to_the_shop'

## Format Strings

In the preceding methods, we have learned how to extract values from strings, and to manipulate strings themselves into desired formats.
Another use of string methods is to manipulate string *representations* of values of other types.
Of course, string representations can always be found using the ``str()`` function; for example:

In [77]:
pi = 3.14159
str(pi)

'3.14159'

In [78]:
"The value of pi is " + str(pi)

'The value of pi is 3.14159'

A more flexible way to do this is to use *format strings*, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted.
Here is a basic example:

In [80]:
"The value of pi is {}".format(pi)

'The value of pi is 3.14159'

In [81]:
f"The value of pi is {pi}"

'The value of pi is 3.14159'

In [82]:
# EXERCISE

s = 'The quick brown fox.'

# Guess - then try to slice the string 's' to get
# a) 'quick'
# b) 'The'
# c) '.'

In [126]:
# EXERCISE

# test if strings are palindromes (= reads the same backwards as forwards)
# hint: to reverse the string x, use x[::-1] 

string1 = 'dad'
string2 = 'bear'
string3 = 'madam'

True
False
True
