<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 10px;"> 

#  Intro to Python: Data Types

_Authors: Tim Book, Noelle Brown_

---

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Learn to use basic Jupyter Notebook features
- Define integers, strings, tuples, lists, and dictionaries
- Demonstrate arithmetic operations and string operations

## Some Intro Concepts

- the biggest advantage of jupyter notebook for python coding is it allows cell-wise execution of codes. this is very helpful even at an expert coding level when we have large chunks of codes and they throw errors upon execution. breaking down code into smaller fragments not only helps in organization and readability but also eases error troubleshooting. generally from a good practice stand, it is better to organize and keep each cell performing one task
- before we go down further into the actual codes, it is worth taking a closer look at how presentable jupyter notebook platform makes housing content, in general, not just code. double click into the first code cell and you will see that it is actually a *markdown* and allows including links, styling text, and so on. for the hashtags which are used to define headings, the more number of `#`, the smaller the text font gets. for `*`, single *italicizes* text within while double will make it **bold**. underscore also gives an alternate way to italicize text. another styling future is to add font colors to the markdown text by using `<span style="color:blue">some blue text</span>`<span style="color:blue">some blue text</span>, where the color can be changed to other colors too, try it out later and you can use that to add more life to your code notebook
- I'm sure many or all of us have used Microsoft excel. the reason to bring up the good old excel is, for me personally in my coding journey, it has ALWAYS helped to draw reference to how I would execute something on Excel, then replicate that using code. Python can do EVERYTHING Excel allows and MUCH more right, so that is a super positive incentive to slowly upgrade and move towards Python programming

## First and Foremost: Python is a Calculator
_(...just like every other programming language)_

Let's learn some common mathematical operations:

In [1]:
# Addition --> same as spreadsheet
2+2

4

In [2]:
# Subtraction (note we can have negative numbers!) --> same as spreadsheet
3 - 7

-4

In [3]:
# Multiplication --> same as spreadsheet
5 * 2

10

In [4]:
# Division --> returns quotient post dividing the dividend (or numerator) by the divisor (denominator)
5 / 2

2.5

In [5]:
# Exponentiation (do NOT use ^) --> different from spreadsheet where ^ for exponentiation
5**2

25

In [6]:
# Modular division ("mod" for short) 
# --> (modulo division) returns remainder post dividing the dividend by the divisor
5 % 2

1

In [7]:
# Floor division (ie "round down" division) 
# --> returns rounded down integer from division operation's quotient
5 // 2

2

In [8]:
# /poll "What is `5 + 2 * 3`?" "21" "11" "idk" anonymous limit 1
# so far what we did was all simple 2 number operations, now BODMAS!
# Brackets>Orders (numbers raised to the powers)>Division>Multiplication>Addition>Subtraction
5 + 2 * 3

11

In [9]:
# /poll "What is `(5 + 2) * 3`?" "21" "11" "idk" anonymous limit 1
(5 + 2) * 3

21

## Variables
Great - Python is just a fancy calculator. It's also important for us to be able to save numbers as **variables** so we can reference them later without memorizing their value.

- when we need to store values for later accessibility/actions in our coding task, that's exactly where variables come in handy, they can be thought of as 'containers' in a simple context > containers that store values
- variable assignment happens with =. whatever is on the right, gets assigned to what is on the left

In [10]:
x = 3
y = 4
z = 2

In [11]:
(x + y) / z

3.5

## Naming Rules --> make them representative

You can _pretty much_ name variables whatever you want. But, there are a few rules we should follow. Some are strict, some are just good manners.

### Variable naming rules (mandatory)
- Names can only consist of numbers, letters and underscores.
- Names can't begin with numbers.
- You can't name a variable after a built-in Python keyword (eg `if`).

### Variable naming rules (good manners)
- Names should _**always**_ be descriptive (ie, don't name variables `x` and `df`)
- No capital letters! Except while naming a Python class (more on classes in a few weeks)
- Variables should not begin with an underscore (this means something special)
- Multi-word variables should be in `snake_case`. All lower case separated by underscores.
- Technically, you _can_ name variables after built-in Python _functions_ (like `print`), but it's an _extremely_ bad idea to do so.
    - Rule of thumb: If a variable name turns green, don't use it!

For a more details on recommended Python coding practices, please read the official Style Guide for Python Code: https://www.python.org/dev/peps/pep-0008/. Its not mandatory to follow these recommendations but highly recommended!
    
### Math exercise (sorry): --> variable understanding check
Recall the quadratic formula for solving a polynomial equation with coefficients $a$, $b$, $c$:

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

In [12]:
a = 1
b = -8
c = 15

In [13]:
discrim = b**2 - 4*a*c

In [14]:
# Slack thread: Give me the code to produce one of the two roots!
(-b + discrim**0.5) / (2*a)

5.0

## So, what is a "data type"?
When you hear the word "data", you probably think of a spreadsheet. Actually, **data is a synonym for information!** Anything that represents "information" is data. Including any and all Python variables. If I run `x = 3`, then `x` is data!

Data can come in various **types.** We've already seen two types!

1. The `int` type: Integers with no decimal part (eg `2`, `-30`, `14`)
1. The `float` type: Numbers with a decimal part, even if that part is zero (eg `2.5`, `3.141`, `2**0.5`, `-3.0`)

Curious about what an object's data type is? Simply use the `type()` function to ask! --> type() is a Python in-built function to check variable data types. we can either pass values directly as parameters or pass variables

```python
type(3) # int
type(4.2) # float
```

In [15]:
type(3)

int

In [16]:
type(4.2)

float

## Strings

---

Strings are how we store text data in Python. Strings are _strings of characters_ between either double quotes (`"`) or single quotes (`'`). Python doesn't care which as long as they match.

In [17]:
"The pen is mightier than the sword!"

'The pen is mightier than the sword!'

In [18]:
'Single quotes work just fine too.'

'Single quotes work just fine too.'

In [19]:
# Multi-line string --> notice code color change if the 3 quotes are replaced with 1 or 2, Python does not treat them as string!
multi_line_string = """If you have three
quotes in a row,
you can even have a string
that spans multiple lines!"""

In [20]:
print(multi_line_string) # --> print is one of the most widely used Python built-in funcs to print outputs

If you have three
quotes in a row,
you can even have a string
that spans multiple lines!


In [5]:
# Escape characters
"Backslashes allow you to have \"quotes\" inside your quotes!"

'Backslashes allow you to have "quotes" inside your quotes!'

The **print** command prints the value assigned to the variable `x` on the screen. 

The **print** statement removes the quotations, whereas just running the jupyter cell with `x` at the last line leaves the quotations in. 

If you need to display the value of a variable in the middle of the cell, then print is the only way to display it. Just typing `x` will only display the value if it is on the last line of the cell

You can use 'single' or "double" quotations to create a string variable.

In [22]:
print("Testing the print command")

Testing the print command


In [7]:
x = 42

x # This x WILL NOT get printed as it is not at the end of the cell
print("x =", x) # This x will get printed as it is within a print statement
print(f"x = {x}") # This x will also get printed. This is using the modern Python f-string to print a value within a string
x # This x WILL get printed as it is at the end of the cell

x = 42
x = 42


42

## String Math!
Besides simply storing text, we can also operate on strings. Everything in Python has a **type**, and types can be operated on with their respective **methods**. Methods are actions we can perform on a type using the following syntax:

```python
variable.method(parameters)
```
- Variable is of a certain data type like string. 
- Method is the action we want to perform on the variable. 
- Every method can take some parameters to control our output

In [9]:
s1 = "Be quiet"
s2 = "this is a library!"

In [10]:
s1 + s2 # see the difference in results for the same math operation done earlier on numbers

'Be quietthis is a library!'

In [14]:
reprimand = s1+' '+s2

In [15]:
str(reprimand)

'Be quiet this is a library!'

In [28]:
# Uppercasing is a method in Python
reprimand.upper()

'BE QUIET, THIS IS A LIBRARY!'

In [29]:
# Also lowercase
reprimand.lower()

'be quiet, this is a library!'

In [30]:
# There are plenty of commands. let's try out Jupyter's autocomplete
# feature to see what we can do! --> pressing tab lists various options to explore!
#reprimand.

In [31]:
# Let's have some fun with .replace()! --> press shift+tab to reveal documentation to help with method call
reprimand.replace("quiet", "loud").replace("library", "party").upper()

'BE LOUD, THIS IS A PARTY!'

In [32]:
# Also: An extremely useful method is .split()
reprimand.split(' ')

['Be', 'quiet,', 'this', 'is', 'a', 'library!']

## Slicin' Strings
We may also want to pick apart our strings. We can do this by **indexing** or **slicing**. In fact, you can index or slice several different types in Python. For example:

- Strings
- Lists
- Tuples
- Sets

---

All of the above types can be accessed using [] brackets in the following ways:

- **`s[0]`** References the first element. Python indices start at 0!
- **`s[0:4]`** References the first **4** elements of a string from index **`0`**. 
- **`s[-1]`** Reference the _first_ item in reverse order (or the last item).
- **`s[-2]`** Reference the _second_ item in reverse order (second to last item).
- **`s[0:-3]`** Reference everything _except the last 3_ elements.


In [16]:
s = "Python programming is really fun"

In [34]:
# getting length of string
len(s)

32

##### white board idea - use word Python to draw out positive and negative indexing to set context for tasks to follow
- general syntax: [start:stop (which executes stop - 1):step]

In [35]:
# First letter
s[0]

'P'

In [36]:
# Second letter
s[1]

'y'

In [37]:
# Second through fourth letter
s[1:4]

'yth'

In [18]:
# First 5 letters
s[:5]

'Pytho'

In [39]:
# Last letter
s[-1]

'n'

In [40]:
# Last 5 letters
s[-5:]

'y fun'

In [41]:
# THREAD: Get me the word "programming" from the string s.
# I want it two ways: Using slicing and using .split()

In [42]:
s[7:18]

'programming'

In [43]:
s.split(' ')[1]

'programming'

## Collection Types!

![](imgs/skittles.jpg)

We often want to store many values in one variable. A _collection_. There are several collection types in Python. The first and most common is...

### Lists
Lists are mutable, heterogeneous collections.

- **Mutable** = They can be changed
- **Heterogeneous** = They can hold values of different data types

In [44]:
names = ['Albert', 'Brenda', 'Carlos', 'Daenerys', 'Elon', 'Farnsworth'] # try replacing some strings to numbers, still works!
type(names)

list

In [45]:
# Reference 1st item
names[0]

'Albert'

In [46]:
# Reference last item
names[-1]

'Farnsworth'

In [47]:
# Every other name, starting with the third
names[2::2] # [start:end-1:step]

['Carlos', 'Elon']

In [48]:
# Backwards!
names[::-1]

['Farnsworth', 'Elon', 'Daenerys', 'Carlos', 'Brenda', 'Albert']

### List Operations

In [49]:
# Append --> alternative would be to use + operator, difference would be append effects change 'in-place' 
# vs + will need to assign back to variable to effect change
names.append('Gary')

In [50]:
names

['Albert', 'Brenda', 'Carlos', 'Daenerys', 'Elon', 'Farnsworth', 'Gary']

In [51]:
# Remove
names.remove('Daenerys')

In [52]:
names

['Albert', 'Brenda', 'Carlos', 'Elon', 'Farnsworth', 'Gary']

In [53]:
# Join???
'_'.join(names)

'Albert_Brenda_Carlos_Elon_Farnsworth_Gary'

### Tuples
Tuples are less used than lists, but very similar. They are immutable and heterogeneous

- **Immutable** = Once made, they can never be changed in place. (No .append() type functions)
- **Heterogeneous** = They can contain values of different types

For our purposes, you can just think of tuples as immutable lists. Their existence is partly legacy from a time when they were more useful. Traditionally they're only used to hold short sequences of variables.

In [21]:
family = ('Ken', 'Tina', 'Jeremy')

In [55]:
# Can slice and index like normal
family[0]

'Ken'

In [23]:
# Bzzzt! Illegal. Tuples are immutable.
# family.append('Chloe')

### Slight aside: Tuple unpacking
Tuples can be "unpacked". So can lists, but this is most common with tuples. This means that you can assign tuples elements to variables if you separate them by comma, like this:

In [57]:
instructor = ("Tim", "Book")
first, last = instructor

In [58]:
first

'Tim'

In [59]:
last

'Book'

We'll see tuple unpacking a few times throughout the course.

## Sets
We'll see sets pretty much never, but they're worth mentioning very briefly. They're **unordered, unique collections**. Just like traditional sets in a math class. Sets are pretty much only used when you need the unique elements from a list and you don’t care about the order in which the elements appear.
- one of the implication of being unordered --> cannot use indexing

In [24]:
my_grades = {'A', 'B+', 'A', 'C+', 'B-', 'B+'}
my_grades

{'A', 'B+', 'B-', 'C+'}

In [61]:
my_grades.add('A-')
my_grades

{'A', 'A-', 'B+', 'B-', 'C+'}

In [62]:
my_grades.remove('A')
my_grades

{'A-', 'B+', 'B-', 'C+'}

In [25]:
'Z' in my_grades

False

In [64]:
your_grades = {'B+', 'B-', 'F-'}

In [65]:
my_grades.intersection(your_grades)

{'B+', 'B-'}

In [66]:
my_grades.union(your_grades) # try .update (alternative), same result

{'A-', 'B+', 'B-', 'C+', 'F-'}

## Dictionaries!

![](imgs/phonebook.jpeg)

Dictionaries are very common. They're **unordered, mutable key-value pairs**. Think of them like an actual dictionary. The key is the "word" and the value is the "definition".
- defined within {}

In [26]:
music = {'doe': 'A deer, a female deer', 'ray': 'A drop of golden sun'}

In [27]:
# Indexing --> happens by calling  dict_variable[key] not by passing index numbers like 0, 1 so on bec they are unordered
music['doe']

'A deer, a female deer'

In [28]:
# Bzzt! Remember, dictionaries are unordered. No such thing as "first" element
# music[0]

In [29]:
music['me'] = 'A name I call myself'
music

{'doe': 'A deer, a female deer',
 'ray': 'A drop of golden sun',
 'me': 'A name I call myself'}

In [30]:
# This is how you can delete a key. But keep in mind, if you need to do this, you're
# better off with a different data type. Perhaps a custom class.
# (We'll learn about classes and OOP in a few weeks).
del music['doe']

In [31]:
music

{'ray': 'A drop of golden sun', 'me': 'A name I call myself'}

In [32]:
# What happens when we attempt to access a missing entry?
# music['doe']

KeyError: 'doe'

In [33]:
# You often want to have a "default" value for keys that don't exist.
# We can do this with the .get() method.
# Fun fact: some people ONLY access dictionary keys with the .get().
# This is starting to gain some traction and is thought to be a pretty good idea.
music.get('doe', 'MISSING SOFLEGE!')

'MISSING SOFLEGE!'

## Dictionaries are a big deal!

Dictionaries can get really big and really complicated, like the one below. You might think this is excessive, but it's very common. This is a very efficient way to store complicated data that don't fit neatly in a spreadsheet. In fact, dictionaries are the data type used by most web APIs! We'll need to parse big dictionaries to get data from the internet!

In [34]:
authors = {
    "J.R.R. Tolkien": {
        "genre": "fantasy",
        "books": [
            "The Fellowship of the Ring",
            "The Two Towers",
            "The Return of the King"
        ],
        "active": False
    },
    "Brandon Sanderson": {
        "genre": "fantasy",
        "books": [
            "The Way of Kings",
            "Words of Radiance",
            "Oathbringer"
        ],
        "active": True,
        "phone": {
            "home": "(281) 330-8004",
            "work": "(877) CASH-NOW"
        }
    },
    "Frank Herbert": {
        "genre": "science fiction",
        "books": ["Dune"],
        "phone": None,
        "active": False
    }
}

In [76]:
# THREAD: Get me Tokien's second book

# method 1 
print(authors['J.R.R. Tolkien']['books'][1])

# method 2
print(authors.get('J.R.R. Tolkien').get('books')[1])

The Two Towers
The Two Towers


In [38]:
authors['J.R.R. Tolkien']

{'genre': 'fantasy',
 'books': ['The Fellowship of the Ring',
  'The Two Towers',
  'The Return of the King'],
 'active': False}

In [42]:
authors['J.R.R. Tolkien']['genre']

'fantasy'

In [77]:
# THREAD: I need to call Brandon Sanderson about an idea for a screenplay.
# Can you get me his work number?
authors['Brandon Sanderson']['phone']['work']

'(877) CASH-NOW'

## Booleans

![](imgs/boole.jpg)

Booleans are variables that only have two different values: `True` and `False`. They're named after their founder, **George Boole** and will come in real handy when we discuss control flow this afternoon.

Booleans really only have three operations you can perform on them: `not`--> opposites, `and`--> all conditions must be met, and `or`--> any condition met.

NOT Table

|  Input  | Output |
|:-------:|:------:|
|   True  |  False |
|  False  |   True |

AND Table

| Input 1 | Input 2 | Output |
|:-------:|:-------:|:------:|
|   True  |   True  |  True  |
|   True  |  False  |  False |
|  False  |   True  |  False |
|  False  |  False  |  False |

OR Table

| Input 1 | Input 2 | Output |
|:-------:|:-------:|:------:|
|   True  |   True  |  True  |
|   True  |  False  |  True  |
|  False  |   True  |  True  |
|  False  |  False  | False  |

In [78]:
# not: Simply gives the opposite
not True

False

In [79]:
not False

True

In [80]:
# and: output only yields True if both input 1 and input 2 are true
sky_blue = True
grass_green = True
pigs_fly = False

In [81]:
sky_blue and pigs_fly

False

In [82]:
sky_blue and grass_green

True

In [83]:
# or: output only yields false if both input 1 and input 2 are false
matt_cool = False

In [84]:
sky_blue or pigs_fly

True

In [85]:
pigs_fly or matt_cool

False

## Cool story, Boole
So what? We rarely actually define variables to be `True` or `False`. More often, we get them from asking Python math problems.

In [86]:
# Greater than
5 > 3

True

In [87]:
# Less than
5 < 3

False

In [88]:
# Greater than or equal to
3 >= 3

True

In [89]:
# THREAD: Fun stuff --> recall BODMAS!
(3 > 2) and ((5 <= 5) or (10 < 3))

True

In [90]:
# Not equals to
3 != 4

True

In [91]:
# Equals to
5 == 4

False

## Food for thought
- Why does `0.2 + 0.1 == 0.3` yield the answer it does?
- Why does `True == 1` yield the answer it does?
- Why does `"3" + 3` yield an error?
- What happens when you add two lists?
- What happens when you multiply a list (or a string!) by an integer? Why does this happen?
    - e.g. `"*" * 20` or `[1, 2, 3, 4] * 2`

#1: Floating point variables typically have this behaviour. It's caused by how they are stored in hardware. using round() function can ensure this is taken care of and return expected result which is True

In [92]:
0.2 + 0.1

0.30000000000000004

In [93]:
0.2 + 0.1 == 0.3

False

In [94]:
round((0.2 + 0.1),1) == 0.3

True

#2: True is mapped to 1 and False to 0

In [95]:
int(True)

1

#3: string and integer addition. can do '3' + '3' or 3 + 3

In [96]:
'3' + 3

TypeError: can only concatenate str (not "int") to str

#4: What happens when you add two lists? --> creates one combined list

#5: Multiply list, string by integer

In [45]:
[1,2,3,4] * 2

[1, 2, 3, 4, 1, 2, 3, 4]

## Practice Problems
1. The sum of two numbers is 104 and their difference is 32. What is the value of the larger number?

In [98]:
# a + b = 104
# a - b = 32
# Add these together: 2a = (104 + 32)
# Divide by 2 to get a
a = (104 + 32) / 2
b = 104 - a
print('a =', a, ', b =', b)

a = 68.0 , b = 36.0


2. You are given a list of numbers: `numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]`. Print out the positive even numbers in the list in reverse order.

In [99]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
# method 1 --> all in solution
numbers[-1::-2]

[20, 18, 16, 14, 12, 10, 8, 6, 4, 2]

In [100]:
# method 2 --> break down into smaller steps
even_numbers = numbers[1::2]
even_numbers[::-1]

[20, 18, 16, 14, 12, 10, 8, 6, 4, 2]

In [101]:
# method 3 --> same as#2, but with reverse() method
even_numbers.reverse()
even_numbers

[20, 18, 16, 14, 12, 10, 8, 6, 4, 2]

3. Given the following dictionary `my_dict`, construct a new dictionary called `names` where the keys are the names of the people and the values are cities those people live in.

```python
my_dict = {0: {'name': 'Noelle',
             'city': 'DEN',
             'state': 'CO'},
          1: {'name': 'Dan',
             'city': 'LA',
             'state': 'CA'},
          2: {'name': 'Riley',
             'city': 'AUS',
             'state': 'TX'}
        }
```

In [46]:
my_dict = {0: {'name': 'Noelle',
             'city': 'DEN',
             'state': 'CO'},
          1: {'name': 'Dan',
             'city': 'LA',
             'state': 'CA'},
          2: {'name': 'Riley',
             'city': 'AUS',
             'state': 'TX'}
        }

In [103]:
# obtaining new list's keys: note - print is a MUST when we want multiple outputs, else only last
# line prints
print(my_dict[0]['name'])
print(my_dict[1]['name'])
print(my_dict[2]['name'])

Noelle
Dan
Riley


In [104]:
# obtaining new list's values
print(my_dict[0]['city'])
print(my_dict[1]['city'])
print(my_dict[2]['city'])

DEN
LA
AUS


In [105]:
names = {}
names[my_dict[0]['name']] = my_dict[0]['city']
names[my_dict[1]['name']] = my_dict[1]['city']
names[my_dict[2]['name']] = my_dict[2]['city']

names

{'Noelle': 'DEN', 'Dan': 'LA', 'Riley': 'AUS'}

## Today we covered...
- Basic Jupyter Notebook use
- Basic math in Python
- String manipulation in Python
- Collection data types in Python
- Booleans in Python