<img src="https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png" style="float: left; margin: 10px;"> 
#  Intro to Python: Data Types

_Authors: Tim Book, Noelle Brown_

---

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Learn to use basic Jupyter Notebook features
- Define integers, strings, tuples, lists, and dictionaries
- Demonstrate arithmetic operations and string operations

## First and Foremost: Python is a Calculator
_(...just like every other programming language)_

Let's learn some common mathematical operations:

In [14]:
2 + 2.5 # Addition

SyntaxError: invalid syntax (<ipython-input-14-5ca5564c12fa>, line 2)

In [2]:
10 - 4.5 # Subtraction (note we can have negative numbers!)

5.5

In [3]:
5 * 10.25 # Multiplication

51.25

In [4]:
40 / 10 # Division

4.0

In [5]:
2**256 # Exponentiation (do NOT use ^) ---> Different from Excel where we use ^ for exponentiation

115792089237316195423570985008687907853269984665640564039457584007913129639936

In [6]:
4 % 2 # Modular division ("mod" for short) --> Finding the remainder of the division operation
## Very important in loops if we determining odd/even numbers

0

In [7]:
5 // 2 
# Floor division (ie "round down" division) --> Rounded down from division operation's quotient

2

In [None]:
# Brackets > Orders > Division / Multiplication > Addition / Subtraction

## Variables
Great - Python is just a fancy calculator. It's also important for us to be able to save numbers as **variables** so we can reference them later without memorizing their value.

In [23]:
# Mathematics
## "=" means assignment; "==" means conditional check
x = 3
y = 4
z = 2

ans = (x+y) / 2

# Text
message = "Hello, I am really excited to be embarking on my Data Science journey"

In [28]:
print(f'The answer: {ans}')
print(message)

The answer: 3.5
Hello, I am really excited to be embarking on my Data Science journey


## Naming Rules

> _There are only two hard things in Computer Science: cache invalidation and naming things._ - Phil Karlton

You can _pretty much_ name variables whatever you want. But, there are a few rules we should follow. Some are strict, some are just good manners.

### Variable naming rules (mandatory)
- Names can only consist of numbers, letters and underscores.
- Names can't begin with numbers.
- You can't name a variable after a built-in Python keyword (eg `if`).

### Variable naming rules (good manners)
- Names should _**always**_ be descriptive (ie, don't name variables `x` and `df`)
- No capital letters!
- Variables should not begin with an underscore (this means something special)
- Multi-word variables should be in `snake_case`. All lower case separated by underscores.
- Technically, you _can_ name variables after built-in Python _functions_ (like `print`), but it's an _extremely_ bad idea to do so.
    - Rule of thumb: If a variable name turns green, don't use it!
    
### Math exercise (sorry):
Recall the quadratic formula for solving a polynomial equation with coefficients $a$, $b$, $c$:

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

In [18]:
a = 1
b = -8
c = 15

#Variables need to be stored first

In [20]:
x1 = (-b + (b**2 - 4*a*c)**(0.5))/(2*a)
x2 = (-b - (b**2 - 4*a*c)**(0.5))/(2*a)
print(x1, x2)

5.0 3.0


## So, what is a "data type"?
When you hear the word "data", you probably think of a spreadsheet. Actually, **data is a synonym for information!** Anything that represents "information" is data. Including any and all Python variables. If I run `x = 3`, then `x` is data!

Data can come in various **types.** We've already seen two types!

1. The `int` type: Integers with no decimal part (eg `2`, `-30`, `14`)
1. The `float` type: Numbers with a decimal part, even if that part is zero (eg `2.5`, `3.141`, `2**0.5`, `-3.0`)

Curious about what an object's data type is? Simply use the `type()` function to ask!

```python
type(3) # int
type(4.2) # float
```

In [30]:
type(3)

int

In [31]:
type(4.2)

float

## Strings

---

Strings are how we store text data in Python. Strings are _strings of characters_ between either double quotes (`"`) or single quotes (`'`). Python doesn't care which as long as they match.


In [36]:
"The pen is mightier than the sword!"

'The pen is mightier than the sword!'

In [37]:
'Single quotes work just fine too.'

'Single quotes work just fine too.'

In [38]:
# Multi-line string --> Notice code colour cange if the 3 quotes are replaced with 1 or 2, Python does not treat them as a string
multi_line_string = '''When you walk through the storm,
Hold your head up high'''

In [39]:
print(multi_line_string)

When you walk through the storm,
Hold your head up high


In [None]:
# Escape characters --> Backslashes allow you to have quotes inside your quotes

The **print** command prints the value assigned to the variable `x` on the screen. 

The **print** statement removes the quotations, whereas just running they jupyter cell with `x` at the last line leaves the quotations in.

You can use 'single' or "double" quotations to create a string variable.

## String Math!
Besides simply storing text, we can also operate on strings. Everything in Python has a **type**, and types can be operated on with their respective **methods**. Methods are actions we can perform on a type using the following syntax:

```python
variable.method(parameters)
```

In [41]:
s1 = "Be quiet"
s2 = "this is a library!"

In [42]:
s1 + s2

'Be quietthis is a library!'

In [48]:
reprimand = s1 + " " + s2
type(reprimand)

str

In [44]:
str(reprimand)

'Be quiet this is a library!'

In [45]:
# Uppercasing is a method in Python
reprimand.upper()

'BE QUIET THIS IS A LIBRARY!'

In [46]:
# Also lowercase
reprimand.lower()

'be quiet this is a library!'

In [None]:
# There are plenty of commands. let's try out Jupyter's autocomplete
# feature to see what we can do!

In [60]:
# Let's have some fun with .replace()!
reprimand.replace('quiet', 'loud').replace('library', 'party').upper()

'BE LOUD THIS IS A PARTY!'

In [64]:
# Also: An extremely useful method is .split()
reprimand.split(' ')

['Be', 'quiet', 'this', 'is', 'a', 'library!']

## Slicin' Strings
We may also want to pick apart our strings. We can do this by **indexing** or **slicing**. In fact, you can index or slice several different types in Python. For example:

- Strings
- Lists
- Tuples
- Sets

---

All of the above types can be accessed using brackets in the following ways:

- **`s[0]`** References the first element
- **`s[0:4]`** References the first **4** elements of a string from index **`0`**.
- **`s[-1]`** Reference the _first_ item in reverse order (or the last item).
- **`s[-2]`** Reference the _second_ item in reverse order (second to last item).
- **`s[0:-3]`** Reference everyting _execept the last 3_ elements.


In [50]:
s = "Python programming is really fun"

In [51]:
len(s)

32

In [53]:
# First letter
s[0]

'P'

In [55]:
# Second letter
s[1]

'y'

In [56]:
# Second through fourth letter
s[1:4]

'yth'

In [59]:
# First 5 letters
s[0:5]

'Pytho'

In [65]:
# Last letter
s[-1]

'n'

In [82]:
# Last 5 letters
s[-5:]

'y fun'

In [69]:
s[-5:]

'y fun'

In [86]:
#Retrieve the word "programming" from the string s
s.split()[1]

'programming'

## Collection Types!

![](imgs/skittles.jpg)

We often want to store many values in one variable. A _collection_. There are several collection types in Python. The first and most common is...

### Lists
Lists are mutable, heterogeneous collections.

- **Mutable** = They can be changed
- **Heterogeneous** = They can hold values of different data types

In [88]:
names = ['Adi', 'Boom', 'Charlie', 'Daenerys', 'Elon', 'Francine']
type(names)

list

In [89]:
# Reference 1st item
names[0]

'Adi'

In [90]:
# Reference 2nd item
names[1]

'Boom'

In [92]:
# Every other name, starting with the third -->tart(from):end(to):step(by)
names[2::2]

['Charlie', 'Elon']

In [78]:
# Backwards!
names[::-1]

['Francine', 'Elon', 'Daenerys', 'Charlie', 'Boom', 'Adi']

### List Operations

In [79]:
# Append
names.append('Gary')

In [80]:
names

['Adi', 'Boom', 'Charlie', 'Daenerys', 'Elon', 'Francine', 'Gary']

In [None]:
# Remove

In [81]:
names.remove('Elon')

In [93]:
# Join???
"_".join(names)

'Adi_Boom_Charlie_Daenerys_Elon_Francine'

### Tuples
Tuples are less used than lists, but very similar. They are immutable and heterogeneous

- **Immutable** = Once made, they can never be changed.
- **Heterogeneous** = They can contain values of different types

For our purposes, you can just think of tuples as immutable lists. Their existence is partly legacy from a time when they were more useful. Traditionally they're only used to hold short sequences of variables.

In [95]:
family = ('Ken', 'Tina', 'Jeremy')

In [96]:
# Can slice and index like normal
family[0]

'Ken'

In [110]:
# Bzzzt! Illegal. Tuples are immutable --> Usually used for values that are fixed.
# family.append('Chloe')

### Slight aside: Tuple unpacking
Tuples can be "unpacked". So can lists, but this is most common with tuples. This means that you can assign tuples elements to variables if you separate them by comma, like this:

In [98]:
instructor = ("Noelle", "Brown")
first, last = instructor #Unpacking b elements

In [99]:
first

'Noelle'

In [100]:
last

'Brown'

We'll see tuple unpacking a few times throughout the course.

## Sets
We'll see sets pretty much never, but they're worth mentioning very briefly. They're unordered, unique collections. Just like traditional sets in a math class.

In [101]:
# Sets are seldom used these days; Unable to index because they are unordered
my_grades = {'A', 'B+', 'A', 'C+', 'B-', 'B+', 'F'}
my_grades

{'A', 'B+', 'B-', 'C+', 'F'}

In [102]:
my_grades.add('A-')
my_grades

{'A', 'A-', 'B+', 'B-', 'C+', 'F'}

In [103]:
my_grades.remove('A')
my_grades

{'A-', 'B+', 'B-', 'C+', 'F'}

In [104]:
'B+' in my_grades

True

In [105]:
your_grades = {'B+', 'B-', 'F-'}

In [106]:
my_grades.intersection(your_grades)

{'B+', 'B-'}

In [107]:
my_grades.union(your_grades)

{'A-', 'B+', 'B-', 'C+', 'F', 'F-'}

## Dictionaries!

![](imgs/phonebook.jpeg)

Dictionaries are very common. They're **unordered, mutable** key-value pairs. Think of them like an actual dictionary. The key is the "word" and the value is the "definition".

In [115]:
music = {'doe': 'A deer, a female deer', 'ray': 'A drop of golden sun'}

In [116]:
# Indexing
music['doe']

'A deer, a female deer'

In [117]:
# Bzzt! Remember, dictionaries are unordered. No such thing as "first" element
# music[0]

In [118]:
music['me'] = 'A name I call myself'

In [119]:
# This is how you can delete a key. But keep in mind, if you need to do this, you're
# better off with a different data type. Perhaps a custom class.
# (We'll learn about classes and OOP in a few weeks).

In [120]:
del music['doe']
music

{'ray': 'A drop of golden sun', 'me': 'A name I call myself'}

In [113]:
# What happens when we attempt to access a missing entry?
# music['doe']

In [121]:
# You often want to have a "default" value for keys that don't exist.
# We can do this with the .get() method.
# Fun fact: some people ONLY access dictionary keys with the .get().
# This is starting to gain some traction and is thought to be a pretty good idea.
music.get('doe', 'MISSING ENTRY')

'MISSING ENTRY'

## Dictionaries are a big deal!

Dictionaries can get really big and really complicated, like the one below. You might think this is excessive, but it's very common. This is a very efficient way to store complicated data that don't fit neatly in a spreadsheet. In fact, dictionaries are the data type used by most web APIs! We'll need to parse big dictionaries to get data from the internet!

In [122]:
authors = {
    "J.R.R. Tolkien": {
        "genre": "fantasy",
        "books": [
            "The Fellowship of the Ring",
            "The Two Towers",
            "The Return of the King"
        ],
        "active": False
    },
    "Brandon Sanderson": {
        "genre": "fantasy",
        "books": [
            "The Way of Kings",
            "Words of Radiance",
            "Oathbringer"
        ],
        "active": True,
        "phone": {
            "home": "(281) 330-8004",
            "work": "(877) CASH-NOW"
        }
    },
    "Frank Herbert": {
        "genre": "science fiction",
        "books": ["Dune"],
        "phone": None,
        "active": False
    }
}

In [123]:
# Get Tolkien's second book
print(authors['J.R.R. Tolkien']['books'][1])

The Two Towers


In [124]:
# Get Brandon's work number please
print(authors['Brandon Sanderson']['phone']['work'])

(877) CASH-NOW


## Booleans

![](imgs/boole.jpg)

Booleans are variables that only have two different values: `True` and `False`. They're named after their founder, **George Boole** and will come in real handy when we discuss control flow this afternoon.

Booleans really only have three operations you can perform on them: `not`, `and`, and `or`.

In [128]:
# not: Simply gives the opposite; Reverses polarity
not True

False

In [130]:
not False

True

In [132]:
# and: A and B only yields True if both A and B are true
sky_blue = True
grass_green = True
pigs_fly = False

In [134]:
sky_blue and pigs_fly

False

In [135]:
sky_blue and grass_green

True

In [136]:
# or: A and B only yields false if both A and B are false
matt_cool = False

In [137]:
sky_blue or pigs_fly

True

In [138]:
pigs_fly or matt_cool

False

## Cool story, Boole
So what? We rarely actually define variables to be `True` or `False`. More often, we get them from asking Python math problems.

In [139]:
# Greater than
5 > 3

True

In [141]:
# Less than
5 < 3

False

In [142]:
# Greater than or equal to
3>= 3

True

In [148]:
# THREAD: Fun stuff
(3 >2) and ((5 <= 5) or (10 < 3))

True

In [145]:
# Not equals to
2 != 3

True

In [149]:
## Equals to
5 == 4

False

## Food for thought
- Why does `0.2 + 0.1 == 0.3` yield the answer it does?
- Why does `True == 1` yield the answer it does?
- Why does `"3" + 3` yield an error?
- What happens when you add two lists?
- What happens when you multiply a list (or a string!) by an integer? Why does this happen?
    - e.g. `"*" * 20` or `[1, 2, 3, 4] * 2`

In [154]:
0.2 + 0.1 == 0.3
#Floating point behaviour (Ans: 0.3000004)

False

In [155]:
#Always use the function round to prevent an error
round((0.2 + 0.3), 1) == 0.3

False

In [156]:
True == 1 #True is mapped to 1 and False is to 0

True

In [None]:
#"3" + 3 yields an error because they are of different datatypes

In [None]:
# Adding two lists creates one combined list

In [157]:
# Multiply list, string by integer

# Practice Problems
1. The sum of two numbers is 104 and their difference is 32. What is the value of the larger number?

In [173]:
#x + y == 104 and x - y == 32
# simplifying 2y == 104 -32
y = (104-32)/2
x = y + 32
print(x, y)

68.0 36.0


You are given a list of numbers below); Print out the positive even numbers in the list in reverse order

In [169]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [185]:
print(numbers[::-2])

# Can also use reverse() method

even_numbers = numbers[1::2]
even_numbers.reverse()
print(even_numbers)

[20, 18, 16, 14, 12, 10, 8, 6, 4, 2]
[20, 18, 16, 14, 12, 10, 8, 6, 4, 2]


3. Given the following dictionary `my_dict`, construct a new dictionary called `names` where the keys are the names of the people and the values are cities those people live in.

```python
my_dict = {0: {'name': 'Noelle',
             'city': 'DEN',
             'state': 'CO'},
          1: {'name': 'Dan',
             'city': 'LA',
             'state': 'CA'},
          2: {'name': 'Riley',
             'city': 'AUS',
             'state': 'TX'}
        }
```

In [196]:
my_dict = {0: {'name': 'Noelle',
             'city': 'DEN',
             'state': 'CO'},
          1: {'name': 'Dan',
             'city': 'LA',
             'state': 'CA'},
          2: {'name': 'Riley',
             'city': 'AUS',
             'state': 'TX'}
        }

In [199]:
names = {}
for dict_key, dict_value in my_dict.items():
    names[dict_value.get('name')] = dict_value.get('city')

print(names)

{'Noelle': 'DEN', 'Dan': 'LA', 'Riley': 'AUS'}


## Today we covered...
- Basic Jupyter Notebook use
- Basic math in Python
- String manipulation in Python
- Collection data types in Python
- Booleans in Python