# Basic Python (And Coding) - Data Types

Let's start in ernest now.

We're assuming that you're completely unfamiliar with python/coding so we'll start at the most basic levels and build from there.

Up first are data types, a key for data science!

## Comments vs Code

Like most coding languages, python allows you to write comments in order to help explain code. Comments are such that when you execute the code, they are ignored and have no impact on what the computer is told to do. In python comments are denoted with a `#` symbol. 

In [1]:
## This is a comment!

## Numeric Data Types

Numeric data types are exactly how they sound, they are number data.

### `int`

An `int` is how python represents an integer (like $1$, $2$, $-71$, etc.). If you're familiar with other programming languages, you might be wondering if there are bounds on how large (or small) an integer you can have. In python 3 the only bound is on your computer's memory.

Here is a link to the python 3 documentation on `int`s, <a href="https://docs.python.org/3/library/functions.html#int">https://docs.python.org/3/library/functions.html#int</a>.

In [2]:
## This is an int
4

4

In [3]:
## We can check the type of an object in python
## using type()
type(4)

int

In [4]:
## You code
## check the type of two ints added together
type(7+1)


int

In [5]:
## You code 
## check the type of two ints multiplied

type(8*9)


int

In [6]:
## You code
## check the type of one int divided by another
## Is it necessarily an int?
type(9/5)


float

### `float`

A `float` is a floating point number, for non-computer people this just means any real number (within some precision determined by your computer's hardware). You may have encountered this in the previous code chunk if your numerator wasn't divisble by your denominator.

Here is a link to the python documentation for `float`s, <a href="https://docs.python.org/3/library/functions.html#float">https://docs.python.org/3/library/functions.html#float</a>.

In [7]:
## This is a float
3.2

3.2

In [8]:
## You code
## Check the type of 4/3
type(4/3)



float

In [10]:
## You code
## what happens if you put 2.3
## in int()?
type(int(2.3))


int

#### Variables

Now maybe we don't always want to program numbers by hand or we want to store a value for reuse, in come variables.

In [11]:
## you store a value in a variable like so
## here we've named the variable x
x = 4.3

In [12]:
x

4.3

### `bool`

A `bool` is a boolean or logical object, meaning a `True` or a `False`. These are incredibly useful in programming as we'll see in the next notebook.

Here's the python documentation on `bool` objects, <a href="https://docs.python.org/3/library/functions.html#bool">https://docs.python.org/3/library/functions.html#bool</a>.

In [13]:
## This is the boolean value for something that is True
True

True

In [14]:
type(False)

bool

In [15]:
## You code
## what happens when you put False inside of int()?

int(False)


0

In [16]:
## You code
## What happens when you put 1 inside of bool()?

bool(1)



True

In [20]:
## You code
## What happens when you put 0 inside of bool()?
bool(0)




False

In [18]:
## You code
## What about putting 3.14 inside of bool()?

bool(3.14)




True

Note that in python `bool()` of any non-zero `int` or `float` results in a `True`.

### `str`

An `str` is a python string, aka a piece of text like a word, sentence, or paragraph.

Here is the python documentation on `str`s, <a href="https://docs.python.org/3/library/stdtypes.html#textseq">https://docs.python.org/3/library/stdtypes.html#textseq</a>.

In [23]:
"This is an 'str' object"

"This is an 'str' object"

In [24]:
## THINKING TIME
## What do you think will happen when you run the following code
"Line 1"
"Line 2"

'Line 2'

In [25]:
## You code
## copy and paste the code chunk from above
## now put "Line 1" and "Line 2" inside of print()
## and rerun the code
## What happens?
print("Line 1")
print("Line 2")



Line 1
Line 2


`print()` takes in a series of python objects (separated by commas) and prints them out to your monitor. Note that in `jupyter notebook`s only the last line of executed code is displayed by default, so if you want something displayed in the middle of a code chunk, use a `print()`.

In [26]:
## Note strs can also be denoted with single-quotation marks
print('This is an str too!')

This is an str too!


In [27]:
## You can "concatenate" multiple strs with a + symbol
print("thing 1" + " and " + "thing 2")

thing 1 and thing 2


#### Built-in `str` Functions

Python `str`s have a number of useful built-in functions. Let's look at a couple of them here.

In [28]:
## Put your name in here
name = "Matt Osborne"

In [29]:
print("name is an", type(name))

name is an <class 'str'>


In [30]:
## .lower() lowers all the characters in the string
print(name.lower())

matt osborne


In [31]:
## .upper() capitalizes all the characters in the string
print(name.upper())

MATT OSBORNE


In [32]:
sentence = "This, is, a, comma, heavy, sentence."

In [33]:
## .replace() will replace substrings with other substrings
print(sentence.replace(",","!"))

This! is! a! comma! heavy! sentence.


In [36]:
## You code
## put the string "2.5" in float()
## What happens?
float("2.5")

2.5

### `list`s

A list is a collection of python objects (for example, `int`s, `float`s and `str`s).

They are <i>mutable</i> meaning that you can change them once they are created. 

Here's a link to the python documentation on lists <a href="https://docs.python.org/3/c-api/list.html">https://docs.python.org/3/c-api/list.html</a>.

In [37]:
## Lists are made by placing objects
## within square brackets, separated by commas
["This", "is", "a", "list", "of", "strs"]

['This', 'is', 'a', 'list', 'of', 'strs']

In [38]:
## Lists can consist of more than one kind of
## python object, even other lists
["This",15,"a",["list"],True]

['This', 15, 'a', ['list'], True]

`list`s can be indexed, meaning you can access specific entries using numerical indices. In python a `list`'s index begins with `0`, so the first entry is the `0`th entry. You can also index in reverse using negative numbers, where the reverse goes from the right, beginning with `-1`.

In [39]:
## run this code
##                0       1         2       3       4        5
##                -6      -5        -4      -3      -2      -1
fruit_basket = ["apple","banana","grapes","kiwi","lemon","grapefruit"]

In [40]:
## You can index a list like so
fruit_basket[0]

'apple'

In [41]:
## Here's an example with the reverse indexing
fruit_basket[-4]

'grapes'

In [42]:
## You code
## What is the 3rd entry of the fruit_basket?
## What about the 2nd entry from the right?

print(fruit_basket[2])
print(fruit_basket[-2])


grapes
lemon


In [44]:
## You code
## Index the list with 1:4, what is returned?
fruit_basket[1:4]



['banana', 'grapes', 'kiwi']

##### `list`s are mutable

Remember that I said `list`s are mutable, meaning you can change the entries at will.

In [46]:
## You code
## overwrite the "lemon" in the fruit_basket with a "lime"
fruit_basket[-2] = "lime"

fruit_basket

['apple', 'banana', 'grapes', 'kiwi', 'lime', 'grapefruit']

#### Built-in `list` Methods

Python also has a number of handy built-in `list` functions, let's look at a couple now.

In [47]:
## We can add an entry to the end of any list
## with .append()
fruit_basket.append("plum")

In [48]:
fruit_basket

['apple', 'banana', 'grapes', 'kiwi', 'lime', 'grapefruit', 'plum']

In [49]:
## You can add a bunch of entries all at once
## using .extend(another_list)
fruit_basket.extend(["pineapple","mango","coconut"])

In [50]:
fruit_basket

['apple',
 'banana',
 'grapes',
 'kiwi',
 'lime',
 'grapefruit',
 'plum',
 'pineapple',
 'mango',
 'coconut']

In [51]:
## You code
## Run the following code and see what .sort() does
fruit_basket.sort()

In [52]:
fruit_basket

['apple',
 'banana',
 'coconut',
 'grapefruit',
 'grapes',
 'kiwi',
 'lime',
 'mango',
 'pineapple',
 'plum']

##### Indexing `str`s

Similar to `list`s, `str`s can also be indexed.

In [54]:
## You code
## Find the 4th and 7th letter in this string
test_string = "apple bottom jeans"

print("4th letter of", test_string, "is", test_string[3])
print("7th letter of", test_string, "is", test_string[6])



4th letter of apple bottom jeans is l
7th letter of apple bottom jeans is b


### `tuple`s

A tuple is also a collection of general python objects (like a `list`) that is not mutable (unlike a `list`). We won't spend as much time on these.

Here's a link to the documentation on `tuple`s <a href="https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences">https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences</a>.

In [55]:
## A tuple is made by placing distinct objects
## between parantheses separated by commas
veggie_tray = ("carrots", "corn", "brussels sprout", "brocoli")

In [58]:
## You Code
## What is the first entry in the veggie tray from the left or right?
veggie_tray[0]
veggie_tray[-1]



'brocoli'

##### `tuple`s are not mutable

In [59]:
## You code
## Try to overwrite the "corn" with "peas"
## What happens?
veggie_tray[1] = "peas"



TypeError: 'tuple' object does not support item assignment

In [61]:
## You code
## Put veggie_tray inside of list()
## store it in veggie_tray_list
## Can you overwirte "corn" now?
veggie_tray_list = list(veggie_tray)

veggie_tray_list[1] = "peas"

veggie_tray_list

['carrots', 'peas', 'brussels sprout', 'brocoli']

### `set`s

Python also has a `set` object that is very similar to the mathematical notion of a set.

Here's a link to the python documentation on `set`s, <a href="https://docs.python.org/3/tutorial/datastructures.html#sets">https://docs.python.org/3/tutorial/datastructures.html#sets</a>.

In [62]:
## A set can be made by placing disinct
## python objects between {}, separated by commas
{"this","is","a","set"}

{'a', 'is', 'set', 'this'}

In [63]:
type({"this","is","a","set"})

set

In [64]:
## You can also use the command set()
## This can be used to remove repeats from lists or tuples

## you and your pal made a grocery list
## but didn't check if the other person
## already put an item on their list
## use set() to make a more succinct grocery_list
grocery_list = ["apples","bananas","beef","onion powder",
                   "apples","mushrooms","milk","almond milk",
                   "bread","beef","chicken"]

## You code here
set(grocery_list)



{'almond milk',
 'apples',
 'bananas',
 'beef',
 'bread',
 'chicken',
 'milk',
 'mushrooms',
 'onion powder'}

#### `set` Functions

`set`s also have a number of useful functions as well. Perhaps the most important are those that copy the operations for mathematical sets.

In [65]:
## Intersection
set_1 = {'apple','bottom','jeans'}
set_2 = {'as','american','apple','pie'}

## You can see what elements are in both lists using .intersection()
set_1.intersection(set_2)

{'apple'}

In [66]:
## Union

## You can see what elements are in either of your lists using .union()
set_1.union(set_2)

{'american', 'apple', 'as', 'bottom', 'jeans', 'pie'}

In [67]:
## minus

## You can find what is in one set, but not another set using minus
set_1 - set_2

{'bottom', 'jeans'}

In [68]:
## You code

## The empty set?

## find the type of {}
## Is it what you think it would be?
type({})


dict

### `dict`

A python dictionary, or `dict`, is a way to store information in the form of keys and values. Each key is linked to a particular value that you can then index with the key, as opposed to integers. Each key is <i>unique</i>, meaning once you've used a key once, you can't use it again. This can be quite useful when you want to keep a tally of unique things in a list, for example, unique words in a sentence or book.

Here is the python documentation on `dict`s <a href="https://docs.python.org/3/tutorial/datastructures.html#dictionaries">https://docs.python.org/3/tutorial/datastructures.html#dictionaries</a>.

In [69]:
## Dictionaries

## It is best to think of a dictionary as a set of key: value pairs, 
## with the requirement that the keys are unique (within one dictionary). 
## A pair of braces creates an empty dictionary: {}
print({})
print(type({}))

{}
<class 'dict'>


In [70]:
## To define a dictionary you do the following
practice_dict = {'apple':1, 'as':2, 'american':1, 'pie':1}

print(practice_dict)
print()

## these are the dictionary keys
print("practice_dict.keys():", practice_dict.keys())
print()

## these are the dictionary values
print("practice_dict.values():", practice_dict.values())
print()

{'apple': 1, 'as': 2, 'american': 1, 'pie': 1}

practice_dict.keys(): dict_keys(['apple', 'as', 'american', 'pie'])

practice_dict.values(): dict_values([1, 2, 1, 1])



In [71]:
## You can access the values for particular keys with this syntax
## dictionary[key]
practice_dict['apple']

1

In [72]:
## You code
## what is the output of practice_dict.items()?
## what does .items() do in general?
practice_dict.items()


dict_items([('apple', 1), ('as', 2), ('american', 1), ('pie', 1)])

In [73]:
## You code
## Use a dictionary to store the ingredient amounts you 
## need for an imaginary recipe
ingredients = {"milk (C)":1, "flour (C)":3, "baking soda (TB)":2, 
              "salt (TB)":1, "eggs":2}

ingredients

{'milk (C)': 1,
 'flour (C)': 3,
 'baking soda (TB)': 2,
 'salt (TB)': 1,
 'eggs': 2}

#### Hard Copy vs. Soft Copy

Run the following code and see what happens.

In [74]:
test_list = [1,2,3,4]
other_list = test_list

In [75]:
print("test_list =",test_list)
print("other_list =",other_list)

test_list = [1, 2, 3, 4]
other_list = [1, 2, 3, 4]


In [76]:
## let's replace the 2 entry of test_list
## with a 7
test_list[2] = 7

In [77]:
print("test_list =",test_list)
print("other_list =",other_list)

test_list = [1, 2, 7, 4]
other_list = [1, 2, 7, 4]


##### Say what!?!

That's right, when we changed `test_list` `other_list` was also changed. This is because python had you computer create the `list` `[1,2,3,4]` in memory then had the `test_list` point to it, when we created `other_list` python just had the variable point to whatever `test_list` was pointed to, aka the `list` `[1,2,3,4]`. Because both variables are pointed to the same thing in your computer's memory, when I changed `test_list` I also changed `other_list`. This is because when you ran `other_list = test_list` you are performing what is known as a <i>soft copy</i> of `test_list`.

Soft copies don't actually copy the python object in your computer's memory. If you want to do that you need to do a <i>hard copy</i>, for `list`s this is done with the command `list.copy()`. Let's see it in action.

In [78]:
test_list = [1,2,3,4]

## Here is the hard copy
other_list = test_list.copy()

In [79]:
print("test_list =",test_list)
print("other_list =",other_list)

test_list = [1, 2, 3, 4]
other_list = [1, 2, 3, 4]


In [80]:
## let's replace the 2 entry of test_list
## with a 7
test_list[2] = 7

In [81]:
print("test_list =",test_list)
print("other_list =",other_list)

test_list = [1, 2, 7, 4]
other_list = [1, 2, 3, 4]


Boom!

Now this may not come up to much with these more basic data structures, but when we get into actual data science algorithms soft copies and hard copies can become an issue. That's why I wanted to introduce it now, to prime you for what's to come.

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2021.

Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)