# Data Types

Duncan Callaway

This notebook is a review of variable types.  My objective is to review enough of the basics to get you in a position to understand how a Pandas dataframe works, which is upcoming

## Object, method, function?

These phrases come up repeatedly in python.  

### Q: What are they? 

Go [here for more](https://www.geeksforgeeks.org/difference-method-function-python/)

1. Object: virtually anything with attributes in python.  
    1. Objects usually belong to classes and have attributes
    1. For example an object might belong to a bike class.  The attributes of the class would be material, wheel size, number of gears, etc.
2. Method: A function associated with an object. 
    a. for example `object.method` applies the method to the object. 
3. Function: performs an action using some set of input parameters.
    b. for example `function(object)` applies the function to the object.
    
    
We'll talk about each of these in class today.

## Basic data, or variable, types

### Q: what are some variable types that get used in Python? 

1. `string`
2. `numbers`
    1. `int`
    3. `float`
4. `bool`

**Strings**

In [None]:
name = 'Imogen'
type(name)

In [None]:
name[2]

### Q: In class action: Change the e in Imogen to y.  

What you'll find: though we can index their contents, strings are *immutable*, meaning you can't change individual elements.  Instead you need to do a wholesale reassignment.

In [None]:
name[4] = 'y'

In [None]:
name = name[0:4] + 'y' + name[5]
name

**Floats**

In [None]:
x = 0.34
type(x)

**Int**

In [None]:
i = 1
type(i)

**Boolean**

In [None]:
ans = True
type(ans)

Note that Python is 'dynamically typed' meaning it assigns a variable a type based on what you set it equal to.

You don't need to define the type ahead of time.

Note this can cause problems, as you can mistakenly assign variables values that won't play nicely later on.

## Data structures or "containers"

### Q: what are some data structures native to Python? 

1. `dict`
2. `list`
3. `tuple`
4. `set`

Let's explore these a little further.  

## Lists

In [None]:
squares = [1, 5, 9, 16, 25] # do this in lecture
squares

We can index elements with the usual process:

In [None]:
print(squares[0]) # do this in lecture; zero is the first element
print(squares[-1]) # do this in lecture; -1 gives the last
print(squares[2:]) # do this in lecture; you can slice...

### Q: What does "mutable" mean?

...that you can change individual entries of the data structure.

Lists are mutable:

In [None]:
squares[1] = 4
squares

That's better! 

### Q: In class, append 36 to your list.

We can also append:

In [None]:
squares.append(6**2)
squares

We can append lists this way too:

In [None]:
squares = squares + [49]
squares

Finally, we can nest lists

In [None]:
x = [1,2,3]
y = [4,6]

In [None]:
A = [x,y]
A

Note that the number of elements in the two lists within the list did not need to be equal.  

Note also that we can also assign different variable types to the nested list:

In [None]:
a = ['abc', 'def']

In [None]:
A = [x,a]
A

### Q: Index the matrix A to get out 'def'

It's tough to index these things.

Think about how you might get the entry 'def' from the list, by integer indexes.

In [None]:
A[1,1]

That didn't work!  (We have to wait for numpy and pandas to be able to do that.)

In [None]:
A[1][1]

Note, pandas data frames are a lot like lists in this way.  You first index which "sublist" you want, then you index into elements of that.  Numpy arrays (as we'll see) are much easier to index.  Frankly, the indexing with pandas is pretty annoying, but there are some workarounds that we'll discuss.  

## Tuples

Tuples are like lists -- they hold multiple elements and you can index them.  

In [None]:
B = (1,2,3)
B

In [None]:
B[1]

### Q: Are tuples mutable? 

A: nope!

In [None]:
B[1] = 3

Changing and individual element throws an error.

So why bother with tuples?
1. They are harder to work with, but
2. They prevent you from doing things you don't want to, and
3. They are more memory-efficient.  

## Sets

Whereas lists are defined with square brackets, sets use curlies:

In [None]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}

What happened there? 

...I wrote orange and apple twice.  Let's look at the set:

In [None]:
print(basket)

Pretty smart -- no duplicates!

### Q: What might you do if you have a variable called basket but you don't know what's in it? 

Now we can query the set for membership:

In [None]:
'orange' in basket

In [None]:
'kiwi' in basket

### Q: Try indexing the list

In [None]:
basket[1]

### Q: Why can't you index a set?
Ans: because the *order* or elements in the set is ambiguous, especially when you think about how duplicates get handled.  

## Dict

Note, pandas data frames are like lists in that there are a number of entries that we can very quickly query.  But there are even more similarities with `dict` data structures.

With dicts, we are inching closer to the kind of structure we have in a pandas data frame.  

And we get a way to index elements of a set, by associating one value with another.

In [None]:
cars = {'Prius':'Toyota', 'Volt':'Chevy', 'Model 3': 'Tesla'}
cars

### Q: after typing in your own dict, try indexing it to get information back out.

In [None]:
cars['Prius']

In this case 
1. 'Prius' is the **key** of the dict.
2. 'Toyota' is the associated **value**.

We can add new entries very easily:

In [None]:
cars['Leaf'] = 'Nissan'

In [None]:
cars

Important note here:  the dict does not store entries in any particular order -- you rely on the key to pull data out.