This notebook is an introduction to some basic datastructures. Using the right data structure makes your code easy to read, faster to run, and easier to write. 

In [1]:
import pandas as pd
from copy import copy, deepcopy

In [2]:
foo = {"a", "b", "c"} # the curly brackets denote a set. you can also use set(["a", "b", "c"])
for _ in range(5):
    foo.update({"c", "d", "e"})

How many letters total are in foo now? How many times did we try to add the letters c, d, and e? 

In [3]:
bar = ["a", "b", "c"] # this is a list, it is always returned in the same order you specified
baz = ["a", "b", "c"]
bay = baz
for _ in range(2):
    bar += ["c", "d", "e"]
    baz += ["cde"]

What about these lists? How many letters total are in them? We assigned bay to be equal to baz at the beginning before changing anything. How many items are in bay? Note this is not specific to lists, if you assign a variable to be equal to another datastructure then the update will apply to both variables. Let's show how to fix it:

In [4]:
boo = copy(baz)
baz += ["z"]
boo # is "z" in boo?

['a', 'b', 'c', 'cde', 'cde']

In [10]:
alphabet = ["a", "b", "c", "d", "e"]
alphabet[3] # is this the 3rd letter of the alphabet? why not?

'd'

In [11]:
print("The zero-ith letter of the alphabet is ", alphabet[0])

The zero-ith letter of the alphabet is  a


In [13]:
# here are some other indexes for lists you should try:
alphabet[:2]
alphabet[0:2] # this is the same as the above
alphabet[-2:]
alphabet[::-1]

['e', 'd', 'c', 'b', 'a']

Dictionaries are useful because you can "look up" values and they don't have to be in any given order. Let's say I tell you to go to the grocery store with a list -- do you need to go down the list in order?

In [6]:
groceries = {"apples":3, "bananas":7, "watermelons":1}
print(f"You need to buy {groceries['apples']} apples")

You need to buy 3 apples


In [7]:
print("Now here's the whole list:")
for key, value in groceries.items():
    print(f"You need to buy {value} {key}")

Now here's the whole list:
You need to buy 3 apples
You need to buy 7 bananas
You need to buy 1 watermelons


DataFrames are very useful objects that act a lot like excel tables. They're nice to format, easy to filter, and can handle really large data sets.

In [8]:
# The format is {"column header":[column, data, here]}.
groceries = pd.DataFrame({"produce":["apples", "banana", "oranges"], "cost":[2.00, 0.10, 0.50], "lbs":[4,10,6], "is_favorite":[True, True, False]})

groceries

Unnamed: 0,produce,cost,lbs,is_favorite
0,apples,2.0,4,True
1,banana,0.1,10,True
2,oranges,0.5,6,False


In [9]:
# now we can do math with the columns -- store the answer in a new col.
groceries["total_cost"] = groceries["cost"]*groceries["lbs"]

# and add the total of our new column "total cost"
print("The total cost of produce is $ ", groceries["total_cost"].sum())

# and now we have cool True/False filters
fil = (groceries["is_favorite"] == True)
print("If we only buy our favorite produce $", groceries["total_cost"].loc[fil].sum())

The total cost of produce is $  12.0
If we only buy our favorite produce $ 9.0
