# Introduction
This is a guide for dictionaries and Data Frames, which are used to handle data and organize it into tables. Basically, they're more sophisticated arrays and lists.

# Creating Dictionaries
First is dictionaries, these data types create a directory so that calling an index by its name calls its value

In [1]:
dict = {'a':'apple','b':'banana'}
print(dict)        
print(dict['a'])

{'a': 'apple', 'b': 'banana'}
apple


# Finding Elements in Dictionaries
Even lists themselves can be the value of the index value!

In [32]:
dict = {'fruit':['apple','banana','coconut'],'vegetable':['spinach','celery','broccolli'],'soda':['coke','pepsi','dr. pepper']}
print(dict['fruit'])
print(dict['soda'][2])

['apple', 'banana', 'coconut']
dr. pepper


# Creating DataFrames
Now it's time for DataFrames! These data types requires you to import pandas like so:

In [2]:
import pandas as pd

Then, the argument for creating a DataFrame is a dictionary with lists

In [35]:
food = pd.DataFrame(dict)
print(food)

     fruit        soda  vegetable
0    apple        coke    spinach
1   banana       pepsi     celery
2  coconut  dr. pepper  broccolli


To change the indexes to the left to a string/label, use the code:

In [30]:
food.index = ['First Food', 'Second Food', 'Third Food']
print(food)

                 drinks    fruit  vegetable
First Food         coke    apple    spinach
Second Food       pepsi   banana     celery
Third Food   dr. pepper  coconut  broccolli


# Getting the Data
To actually get the data, you usually read files called csv, which are "comma-seperated values". If you want to remove the 0,1,2,3 labels and start with the actual labels for rows, put index_col = 0 in the argument

In [None]:
food = pd.read_csv('food.csv') 
food = pd.read_csv('food.csv', index_col = 0) 

# Accessing Elements using Square Brackets
There are many ways to access the elements, one is to use square brackets

In [13]:
print(food['fruit'])
print(food[['fruit']])

First Food       apple
Second Food     banana
Third Food     coconut
Name: fruit, dtype: object
               fruit
First Food     apple
Second Food   banana
Third Food   coconut


As you can see, one square bracket creates a Series while the two square brackets create a DataFrame. Now, if we want two columns or more, we put in a comma inbetween column names

In [45]:
print(food[['fruit','vegetable']])

     fruit  vegetable
0    apple    spinach
1   banana     celery
2  coconut  broccolli


In order to find the rows instead of columns, use one square bracket and use a colon to select your range:

In [41]:
print(food[1:3])

     fruit        soda  vegetable
1   banana       pepsi     celery
2  coconut  dr. pepper  broccolli


# Accessing Elements using iloc and loc
Another the more efficient search method is using loc and iloc. loc takes in names of the variables for the rows and columns while iloc takes in the index number:

In [25]:
print(food.loc[["First Food"]])
print()
print(food.iloc[[0]])
print()
print(food.loc[["First Food", "Third Food"], ["vegetable", "fruit"]])

           drinks  fruit vegetable
First Food   coke  apple   spinach

           drinks  fruit vegetable
First Food   coke  apple   spinach

            vegetable    fruit
First Food    spinach    apple
Third Food  broccolli  coconut


To look at all of the rows and only certain columns, put a colon for the row argument:

In [36]:
print(food.loc[:,["fruit","soda"]])

     fruit        soda
0    apple        coke
1   banana       pepsi
2  coconut  dr. pepper


# Filtering Data Frames

Say you want to find values of rows that fullfill a certain requirement.
For example,

In [5]:
dict2 = {"Flavor":["apple","strawberry","cherry"], "Rating":[9,7,10]}
lolipops = pd.DataFrame(dict2)
print(lolipops)

       Flavor  Rating
0       apple       9
1  strawberry       7
2      cherry      10


Here, we created a new DataFrame for lolipops, which contains their flavors and their ratings from 1 to 10. If we want to see all of the rows that contain a rating above 8, we would first have to extract the column "rating":

In [8]:
rating = lolipops["Rating"]
print(rating)

0     9
1     7
2    10
Name: Rating, dtype: int64


It's alright if the data type is a series. Now, we check which ratings are above 8:

In [10]:
above_8 = rating > 8
print(above_8)

0     True
1    False
2     True
Name: Rating, dtype: bool


Yay! Now we have a boolean series that denotes whether the row is above 8 or not. Now, we use this Series as an index for the DataFrame to get the rows that have ratings above 8:

In [11]:
good_ratings = lolipops[above_8]
print(good_ratings)

   Flavor  Rating
0   apple       9
2  cherry      10


As we can see, the DataFrame only prints out the rows that have a rating above 8. If we want to check if the rating is above 8 but below 10, we have to use logical operators, mainly "and". However, DataFrames and Series cannot use "and". Thus, you need to use a special function from numpy called logical_and(), logical_or(), or logical_not():

In [14]:
import numpy as np
above_8 = np.logical_and(rating > 8, rating < 10)
print(lolipops[above_8])

  Flavor  Rating
0  apple       9


Huzzah! Now we can check for specific rows in data

# Summary
So in short. using square brackets can only(?) find entire columns or entire rows. 

In [None]:
food[number:number] #gives the rows while

food[column_name,column_name] #gives the columns you want

Thus, if you want to have the most flexibility with finding elements in the 
DataFrame, use iloc or loc. General idea is within the brackets, 
the first nested bracket is the rows while the second nested bracket 
is columns (putting in a colon for rows doesn't require brackets). 

In [None]:
food.loc[[<row name>, <row name>], [<column name>, <column name]]
food.loc[:, [<column name>, <column name>]]

And that's basically Dictionaries and DataFrames.
Remember to import pandas if you wanna use DataFrames