# Week 9 Pandas Part 3

This week we will introduce another data structure in python: Dictionaries. Dictionaries are a useful way of storing key:value relationships and can help us be more efficient in manipulating dataframes. We will look at how we can use dictionaries to load in data more efficiently, do advanced aggregations, rename columns, and create new columns.

## Dictionarys

Dictionaries are a helpful way to store information when a element (key) is attached to another specific element (value). If you were to look up a apple (key) in webster's dictionary there would be the definition, the round fruit of a tree of the rose family (value).

In [None]:
import pandas as pd
import numpy as np

# dictionary syntax
# {key:value}

my_dict = {'apple':['the round fruit of a tree of the rose family']}

In [None]:
# we can call the value of any given key of the dictionary

my_dict['apple']

In [None]:
# we cant call information positionally like we would with a list

my_dict[0]

In [None]:
# dictionaries can store many different types of structures as values

my_dict = {
    
    'my_string':'round',
    'my_int':1,
    'my_list':[1,2,3,4],
    'my_tuple':(3,5),
    'my_df':pd.DataFrame(),
}


In [None]:
# only unmutable data structures can be used as keys (strings, integers, tuples)

my_dict = {
    
    1:143,
    'one':5653,
    (9,3):34534
}

## Loading in Columns as Specific Data Types

In order to be efficient with memory, we can load our data in with specifc data types that might take up less space.

In [None]:
cereal_data = pd.read_csv('data//cereal.csv')

cereal_data.info()

In [None]:
dtype_dict = {
    
    'calories':'int8',
    'protein':'int8',
    'fat':'int8',
    'sodium':'int8',
    'sugars':'int8',
    'potass':'int8',
    'vitamins':'int8',
    'shelf':'int8',
    'fiber':'float16',
    'carbo':'float16'
}

cereal_small = pd.read_csv('data//cereal.csv', dtype = dtype_dict)

cereal_small.info()

## Advanced Group by

Doing a groupby where we assign different operations to different columns is a situation where dictionaries come in handy.

In [None]:
cereal_data.head()

In [None]:
# what if we want the sum of calories and the mean of fat?

cereal_data.groupby('mfr').agg(['sum','mean'])

In [None]:
f = {
    
    'calories':'sum',
    'protein':'median',
    'fiber':'count'
    
}

cereal_grp = cereal_data.groupby('mfr').agg(f).reset_index()

cereal_grp

## Renaming columns

Now we have our aggregated data, we might want to rename the columns so we dont get confused with our original dataset. We can use another dictionary to do this!

In [None]:
rename_dict = {
    
    'calories':'calories_sum',
    'protein':'protein_median',
    'fiber':'mfr_item_count'
}

cereal_grp = cereal_grp.rename(columns = rename_dict)

cereal_grp

## Mapping values to create columns