# Python Data Storage Objects

#### PHYS 200: Modeling and Simulation, Mike Augspuger

Storing and manipulating data is a large portion of the coding work in simulations.  Python is particularly flexible in how it allows you to store data of all sorts: many storage objects treat data of all different types simply as `objects`, which means a given data storage object can hold many different kinds of "things".  

This document is intended to give you a reference sheet about different approaches to data storage and manipulation as you parse and write code.

## Foundational Data Types

The "things" (that is, `objects`) that can be stored within data storage objects range widely.  Here is a list of some of the most common:

* `str`:  A string is set of characters stored together that has no numerical value (like a word).
* `int`:  Integer
* `float`:  This is a decimal numerical value
* `Boolean`: A Boolean value is either `True` or `False` (or if used in an equation, `1` or `0`)

Beyond these basic data types, though, data storage objects can store *other* data storage objects.  For example, a `list` can store a number of `lists`, or even a list that consists of different data types.

*As you read this, remember that a `data object` can be not only the common data types listed above (`str`,`int`, etc...), but also a data storage object itself.*

## Tuple

* A `tuple` is a simple sequence of data objects.  
* It is a built-in Python type (that is, it is always available)
* It is *immutable*, which means you cannot change its values.  If you create a `tuple` that is `(5,6,7)`, it will remain that unless you create a completely new `tuple` and give it the same name.
* Tuples are useful to store simple, unchanging values.  Because they are immutable, they offer protection against accidental over-writing (your program will give you an error message if you try to change a value).
* A tuple is represented in Python by parentheses `( )`.

In [None]:
# CREATING tuples

# This tuple has 4 items: two integers, a string, and a tuple
tuple1 = (5,6,"oranges",(8,9,10))
# OR
tuple1 = 5,6,"oranges",(8,9,10)

In [None]:
# ACCESSING data in tuples

# Important: Note that the rules here will work for most 
# sequence-type data storage objects

tuple1[0]      # returns the first item in the tuple --> 5
tuple1[2]      # returns the third item  --> "oranges"
tuple1[-1]     # returns the last item --> (8,9,10)
tuple1[3][0]   # returns the first item in the fourth item --> 8

# Accessing more than one value at a time is called 'slicing':
tuple1[:2]     # returns items at index 0 and 1 -->  (5,6)
tuple1[1:3]    # returns items at indexes 1 and 2 -->  (6,"oranges")

# Accessing all the values in one line is called 'unpacking'
var1, var2, var3, var4 = tuple1   # var1=5, var3="oranges", etc..

# CHANGING the value in a tuple
#tuple1[2] = "apples"    # This returns an error!  Tuple values 
                         #  are immutable

In [None]:
# MANIPULATING data with tuples

# Tuple methods are very limited: these are not meant to be
# used for data manipulation
# The methods here can also be used with other sequence-like
# data storage objects

len(tuple1)     # returns the length of the tuple  --> 4
type(tuple1)    # returns the data type  --> 'tuple'
type(tuple1[2]) # returns the data type of indexed object --> 'str'
tuple1.index('oranges')  # returns index value  -->  2
tuple2 = tuple1 + (4,5)  # returns (5,6,"oranges",(8,9,10),4,5)

## Lists

* A list is a simple sequence that is a built-in type.
* Lists are mutable: its members can be changed.
* Lists are more flexible than a tuple for data manipulation.
* A list is represented by block parentheses: `[ ]`

In [None]:
# CREATING lists
list1 = [5,6,"oranges",[8,9,10]]

In [None]:
# ACCESSING and CHANGING data in lists
# Use the same indexing rules that you would for a tuple,
# but changing values is possible
list1[2] = "apples"   # replaces "oranges" with "apples"

In [None]:
# MANIPULATING data in lists
# These are some common tools, but look up "Python Lists" to find more
list1.append(11)         # adds item--> [5,6,"oranges",[8,9,10],11]
lastitem = list1.pop()   # removes last item and returns it  --> 11
list1.extend(list2)      # "appends" members of list2 to end of list1
list1.remove("oranges")  # removes item from list
list1.insert("apples",1) # inserts item at given index
list1.sort()             # sorts list alphabetically or numerically

## Dictionaries

* A `dictionary` is a Python built-in type that maps a set of 'keys' to a set of 'values'
* The keys are immutable, but the values can be changed.  Values can be any data type (including `dictionaries`).
* In newer versions of Python (3.7+), dictionaries are *ordered*: that is, the keys have a set order.  But they are not generally used when order is important: they are more like a, um, dictionary, that allows you to look up a 'definition' of a key 'word'.
* Dictionaries are used primarily to store sets of mostly unchanging variables.  Very useful for storing system parameters, but limited in usefulness for storing data.
* Dictionaries are represented by curly parentheses: `{ }`

In [None]:
# CREATING dictionaries
dict1 = {'cheese': 3.45, 'milk': 2.95}
dict1 = dict(cheese=3.45,milk=2.95)

# This creates a dictionary whose values are a tuple and a list
dict2 = dict(oranges=(4,5,6),pears=[7,8,9])

In [None]:
# ACCESSING and CHANGING data in dictionaries
dict1['butter'] = 4.65     # Adds a new key-value pair
item = dict1['milk']       # returns the value for that key --> 2.95
item = dict1[1]            # returns ERROR --> cannot use numerical index

In [None]:
# MANIPULATING data in dictionaries
list_keys = dict1.keys()   # returns list of dictionary keys
list_vals = dict1.values() # returns list of dictionary values

## Pandas Series

* A `pd.Series` is a one dimensional array with index labels. It is much like a two column table, where one column holds the labels and the other column holds the values. 
* In a sense, they are similar to `dictionaries`: every value in the series is associated with a key/ label.   But they are more manipulable than `dictionaries`, and have a host of useful associated methods and attributes, and so are more appropriate to storing data sets.
* The values in a series are accessible using both the label and the index number.

In [None]:
import pandas as pd

# CREATING a pd.Series
# To create an empty series, insert an empty list as the data
# Assign the dtype 'object' unless you know that you want to use
# float values, in which case, assign 'np.float64'
series1 = pd.Series([],dtype=object)

# Creatig a Series from a dictionary
series1 = pd.Series(dict1)
series1 = pd.Series(dict(para1=10.0, para2=20.5))
series1 = pd.Series({'para1':10.0, 'para2': 20.5})

# Create a Series from two arrays or lists
# The 'index' is the list of labels.  Both arrays should be same length.
series2 = pd.Series(data=array1,index=array2)

# Create a Series from the values of 2 Series
series3 = pd.Series(data=series1.values,index=series2.values)

In [None]:
# ACCESSING or CHANGING data in a Series

var1 = series1['para1']     # returns the value for that key --> 10.0
var1 = series1.para1        # same as above -->  10.0
var1 = series1.iloc[1]      # finds a value by index number --> 20.5
var1 = series1[1]           # same as above --> 20.5

series3 = series1.append(series2)  # adds key-value pairs in s2 to s1
# This only works if the Series was created as an empty list
series1[i] = 30.5           # adds the ith value to the series

array1 = series1.values     # returns an np.ndarray of the values
array1 = series1.index      # returns a pandas list of the labels

In [None]:
import numpy as np

series1 = pd.Series(dict(para1=10.0, para2=20.5))
series2 = pd.Series(dict(para3=80.0, para4=30.5))
series3 = series1.append(series2)
series3

In [None]:
# MANIPULATING data in a Series
#  There are a ton of associated methods.  But here are some 
# interesting and/or helpful ones.  Look them up to find details.

series1.mean()     # Finds the mean of values. Also max(),median()...
series1.plot()     # Plots the series with the index on the x-axis
series1.head()     # returns the first rows of the Series: also tail()
series1.to_excel() # writes the Series to an excel sheet
series1.to_dict()  # converts to dictionary.  Also to_list()
series1.interpolate()  # Using interpolation to fill in NaN values

## Pandas DataFrame

* A `DataFrame` is much like a `Series`, but it has more than one column of values.  In fact, each column in a `DataFrame` is a `Series`.
* A `DataFrame` represents a table of data, with one index column of labels, and multiple columns of values.
* Many of the methods for `Series` are applicable to `DataFrames`.  Like `Series`, they are very useful for storing and manipulating data.

In [None]:
# CREATING a pd.DataFrame

# An empty DataFrame
empty_frame = pd.DataFrame([],columns=['postion','velocity'])

# A populated DataFrame
frame1 = pd.DataFrame({'col1': [1,2], 'col2': [3,4]})  # from dictionary
frame1 = pd.DataFrame(dict(col1=[1,2],col2=[3,4]))     # from dictionary
# from np.ndarray
frame1 = pd.DataFrame(np.array([[1,2],[3,4],[5,6]]),columns=['a','b'])

In [None]:
# ACCESSING and CHANGING data in a pd.DataFrame
series1 = frame1.iloc[1]   # returns the second row of frame as a Series
series1 = frame1.col1      # returns column named col1 as a Series

# Adding data to a DataFrame that was created as empty
frame1.loc[i] = [col1_value,col2_value,col3_val]

# Generally the rules for accessing and manipulating DataFrames
# are similar to those for Series.  Check the DataFrame data
# sheet for particulars (search for "Pandas DataFrame")

## NumPy ndarrays

* NumPy ndarrays are a data storage object found in the NumPy library.  They are the core building block for the NumPy library, which is the premiere library for doing numerical work in Python
* The ndarrays ('n-dimensional array') are highly manipulable, cost-efficient (they don't use much memory or processing power), and support a wide-range of computational methods.
* We don't use these a lot in this course, but they are good to know about (and they also underly much of the Pandas library)

In [None]:
# Creating an np.array
array1 = np.array([1,2,3,4,5,6])     # 1D array
array2 = np.array([[1,2,3],[4,5,6]])   # 2D array

In [None]:
# ACCESSING data in an np.array
# Indexing works, but you have to get used to the conventions
grades = np.array([[93,95],[84,100],[99,87]])
grades.shape     # returns (3,2) --> 3 rows, 2 columns
grades[1,0]      # returns value in 2nd row, first column -->84
grades[-1,1]     # returns value in last row, 2nd column --> 87
grades[0]        # returns first row --> [93,95]

# There are a bunch of mathematical tools that you can use with np.ndarrays
# but I will not go into them here.  Just be aware that this is a
# powerful computational tool that underlies a lot of scientific
# work in Python.
