# Jupyter Notebook Example using Pandas

Early notebook with far less Markdown commentary than later notebooks but intention is to show use of Pandas through example.  Read the comments in the code itself for guidance.
### Don't read the block below, that was lesson 002.

In [1]:
#LOADING ALL SERIES COMMANDS AS THIS BUILDS ON DATAFRAME
import numpy as np
import pandas as pd

#Series is a one-dimensional labeled array capable of holding any data type 
series = pd.Series([1,2,3,4,5,'red','green','blue',6,7,8,9])
print(series)
#If data is an ndarray, index must be the same length as data. If no index is passed, one will be created
series=pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(series)
print(series.index)
print(series[0])
print(series[:])
#notice that a series can be created from a classic (key=value pair) dictionary
d = {'b': 1, 'a': 0, 'c': 2}
series=pd.Series(d)
print(series)
print(series["b"])
#Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.
#If data is an ndarray, index must be the same length as data. If no index is passed, one will be created
series=pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print("Full array")
print(series)
print ("Just the first index")
print("    When directly indexed the 'index' is not included.")
print(series[0])
print(" All values up to element #3")
print(series[:3])
print ("Only those values greater than the median")
print(series[series > series.median()])
print("Integrate with numpy and calculate the exponent, notice Numpy integration")
print(np.exp(series))
#Series data type operations
print(series.dtype)
#Get the actual array in a series, maybe for direct manipulation
print(series.array)
print("My series dimensions are: ",series.ndim)
print("My series size is:", series.size)
print("My series shpae is:", series.shape)
print("--------------------------------------")
my_array=series.values
print("My array dimensions are: ",my_array.ndim)
print("My array size is:", my_array.size)
print("My array shpae is:", my_array.shape)
for idx in range(0,my_array.size):
    print(my_array[idx]);
series.to_numpy()
#dictionary type structure example
print("Key 'a' access:",series['a'])
print("Example of a bad key request for 'z' with a check:", 'z' in series)
print ("or")
print ("Key 'z' access with a .get:", series.get('z'))
print ("or perhaps more elegant")
print("Key 'z' access with a .get and return for failure:", series.get('z','Not found'))
#vector manipulations
add_series=series+series
print("Series added to itself:\n", add_series)
multiply_series=series * 2
print("Series multiplied by 2:\n", multiply_series)
#Series attribution
print(series.name)
series.rename("My Example Series")
print(series.name)

0         1
1         2
2         3
3         4
4         5
5       red
6     green
7      blue
8         6
9         7
10        8
11        9
dtype: object
a   -0.933593
b    0.353207
c    0.176636
d    1.337113
e   -0.837506
dtype: float64
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
-0.9335933305385431
a   -0.933593
b    0.353207
c    0.176636
d    1.337113
e   -0.837506
dtype: float64
b    1
a    0
c    2
dtype: int64
1
Full array
a    0.526326
b    0.578714
c   -0.170463
d   -1.434767
e   -0.733055
dtype: float64
Just the first index
    When directly indexed the 'index' is not included.
0.5263256026830322
 All values up to element #3
a    0.526326
b    0.578714
c   -0.170463
dtype: float64
Only those values greater than the median
a    0.526326
b    0.578714
dtype: float64
Integrate with numpy and calculate the exponent, notice Numpy integration
a    1.692701
b    1.783743
c    0.843274
d    0.238171
e    0.480439
dtype: float64
float64
<PandasArray>
[  0.5263256026830322,  

# DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

In [2]:
#DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.

print("Define a dictionary of Series, one with an integer and float array.")
print("################################################################################################################")
my_dictionary = {'array_one': pd.Series([1,2,3,4,5,6,7,8,9]),
                 'array_two': pd.Series([1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0])}
print("")
print("Transform that dictinary of Series into a DataFrame")
my_dataframe=pd.DataFrame(my_dictionary)

print("################################################################################################################")
print("Simply calling print on the data frame shows the contents as two columns, one for each dictionary element")
print(my_dataframe)
print("")
print("Note that if the dictionaries are not the same size, they are uniond and 'NaN' padded for missing values.")

            

Define a dictionary of Series, one with an integer and float array.
################################################################################################################

Transform that dictinary of Series into a DataFrame
################################################################################################################
Simply calling print on the data frame shows the contents as two columns, one for each dictionary element
   array_one  array_two
0          1        1.0
1          2        2.0
2          3        3.0
3          4        4.0
4          5        5.0
5          6        6.0
6          7        7.0
7          8        8.0
8          9        9.0

Note that if the dictionaries are not the same size, they are uniond and 'NaN' padded for missing values.


In [3]:
#column selection, addition, deletion
print(my_dataframe['array_one'])

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
Name: array_one, dtype: int64


In [4]:
#pretty display
my_dataframe

Unnamed: 0,array_one,array_two
0,1,1.0
1,2,2.0
2,3,3.0
3,4,4.0
4,5,5.0
5,6,6.0
6,7,7.0
7,8,8.0
8,9,9.0


In [5]:
print("Adding a column is as simple as declaring another dictionary element")
print("################################################################################################################")
my_dataframe['array_three']= my_dataframe['array_one'] * my_dataframe['array_two']
print("")
print("After performing a multiplication of array_one and array_two, you should see an square floating point result")
print("################################################################################################################")
print(my_dataframe)
print("")
print("Now we show algorithmic decisions for a 'flag' column for values that aren't even")
print("################################################################################################################")
my_dataframe['flag']=my_dataframe['array_three'] % 2 != 0
print(my_dataframe)
print("")
print("Now we remove the 'flag' column")
print("################################################################################################################")
my_dataframe.pop('flag')

Adding a column is as simple as declaring another dictionary element
################################################################################################################

After performing a multiplication of array_one and array_two, you should see an square floating point result
################################################################################################################
   array_one  array_two  array_three
0          1        1.0          1.0
1          2        2.0          4.0
2          3        3.0          9.0
3          4        4.0         16.0
4          5        5.0         25.0
5          6        6.0         36.0
6          7        7.0         49.0
7          8        8.0         64.0
8          9        9.0         81.0

Now we show algorithmic decisions for a 'flag' column for values that aren't even
################################################################################################################
   array_one  array_two  arra

0     True
1    False
2     True
3    False
4     True
5    False
6     True
7    False
8     True
Name: flag, dtype: bool

### Subsequent notebooks will use DataFrame to load data, merge data, plot data, output data, and perform queries on your data.