This notebook is about pandas. It will start with a gentle introduction of the functions I found most useful, and then will carry on to how to perform window-functions like operations and will go slightly more in depth concerning indexes (and multi indexes).

In [3]:
import pandas as pd
import numpy as np

## Creating a pandas dataframe.  

This can be done in different ways and from several different objects.
Also, a dataframe can hava a simple or a multi index.

In [20]:
df = pd.DataFrame(np.random.normal(0,1,1000))

Here the result is a bit bad because it labels the column with 0.  
So you can only target the column by reference to the index position and not by name.

In [21]:
df.head(1)

Unnamed: 0,0
0,0.746261


In [22]:
df.head(1)[0]

0    0.746261
Name: 0, dtype: float64

In [23]:
df.head(1)["0"]

KeyError: '0'

We can see what happens if we pass a touple, a dictionary or a list to define the variables. 

In [30]:
one_df_touple = pd.DataFrame(
    ([10,20,12]
    )
)
one_df2_touple

Unnamed: 0,0
0,10
1,20
2,12


In [33]:
two_df_touple = pd.DataFrame(
    ([10,20,12],
     [1,2,3]
    )
)
two_df2_touple

PandasError: DataFrame constructor not properly called!

In [46]:
one_df2_dict = pd.DataFrame(
    {"first":[10,20,12],
     "second":(1,2,3),
     "third":np.array([1,4,8]),
     "fourth":[("a", "b"),("c","d"),("e","f")],
     "fifth": pd.Series({"a":1,"b":2,"c":3}),      
     "sixth": pd.Series({"c":1,"b":2,"a":45})
    }
)
one_df2_dict

Unnamed: 0,fifth,first,fourth,second,sixth,third
a,1,10,"(a, b)",1,45,1
b,2,20,"(c, d)",2,2,4
c,3,12,"(e, f)",3,1,8


From the example above it is interesting to note a few things:  
1. when a dictionary is passed to the pandas.DataFrame constructor, this sets the keys of the dictionary as the names of the columns, which are then displayed in alphabetical order.  
2. almost any type of item can be contained in the values of the dictionary, as long as it has the same length. 
3. each one of the types gets lazily converted into a pd.Series, and then put together into a pd.DataFrame.  
4. If one of the values of the dictionary is a pd.Series with an index, that index is going to be set as global forthe whole newly created dataframe.  
5. Pandas performs an inner join when two series with the exact same indexes are passed.  

In [45]:
one_df2_dict = pd.DataFrame(
    {"first":[10,20,12],
     "second":(1,2,3),
     "third":np.array([1,4,8]),
     "fourth":[("a", "b"),("c","d"),("e","f")],
     "fifth": pd.Series({"a":1,"b":2,"c":3}),
     "sixth": pd.Series({"c":1,"b":2,"z":3})
    }
)
one_df2_dict

ValueError: array length 3 does not match index length 4

As we can see above, if one of the pd.Series does not share the exact same elements in the index, this is going to return an error, because pandas tries to keep the index unique, and the only set that allow the merge of the different pandas.Series is [a,b,c,z]. However, then pandas would have a dataframe of length 4, but some series with length 3, and wouldn't know how to handle this.

In [50]:
one_df2_list = pd.DataFrame(
    [1,2,3,4]
)
one_df2_list

Unnamed: 0,0
0,1
1,2
2,3
3,4


In [49]:
two_df2_list = pd.DataFrame(
    [[1,2,3,4], [2,4,6,8]]
)
two_df2_list

Unnamed: 0,0,1,2,3
0,1,2,3,4
1,2,4,6,8


A pandas dataframe also accepts an explicit declaration of the index. let's pick the example of one_df2_dict:

In [51]:
one_df2_dict = pd.DataFrame(
    {"first":[10,20,12],
     "second":(1,2,3),
     "third":np.array([1,4,8]),
     "fourth":[("a", "b"),("c","d"),("e","f")],
     "fifth": pd.Series({"a":1,"b":2,"c":3}),
     "sixth": pd.Series({"c":1,"b":2,"z":3})
    }, 
    ["q", "w", "e"]
)
one_df2_dict

Unnamed: 0,fifth,first,fourth,second,sixth,third
q,,10,"(a, b)",1,,1
w,,20,"(c, d)",2,,4
e,,12,"(e, f)",3,,8


There are two interesting behaviours here:
1. We can notice that the index was created first, and the data were populated thereafter.  
  i. This means that when the pandas.Series fifth and sixth try to be populated, they do not find any value in the index that matches a value in themselves.  This leads to the insertion of NaNs (not a number).
  ii. We will see below how to avoid this.  
2. We also notice that now the index values are [q,w,e]