This notebook contains an example dataset to help us get the basic understanding of how series and DataFrame math work in PANDAS


In [2]:
%matplotlib inline
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

first we are going to create two series 

In [9]:
np.random.seed(8)
s1=pd.Series(np.random.randn(5))
s1

0    0.091205
1    1.091283
2   -1.946970
3   -1.386350
4   -2.296492
dtype: float64

In [10]:
s2=pd.Series(np.random.randn(5))
s2

0    2.409834
1    1.727836
2    2.204556
3    0.794828
4    0.976421
dtype: float64

In [11]:
combine= pd.concat([s1, s2])
combine

0    0.091205
1    1.091283
2   -1.946970
3   -1.386350
4   -2.296492
0    2.409834
1    1.727836
2    2.204556
3    0.794828
4    0.976421
dtype: float64

as we can see this isn't the cleanest way to combine these two series as if we call 0 we will get both 0 values which could be problematic for analyis 

In [12]:
combine[0]

0    0.091205
0    2.409834
dtype: float64

instead we can reindex:

In [14]:
combine.index = range(combine.count())
combine

0    0.091205
1    1.091283
2   -1.946970
3   -1.386350
4   -2.296492
5    2.409834
6    1.727836
7    2.204556
8    0.794828
9    0.976421
dtype: float64

we can also add our two series:

In [15]:
s1 +s2

0    2.501039
1    2.819119
2    0.257586
3   -0.591522
4   -1.320070
dtype: float64

or subtract them:

In [16]:
s1 - s2

0   -2.318630
1   -0.636553
2   -4.151527
3   -2.181177
4   -3.272913
dtype: float64

These are both the expected values however things change when the series or DataFrames have different indecies

In [18]:
s2.index= list(range(3,8))
s2

3    2.409834
4    1.727836
5    2.204556
6    0.794828
7    0.976421
dtype: float64

In [19]:
s1+s2

0         NaN
1         NaN
2         NaN
3    1.023485
4   -0.568655
5         NaN
6         NaN
7         NaN
dtype: float64

Now we are going to both reindex and fill missing values with 0 in order to best represent this data

In [21]:
s1.reindex(range(10),fill_value=0) + s2.reindex(range(10),fill_value=0)


0    0.091205
1    1.091283
2   -1.946970
3    1.023485
4   -0.568655
5    2.204556
6    0.794828
7    0.976421
8    0.000000
9    0.000000
dtype: float64

Great, now let's try some multiplicaiton

In [22]:
s1 = pd.Series(range(1,4), index= ['a','a','c'])
s1

a    1
a    2
c    3
dtype: int64

In [23]:
s2 = pd.Series(range(1,4), index=['a','a','b'])
s2

a    1
a    2
b    3
dtype: int64

In [24]:
s1 * s2

a    1.0
a    2.0
a    2.0
a    4.0
b    NaN
c    NaN
dtype: float64

multiplying in this space is equal to performing a cartesian product on the two Series on those specific labels

we can also do math with DataFrames

In [79]:
df1= pd.DataFrame(data= np.arange(1,50,5))
df1

Unnamed: 0,0
0,1
1,6
2,11
3,16
4,21
5,26
6,31
7,36
8,41
9,46


In [80]:
df2= pd.DataFrame(data= np.arange(1,100,10))
df2

Unnamed: 0,0
0,1
1,11
2,21
3,31
4,41
5,51
6,61
7,71
8,81
9,91


In [81]:
df1+df2

Unnamed: 0,0
0,2
1,17
2,32
3,47
4,62
5,77
6,92
7,107
8,122
9,137


In [83]:
df1*df2

Unnamed: 0,0
0,1
1,66
2,231
3,496
4,861
5,1326
6,1891
7,2556
8,3321
9,4186
