<a href="https://colab.research.google.com/github/EmoreiraV/DPIP/blob/main/colab_pandas_add.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

As part of the Q+A, a question was asked about the add syntax which can be found on page 17 of the week 7 notes.

This colab is a further, demonstration of the syntax based on the example in the notes.

Lets make 2 data frames, similar to that in the notes:

In [None]:
import pandas as pd

In [None]:
df1 = pd.DataFrame({"a" : [1, 2, 3], "b" : [4,5 , 6]},
index = ["x", "y", "z"])

In [None]:
print(df)

   a   b
x  2   7
y  3  11
z  5  13


In [None]:
df2 = pd.DataFrame({"a" : [0.1, 0.2,], "b" : [0.3,0.4], "c":[0.5,0.6]},
index = ["x", "y", ])
print(df2)

     a    b    c
x  0.1  0.3  0.5
y  0.2  0.4  0.6


As we can see there are columns, named:

- a
- b
- c (only in df2)

and three rows named:

- x
- y
- z (only in df1)


Thus, if we add them, we only get the values for row and column pairs that exist in both data frames:

In [None]:
df1+df2

Unnamed: 0,a,b,c
x,1.1,4.3,
y,2.2,5.4,
z,,,


So we dont get values for "x","c" as this is not in df1 (as the column doesnt exist), and we dont get values for "a","z" as this is not in df2 (with a similar story for the remaining columns.

The same rule, applies for products and divisions:

In [None]:
print(df1*df2)
print(df1/df2)

     a    b   c
x  0.1  1.2 NaN
y  0.4  2.0 NaN
z  NaN  NaN NaN
      a          b   c
x  10.0  13.333333 NaN
y  10.0  12.500000 NaN
z   NaN        NaN NaN


It is very natural to write ```df1 + df2```, but we can also use the following syntax:

In [None]:
df1.add(df2)

Unnamed: 0,a,b,c
x,1.1,4.3,
y,2.2,5.4,
z,,,


This syntax lets us specify some extra options.

If we have a ***really good reason*** to think that columns that don't exist should have a default value (and therefore should not be NaN in the combined table), you can use the following command:

In [None]:
df1.add(df2,fill_value=0)

Unnamed: 0,a,b,c
x,1.1,4.3,0.5
y,2.2,5.4,0.6
z,3.0,6.0,


We see that many of the values now have values, where pandas has substituted our fill_value.

However, if the value just doesnt exist in either dataframe then it will still be NaN.

#### Adding a series to a data frame

As in the notes, the behaviour is a little different if we are adding a "series" object (rather than a dataframe).

Lets make series object:

In [None]:
series_obj1 = pd.Series({"a":100,"b":200,"c":300})

Lets now add it to our dataframes, first to ```df2```:

In [None]:
df2 + series_obj1

Unnamed: 0,a,b,c
x,100.1,200.3,300.5
y,100.2,200.4,300.6


We can see that the elements of ```series_obj1``` have been added to each row.

If instead we add to ```df1```:

In [None]:
df1 + series_obj1

Unnamed: 0,a,b,c
x,101,204,
y,102,205,
z,103,206,


We see that it again adds this to each row. Further, similar to the adding two dataframes, there is a column that is not in ```series_obj1```, it will replace these values with NaN (although the syntax for fill_value does not work here).

If instead of adding a series object to each row, we wanted to add a series object to each column. We can make a new series:

In [None]:
series_obj2 = pd.Series({"x":1000,"y":2000,"z":3000})

If we were just to add this directly then pandas would try to add this row-wise:

In [None]:
df1 + series_obj2

Unnamed: 0,a,b,x,y,z
x,,,,,
y,,,,,
z,,,,,


Thus, we need to tell pandas to add this columnwise, like so:

In [None]:
df1.add(series_obj2,axis=0)

Unnamed: 0,a,b
x,1001,1004
y,2002,2005
z,3003,3006


# Task 4 week 7

In [None]:
import pandas as pd
import numpy as np
x = pd.DataFrame(np.random.choice(range(-5, 6), (4,3), replace=True),
                 index=["ra", "rb", "rc", "rd"],
                 columns=["ca", "cb", "cc"])

In [None]:
x

Unnamed: 0,ca,cb,cc
ra,3,4,-3
rb,4,4,1
rc,-4,4,-1
rd,-3,2,-2


In [None]:
cmeans = x.mean()          # Compute column-wise means
x1 = x - cmeans            # Subtract from columns
print(x1)
print("\n"*5)
print(x1.sum())

     ca   cb    cc
ra  3.0  0.5 -1.75
rb  4.0  0.5  2.25
rc -4.0  0.5  0.25
rd -3.0 -1.5 -0.75






ca    0.0
cb    0.0
cc    0.0
dtype: float64


With axis=0

In [None]:
cmeans = x.mean()          # Compute column-wise means
x1 = x.sub(cmeans,axis=0)          # Subtract from columns
print(x1)
print("\n"*5)
print(x1.sum())

    ca  cb  cc
ca NaN NaN NaN
cb NaN NaN NaN
cc NaN NaN NaN
ra NaN NaN NaN
rb NaN NaN NaN
rc NaN NaN NaN
rd NaN NaN NaN






ca    0.0
cb    0.0
cc    0.0
dtype: float64


With axis=1

In [None]:
cmeans = x.mean()          # Compute column-wise means
x1 = x.sub(cmeans,axis=1)          # Subtract from columns
print(x1)
print("\n"*5)
print(x1.sum())

     ca   cb    cc
ra  3.0  0.5 -1.75
rb  4.0  0.5  2.25
rc -4.0  0.5  0.25
rd -3.0 -1.5 -0.75






ca    0.0
cb    0.0
cc    0.0
dtype: float64


### Second part

Has a typo:

In [None]:
rmeans = x.mean(axis=1)    # Compute row-wise means
x2 = x.sub(rmeans, axis=1)

print(x2)
print("\n"*5)
print(x2.sum(axis=1))      # Check rows sum to (almost) 0

    ca  cb  cc  ra  rb  rc  rd
ra NaN NaN NaN NaN NaN NaN NaN
rb NaN NaN NaN NaN NaN NaN NaN
rc NaN NaN NaN NaN NaN NaN NaN
rd NaN NaN NaN NaN NaN NaN NaN






ra    0.0
rb    0.0
rc    0.0
rd    0.0
dtype: float64


Without the typo:

In [None]:
rmeans = x.mean(axis=1)    # Compute row-wise means
x2 = x.sub(rmeans, axis=0) # Subtract from row

print(x2)
print("\n"*5)
print(x2.sum(axis=1))      # Check rows sum to (almost) 0

          ca        cb        cc
ra  1.666667  2.666667 -4.333333
rb  1.000000  1.000000 -2.000000
rc -3.666667  4.333333 -0.666667
rd -2.000000  3.000000 -1.000000






ra    8.881784e-16
rb    0.000000e+00
rc   -2.220446e-16
rd    0.000000e+00
dtype: float64
