# EXTRA COLUMNS
---

You can extend an existing DataFrame with columns whose values depend on the values of already existing columns.

In [1]:
import pandas as pd

Imagine that the data contains the results of two tests of several students.

In [2]:
test_results = {'Test1':[84,37,55], 'Test2':[91,62,74]}
tests = pd.DataFrame(test_results)
tests.index = ['John Brown','Sarah Green','Peter Black']
tests

Unnamed: 0,Test1,Test2
John Brown,84,91
Sarah Green,37,62
Peter Black,55,74


You want to add a column that contains the sum of the points obtained.

In [3]:
tests['Total'] = tests['Test1'] + tests['Test2']
tests

Unnamed: 0,Test1,Test2,Total
John Brown,84,91,175
Sarah Green,37,62,99
Peter Black,55,74,129


### Tasks

To the test results, add a column containing the arithmetic mean of Test1 and Test2.

In [4]:
tests['Mean'] = (tests['Total'] ) / 2
tests

Unnamed: 0,Test1,Test2,Total,Mean
John Brown,84,91,175,87.5
Sarah Green,37,62,99,49.5
Peter Black,55,74,129,64.5


The speed camera recorded the speeds of vehicles in km/h. The table below contains recorded speeds along with vehicle registration numbers.

Day    | KMH
-------|---------
BW3941 | 58
GM2309 | 76
WX1515 | 47
BB0099 | 50

1. Create a DataFrame containing the speed camera data.
1. Display the contents of the DataFrame
1. Add a 'Limit' column containing the permitted vehicle speed, i.e. 50 km/h
1. Display the contents of the DataFrame
1. Add a column in which calculate how many km/h each vehicle exceeded the speed limit
1. Display the contents of the DataFrame

In [9]:
labels=["Day","KMH"]


Day =["BW3941",
"GM2309",
"WX1515",
"BB0099" ]
KMH = [58
,76,
47,
50]
data = {'Day':Day,'KMH':KMH}
casp = pd.DataFrame(data)
casp.columns = labels
casp


Unnamed: 0,Day,KMH
0,BW3941,58
1,GM2309,76
2,WX1515,47
3,BB0099,50


In [10]:
casp['Limit'] = [50,50,70,90]
casp

Unnamed: 0,Day,KMH,Limit
0,BW3941,58,50
1,GM2309,76,50
2,WX1515,47,70
3,BB0099,50,90


In [17]:
casp['OverLimit'] = 0
casp['OverLimit'] = casp['KMH'] - casp['Limit'] 
casp['OverLimit'] = casp['OverLimit'].where(casp['OverLimit'] > 0 ,other=0)
casp

Unnamed: 0,Day,KMH,Limit,OverLimit
0,BW3941,58,50,8
1,GM2309,76,50,26
2,WX1515,47,70,0
3,BB0099,50,90,0


In [7]:
casp['OverLimit'] = (casp['Limit'] - casp['KMH'])
casp[casp.OverLimit < 0].loc[:,'OverLimit'] = 0
casp


You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  casp[casp.OverLimit < 0].loc[:,'OverLimit'] = 0


Unnamed: 0,Day,KMH,Limit,OverLimit
0,BW3941,58,50,-8
1,GM2309,76,50,-26
2,WX1515,47,70,23
3,BB0099,50,90,40
