## Exercise 3: Data selection and the .loc method

Let's select some data with the numpy notation and use the .loc method. We will use the the RadNet Laboratory Analysis, from the U.S. Environmental Protection Agency, located [here](https://opendata.socrata.com/api/views/cf4r-dfwe/). This dataset describes the radioactive content collected by the USPA in different cities in the US, for different radionuclides. 

In [1]:
import numpy as np
import pandas as pd

In [3]:
url = "https://raw.githubusercontent.com/TrainingByPackt/Big-Data-Analysis-with-Python/master/Lesson01/Dataset/RadNet_Laboratory_Analysis.csv"

In [4]:
df = pd.read_csv(url)

First, select the column "State":

In [15]:
df['State'].head()

0    ID
1    ID
2    AK
3    AK
4    AK
Name: State, dtype: object

To filter based on the value of the data, use the comparison with filter:

In [8]:
df[df.State == "MN"]

Unnamed: 0,State,Location,Date Posted,Date Collected,Sample Type,Unit,Ba-140,Co-60,Cs-134,Cs-136,Cs-137,I-131,I-132,I-133,Te-129,Te-129m,Te-132
367,MN,St. Paul,04/08/2011,03/28/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
368,MN,St. Paul,04/22/2011,04/13/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,0.16,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
380,MN,Welch,04/08/2011,03/29/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
381,MN,Welch,06/01/2011,04/14/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
555,MN,St. Paul,04/04/2011,03/22/2011,Precipitation,pCi/l,Non-detect,Non-detect,Non-detect,,Non-detect,32.3,Non-detect,Non-detect,,,Non-detect
556,MN,St. Paul,04/10/2011,03/29/2011,Precipitation,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,16,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
557,MN,Welch,04/04/2011,03/17/2011,Precipitation,pCi/l,Non-detect,Non-detect,Non-detect,,Non-detect,Non-detect,Non-detect,Non-detect,,,Non-detect
558,MN,Welch/510,04/13/2011,04/04/2011,Precipitation,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,9.1,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect


More than one condition can be applied at the same time:

In [9]:
df[(df.State == 'CA') & (df['Sample Type'] == 'Drinking Water')]

Unnamed: 0,State,Location,Date Posted,Date Collected,Sample Type,Unit,Ba-140,Co-60,Cs-134,Cs-136,Cs-137,I-131,I-132,I-133,Te-129,Te-129m,Te-132
305,CA,Los Angeles,04/10/2011,04/04/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,0.39,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
306,CA,Los Angeles,06/01/2011,04/12/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,0.18,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
356,CA,Richmond,04/09/2011,03/29/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect
357,CA,Richmond,06/01/2011,04/13/2011,Drinking Water,pCi/l,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect,Non-detect


Now select the MN state and the I-131 radionuclide:

In [11]:
df[(df.State == "MN") ]["I-131"]

367    Non-detect
368          0.16
380    Non-detect
381    Non-detect
555          32.3
556            16
557    Non-detect
558           9.1
Name: I-131, dtype: object

Another common approach is the use of the method `.loc`:

```
df.loc[<row selection>, <column selection>]
```

In [None]:
df_rad.loc[df_rad.State == "MN", "I-131"]

Note that, differently from the results above, the result of the loc filter is a Series and not a DataFrame. This depends on the operation and selection done on the DataFrame and not is caused only by loc: as the DataFrame can be understood as a 2D combination of Series, a selection of one column will return a Series.

In [13]:
df[['I-132']].head()

Unnamed: 0,I-132
0,Non-detect
1,Non-detect
2,Non-detect
3,Non-detect
4,Non-detect
