# Extracting a single value

## By square bracket notation

We can extract a single value by using the square bracket notation twice.  For example, I can get the 11,000th value from the rainfall amount column like this.a row or a column from a data frame/series.  This is a simple consequence of the fact that square bracket notation works on both data frames _and_ series.  The left-most one is working on a data frame and returning a series, the second one is working on the series.


In [4]:
import pandas as pd

wentworth = pd.read_csv("data/rainfall/IDCJAC0009_047045_1800_Data.csv")
print(wentworth["Rainfall amount (millimetres)"][11000])

0.5


## By Summarising

Pandas provides some "magic" when it comes to summarising columns.  Series have a set of "methods" attached to them that you can call any time you like to get summaries.  Note that these summaries work on Series, so you should extract them first.  Examples are:
  * add up all elements (`sum`)
  * calculate the average (`mean`) or mode (`mode`)
  * find the largest (`max`) or smallest (`min`).

In [10]:
wentworth["Rainfall amount (millimetres)"].sum()
# Exercise, try out mean, mode, min, and max

22346.199999999997

# Example

What is the largest rainfall day for Richmond RAAF base (which is in the file `data/rainfall/IDCJAC0009_067105_1800_Data.csv`)?

Which of our rainfall files has the highest average rainfall?

# Exercise

What is the total rainfall recorded for Meriwagga (rainfall file 075167)?  What is the maximum and minimum rainfall on any one day?  I am sure you can guess the minimum, but what code will give it to you?

## By `loc` and `iloc`

We've seen how to recover a Series from a DataFrame - i.e. how to extract a column.

Lets see how to extract a row.

It is important to realise that, since DataFrames are built from Series, it is somewhat awkward to pull out a single row.  In effect, we are asking for pandas to visit each Series and grab the value at a particular index.

Instead of doing this though, we will use the `loc` functionality of pandas.

`loc` and `iloc` are functions that can get columns _or rows_.  `loc` goes by column name when getting columns and by index when getting rows.  `iloc` goes by the order of the column when getting columns and the order of the row when getting rows.

`loc` and `iloc` actually take two parameters to look up both axis at once.

In [19]:
wentworth.loc[1110, "Rainfall amount (millimetres)"]

0.0

but (as you can see) does it _row first_.  This means if we only give one, they will look up by row and give you back a series for that row.  It looks like the table was "flipped", but that is not really what happens.

In [21]:
wentworth.loc[1110]

Product code                                      IDCJAC0009
Bureau of Meteorology station number                   47045
Year                                                    1936
Month                                                      1
Day                                                       16
Rainfall amount (millimetres)                            0.0
Period over which rainfall was measured (days)           NaN
Quality                                                    Y
Name: 1110, dtype: object

# Example

What was the rainfall for the 1st May 2019 in Richmond RAF?

# Exercise

What is the title of the 6th row in the `workouts.csv` file?

# Using `loc`/`iloc` for everything?

Many pandas programmers just use `loc` and `iloc` for everything but I will not.  Using them "hides" the underlying working of pandas and since we are here to learn, that doesn't suit us.  We will use it when we need to, but stick to square bracket notation as much as possible.  If you post a question on stack overflow you will probably get a `loc`/`iloc` based answer though, so we want to make sure you really know how they work.