# Transforming and Combining Data

In the previous module you worked on a dataset that combined two different `World Health
Organization datasets: population and the number of deaths due to tuberculosis`.
They could be combined because they share a `common attribute: the countries`. This
week you will learn the techniques behind the creation of such a combined dataset.

In [None]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

import pandas as pd

In [None]:
table = [
  ['UK', 2678454886796.7],    # 1st row
  ['USA', 16768100000000.0],  # 2nd row
  ['China', 9240270452047.0], # and so on...
  ['Brazil', 2245673032353.8],
  ['South Africa', 366057913367.1]
]

In [None]:
headings = ['Country', 'GDP (US$)']
gdp = pd.DataFrame(columns=headings, data=table)
gdp

In [None]:
headings = ['Country name', 'Life expectancy (years)']
table = [
  ['China', 75],
  ['Russia', 71],  
  ['United States', 79],
  ['India', 66],
  ['United Kingdom', 81]
]
life = pd.DataFrame(columns=headings, data=table)
life

## Defining functions
To make the GDP values easier to read, I wish to convert US dollars to millions of US
dollars.

I have to be precise about what I mean. For example, if the GDP is 4,567,890.1 (using
commas to separate the thousands, millions, etc.), what do I want to obtain? 

Do I want always to round down to the nearest million, making it 4 million, round to the nearest
million, making it 5, or round to one decimal place, making it 4.6 million? 

Since the aim is to simplify the numbers and not introduce a false sense of precision, let’s round to the
nearest million.

The following function, written in two different ways, rounds a number to the nearest million. It calls the Python function `round()` which rounds a decimal number to the nearest integer. If two integers are equally near, it rounds to the even integer.

In [None]:
def roundToMillions (value):
    result = round(value / 1000000)
    return result

A function definition always starts with `def` , which is a reserved word in Python.
After it comes the function’s name and arguments, surrounded by parenthesis, and finally
a colon (:). This function just takes one argument. If there’s more than one argument, use
commas to separate them.
Next comes the function’s body, where the calculations are done, using the arguments
like any other variables. The body must be indented, conventionally by four spaces.
For this function, the calculation is simple. I take the value, divide it by one million, and call
the built-in Python function `round()` to convert that number to the nearest integer. If the
number is exactly mid-way between two integers, `round()` will pick the even integer,
i.e. `round(2.5) is 2 but round(3.5)` is 4.Finally, I write a `return statement` to pass the
result back to the code that called the function. The return word is also reserved in
Python.
The result variable just stores the rounded value temporarily and has no other purpose.
It‘s better to write the body as a single line of code:

In [None]:
def roundToMillions (value):
    return round(value / 1000000)

To test a function, write expressions that check for various argument values whether the function returns the expected value in each case.

In [None]:
roundToMillions(4567890.1) == 5

The art of testing is to find as few test cases as possible that cover all bases. And I mean
all. Prepare for the worst and hope for the best.
So here are some more tests, even for the unlikely cases of the GDP being zero or
negative, and you can probably think of others.

In [None]:
roundToMillions(0) == 0  # always test with zero...

In [None]:
roundToMillions(-1) == 0 # ...and negative numbers

In [None]:
roundToMillions(1499999) == 1 # test rounding to the nearest

Now for the next conversion, from US dollars to a local currency, for example British
pounds. I searched the internet for ‘average yearly USD to GBP rate’, chose a conversion
service and took the value for 2013. Here’s the code and some tests.

The next function converts US dollars to British pounds.

In [None]:
def usdToGBP (usd):
    return usd / 1.564768 # average rate during 2013 

usdToGBP(0) == 0

In [None]:
usdToGBP(1.564768) == 1

In [None]:
usdToGBP(-1) < 0

### Tasks

1. Define a few more test cases for both functions.
- Why can't you use `roundToMillions()` to round the population to millions of inhabitants? Write a new function and test it. **You need to write this function in preparation for next task.**
- Write a function to convert US dollars to your local currency. If your local currency is USD or GBP, convert to Euros. Look up online what was the average exchange rate in 2013.

In [9]:
# Why can't you use roundToMillions() to round the population to millions of inhabitants? Write a new function and test it. You need to write this function in preparation for next task.

In [10]:
def roundToMillions (value):
    return round(value / 1000000)

In [7]:
# Write a function to convert US dollars to your local currency. If your local currency is USD or GBP, convert to Euros. Look up online what was the average exchange rate in 2013.

In [8]:
$ currency_converter 100 USD --to EUR
1 USD = 0.7268 EUR on 31-12-2013


SyntaxError: invalid syntax (<ipython-input-8-f9ff53875f75>, line 1)