# Manipulating data

OK, we've seen how to create lists and array. What to do to work with or change values of data?

Let's say that we have a list of 9 values from 0 to 1 and want to multiply it by $\pi$?

In [None]:
x_list = [0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.]

In [None]:
x_list * 3.14

## Loops!
A loop is a way to step through each element of an array.

Python has two types of loops
* `for` - goes through a fixed, predefined number of iterations ([more on for loops](https://swcarpentry.github.io/python-novice-gapminder/12-for-loops/index.html))
* `while` - continues until some condition happens

In Python, we iterate over the values using a `for` loop, which we can show as

In [None]:
for value in x_list:
    print(value)

Note that the first line of the `for` loop ends with a colon, and the body must be indented 4 spaces.

The above is the same as doing the following:

In [None]:
print(x_list[0])
print(x_list[1])
print(x_list[2])

How do we use the loop to update the data?

In [None]:
for i in range(len(x_list)):
    print('----------')
    print('index value', i)
    print('original value', x_list[i])
    x_list[i] = x_list[i] * 3.14
    print('updated value', x_list[i])
print(x_list)

If you think this is cumbersome, you are right! There must be a better way!

In [None]:
import numpy as np
x_array = np.linspace(0, 1, 9)
print('original values', x_array)

In [None]:
x_array *= 3.14
print('updated values', x_array)

The same goes for using trig functions. NumPy has trig functions so you don't need to loop through to use them to calculate

$$
y = \sin(x).
$$

In [None]:
y_array = np.sin(x_array)
print(y_array)

NumPy has a plethora of [mathemtical functions](https://numpy.org/doc/stable/reference/routines.math.html) you can use.

Er, that's great if you want to update all the values. What if you only want to update some values.

## Conditional statements

An `if` statement (also called a conditional statement) controls whether some code is executed based on a condition! (suprise!)

It is written as
```
if <condition>:
    <some action is taken>
```

For the condition, you use a rational operator:

In the examples below, `a = 2.0` and `b = 3.0`.

<table>
<thead>
<tr>
<th>Operator</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr><td>&lt;</td><td>is less than</td><td>a &lt; b is True</td></tr>
<tr><td>&lt;=</td><td>is less than or equal to</td><td>a &lt;= b is True</td></tr>
<tr><td>&gt;</td><td>is greater than</td><td>a &gt; b is False</td></tr>
<tr><td>&gt;=</td><td>is greater than or equal to</td><td> a &gt;= b is False</td></tr>
<tr><td>==</td><td>is equal to</td><td>a == b is False</td></tr>
<tr><td>!= &lt;&gt;</td><td>is not equal to</td><td>a != b is True</td></tr>
</tbody>
</table>


In [None]:
speed = 50  # What Chris's speed on Laporte normally is
speed_limit = 40  # What the speed limit on Laporte is
if speed > speed_limit:
    print("Chris could get a ticket")

What if you want something `else` to happen?

In [None]:
speed = 40
if speed > speed_limit:
    print("Chris could get a ticket")
else:
    print("Chris doesn't have to worry about a ticket")

Can I get more than two options? You sure can!

In [None]:
speed = 20
if speed > speed_limit:
    print("Chris could get a ticket")
elif speed < speed_limit:
    print("Chris might make the drivers behind him angry")
else:
    print("Chris doesn't have to worry")

## Combining loops and conditionals

Some one gives you longitude values ranging from -180 to 540. You need them to range from 0 to 360 to work with your code. What do you do!


In [None]:
lon_list = [-180, -75, 30, 45, 127, 280, 360, 480, 538]
new_lon_list = []

for lon_val in lon_list:
    if lon_val < 0:
        new_lon_list.append(lon_val + 180)
    elif lon_val >= 360:
        new_lon_list.append(lon_val - 180)
    else:
        new_lon_list.append(lon_val)
        
print(new_lon_list)

## Does NumPy have a better way?

In [None]:
lon_array = np.array(lon_list)

condition_less = lon_array < 0
condition_greater = lon_array >= 360
lon_array[condition_less] += 180
lon_array[condition_greater] -= 180
print(lon_array)

### How does that work?

NumPy allows slicing with `start:stop:step` or an array of Booleans.

In [None]:
start = 1
stop = 3  # up to, but not including
step = 1
print(lon_array[start:stop:step])

In [None]:
print(condition_less)
print(lon_array[condition_less])

## My friend gave me lots of arrays to fix. What do I do?

For creating a piece of reusable code, you can create a function

In [None]:
def awesome_longitude_fix(lon_array):
    """
    Fixes longitude values to be between 0 & 360

    Parameters
    ----------
    lon_array : array-like
        array of longitude values that needs to be fixed

    Return
    ------
    lon_array : array-like
        longitude array between 0 & 360
    """
    condition_less = lon_array < 0
    condition_greater = lon_array >= 360
    lon_array[condition_less] += 180
    lon_array[condition_greater] -= 180
    return lon_array

In [None]:
lon1 = np.array([-172, -90])
lon2 = np.array([30, 0])
lon3 = np.array([380, 420])
lon1 = awesome_longitude_fix(lon1)
lon2 = awesome_longitude_fix(lon2)
lon3 = awesome_longitude_fix(lon3)
print(lon1, lon2, lon3)

### What if I want to share my awesome function?

In [None]:
from awesome_code import awesome_longitude_fix as fix_longitude

lon1 = np.array([-172, -90])
lon2 = np.array([30, 0])
lon3 = np.array([380, 420])
lon1 = fix_longitude(lon1)
lon2 = fix_longitude(lon2)
lon3 = fix_longitude(lon3)
print(lon1, lon2, lon3)

### But, wait, we are still repeating code

Whenever you realize that you've repeated code, you should say to yourself "there must be a better way!"

In [None]:
lon1 = np.array([-172, -90])
lon2 = np.array([30, 0])
lon3 = np.array([380, 420])
for lon in [lon1, lon2, lon3]:
    lon = fix_longitude(lon)
print(lon1, lon2, lon3)


   <b>Why does <code>lon</code> change <code>lon1</code>, <code>lon2</code>, and <code>lon3</code>?</b><br>

   <code>lon</code> is a temporary <code>identifier</code> that <b>points</b> to the address where the values for <code>lon1</code>, <code>lon2</code>, and <code>lon3</code> are stored in memory. This relates to a gotcha that you need to watch out for.


### But, wait, each step of the loop repeats

Can't we do this at the same time?

In this example, we have what's called an embarrassingly parallel task. We can use task parallelism to do this work. Check out more details on [joblib](https://joblib.readthedocs.io/en/stable/parallel.html).


In [None]:
from joblib import Parallel, delayed
Parallel(n_jobs=3)(delayed(fix_longitude)(lon) for lon in [lon1, lon2, lon3])

<div class="alert alert-block alert-info"><b>Hint</b> Be a thoughtful machine user. Check how many cores your machine has with <code>lscpu</code> and use <code>top</code> to see what resources other users are using.</div>

<div class="alert alert-block alert-danger">
    <h3>Example on a big gotcha!</h3>
</div>
An important gotcha with lists &amp; arrays.

In [None]:
x = np.array([1, 2, 3])
y = x
y[0] = 9

What is the value of `y[0]` and `x[0]`?

In [None]:
print(y[0], x[0])

<div class="alert alert-block alert-info"><b>Hint</b> This is similar to <code>lon</code> in the loop in the last section.</div>

## Additional resources for data manipulation
* [Software Carpentry: Python novice - for loops](https://swcarpentry.github.io/python-novice-gapminder/12-for-loops/index.html)
* [Software Carpentry: Python novice - conditionals](https://swcarpentry.github.io/python-novice-gapminder/13-conditionals/index.html)
* [Unidata: Loops](https://unidata.github.io/python-training/python/loops/)
* [Unidata: Conditionals](https://unidata.github.io/python-training/python/conditionals/)
* [Unidata: Functions](https://unidata.github.io/python-training/python/functions/)
* [Unidata: NumPy Basics](https://unidata.github.io/python-training/workshop/NumPy/numpy-basics/)