# Manipulating data

OK, we've seen how to create lists and array. What to do to work with or change values of data?

Let's say that we have a list of 9 values from 0 to 1 and want to multiply it by $\pi$?

In [1]:
x_list = [0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.]

In [2]:
x_list * 3.14

TypeError: can't multiply sequence by non-int of type 'float'

## Loops!
A loop is a way to step through each element of an array.

Python has two types of loops
* `for` - goes through a fixed, predefined number of iterations ([more on for loops](https://swcarpentry.github.io/python-novice-gapminder/12-for-loops/index.html))
* `while` - continues until some condition happens

In Python, we iterate over the values using a `for` loop, which we can show as

In [3]:
for value in x_list:
    print(value)

0.0
0.125
0.25
0.375
0.5
0.625
0.75
0.875
1.0


Note that the first line of the `for` loop ends with a colon, and the body must be indented 4 spaces.

The above is the same as doing the following:

In [4]:
print(x_list[0])
print(x_list[1])
print(x_list[2])

0.0
0.125
0.25


How do we use the loop to update the data?

In [5]:
for i in range(len(x_list)):
    print('----------')
    print('index value', i)
    print('original value', x_list[i])
    x_list[i] = x_list[i] * 3.14
    print('updated value', x_list[i])
print(x_list)

----------
index value 0
original value 0.0
updated value 0.0
----------
index value 1
original value 0.125
updated value 0.3925
----------
index value 2
original value 0.25
updated value 0.785
----------
index value 3
original value 0.375
updated value 1.1775
----------
index value 4
original value 0.5
updated value 1.57
----------
index value 5
original value 0.625
updated value 1.9625000000000001
----------
index value 6
original value 0.75
updated value 2.355
----------
index value 7
original value 0.875
updated value 2.7475
----------
index value 8
original value 1.0
updated value 3.14
[0.0, 0.3925, 0.785, 1.1775, 1.57, 1.9625000000000001, 2.355, 2.7475, 3.14]


If you think this is cumbersome, you are right! There must be a better way!

In [6]:
import numpy as np
x_array = np.linspace(0, 1, 9)
print('original values', x_array)

original values [0.    0.125 0.25  0.375 0.5   0.625 0.75  0.875 1.   ]


In [19]:
x_array *= 3.14
print('updated values', x_array)

updated values [0.      1.23245 2.4649  3.69735 4.9298  6.16225 7.3947  8.62715 9.8596 ]


The same goes for using trig functions. NumPy has trig functions so you don't need to loop through to use them to calculate

$$
y = \sin(x).
$$

In [21]:
y_array = np.sin(x_array)
print(y_array)

[ 0.          0.94330485  0.62621789 -0.52758682 -0.97645917 -0.12064074
  0.89637118  0.71570146 -0.42124901]


NumPy has a plethora of [mathemtical functions](https://numpy.org/doc/stable/reference/routines.math.html) you can use.

Er, that's great if you want to update all the values. What if you only want to update some values.

## Conditional statements

An `if` statement (also called a conditional statement) controls whether some code is executed based on a condition! (suprise!)

It is written as
```
if <condition>:
    <some action is taken>
```

For the condition, you use a rational operator:

In the examples below, `a = 2.0` and `b = 3.0`.

<table>
<thead>
<tr>
<th>Operator</th>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr><td>&lt;</td><td>is less than</td><td>a &lt; b is True</td></tr>
<tr><td>&lt;=</td><td>is less than or equal to</td><td>a &lt;= b is True</td></tr>
<tr><td>&gt;</td><td>is greater than</td><td>a &gt; b is False</td></tr>
<tr><td>&gt;=</td><td>is greater than or equal to</td><td> a &gt;= b is False</td></tr>
<tr><td>==</td><td>is equal to</td><td>a == b is False</td></tr>
<tr><td>!= &lt;&gt;</td><td>is not equal to</td><td>a != b is True</td></tr>
</tbody>
</table>


In [8]:
speed = 50  # What Chris's speed on Laporte normally is
speed_limit = 40  # What the speed limit on Laporte is
if speed > speed_limit:
    print("Chris could get a ticket")

Chris could get a ticket


What if you want something `else` to happen?

In [9]:
speed = 40
if speed > speed_limit:
    print("Chris could get a ticket")
else:
    print("Chris doesn't have to worry about a ticket")

Chris doesn't have to worry about a ticket


Can I get more than two options? You sure can!

In [10]:
speed = 20
if speed > speed_limit:
    print("Chris could get a ticket")
elif speed < speed_limit:
    print("Chris might make the drivers behind him angry")
else:
    print("Chris doesn't have to worry")

Chris might make the drivers behind him angry


## Combining loops and conditionals

Some one gives you longitude values ranging from -180 to 540. You need them to range from 0 to 360 to work with your code. What do you do!


In [11]:
lon_list = [-180, -75, 30, 45, 127, 280, 360, 480, 538]
new_lon_list = []

for lon_val in lon_list:
    if lon_val < 0:
        new_lon_list.append(lon_val + 180)
    elif lon_val >= 360:
        new_lon_list.append(lon_val - 180)
    else:
        new_lon_list.append(lon_val)
        
print(new_lon_list)

[0, 105, 30, 45, 127, 280, 180, 300, 358]


## Does NumPy have a better way?

In [12]:
lon_array = np.array(lon_list)

condition_less = lon_array < 0
condition_greater = lon_array >= 360
lon_array[condition_less] += 180
lon_array[condition_greater] -= 180
print(lon_array)

[  0 105  30  45 127 280 180 300 358]


### How does that work?

NumPy allows slicing with `start:stop:step` or an array of Booleans.

In [13]:
start = 1
stop = 3  # up to, but not including
step = 1
print(lon_array[start:stop:step])

[105  30]


In [14]:
print(condition_less)
print(lon_array[condition_less])

[ True  True False False False False False False False]
[  0 105]


## My friend gave me lots of arrays to fix. What do I do?

For creating a piece of reusable code, you can create a function

In [15]:
def awesome_longitude_fix(input_lon_array):
    condition_less = input_lon_array < 0
    condition_greater = input_lon_array >= 360
    input_lon_array[condition_less] += 180
    input_lon_array[condition_greater] -= 180
    return input_lon_array

In [17]:
lon1 = np.array([-172, -90])
lon2 = np.array([30, 0])
lon3 = np.array([380, 420])
lon1 = awesome_longitude_fix(lon1)
lon2 = awesome_longitude_fix(lon2)
lon3 = awesome_longitude_fix(lon3)
print(lon1, lon2, lon3)

[ 8 90] [30  0] [200 240]


### What if I want to share my awesome function?

In [18]:
from awesome_code import awesome_longitude_fix as fix_longitude

lon1 = np.array([-172, -90])
lon2 = np.array([30, 0])
lon3 = np.array([380, 420])
lon1 = fix_longitude(lon1)
lon2 = fix_longitude(lon2)
lon3 = fix_longitude(lon3)
print(lon1, lon2, lon3)

[ 8 90] [30  0] [200 240]


## Another important gotcha with lists & arrays

In [None]:
x = np.array([1, 2, 3])
y = x
y[0] = 9

What is the value of `y[0]` and `x[0]`?

In [None]:
print(y[0], x[0])

## Additional resources for data manipulation
* [Software Carpentry: Python novice - for loops](https://swcarpentry.github.io/python-novice-gapminder/12-for-loops/index.html)
* [Software Carpentry: Python novice - conditionals](https://swcarpentry.github.io/python-novice-gapminder/13-conditionals/index.html).