# Sorting Arrays and Lists

## Sorting Arrays

In the last mission, we learned about arrays and linked lists. In this mission, we'll cover a more concrete usage of both data structures by talking about sorting. If you've used a Python list, you're probably familiar with using the list.sort() method to order values. For example, if we have the below list:

```python
>> values = [5,3,2,10]
```

We can sort it:

```python
>> values.sort()
>> values
[2,3,5,10]
```
However, there are many cases where the default sorting behavior isn't ideal:
* We have a **custom data structure**, and we want to sort it. For example, we want to sort a set of `JSON` files.
* We're working with data that's **too large to fit into memory**, but we still want to ensure that everything is sorted. This **may require splitting the data across multiple machines to sort**, and then combining the sorted results.
* We want a **custom ordering** -- for example, we want to sort locations based on their proximity to one or more cities. We can't sort by simple distance to the closest city, since we want to take distance to multiple cities into account.

In cases like the above, you'll need to either find a different sorting mechanism in Python, or code your own sort. Even in cases where you're not directly coding your own sort, understanding how the sorting works can help you pick the sorting mechanism that will complete the fastest.<br>

There are a variety of different sorting techniques that have different time and space complexity tradeoffs. We'll start with some simple but inefficient sorting algorithms, then dive into more complex algorithms in a later course.<br>

Before we dive into trying some sorting algorithms, let's explore the dataset we'll be using. We'll be using a set of bank transaction data that originally came from here. Each row is a single credit card transaction, and each column contains attributes of that transaction. A credit card transaction is any time you use a credit card, for example in a coffee shop to buy coffee. We've removed a number of extraneous columns, which leaves us with just two -- `Time` and `Amount`:
* `Time` -- the number of times the credit card used for this transaction has been used.
* `Amount` -- the dollar amount of the transaction.

Here are the first few rows of the data:

 |Time|Amount
---|---|---
0|0.0|149.62
1|0.0|2.69
2|1.0|378.66
3|1.0|123.50
4|2.0|69.99

Let's explore the data to get a handle on it before moving forward.

* Load the data in from `amounts.csv` using `pandas.read_csv()`.
* Create a list called `amounts` that contains all of the items in the `Amount` column.
* Create a list called `times` that contains all of the items in the `Time` column.
* Print the average of all the items in amounts.

In [1]:
import pandas as pd

amounts_csv = pd.read_csv('../data/amounts.csv')
amounts = amounts_csv['Amount']
times = amounts_csv['Time']

print(amounts.mean())

88.3496192509


## Swapping Elements

The basic unifying factor behind most sorting algorithms is the idea of the swap. We take one element from one part of an array, and another element from another part of an array, and switch them. If we make enough swaps, we can eventually get the array into the right order. Most sorting algorithms differ only on in which order different items are swapped. Here's how a single swap would look:

![array_swaps-1](https://s3.amazonaws.com/dq-content/173/array_swaps_1.svg)

Note that we can't directly swap `2` for `5` and `5` for `2`. Once we make the first swap, the value of `2` will be overwritten. We need to store that value somewhere else while we make the first swap. Here are the steps in making a swap:
* Select the two elements you want to swap.
* Copy the first element to an external variable.
* Replace the first element with the value of the second.
* Replace the second element with the value of the external variable.

Let's code our own swap function -- this will be important later on, when we're creating sorting algorithms.



* Write a `swap` function that takes in 3 arguments -- `array`, `pos1`, and `pos2`. The function should swap the array elements in position `pos1` and position `pos2`.
* Assign the first `10` elements from `amounts` to `first_amounts`.
* Swap the items in position `1` and `2` in `first_amounts`.

In [4]:
def swap(array, pos1, pos2):
    
    pos1_val = array[pos1]
    array[pos1] = array[pos2]
    array[pos2] = pos1_val
    

In [7]:
first_amounts = amounts[:10]
first_amounts

0    149.62
1      2.69
2    378.66
3    123.50
4     69.99
5      3.67
6      4.99
7     40.80
8     93.20
9      3.68
Name: Amount, dtype: float64

In [8]:
swap(first_amounts, 1, 2)
first_amounts

0    149.62
1    378.66
2      2.69
3    123.50
4     69.99
5      3.67
6      4.99
7     40.80
8     93.20
9      3.68
Name: Amount, dtype: float64

## Selection Sort

Let's say you're handed a list of numbers to sort in order:

```python
numbers = [5,4,3,1]
```

A simple way to sort them might be to find the lowest number and swap it with the number in front:

```python
numbers = [1,4,3,5]
```
Then find the next lowest number and swap it with the next number:

```python
numbers = [1,3,4,5]
```
After repeating the process, the list is sorted. What we just did was a selection sort! Here's a diagram to make it more clear how the sort works:

![selection-sort](https://s3.amazonaws.com/dq-content/173/selection_sort.svg)

As you can see, we:
* Loop across all of the elements in the array
  * Loop across all the elements from the outer loop index to the end of the array
    * This loops across all of the unsorted elements
    * Find the lowest value
  * Swap the lowest value with the first unsorted value

Once there aren't any unsorted elements left, our sort is done. This is a fairly straightforward type of sort, but isn't necessarily the best performing.

Here's an animation of a selection sort, by user Joestape89 on Wikimedia:

![selection-sort-animated](https://s3.amazonaws.com/dq-content/173/Selection-Sort-Animation.gif)

* Create a function called `selection_sort` that takes in an argument `array`.
  * Loop across `array` from the first element to the last (loop variable `i`).
    * Loop across `array` from `i` to the end of the array with the loop variable `z` (this loops across the unsorted values)
    * Find the lowest value and lowest index.
  * Swap the lowest element with `i`
* Assign the first 10 elements from amounts to `first_amounts`.
* Sort `first_amounts` with your function.

In [9]:
def selection_sort(array):
    pass