## 1. Enumerate

There are many situations where we'll need to iterate over multiple lists in tandem, such as this one:

>```python
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for animal in animals:
    print("Animal")
    print(animal)
    print("Viciousness")
```

In the example above, we have two lists. The second list describes the viciousness of the animals in the first list. A <span style="background-color: #F9EBEA; color:##C0392B">Dog</span> has a viciousness level of <span style="background-color: #F9EBEA; color:##C0392B">1</span>, and a SuperLion has a viciousness level of <span style="background-color: #F9EBEA; color:##C0392B">10</span>. We want to retrieve the position of the item in animals the loop is currently on, so we can use it to look up the corresponding value in the <span style="background-color: #F9EBEA; color:##C0392B">viciousness</span> list.

Unfortunately, we can't just loop through <span style="background-color: #F9EBEA; color:##C0392B">animals</span>, and then tap into the second list. Python has an [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) function that can help us with this, though. The [enumerate()](https://docs.python.org/3/library/functions.html#enumerate) function allows us to have two variables in the body of a for loop -- an index, and the value.

>```python
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
viciousness = [1, 5, 10, 10, 1]
for i, animal in enumerate(animals):
    print("Animal")
    print(animal)
    print("Viciousness")
    print(viciousness[i])
```

In [None]:
ships = ["Andrea Doria", "Titanic", "Lusitania"]
cars = ["Ford Edsel", "Ford Pinto", "Yugo"]

for i,item in enumerate(ships):
    print(item)
    print(cars[i])

In [None]:
#adding columns
things = [["apple", "monkey"], ["orange", "dog"], ["banana", "cat"]]
trees = ["cedar", "maple", "fig"]

for i, item in enumerate(things):
    item.append(trees[i])
    
print(things)

## 2. List Comprehensions

We've written many short for loops to manipulate lists. Here's an example:

>```python
animals = ["Dog", "Tiger", "SuperLion", "Cow", "Panda"]
animal_lengths = []
for animal in animals:
    animal_lengths.append(len(animal))
```

It takes three lines to calculate the length of each string <span style="background-color: #F9EBEA; color:##C0392B">animals</span> this way. However, we can condense this down to one line with a list comprehension:

>```python
animal_lengths = [len(animal) for animal in animals]
```

List comprehensions are much more compact notation, and can save space when you need to write multiple for loops.

<br>
<div class="alert alert-info">
<b>Exercise Start.</b>
</div>

**Description**: 

1. Use list comprehension to create a new list called <span style="background-color: #F9EBEA; color:##C0392B">apple_prices_doubled</span>, where you multiply each item in <span style="background-color: #F9EBEA; color:##C0392B">apple_prices</span> by <span style="background-color: #F9EBEA; color:##C0392B">2</span>.
2. Use list comprehension to create a new list called <span style="background-color: #F9EBEA; color:##C0392B">apple_prices_lowered</span>, where you subtract <span style="background-color: #F9EBEA; color:##C0392B">100</span> from each item in <span style="background-color: #F9EBEA; color:##C0392B">apple_prices</span>.

>```python
apple_prices = [100, 101, 102, 105]
```

## 3. Numpy

In the previous notebooks, we used nested lists in Python to represent datasets. Python lists offer a few advantages when representing data:

- lists can contain mixed types
- lists can shrink and grow dynamically

Using Python lists to represent and work with data also has a few key disadvantages:

- to support their flexibility, lists tend to consume lots of memory
- they struggle to work with medium and larger sized datasets

While there are many different ways to classify programming languages, an important way that keeps performance in mind is the difference between **low-level** and **high-level** languages. Python is a high-level programming language that allows us to quickly write, prototype, and test our logic. The C programming language, on the other hand, is a low-level programming language that is highly performant but has a much slower human workflow.

<span style="background-color: #F9EBEA; color:##C0392B">NumPy</span> is a library that combines the flexibility and ease-of-use of Python with the speed of C. In this mission, we'll start by getting familiar with the core NumPy data structure and then build up to using NumPy to work with the dataset <span style="background-color: #F9EBEA; color:##C0392B">olympics.csv</span>, which contains data on olympics results over the years in Track and Field sport.


### 3.1 Creating Arrays

The core data structure in NumPy is the <span style="background-color: #F9EBEA; color:##C0392B">ndarray</span> object, which stands for **N-dimensional array**. An **array** is a collection of values, similar to a list. **N-dimensional** refers to the number of indices needed to select individual values from the object.

<img width="500" alt="creating a repo" src="https://drive.google.com/uc?export=view&id=0BxhVm1REqwr0X0VuT3NoZGZ0UlU">

A **1-dimensional** array is often referred to as a vector while a **2-dimensional** array is often referred to as a **matrix**. Both of these terms are both borrowed from a branch of mathematics called linear algebra. They're also often used in data science literature, so we'll use these words throughout this course.

To use <span style="background-color: #F9EBEA; color:##C0392B">NumPy</span>, we first need to import it into our environment. NumPy is commonly imported using the alias <span style="background-color: #F9EBEA; color:##C0392B">np</span>:

>```python
import numpy as np
```

We can directly construct arrays from lists using the <span style="background-color: #F9EBEA; color:##C0392B">numpy.array()</span> function. To construct a vector, we need to pass in a single list (with no nesting):

>```python
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])
```

In [None]:
import numpy as np

vector = np.array([10, 20, 30])
matrix = np.array([[5, 10, 15], [20, 25, 30], [35, 40, 45]])

print(vector[0])
print(matrix[0])
print(matrix[0][1])

### 3.2 Array shape

It's often useful to know how many elements an array contains. We can use the [ndarray.shape](http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ndarray.shape.html) property to figure out how many elements are in the array.



In [None]:
vector = np.array([1, 2, 3, 4])
print(vector.shape)

matrix = np.array([[5, 10, 15], [20, 25, 30]])
print(matrix.shape)

### 3.3 Using numpy

We can read in datasets using the [numpy.genfromtxt()](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.genfromtxt.html) function. Our dataset, <span style="background-color: #F9EBEA; color:##C0392B">olympics.csv</span> is a comma separated value dataset. We can specify the delimiter using the delimiter parameter:

>```python
import numpy
data = numpy.genfromtxt("data.csv", delimiter=",")
```

**"olympics.csv'**

Here's what each column represents:

- Name -- the name of the medalist
- Country -- the country the data is for.
- Medal -- type of medal won 
- Time -- time executed.
- Year -- the year the data in the row is for.
- Color -- the color that the competitor was using during the competition


In [15]:
import numpy as np
olympics = np.genfromtxt("data/olympics.csv", delimiter=',')
print(type(olympics))

<class 'numpy.ndarray'>


Each value in a NumPy array has to have the same data type. NumPy data types are similar to Python data types, but have slight differences. You can find a full list of NumPy data types [here](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html). 

In [16]:
print(olympics.dtype)

float64


### 3.4 Inspecting the data

Here's how NumPy represents the first few rows of the dataset:

>```python
array([[             nan,              nan,              nan,              nan,              nan],
       [  1.98600000e+03,              nan,              nan,              nan,   0.00000000e+00],
       [  1.98600000e+03,              nan,              nan,              nan,   5.00000000e-01]])
```

In [17]:
olympics

array([[    nan,     nan,     nan,     nan,     nan,     nan],
       [    nan,     nan,     nan,    9.63, 2012.  ,     nan],
       [    nan,     nan,     nan,    9.75, 2012.  ,     nan],
       [    nan,     nan,     nan,    9.79, 2012.  ,     nan],
       [    nan,     nan,     nan,    9.69, 2008.  ,     nan],
       [    nan,     nan,     nan,    9.89, 2008.  ,     nan],
       [    nan,     nan,     nan,    9.91, 2008.  ,     nan],
       [    nan,     nan,     nan,    9.85, 2004.  ,     nan],
       [    nan,     nan,     nan,    9.86, 2004.  ,     nan],
       [    nan,     nan,     nan,    9.87, 2004.  ,     nan],
       [    nan,     nan,     nan,    9.87, 2000.  ,     nan],
       [    nan,     nan,     nan,    9.99, 2000.  ,     nan],
       [    nan,     nan,     nan,   10.04, 2000.  ,     nan],
       [    nan,     nan,     nan,    9.84, 1996.  ,     nan],
       [    nan,     nan,     nan,    9.89, 1996.  ,     nan],
       [    nan,     nan,     nan,    9.9 , 1996.  ,   

There are a few concepts we haven't been introduced to yet that we'll dive into into:

- Many items in <span style="background-color: #F9EBEA; color:##C0392B">world_alcohol</span> are <span style="background-color: #F9EBEA; color:##C0392B">nan</span>, including the entire first row. <span style="background-color: #F9EBEA; color:##C0392B">nan</span>, which stands for **"not a number"**, is a data type used to represent missing values.
- Some of the numbers are written like <span style="background-color: #F9EBEA; color:##C0392B">1.98600000e+03</span>.

The data type of <span style="background-color: #F9EBEA; color:##C0392B">olympics</span> is float. Because all of the values in a **NumPy array have to have the same data type**, NumPy attempted to convert all of the columns to floats when they were read in. The <span style="background-color: #F9EBEA; color:##C0392B">numpy.genfromtxt()</span> function will attempt to guess the correct data type of the array it creates.

In this case, the **Name**, **Country**, **Medal** and **Color** columns are actually <span style="background-color: #F9EBEA; color:##C0392B">strings</span>, and couldn't be converted to <span style="background-color: #F9EBEA; color:##C0392B">floats</span>. When NumPy can't convert a value to a numeric data type like float or integer, it uses a special nan value that stands for **"not a number"**. NumPy assigns an na value, which stands for "not available", when the value doesn't exist. <span style="background-color: #F9EBEA; color:##C0392B">nan</span> and <span style="background-color: #F9EBEA; color:##C0392B">na</span> values are types of missing data. We'll dive more into how to deal with missing data in later missions.

The whole first row of <span style="background-color: #F9EBEA; color:##C0392B">world_alcohol.csv</span> is a header row that contains the names of each column. This is not actually part of the data, and consists entirely of strings. Since the strings couldn't be converted to floats properly, NumPy uses nan values to represent them.

### 3.5 Reading the data correctly

When reading in the data using the [numpy.genfromtxt()](http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.genfromtxt.html) function, we can use parameters to customize how we want the data to be read in. While we're at it, we can also specify that we want to skip the header row of <span style="background-color: #F9EBEA; color:##C0392B">world_alcohol.csv</span>.

To specify the data type for the entire NumPy array, we use the keyword argument dtype and set it to <span style="background-color: #F9EBEA; color:##C0392B">"U75"</span>. This specifies that we want to read in each value as a 75 byte unicode data type. We'll dive more into unicode and bytes later on, but for now, it's enough to know that this will read in our data properly.

To skip the header when reading in the data, we use the skip_header parameter. The <span style="background-color: #F9EBEA; color:##C0392B">skip_header</span>skip_header parameter accepts an integer value, specifying the number of lines from the top of the file we want NumPy to ignore.


In [20]:
olympics = np.genfromtxt("data/olympics.csv", delimiter=",", dtype="U75", skip_header=1)
print(olympics)

[['Usain Bolt' 'JAM' 'GOLD' '9.63' '2012' 'goldenrod']
 ['Yohan Blake' 'JAM' 'SILVER' '9.75' '2012' 'silver']
 ['Justin Gatlin' 'USA' 'BRONZE' '9.79' '2012' 'saddlebrown']
 ['Usain Bolt' 'JAM' 'GOLD' '9.69' '2008' 'goldenrod']
 ['Richard Thompson' 'TRI' 'SILVER' '9.89' '2008' 'silver']
 ['Walter Dix' 'USA' 'BRONZE' '9.91' '2008' 'saddlebrown']
 ['Justin Gatlin' 'USA' 'GOLD' '9.85' '2004' 'goldenrod']
 ['Francis Obikwelu' 'POR' 'SILVER' '9.86' '2004' 'silver']
 ['Maurice Greene' 'USA' 'BRONZE' '9.87' '2004' 'saddlebrown']
 ['Maurice Greene' 'USA' 'GOLD' '9.87' '2000' 'goldenrod']
 ['Ato Boldon' 'TRI' 'SILVER' '9.99' '2000' 'silver']
 ['Obadele Thompson' 'BAR' 'BRONZE' '10.04' '2000' 'saddlebrown']
 ['Donovan Bailey' 'CAN' 'GOLD' '9.84' '1996' 'goldenrod']
 ['Frankie Fredericks' 'NAM' 'SILVER' '9.89' '1996' 'silver']
 ['Ato Boldon' 'TRI' 'BRONZE' '9.9' '1996' 'saddlebrown']
 ['Linford Christie' 'GBR' 'GOLD' '9.96' '1992' 'goldenrod']
 ['Frankie Fredericks' 'NAM' 'SILVER' '10.02' '1992' '

In [21]:
#slicing
print(olympics[1,:])
print(olympics[1,0:2])

['Yohan Blake' 'JAM' 'SILVER' '9.75' '2012' 'silver']
['Yohan Blake' 'JAM']


### 3.5 Array Comparisons

One of the most powerful aspects of the NumPy module is the ability to make comparisons across an entire array. These comparisons result in Boolean values.




In [7]:
vector = np.array([5, 10, 15, 20])
vector == 10

array([False,  True, False, False])

In [8]:
matrix = np.array([[5, 10, 15], 
                   [20, 25, 30],
                   [35, 40, 45]]
                 )
matrix == 25

array([[False, False, False],
       [False,  True, False],
       [False, False, False]])

### 3.6 Selecting elements

We mentioned that comparisons are very powerful, but it may not have been obvious why on the last screen. Comparisons give us the power to select elements in arrays using Boolean vectors. This allows us to conditionally select certain elements in vectors, or certain rows in matrices.



In [None]:
vector = np.array([5, 10, 15, 20])
equal_to_ten = (vector == 10)

print(vector[equal_to_ten])

In [None]:
vector = np.array([5, 10, 15, 20])
equal_to_ten_and_five = (vector == 10) | (vector == 5)

print(equal_to_ten_and_five)

In [None]:
vector = np.array([5, 10, 15, 20])
equal_to_ten_or_five = (vector == 10) | (vector == 5)
vector[equal_to_ten_or_five] = 50
print(vector)

### 3.7 Computing with NumPy

Now that alcohol_consumption consists of numeric values, we can perform computations on it. NumPy has a few built-in methods that operate on arrays. You can view all of them in the documentation. For now, here are a few important ones:

- sum() -- Computes the sum of all the elements in a vector, or the sum along a dimension in a matrix
- mean() -- Computes the average of all the elements in a vector, or the average along a dimension in a matrix
- max() -- Identifies the maximum value among all the elements in a vector, or the maximum along a dimension in a matrix

Here's an example of how we'd use one of these methods on a vector:

In [None]:
vector = np.array([5, 10, 15, 20])
vector.sum()

With a matrix, we have to specify an additional keyword argument, axis. The axis dictates which dimension we perform the operation on. 1 means that we want to perform the operation on each row, and 0 means on each column. The example below performs an operation across each row:




In [None]:
matrix = np.array([
                [5, 10, 15], 
                [20, 25, 30],
                [35, 40, 45]
             ])
matrix.sum(axis=1)

### 3.8 NumPy Strengths And Weaknesses
You should now have a good foundation in NumPy, and in handling issues with your data. NumPy is much easier to work with than lists of lists, because:

- It's easy to perform computations on data.
- Data indexing and slicing is faster and easier.
- We can convert data types quickly.
Overall, NumPy makes working with data in Python much more efficient. It's widely used for this reason, especially for machine learning.

You may have noticed some limitations with NumPy as you worked through the past two missions, though. For example:

- All of the items in an array must have the same data type. For many datasets, this can make arrays cumbersome to work with.
- Columns and rows must be referred to by number, which gets confusing when you go back and forth from column name to column number.
- In the next few missions, we'll learn about the Pandas library, one of the most popular data analysis libraries. Pandas builds on NumPy, but does a better job addressing the limitations of NumPy.

