# Lists

Lists are the simplest data structures in Python, and you'll use them for absolutely everything. Let's start with a 1D list first.

In [48]:
myNumbers = [1123, 875, 3.146, 8.72, 188, 0.99]

This list stores three numbers, and we can get each entry in the list by using its index. Remember that Python starts counting at 0!

In [52]:
print myNumbers[2]
print myNumbers[-2]

3.146
188


A list can be used to store any kind of data. It can even store other lists!

In [50]:
dogNames = ['Fido', 'Spot', 'Romulus']
fours = [4, 'four', [1, 2, 3, 4], [[[[4]]]]]

In [53]:
print fours[2][2]
print fours[3][0][0]

3
[[4]]


---------

**Problem: In the following list, access the number 6.**

In [22]:
numbers = [1, [4], ['cat', [5, [[8, 6]]]], 13]

In [29]:
# answer
numbers[2][1][1][0][1]

6

---------

There are several alternative ways of making a list. Note that the first number in the range function is inclusive, but the second is exclusive.

In [45]:
ones = [1]*7
twos = [2]*10
answers = ['yes', 'no']*5
maybes = [['yes', 'no']]*5
count = range(5, 16)
countByTwo = range(5, 16, 2)

------

**Problem: Create a list of length 100 where every element is a 5.**


In [54]:
# answer
l = [5]*100

-----

One of the most useful ways of creating lists is via something called "list comprehension".

In [51]:
evens = [i*2 for i in range(10)]
print evens

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]


In [52]:
x = [1, 6, 2, 9, 5]
y = [i*2+6 for i in x]
print y

[8, 18, 10, 24, 16]


In [67]:
z = [i if i>3 else 0 for i in x]
print z

[0, 6, 0, 9, 5]


------------

**Problem: Create a list of odd numbers from 1 to 57.**


In [56]:
# answer
l = [i*2+1 for i in range(29)]

**Problem: Make a list of squares from 1 to 10,000.**

In [62]:
# answer
l = [i*i for i in range(1, 101)]

**Problem: Create a list of lists [[1], [1, 2], [1, 2, 3], ...] going all the way to a list of size 15.**


In [1]:
# answer
l = [range(1, i+2) for i in range(15)]

**Problem: Create a list of numbers and their squares [[1, 1], [2, 4], [3,9]... [100, 10000]] going all the way to 100.**


In [3]:
# answer
l = [[i, i*i] for i in range(1, 101)]

**Problem: Alter the list below, so that every even value is left the same and every odd value is multiplied by -1. (Might have to consult Stack Exchange).**


In [64]:
a = [12, 56, 23, 35, 123, 32, 65, 131, 54]

In [65]:
# answer
b = [i*-1 if i%2==1 else i for i in a]

-------------

You can choose subsets of a list. This is called slicing. Note that the first index in the slice is exclusive, but the second is inclusive.

In [258]:
a = range(50)
b = a[:20]
c = a[0:20]
d = a[10:20]
e = a[0:20:2]
f = a[::3]
g = a[2::3]

There are simple commands for adding and removing elements from a list.

In [34]:
b.pop(3)
b.append(36)
b.insert(2, 80)

You can also concatenate two lists with a simple addition operator.

In [37]:
ab = a+b

------

**Problem: Reorder an out-of-order list like a = [1, 2, 3, 6, 7, 8, 4, 5, 9, 10] using slicing and concating.**


In [55]:
a = [1, 2, 3, 6, 7, 8, 4, 5, 9, 10]

In [56]:
# answer
a = a[:3] + a[6:8] + a[3:6] + a[8:10]

**Problem: Make a Fibonacci sequence of length 50.**

In [60]:
# answer
a = [1, 1]
for _ in range(48):
    a.append(a[-1]+a[-2])

**Problem: Make a list of squares from 1-10k, except without any numbers from 5k-7k.**


In [62]:
# answer
a = []
for i in range(1, 101):
    square = i*i
    if square<5000 or square>7000:
        a.append(square)

------

You can combine the elements of two lists together into a single list via the zip() operation. The lists have to be the same size.

In [72]:
a = range(30)
b = [i*i for i in a]
c = zip(a,b)
d, e = c[0]

Unzipping a list is the same operation, but with an additional asterisk.

In [73]:
d, e = zip(*c)

-------

**Problem: Make an array c = a^2 + b.**


In [71]:
a = range(30)
b = [i*i for i in a]

In [74]:
# answer
c = [i*i+j for i,j in zip(a,b)]

--------

Important note! Variable names in Python are pointers. If you point two variables at a single list, changing one will change both!

In [80]:
a = range(5)
b = a
b.pop(3)
print a

[0, 1, 2, 4]


If you want to copy a list, you need to use the copy.deepcopy() function.

In [83]:
import copy
a = range(5)
b = copy.deepcopy(a)
b.pop(3)
print a

[0, 1, 2, 3, 4]


-----------

# Numpy Arrays

Arrays work like lists, but add a lot of extra features. They're not in basic Python, so you'll have to import the numpy (numeric Python) package to use them. If you don't already have it installed on your computer, you can get numpy by typing "pip install numpy" in the command line.

Because we use numpy so much for so many different things, it's common to just call numpy "np" when you import it. You can turn a list into a numpy array with the np.array() function.

In [88]:
import numpy as np
a = np.array([2, 3, 5])
b = range(6)
b = np.array(b)
print a
print b

[2 3 5]
[0 1 2 3 4 5]


Here are some examples of numpy features.

In [112]:
a = np.random.rand(2, 5)
b = np.ones(6)
c = np.zeros((3, 2, 4))
print a
print b
print c.shape

[[ 0.58976459  0.63534604  0.85449989  0.25100042  0.46435569]
 [ 0.33161429  0.93313368  0.5136131   0.17775698  0.41539377]]
[ 1.  1.  1.  1.  1.  1.]
(3, 2, 4)


However, there are some things lists can do which numpy arrays cannot do. For example, numpy has a hard time dealing with non-rectangular arrays.

In [102]:
a = [1, [4], ['cat', [5, [[8, 6]]]], 13]
a = np.array(a)

ValueError: setting an array element with a sequence

--------

**Problem: Create a 3x5 array of random numbers with a Gaussian mean of 5 and standard deviation of 1.5. You'll have to search online.**


In [119]:
# answer
a = np.random.normal(5, 1.5, (3, 5))

---------

Numpy is great for linear algebra. You can also do vector and matrix addition and multiplication, as well as transposes, inverses, etc. very easily. Note that the command for concating arrays is different than the command for concating lists.

In [157]:
a = np.array([[3, 2, 8, 1], [12, 4, 0, 6]])
b = np.array([[1, 6, 3, 9], [2, 5, 1, 6]])
print "Matrix 1"
print a
print "Matrix 2"
print b
print "Sum"
print a+b
print "Difference"
print a-b
print "Element-wise multiplication"
print a*b
print "Transpose of matrix 1"
print a.transpose()
print "Dot product"
print a.transpose().dot(b)

Matrix 1
[[ 3  2  8  1]
 [12  4  0  6]]
Matrix 2
[[1 6 3 9]
 [2 5 1 6]]
Sum
[[ 4  8 11 10]
 [14  9  1 12]]
Difference
[[ 2 -4  5 -8]
 [10 -1 -1  0]]
Element-wise multiplication
[[ 3 12 24  9]
 [24 20  0 36]]
Transpose of matrix 1
[[ 3 12]
 [ 2  4]
 [ 8  0]
 [ 1  6]]
Dot product
[[27 78 21 99]
 [10 32 10 42]
 [ 8 48 24 72]
 [13 36  9 45]]


More matrix operations. You should search online for the correct numpy function whenever you need a matrix operation.

In [156]:
a = np.random.rand(4, 4)
print "Matrix"
print a
print "Determinant of matrix"
print np.linalg.det(a)
print "Inverse of matrix"
print np.linalg.inv(a)
print "Trace of matrix"
print np.trace(a)

Matrix
[[  4.96718471e-01   9.06596419e-04   4.80087314e-01   4.77729629e-01]
 [  2.51531668e-01   3.95407503e-01   5.97658420e-01   7.19762273e-01]
 [  4.14799930e-01   8.68355307e-02   9.56492040e-01   2.64416367e-01]
 [  5.12390630e-01   7.97231812e-01   2.22395915e-01   7.11669950e-01]]
Determinant of matrix
-0.1161108846
Inverse of matrix
[[ 1.72244348 -2.43703848  0.38745967  1.16455021]
 [-1.94110648 -0.3631753   0.88970623  1.33976581]
 [-0.90743079  0.53886393  1.25800493 -0.40325497]
 [ 1.21791968  1.99307199 -1.66876229 -0.80813625]]
Trace of matrix
2.56028796328


----------

** Problem: In the code below, let data be 100 observations on a 5x5 grid. Find the mean and standard deviation. The result should be two 5x5 matrices. **

In [164]:
np.random.seed(42)
data = np.random.rand(100, 5, 5)
data[0]

array([[ 0.37454012,  0.95071431,  0.73199394,  0.59865848,  0.15601864],
       [ 0.15599452,  0.05808361,  0.86617615,  0.60111501,  0.70807258],
       [ 0.02058449,  0.96990985,  0.83244264,  0.21233911,  0.18182497],
       [ 0.18340451,  0.30424224,  0.52475643,  0.43194502,  0.29122914],
       [ 0.61185289,  0.13949386,  0.29214465,  0.36636184,  0.45606998]])

In [166]:
# answer
mean = data.mean(axis=0)
std = data.std(axis=0)

**Problem: Find the difference between the sum of the squares of the first one hundred natural numbers (starting from 1) and the square of the sum (https://projecteuler.net/problem=6).**


In [178]:
# answer
a = np.arange(1, 101)
difference = sum(a*a) - pow(sum(a), 2)

**Problem: Given two arrays of numbers, determine which numbers are in both matrices.**


In [187]:
np.random.seed(42)
a = (np.random.rand(10, 10)*100).astype(int)
b = (np.random.rand(10, 10)*100).astype(int)
print a
print b

[[37 95 73 59 15 15  5 86 60 70]
 [ 2 96 83 21 18 18 30 52 43 29]
 [61 13 29 36 45 78 19 51 59  4]
 [60 17  6 94 96 80 30  9 68 44]
 [12 49  3 90 25 66 31 52 54 18]
 [96 77 93 89 59 92  8 19  4 32]
 [38 27 82 35 28 54 14 80  7 98]
 [77 19  0 81 70 72 77  7 35 11]
 [86 62 33  6 31 32 72 63 88 47]
 [11 71 76 56 77 49 52 42  2 10]]
[[ 3 63 31 50 90 24 41 75 22  7]
 [28 16 92 80 63 87 80 18 89 53]
 [80 89 31 11 22 42 81 86  0 51]
 [41 22 11 33 94 32 51 70 36 97]
 [96 25 49 30 28  3 60 50  5 27]
 [90 23 14 48 98 24 67 76 23 72]
 [36 63 63 53  9 83 32 18  4 59]
 [67  1 51 22 64 17 69 38 93 13]
 [34 11 92 87 25 65 81 55 52 24]
 [ 9 89 90 63 33 34 72 89 88 77]]


In [189]:
# answer
intersection = np.intersect1d(a, b)

-------

Numpy also contains advanced searching and selection tools which are very useful.

In [301]:
np.random.seed(42)
a = np.random.randint(0, 10, 10)
print a
print a[np.array([1, 5, 3, 3])]

[6 3 7 4 6 9 2 6 7 4]
[3 9 4 4]


In [302]:
print a>5
print np.where(a>5)

[ True False  True False  True  True False  True  True False]
(array([0, 2, 4, 5, 7, 8]),)


In [307]:
print a[a>5]
print a[np.where(a>5)]
print a[~(a>5)]

[6 7 6 9 6 7]
[6 7 6 9 6 7]
[3 4 2 4]


In [304]:
b = np.random.rand(5,5)
print b
print b[[1,2],[2,3]]

[[ 0.60111501  0.70807258  0.02058449  0.96990985  0.83244264]
 [ 0.21233911  0.18182497  0.18340451  0.30424224  0.52475643]
 [ 0.43194502  0.29122914  0.61185289  0.13949386  0.29214465]
 [ 0.36636184  0.45606998  0.78517596  0.19967378  0.51423444]
 [ 0.59241457  0.04645041  0.60754485  0.17052412  0.06505159]]
[ 0.18340451  0.13949386]


-----------

**Problem: Split the even and odd indices of mixedData into two new arrays.**

In [254]:
np.random.seed(42)
mixedData = np.random.randint(0, 20, 100)

In [259]:
# answer
even = mixedData[::2]
odd = mixedData[1::2]

** Problem: Now split the even and odd _elements_ of mixedData into two new arrays. **

In [276]:
# answer
evens = mixedData[mixedData%2 == 0]
odds = mixedData[mixedData%2 == 1]

**Problem: Remove all data points which are 2 standard deviations or more outside the mean.**


In [277]:
np.random.seed(42)
data = np.random.normal(5, 1, 1000)

In [291]:
# answer
mean = np.mean(data)
std = np.std(data)
outliers = np.logical_or(data<mean-2*std, data>mean+2*std)
data = data[~outliers]

---------

# More Problems

There's a lot more stuff you can do with lists and arrays. As always in programming, there's lots of different ways to do a single problem. Try to find the most efficient ones!

----------

**Problem: Turn an array of numbers [0, 1, 2... 99] into an array of pairs [[0, 1], [2, 3]... [98, 99]].**


In [190]:
a = np.arange(100)

**Problem: Take an array of size 500 and reverse the order of numbers 100 through 300.**

In [191]:
a = np.arange(500)

**Problem: Replace the negative numbers in badData with zeros.**

In [205]:
np.random.seed(42)
badData = (np.random.rand(10, 10)*20-3).astype(int)

** Problem: Now replace the negative numbers in badData with the numbers in the same position in replacementData. **

In [206]:
np.random.seed(42)
badData = (np.random.rand(10, 10)*20-3).astype(int)
replacementData = (np.random.rand(10, 10)*20).astype(int)

**Problem: Each entry in data is in the form [time, measurement]. How much higher is the average measurement from t=10 to t=20 than it is from t=0 to t=10?**

In [217]:
np.random.seed(42)
times = np.sort(np.random.rand(100)*20)
measurements = np.arange(100)/20.0 + np.random.normal(0, 2.0, 100)
data = np.array(zip(times, measurements))

**Problem: The data below has entries of all different sizes. Make an array stating how long each entry is.**

In [225]:
np.random.seed(42)
data = [np.random.rand(np.random.randint(10)).tolist() for _ in range(50)]