# For Loops

## *This is based on Python version 2.7, but is easily translatable to Python 3.x

Before talking about for loops, we'll first talk about Python lists.
## Lists
In Python, a [list](https://docs.python.org/2.7/tutorial/datastructures.html "Python documentation for lists") is a discrete collection of elements, separated by commas, and enclosed in square brackets. They are very general in that different types of data can be contained in the same list, i.e. floats, integers, strings, tuples, other lists, etc. Making a list is very simple, for example:

In [None]:
##### With characters
my_list_chars = [ 'a' , 'b' , 'c' ]
print my_list_chars

# Using indexing
print my_list_chars[0]
print my_list_chars[1]
print my_list_chars[2]

# Using slicing (NOTE: this prints elements 0 and 1. So the number after the colon is NOT included)
print my_list_chars[0:2]

##### With integers
my_list_ints = [ 1 , 2 , 3 ]
print my_list_ints

# Using indexing
print my_list_ints[0]
print my_list_ints[1]
print my_list_ints[2]

# Using slicing
print my_list_ints[0:2]

# Mixed data types
my_list = [ 'a' , 'b' , 'c' , 1 , 2 , 3 , 'do' , 're' , 'me' ]

print '\n' , my_list[:3] , ", it's easy as" , my_list[3:6] , ', as simple as' , my_list[6:]

By default, [:N] means "start from the element at index 0 and go until the *N-1* element, and [N:] means "start from the element at index N and go until the end of the list. So lists are very versatile, but in practice when I work with lists they are usually always numbers.

## For loops

I think of a [for loop](https://docs.python.org/2/tutorial/controlflow.html "Python documentation for for loops") this way: I want to **loop** through each element **in** a **list**, and **for** each element I want to do something. For example, say we just wanted to print every element in a list, but on a separate line. In other words, `for` `each_element` `in` `my_list`, we want to `print` `each_element`.

In [None]:
for each_element in my_list:
    print each_element

So, the variable `each_element` is iteratively assigned each value of the list. Alternatively we could use indexing to print each value:

In [None]:
print my_list[0]
print my_list[1]
print my_list[2]
print my_list[3]
print my_list[4]
print my_list[5]
print my_list[6]
print my_list[7]
print my_list[8]

But clearly this is tedious (a good rule of thumb in coding is when something seems tedious, it can probably be accomplished with a loop, a function, or something else). So, instead of looping through each element in the list we could loop through a list of integers (created via the [range](https://docs.python.org/2/library/functions.html#range) function) that is the same size as our list, and use indexing to print each value:

In [None]:
for i in range( len( my_list ) ):
    print my_list[i]
    
print '\nrange( len( my_list ) ):' , range( len( my_list ) )

In order to get the correct number of integers I used the [len](https://docs.python.org/2/library/functions.html#len) function. This simply returns the length of the given list. A better way is to define a variable as the length of the array:

In [None]:
N = len( my_list )

for i in range( N ):
    print my_list[i]
    
print '\nlen( my_list ):' , len( my_list )
print 'range( N ):' , range( N )

## Avoid this common mistake

A common mistake (in my experience) is to send in the actual list into the range function instead of the length of the list, i.e

list = [ 1 , 2 , 3 , 4 , 5 , 6 ]

range( list ) WRONG

range( len( list ) ) RIGHT

The error you will get is:

In [None]:
list = [ 1 , 2 , 3 , 4 , 5 , 6 ]

range( list )

The range function is pretty versatile:

In [None]:
N = 10

# Default behavior
R1 = range( N )
print R1

# Specifying the start point
R2 = range( 1 , N )
print R2

# Specifying the step
R3 = range( 1 , N , 2 )
print R3 

# Going backwards (must specify the step)
R4 = range( 10 , 0 , -1 )
print R4

## xrange

Instead of using the range function, which creates a list, you can use the xrange function, which doesn't create a list, but an iterable (or something like that). The difference is that you don't have to use memory to store the list, and so if you have a big list (like millions or billions), xrange is better.

NOTE: In Python 3, the range function is actually xrange (at least that's what I read). So, in terms of using them in for loops like this there is no difference between the two.

In [None]:
N = len( my_list )

for i in xrange( N ):
    print my_list[i]
    
print '\nxrange( N ):' , xrange( N )

from time import time
# Setting this to 1.0e8 crashed my computer, just fyi
N = int( 1.0e7 )

print '\n--------------'
print 'Timing info'

x = 0
start = time()
for i in range( N ):
    x = i
    
print 'Using range: ' , time() - start

x = 0
start = time()
for i in xrange( N ):
    x = i
    
print 'Using xrange:' , time() - start

Not a huge difference here, but with larger lists and/or more complicated math, it adds up.

## List comprehension

An alternative (and superior) way to structure a for loop is to use **list comprehension**. It has the same functionality as a for loop, but is much faster. I don't use it much (but I should), but it goes something like this:

In [None]:
N = 10
nums = range( N )

# Old way
squares = []
for i in range( N ):
    squares.append( nums[i]**2 )
    
print squares

# New way
squares = [ nums[i]**2 for i in range( N )]

print squares

In [None]:
# Setting this to 1.0e8 crashed my computer, just fyi
N = int( 1.0e7 )

nums = range( N )

print '\n--------------'
print 'Timing info'

start = time()
for i in range( N ):
    squares.append( nums[i]**2 )
print time() - start

start = time()
squares = [ nums[i]**2 for i in range( N ) ]
print time() - start

Again not a huge difference, but with more data, etc...

We'll end this with a slightly more difficult example, specifically from Andreas' HW 2 from Large-Scale Structure.

## Example: Part (f) from Andreas' HW2, weighting by 1/Vmax

For this we need the data set he gave us, `SDSS_DR7.dat`.

In [None]:
# Import numpy and matplotlib
import numpy            as np
import matplotlib.pyplot as plt

# Make plots appear within notebook
%matplotlib inline

# Define constants
c     = 3.0e10  # [ cm / s ]
h     = 1.0     # [ dimensionless ]
H0    = 100 * h # [ km / s / Mpc ]
OMEGA = 2.295   # [steradians ]

# Convert H0 from km/s/Mpc to 1/s by dividing by km/Mpc
H0 /= 3.086e19 # [ 1 / s ]

##### Compute the mean volume of the sample
zmean  = 0.1                          # [dimensionless ]
dmean  = c * zmean / H0               # [ h^(-1) * cm ] (small redshift approximation)
Vmean  = 1.0 / 3.0 * dmean**3 * OMEGA # [ h^(-3) * cm^3 ]
Vmean *= ( 1.0 / 3.086e24 )**3        # [ h^(-3) * Mpc^3 ]

# Read in the data and load it into arrays
RA , DEC , z , Mg , Mr = np.loadtxt( 'SDSS_DR7.dat' , unpack = True )
# unpack = True transposes the array. So, if the datafile structure is in columns (as we have here)
# we can assign each column (transposed into a row) 
# to its own array with one line

##### Bin the data
# Define bin-width
dMr = 0.1 # [ magnitudes ]

# Define array holding left and right edges of each bin
BinEdges = np.arange( np.min( Mr ) , np.max( Mr ) + dMr , dMr ) # [ magnitudes ]

# To create bins, copy the bin-edges array but remove the last entry because for N bin-edges there are (N-1) bins
# Be careful with copying numpy arrays. You MUST use the copy function, otherwise you won't get a copy, just
# a new pointer to the same array, so changes to your "copy" will also apply to the original
bins = np.copy( BinEdges[ 0 : -1 ] ) # [ magnitudes ]

# Shift the bins over by 1/2 of the bin-width to align x-axis ticks with centers of bins
bins += dMr / 2.0 # [ magnitudes ]

# Get total number of bins
Nbins = len( bins )

# Define empty array of length Nbins to hold the counts
counts = np.empty( Nbins )

# Loop through each bin
for i in range( Nbins ):
    
    # Get all the galaxies in the magnitude bin
    x = np.where( ( Mr > BinEdges[i] ) & ( Mr < BinEdges[i+1] ) )[0]
    
    # If the bin is empty, set it's value to nan and skip the rest of the loop
    if ( len(x) == 0 ):
        counts[i] = 'nan'
        continue

    # Compute the maximum volume in this bin
    zmax = np.max( z[x] )              # [ dimensionless ]
    dmax = c * zmax / H0               # [ h^(-1) * cm ]
    Vmax = 1.0 / 3.0 * dmax**3 * OMEGA # [ h^(-3) * cm^3 ]
    Vmax *= ( 1.0 / 3.086e24 )**3      # [ h^(-3) * Mpc^3 ]

    # Add the inverse-volume-weighted counts to the array
    counts[i] = np.sum( x ) * 1.0 / Vmax

# Normalize by total number of galaxies and binwidth
counts /= ( len( Mr ) * dMr )

# Compute number density
dn = counts / Vmean

plt.step( bins , np.log10( dn ) , 'k' )

plt.xlabel( r'$M_{r}=-2.5\,\ell og_{10}\left(L_{r}/L_{\odot,r}\right)+M_{\odot,r}$' )
plt.ylabel( \
  r'$\ell og_{10}\left(d\left(\frac{n}{h^{3}\,Mpc^{-3}}\times\frac{1}{V_{max}}\right)/dM_{r}\right)$' )

# Invert the x-axis so that brighter objects are to the right
plt.xlim( np.max( Mr ) , np.min( Mr ) )

plt.show()

Finally, [here](https://docs.python.org/2.7/tutorial/datastructures.html#looping-techniques) is Python's looping tutorial. 