# Lecture 4: List comprehension & NumPy
ENVR 890-010: Python for Environmental Research, Fall 2022

September 10, 2022

By Andrew Hamilton, modified by Rosa Cuppari. Some material adapted from Greg Characklis, David Gorelick and H.B. Zeff.

## Summary
In this lecture, we will first learn about using **list comprehensions** to write for loops in a more efficient and compact way. We will then move beyond the standard data structures (list, tuple, dictionary) to a more advanced data structure called a **NumPy array**. Along the way, we will learn how **logical indexing** can be used with NumPy as a powerful tool to retrieve and manipulate particular subsets of data.

## List comprehensions
As we learned last week, **loops** can be used to execute a code block for every item in a list. For example, to a list of all integer degrees between 0 and 90 converted to radians, we can write:

In [1]:
import math

def degrees_to_radians(degrees):
    return degrees * (2 * math.pi / 360)

In [None]:
### create list of degrees
degrees = list(range(0, 91))

### use for loop to create list of radians
radians = []
for d in range(0, 91):
    radians.append( degrees_to_radians(d) )
    
print(degrees)
print()
print(radians)
print()
print( len(radians) )

[**List comprehensions**](https://docs.python.org/3/tutorial/datastructures.html) are a way to write for loops in a more compact and computationally efficient way:

In [None]:
### do the same thing with list comprehension instead of for loop
radians_lc = [degrees_to_radians(d) for d in degrees]

print(radians_lc)
print()
### check that the two lists are equivalent
print(radians == radians_lc)

The list comprehension syntax can be thought of as an **expression** (e.g., what to do to each element) followed by a **for clause**, all surrounded by brackets. 

We can include an **if clause** to include only certain elements (for example, those whose degrees are divisible by 10) in the final list:

In [None]:
### first use for loop combined with if statement
radians_divisible10 = []
for d in degrees:
    if d % 10 == 0:
        radians_divisible10.append( degrees_to_radians(d) )
print(radians_divisible10)
print()
print(len(radians_divisible10))

In [None]:
### now repeat with list comprehension
radians_divisible10_lc = [degrees_to_radians(d) for d in degrees if d % 10 == 0]

print(radians_divisible10_lc)
print()

print(radians_divisible10 == radians_divisible10_lc)

### In class exercise
Assume we work for the county office of environmental quality, and are administering a program that will provide subsidized water quality assessments for all households that rely on private groundwater wells and have incomes less than \$30,000 per year. 

First we will create random demographic data for 1000 households (but pretend for the exercise that this was retrieved from a county database).

In [2]:
import random
## numeric index for each household in county
household = ['H' + str(i) for i in list(range(1000))]
# print(household)

In [3]:
## randomly assign water source for each household
water = [random.choices(['municipal', 'private'], weights = [0.6, 0.4], k=1)[0] for h in household]
# print(water)

In [4]:
## randomly assign income for each household from Normal/Gaussian distribution
income = [max(random.gauss(50000, 20000), 0) for h in household]
# print(income)

1. Use list comprehension to find the list of households with a private water source? How many are there?

In [5]:
private_hh = [household[i] for i in range(1000) if water[i] == 'private']
len(private_hh)

394

In [None]:
# private_hh = []
# for i,hh in enumerate(household):
#     if water[i] == 'private':
#         private_hh.append(hh)
# print(private_hh)

In [None]:
### enumerate example
l = ['r','g']
# for e in l:
#     print(e)
    
# for i in range(len(l)):
#     print(i)
    
# for i in range(len(l)):
#     print(l[i])

# for i,e in enumerate(l):
#     print(i, e)

for i in range(len(l)):
    print(i, l[i])

2. How many households meet both criteria for testing?

In [None]:
# eligible = [household[i] for i in range(len(household)) if (water[i] == 'private') and (income[i] < 30000) ]
# print(len(eligible))

# eligible = [private_hh[i] for i in range(len(private_hh)) if (income[i] < 30000) ]
# print(len(eligible))

3. What are the average incomes for households with municipal and private water, respectively? Use the statistics.mean() function.

In [None]:
import statistics
muni_income = [income[i] for i in range(len(household)) if (water[i] == 'municipal') ]
print(statistics.mean(muni_income))