# Section 2.2 | Loops and List Comprehension

Let's go a bit further with Python lists and see how we can use them to create for loops. You may remember "for loops" from [Module 1: Section 4](https://github.com/bueno646/CIERA-HS-Program-2021/blob/master/IDEASpy-Mike-Updates/Module_1/Section_4_conditional_statements_and_loops.ipynb). We will also introduce "list comprehension", how they can simplify your life, and how we can read data from files into lists.

__why should you care about list comprehensions__: They allow for you to both quickly access (or "parse") data stored in lists and to write more condensed code! 

## For Loops Refresher

In Module 1, you learned to construct for loops to repeat a series of steps over a predefined number of times, e.g., for each item x in example_list where example_list contains 4 items [Module 1: Section 4](https://github.com/bueno646/CIERA-HS-Program-2021/blob/master/IDEASpy-Mike-Updates/Module_1/Section_4_conditional_statements_and_loops.ipynb). Another way to repeat a sequence of steps is through a for loop, in which you repeat (or "iterate") over a predefined sequence, such as a Python list. As a reminder, here is the general structure for a for loop:

> example_list = [2,4,6,8]
> for temporary_variable in example_list:<br>
> &nbsp; &nbsp; &nbsp;    print(temporary_variable)

Try running this code in the cell below!

In [None]:
example_list = [2,4,6,8] 

for temporary_variable in example_list:
    
    print(temporary_variable)

__Explanation__:<br>
"temporary_variable" in the cell above is a variable the for loop uses as it goes through the items in the sequence "example_list". Starting with the 0th element in example_list, the int 2, the for loop saves this element as "temporary_variable". The for loop then does what we tell it to - print "temporary_variable" in line 5. After this temporary_variable is printed, the for loop starts over again with the next element in example_list - the int 4 - being saved as the "temporary_variable" because there are no other commands after the print statement. Everything under line 3 in the cell above is repeated for each item in "example_list".


### Quick note on variable names
It is important to be careful with the different names you use in your code! It can be helpful to name your variables in a way that helps prevent you from accidently reusing variables later on! Unknowningly reusing a  variable that you used earlier can lead to bugs in your code that can be difficult to find! In our example from above, the temporary variable "temporary_variable" will remain assigned to the last element in the python list it was iterating over. Run the cell below to see for yourself!

In [None]:
print(temporary_variable)

##  Grabbing indices for an iterable object

We can also iterate over the indices of the list (e.g., for a list of 4 elements, or length 4, the indices are 0, 1, 2, and 3). Within the loop, we reference the items by the list name and the index variable. You may find it helpful to revist our discussion on indexing in 1.3 of [Module 2: Section 1](https://github.com/bueno646/CIERA-HS-Program-2021/blob/master/IDEASpy-Mike-Updates/Module_2/Section_2_List_Comprehensions.ipynb).  

### Accessing (or 'parse', 'iterate over' etc) your data
__why should I care?__

Lists are powerful objects because they provide an easily accessible place to store your data. A key feature of lists is that you can write code to access the data stored in them quickly, instead of writing many lines of code. Using range( ), len( ), and for loops allows you to quickly access your day. It is also often helpful to keep track of indices while we are iterating over the data. 

##  Walkthrough: Using the range( ), len( ), and for loops to access (or 'parse', 'iterate over' etc) your data

__Context__:

You, an astronomer, are given a data set of 1000 stellar temperatures. Within those stellar temperatures, you want to make a subset of sun like stars - stars with temperatures between 5300 K and 6000 K. Before you try your code on the entire data set, you want to try it on a smaller portion of this data (a 'subset') - 10 data points. 

__Situation__: 

You are going to use the enumerate( ) function to iterate over the data in your subset. You will also use conditional operators (from [Module 1: Section 4](https://github.com/bueno646/CIERA-HS-Program-2021/blob/master/IDEASpy-Mike-Updates/Module_1/Section_4_conditional_statements_and_loops.ipynb)) to check each data point if between 5300 K and 6000 K. If the data point is within this temperature range, we will add this index to a list (this list will be empty to start). We will then use that list of indices _to make a copy of the subset that is __only__ sun like stars_.

### Using Range(  ) and len( ) to grab indices
We can quickly grab indices for a list using two  functions: len() and range(). 

Here is a brief explanation of the range function and the len function.

__Brief Explanation__:<br>
> range(max_number) : returns an __iterable__ object consisting of numbers from 0 up to max_number minus 1<br>
> len(iterable_object) : returns the number of items in an __iterable__ object (like a __list__)

__Note:__ folks often refer to the output of the len( ) function as "the length of an object" instead of "the number of items".

Lets gets more familiar with both functions with examples!

#### Range( ) example

In [None]:
range(10)  # this is the iterable object

## Lets use a for loop on this object to see what is in it

for ii in range(10):
    print (ii)

__Range Example Explanation__: <br>
As noted in 1.2.1, the range function return an iterable object consisting of the number 0 up to the max_number (10 in this) minus 1. The outputs of 1.2.2 confirm this, as we see 0-9 printed.



#### Len( ) example

In [None]:
# length of a list with three objects in it
fruit_list = ['apple','orange','cherry']

print("our fruit list has length:",len(fruit_list))

# it even works if your list is empty
empty_list = []

print()
print("the length of my empty list is:", len(empty_list))

#### Where does the len( ) fit in?
As we reviewed earlier, the len function will tell you the length of an iterable object. We can use the length of the iterable object with the range function to have access to the indices of the elements in an iterable object.

__Why should I care?__

With the indices of an iterable object like a list, you can change the elements of that list! This could be helpful in a variety of situations. For example, imagine you were given a data subset of 10 stellar temperatures. You find out that the instrument used to collect the data accidentally reports negative data points for some stars. You can use a for loop to iterate over the list of stellar temperatures while also using range and len (as we will show with an example) to have the indices on hand. Then, you can use a conditional statement to check if an element is negative. If it is, you can use code like this to clean up your data set!

Example
> if star_temp < 0: <br>
> &nbsp; &nbsp; &nbsp; star_temp_list.pop(element_index)    #this removes the negative temperature!


Lets go through an example __with just three elements__ to show how range( ), len ( ), and for loops work together in this way!

__note:__ Pay close attention to the comments in the following code blocks!

In [None]:
# Below is the data subset
stellar_data_kelvin_subset = [5600,5000,-6500,6600,3000,-5708,7000,6300,-5200,5900]


for element_index in range(len(stellar_data_kelvin_subset[:3])):
    print("the loop is on index #:", element_index)
    print("The element at this index is:",stellar_data_kelvin_subset[element_index])
    print()

#### Example explanation
Let's take a close look at __line 5__ of the example:

> for element_index in range(len(stellar_data_kelvin_subset[:3])):

Lets list what is going on here:

> - First, the len(stellar_data_kelvin_subset[:3]) returns the length of the slice of the list. In this case, it is length 3
> - Next, range(len(stellar_data_kelvin_subset[:3])) works as if the code was written as range(3) __because__ len(stellar_data_kelvin_subset[:3]) returns the integer 3
>> - The fact that the range function is not inclusive is crucial here. range(3) is now an iterable object that returns the following integers: 0, 1, and 2

Lets pause a second to simplify the original for loop statement "for element_index in range(len(stellar_data_kelvin_subset[:3]))" 

> original: for element_index in range(len(stellar_data_kelvin_subset[:3])) <br>
> simplified: for element_index in range(3):

Now lets continue to break down the simplified version!

> - range(3) is an iterable object that returns the following integers: 0, 1, and 2
> - Therefore, 0 is the first integer that will be saved to the temporary variable "element_index", followed by 1 and 2
>> - __important__ the integers 0, 1, and 2 are __also__ the indices for a list containing 3 objects
>> - For example, fruit_list = ['cherry', 'apple', 'orange] is a list containing 3 objects and therefore has indices 0 (the 0th element is'cherry'), 1 (the 1st element is 'apple'), and lastly 2 (the 2nd element is 'orange') 

Lets move on to the next stage, what happens after the first integer "0" is stored to the temporary variable "element_index"? Lets look at the next two print statements, reproduced below, to close out this explanation

> print("the loop is on index #:", element_index) <br> 
> print("The element at this index is:",stellar_data_kelvin_subset[element_index])

The first print statement just prints the contents of the temporary variable "element_index", which would be the integer "0" at first

The second print statement __uses__ element_index to access an element in the list "stellar_data_kelvin_subset" because element_index is in square brackets next to the list - i.e stellar_data_kelvin_subset[element_index] 



##### Now that we can use range( ), len( ),  and for loops to access list elements, lets take it a step further!
In the "why should I care" for this section, we talked about an example where you would have to remove negative star temperatures. Now lets actually remove the negative values using the data subset from above and also building on the for loop we used earlier

See the code below and pay attention to the comments for an explanation!

In [None]:
# Below is the data subset
stellar_data_kelvin_subset = [5600,5000,-6500,6600,3000,-5708,7000,6300,-5200,5900]

# Lets print the original list for reference

print("The unaltered list is ", stellar_data_kelvin_subset)
print()

# we will use this list later
storage_list = []

for element_index in range(len(stellar_data_kelvin_subset)):
    
    # lets use a conditional statement to check if the temperature is negative
    if stellar_data_kelvin_subset[element_index] < 0:
        
        # lets use an informative print statement for learning purposes
        print("I found a negative number at index", element_index)
        print ()
        
        # now that we have an index, we can save the values we want to get rid of later
        storage_list.append(stellar_data_kelvin_subset[element_index])
        
        
        
# now lets iterate over our list of stored indices and remove the negative values!
for negative_value in storage_list:
    stellar_data_kelvin_subset.remove(negative_value)
        
# Lets print our updated list now
print("after removing negatives, our list is:", stellar_data_kelvin_subset)

##### Recap

We wanted only positive stellar temperatures. We used lists, for loops, range( ), and len( ) to arrive at the code outputs above. 

Lets take a look at the code above, the list of indices, and the original data subset to check that everything worked as expected. You can find this in the code block below.

In this example, we walked through how to use the following to conveniently select data points from larger data sets:

> - lists
> - for loops
> - len( ) and range( )


## List comprehension

List comprehension is a way to write code with fewer lines (or "leaner" code). Comprehensions are a powerful tool for doing operations on lists or even to create new lists. Essentially, list comprehensions allow you to use for loops on lists with fewer lines of code!

__Why you should care__

Often, writing code with fewer lines can be more easy to read and more efficient for your computer to process. 


### Walkthrough: Unit Conversions

Lets say you are given a list of the masses of 4 objects in kilograms. You find out that you need the objects in grams.  

__Note__: we can convert from kilograms to grams by multiplying by 1000!

Let's see how we can write a for loop via list comprehension to help us out here!

In [None]:
data_kilograms_ = [2, 4, 6, 8]

# Prints the items in the list without need for a for loop
print("in kilograms",[item for item in data_kilograms_])  
print()

 # This will print each item multiplied by 1000
print("in grams",[item*1000 for item in data_kilograms_])   

And, another way to do the exact same thing, by iterating over the indices of the list:

In [None]:
data_kilograms_ = [2, 4, 6, 8]
print("in kilograms",[data_kilograms_[i] for i in range(len(data_kilograms_))])
print("in kilograms",[data_kilograms_[i]*1000 for i in range(len(data_kilograms_))])

List comprehension can be a bit trickier to understand because it's compressing the code down to the bare minimum; that's also the power of list comprehension, and why it's worth understanding it well.

## Your turn! Using Lists and List Comprehensions

__Situation:__ 
You, an astronomer, have been giving a list of distance of the nearest stars (exlcuding the sun) in __light years (Ly)__. 

__Task:__
You will using loops, list comprehensions, and mathematical operations to convert the distances from __light years (Ly)__ to __parsecs (P)__

__Hint__: 1 Ly = 0.306601 pc

__Note__: Pay close attention to the comments for detailed instructions

In [None]:
# List nearest stars distances in light years
distances_ly = [4.2, 4.2, 4.4, 5.9, 7.8]

# By the way, the names of these stars nearest stars are:
# (1) Proxima Centauri, (2) Alpha Centauri A, (3) Alpha Centauri B, 
# (4) Barnard's Star and (5) Wolf 359
# Alpha Cen A and B are technically part of a single star system

# Conversion factor - from ly to pc
ly_to_pc = 0.306601



In [None]:
# Write a FOR LOOP to print each star distance in units of pc using the conversion factor above
for FILL IN CODE
    FILL IN CODE 
    

In [None]:
# Now, print off the distances in units of pc again, but this time using LIST COMPREHENSION
print('distances in pc:', FILL IN CODE)



##### In the above code, we printed the converted values, but we didn't SAVE the converted values


In [None]:
# Using a FOR LOOP, create a NEW list to store the distances in pc, called distances_pc
distances_pc = FILL IN CODE    # must first create the new empty list
for i in FILL IN CODE:   # here we set up temp variable i for iterating
    FILL IN CODE    # hint: use append



In [None]:
# Alternatively, we can change the values of our original list so that all values are in pc
for i in FILL IN CODE:
    distances_ly[i] = FILL IN CODE








##### Print out the new list distances_pc and the modified distances array
##### They SHOULD be identical!

In [None]:
print('distances_pc =', FILL IN CODE)
print('distances in pc =', FILL IN CODE)

##### Now, perform the same two tasks using LIST COMPREHENSION

In [None]:
# First, let's reset back to our original list in light years
distances_ly = [4.2, 4.2, 4.4, 5.9, 7.8]

# Using LIST COMPREHENSION, create a NEW list to store the distances in pc, called distances_pc
distances_pc = FILL IN CODE

In [None]:
# Using LIST COMPREHENSION, change the values of our original list so that all values are in pc
distances = FILL IN CODE

##### Lastly print out the new list distances_pc and the modified distances array
##### Again, they SHOULD be identical!

In [None]:
print('dis_lytances_pc =', FILL IN CODE)
print('distances in pc =', FILL IN CODE)

## Takeaways
> - You can use for loops, range( ), and len( ) to grab indices for list. This can be helpful for a variety of tasks, such as cleaning up data!
> - List comprehension are a powerful tool for writing cleaner and more concise code