# Problem Set 13: Programming basics in Python

Author: Greg Wray

## Instructions

Create a markdown document within JupyterLab and answer the questions below using code blocks that generate the correct outputs. We encourage you to include explanatory text in your markdown document. 

Write "robust" solutions wherever possible. A good rule of thumb for judging whether your solution is appropriately "robust" is to ask yourself "If I added additional observations or variables to this data set, or if the order of variables changed, would my code still compute the right solution?"

Make sure your markdown is nicely formatted -- use headers, bullets, numbering, etc so that the structure of the document is clear.

When completed, title your markdown file as follows (replace `XX` with the assignment number, e.g. `01`, `02`, etc):

-   `netid-assignment_XX-Spring2024.qmd`

Submit both your markdown file and the generated HTML document via the Assignments submission section on Sakai.


1. Sets can be a useful data structure when you anticipate needing to enforce unique values and don't care about the order of items. They are also useful when you want to compare membership among groups. This can be done using operators and set logic.

   * Create some sets:   
   `iterables = {'string', 'list', 'tuple', 'set', 'dictionary', 'frozen set', 'range'}`   
   `immutables = {'frozen set', 'string'}`  
   `collections = {'set', 'frozen set'}`  
   `sequences = {'list', 'tuple', 'dictionary', 'range'}`  

     Write code that uses set logic to solve the following: 

   * How many different data structures are represented in the four sets?

   * How many iterable data structures are not collections? 

   * How many data structures are mutable? (You can assume that the set `immutables` is complete.)

   * How many iterables are neither collections nor sequences? Which one(s)?   

2.  Dictionaries are common in Python programming because they are highly optimized for "look up" operations: cases where you know the key and want to retrieve the value(s) it is associated with. A data frame constructed from nested lists can provide the same functionality, but will typically be much slower because it is built on a different data structure. Let's have the best of both worlds and construct a data fame using a dictionary. 

    Construct a dictionary called `mammals` by copying and pasting the following code:   
    `keys = ['human', 'red fox', 'horseshoe bat', 'opossum']`  
    `B = ['primate', 'canine', 'microbat', 'marsupial']`   
    `C = [3.2, 2.9, 1.9, 4.0]`   
    `D = [2, 4, 2]`   
    `E = [0, 0, 2, 0]`    
    `vals = list(zip(B, C, D, E))`     
    `mammals = dict(zip(keys, vals))`    

    * Retrieve the values for `'human'` and for `'opossum'`. What worked and what went wrong? (Hint: look carefully at the input lists.)

    * Notice that `zip()` doesn't care if the lists you give it are not of equal length -- it doesn't raise an exception or error at run-time. How does it handle this situation? Note that `zip()`'s behavior is different from R, where "recycling" is typically used to force vectors to be equal. 

    * Notice also how easy it is to make this kind of mistake! In **words** describe how you could "trap" this type of error prior to constructing the dictionary. **Bonus:** write code to actually do this.

    * List D contains the number of legs of the mammal in question. Fix list D and re-create the dictionary. Check that you can retrieve the correct values for `'opossum'`.

    * The other lists are as follows: B represents genome size and E represents the number of wings. Add your favorite mammal to the dictionary. You can look up genome sizes [here](http://www.genomesize.com).

    * Dictionaries have a order, which is specified when they are created or updated. Unlike lists, you can't easily alter this order. What happens when you try to apply `sort()`? If you want to re-order a dictionary, the best approach is to make a copy. We won't do that right now as it's a bit awkward. (All data structures have strengths and weaknesses!) But we can copy the keys into a list using `.keys()`. Write code to return the name of the mammal that is is alphabetically last.     

    * A similar approach can be used to copy the values with `.values()`. Write code to return the value of the largest genome.

3. This problem and the next two build on the [Boolean network simulation model](https://github.com/Bio724D/Bio724D_2023_2024/blob/main/python_notebooks/boolean_networks_example.ipynb) we examined previously. Starting with the function definitions and the basic boolean network model ("core of the simulation"), your task is to extend the code to check every possible starting condition in a single run. The first step is to explicitly enumerate all the possible staring conditions. With 3 nodes and 2 possible conditions for each, the number of permutations is 8. These need to be placed into a nested list of lists so that your program can step through each unique combination of starting conditions, one at a time:     
`[[True, True, True], [True, True, False], ...]`  
There are 3 ways you could construct this nested list: (1) write it out by hand; (2) use nested for loops (one for each node and one for each condition); or (3) use a method in the `itertools` module, which is part of the Python Standard Libary. Any of these options is okay for solving this problem. However, note that the first option is simple with a small number of starting conditions, but does **not** scale well if there are more nodes! 

4. Once you have enumerated all the possible starting conditions, the next step is to place the simulation inside a `for` loop so that it carries out a simulation for each set of starting conditions. During each loop, you will need to extract the next item in the nested list and assign its contents to the initial values of `V1`, `V2`, and `V3` before running the simulation. 

5. Testing every possible set of starting conditions allows us to evaluate their impact on the outcome of the simulation. Let's now automate the process of comparing those outputs. One way to think about the behavior of boolean networks is that the system sometimes oscillates between 2 or more states and sometimes becomes fixed in single state. Let's focus on the simpler case and count how many starting conditions lead to behavior that becomes fixed. We can define "fixed" as a situation where the last 10 iterations have precisely the same state. When the simulation finishes looping, the lists `V1`, `V2`, and `V3` contain the information you need to  determination whether fixed is True or False. Store that value in a list where each item is itself a list that contains two items: (1) the starting condition and (2) the truth value for fixed. You can either represent the starting condition as another list (e.g., `[True, False, False]`) or encode it as a string represention (e.g., `'TFF'`). **Bonus:** are you able to draw any conclusion about the boolean model from your measurement? 

6. **Notebook:** Choose something that you learned from lecture, the hands-on coding in class, or your own investigation that you think will be valuable for your future programming endeavors. Using text or a mix of text and code, create an entry for your notebook. Add this to your notebook and include it here. 

7. **Thursday lunch:** Identify something that you learned from the presentation or discussion on Thursday that you found valuable. Provide a brief reflection here (1-5 sentences) and include code or pseudo-code if useful. (Hint: consider adding this to your notebook as well.)