# ReprodICUbility: Open Science and Coding Guidelines
Welcome to ReprodICUbility! It is a pleasure to have you with us.
In this Notebook, we want to provide a guide for implementing best practices in two critical areas: **Open Science** and **Coding Guidelines**.

<u>**Open Science**</u>  
As members of the Student Network of Open Science, adhering to open science principles is one of our highest priorities and this Notebook will help operationalize these principles. It aims to offer steps to integrate the at times abstract and lofty ideals of open science into our workflows in a concrete, pragmatic manner.

<u>**Coding Guidelines**</u>  
Collaborative work is at the core of our project, making standardized coding practices essential. This section will outline guidelines designed to streamline collaboration by ensuring uniformity in code across teams and throughout the project. Additionally, it will provide recommendations for enhancing code readability and structure.


Together, these guidelines will support our shared goal of developing a robust and reproducible open reproduction pipeline as a team effort.


## <u>**[I] Open Science**</u>

### <u>0. Open Ignorance</u>
- **Principle**:
    - This is a note for our collaboration rather than for the reproduction process itself.
    - ReprodICUbility is a large, fast-paced project where diverse backgrounds meet. Naturally, this will lead to questions and uncertainties.
- **Practice:**
    - _Openly ask every question you have_: a willingness to learn is far more valuable here than trying to appear omniscient!
        - If you don’t understand something after paying attention, chances are that half of your collaborators are in the same boat.
    - _Share helpful resources_ or explanations whenever you find answers to your questions.
    - Encourage a culture where asking for clarification is seen as a contribution to collective understanding.

### <u>1. Accountability</u>
- **Principle**:
    - Effective collaboration and appropriate crediting require documentation of your contributions to the project.
    - Open science starts with transparency among ourselves, not just with what we present the public at the end.
    - It is remarkable how much it speeds up team efforts to know precisely whom to ask about a specific decision that was made.
        - “What does this variable mean? What statistical assumption is being tested here? Who decided on this or that approach? . . .”
- **Practice**:
    - _Annotate_ scripts and documents when you make significant alterations or decisions, along with the rationales behind them.
    - Maintain a changelog in our shared documents to track who made which changes and why.
        - Suggestion: Set up brief but regular _documentation sprints_ where teams review and update project records.
    - Regular _check-ins with other teams_: Engage in mutual reviews and exchanges of ideas.
        - Articulating your approaches helps identify inconsistencies, often even before others point them out.

### <u>2. Transparency</u>
- **Principle:**
    - This may be the heart of classic open science: making all aspects of research accessible, including methods, data, protocols, and results, to enable scrutiny and reuse.
- **Practice:**
    - _Citation_: Provide full references for all external information, whether from papers, textbooks, or other resources.
    - _AI_: Nowadays, most of us use AI in some way. Documenting exactly how and were it was used it, thus, also part of open science practice.
    - _Findings_: Report all findings, regardless of whether they align with expectations or are appealing — do not filter evidence.
    - _Alternatives_: Acknowledge that there is no single way to build a pipeline as we plan, and record any statistical disputes, alternative methods, and assumptions discussed within the team. Be transparent about which decisions were made, which alternatives were omitted — and why.
    - _Platforms_: Use the shared, open-access platforms we provide for storing and sharing data, code, and documentation, ensuring that all team members and external collaborators have access.


<u>**Beyond**</u>
Fortunately, much of the ReprodICUbility infrastructure inherently supports open science. Using the resources we provide integrates many of the more basic open science practices by design. Here, too, _always ask questions_ if anything about the digital architecture is unclear, unintuitive or posits a hurdle to your work. 
And after all: Reproducibility itself is one of the major principles of open science :) 

Thank you for your contribution! <3

Interesting Resources:
- Open Science Collaboration. PSYCHOLOGY. Estimating the reproducibility of psychological science. Science. 2015 Aug 28;349(6251):aac4716. doi: 10.1126/science.aac4716. PMID: 26315443.
- National Academies of Sciences, Engineering, and Medicine; Policy and Global Affairs; Board on Research Data and Information; Committee on Toward an Open Science Enterprise. Open Science by Design: Realizing a Vision for 21st Century Research. Washington (DC): National Academies Press (US); 2018 Jul 17. PMID: 30212065.
- Grant S, Mayo-Wilson E, Kianersi S, Naaman K, Henschel B. Open Science Standards at Journals that Inform Evidence-Based Policy. Prev Sci. 2023 Oct;24(7):1275-1291. doi: 10.1007/s11121-023-01543-z. Epub 2023 May 13. PMID: 37178346.

## <u>**[II] Coding Guidelines**</u>

The Coding Guidelines will follow two principles: Standardization and Readability. These two obviously go hand in hand in many regards.

**Standardization** simply refers to norms we want to lay down at the beginning of the project. These concern variable and function names, code structure, logging, etc. These standards may seem arbitrary at times — and they are! But we need to standardize format somehow to ensure uniformity across teams and across time, otherwise this whole thing will become a mess very quickly. Thus, we would kindly ask you to adhere to these standards as best as you can.

**Readability** involves general recommendations for streamlining teamwork throughout the project. We all sometimes approach coding nonchalantly with an attitude of: “Who else needs to read this?”. But for our project, readability is key. As a shorthand, you can ask yourself: If you lost your memory of the past days, could you understand your own code from scratch? How much time and effort would that take? 

Let’s dive in!


### <u>1. **Naming Variables and Functions**</u>

- _ReprodICUbility_: We have noticed in the preparation for this project that the name “ReprodICUbility” is quickly misspelled, which is why we will consistently use its short form _**repro**_ when we need to refer to it in code.
- _Necessity_: It is a prominent mistake to optimize something that shouldn’t even exist → Do you really need this new variable/function?
    - Whenever a new name or operation is introduced in code, explain its purpose and background in the larger context (see commenting). 
- _Case Type_: We've decided to exclusively use _**snake_case**_ (i.e., sub-names separated by underscores) instead of camelCase or PascalCase. This is one of these arbitrary standards specified above - it is purely an aesthetic formality that we want to lay down for consistency.
- _Clarity_: Try to be *too* explicit (because you can actually never be too explicit). Make names longer rather than shorter and a bit more obvious than you think necessary at the time. For example:
    - Avoid abbreviations.
    - Put units in variable names where useful (e.g., `missing_variables_percentage_decimal_notation` or `missing_variables_percentage_percentage_notation`; `delay_time_seconds` or `delay_time_minutes`).

As an example, check out this snippet of code. Can you figure out what it is for? 

In [41]:
import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.stats.power import TTestPower

def pdt(data_b, data_a, a=0.05):
    t_s, p_v = stats.ttest_rel(data_b, data_a)
    m_d = np.mean(data_b - data_a)
    s_d = np.std(data_b - data_a, ddof=1)
    c_d = m_d / s_d
    e_s = np.abs(c_d)
    n = len(data_b)
    pw_a = TTestPower()
    pw = pw_a.solve_power(effect_size=e_s, nobs=n, alpha=a, alternative='two-sided')
    r = pd.DataFrame({
        'M': ['T-s', 'P-v', 'C-d', 'Pw'],
        'V': [t_s, p_v, c_d, pw]
    })
    return r

def mn():
    np.random.seed(42)
    d_b = np.random.normal(100, 10, 30)
    d_a = d_b + np.random.normal(-3, 5, 30)
    r = pdt(d_b, d_a)
    print(r.to_string(index=False))

if __name__ == "__main__":
    mn()

  M        V
T-s 4.242251
P-v 0.000206
C-d 0.774526
 Pw 0.983744


I don't know about you, but it would take me much longer than necessary to understand what is going on here, although this is actually a very basic statistical test. This is simply a dependent sample t-test on some randomly generated data in the main function. The abbreviations, however, can make it hard to understand exactly what is happening. 
We suggest that in the context of ReprodICUbility you be as explicit in your naming of variables and functions as possible. 
Look at this second example:

In [43]:
def perform_dependent_sample_ttest(data_pre_intervention, data_post_intervention, alpha=0.05):

    t_statitstic, p_value = stats.ttest_rel(data_pre_intervention, data_post_intervention)

    mean_difference = np.mean(data_pre_intervention - data_post_intervention)
    standard_difference = np.std(data_pre_intervention - data_post_intervention, ddof=1)
    cohen_d = mean_difference / standard_difference

    effect_size = np.abs(cohen_d)
    n = len(data_pre_intervention)
    power_analysis = TTestPower()
    power = power_analysis.solve_power(effect_size=effect_size, nobs=n, alpha=alpha, alternative='two-sided')

    results = pd.DataFrame({
        'Metric': ['T-statistic', 'P-value', 'Effect Size (Cohen\'s d)', 'Statistical Power'],
        'Value': [t_statitstic, p_value, cohen_d, power]
    })

    return results

def main():
    np.random.seed(42)
    data_pre_intervention = np.random.normal(100, 10, 30)
    data_post_intervention = data_pre_intervention + np.random.normal(-3, 5, 30)

    results = perform_dependent_sample_ttest(data_pre_intervention, data_post_intervention)

    print(results.to_string(index=False))

if __name__ == "__main__":
    main()

                 Metric    Value
            T-statistic 4.242251
                P-value 0.000206
Effect Size (Cohen's d) 0.774526
      Statistical Power 0.983744


Note that this code does not even contain a single comment or any additional explanations. Simply from the way the variables were named, we can understand what it does much more quickly, almost like reading a text.

### <u>2. **Code Structure**</u>

- _Modularity_: Prioritize modular over monolithic structure.
    - E.g., modular functions each perform a single task, making the code easier to read, test, and debug.
- *Avoiding Nesting*: To enhance modularity, de-nesting code generally improves readability.
    - Why? Nesting necessarily requires more working memory, as more information has to be held in mind before a chunk of operations comes to an end.
    - How?
        - *Extraction*: Pull out parts from the nested structure and make them into their own operations (e.g., two modular functions instead of one nesting the other).
        - *Inversion*: De-nest conditional statements or loops by handling exceptional cases *at the beginning* instead of at the end. This allows the main logic later to be less nested and more easily readable.
- A simple mental short-hand here is: Whenever *you yourself* feel like you have to hold too much in your mind when nesting operations *while working on the problem*, imagine how others will feel. That feeling in yourself is a valuable guide towards readability for others.

For example, look at this code. I tried to make this as horrible as possible (maybe a bit too much so haha), but it illustrates the point:

In [12]:
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(0.5, 1, 100)

def t_test(data1, data2):
    t_statistic = 0
    p_value = 0
    if len(data1) > 0:
        if len(data2) > 0:
            mean1 = 0
            mean2 = 0
            var1 = 0
            var2 = 0
            n1 = len(data1)
            n2 = len(data2)
            for i in range(n1):
                for j in range(n2):
                    if i == 0:
                        mean1 += data1[i] / n1
                    else:
                        if j == 0:
                            mean2 += data2[j] / n2
                        else:
                            var1 += (data1[i] - mean1) ** 2 / (n1 - 1)
                            var2 += (data2[j] - mean2) ** 2 / (n2 - 1)
                            if i == n1 - 1 and j == n2 - 1:
                                pooled_var = ((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2)
                                if pooled_var != 0:
                                    t_statistic = (mean1 - mean2) / np.sqrt(pooled_var * (1/n1 + 1/n2))
                                    if t_statistic != 0:
                                        p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=n1 + n2 - 2))
                                        if p_value < 0.05:
                                            if t_statistic > 0:
                                                print("Data1 mean is significantly greater than Data2 mean.")
                                            elif t_statistic < 0:
                                                print("Data1 mean is significantly less than Data2 mean.")
                                            else:
                                                print("Data1 mean is equal to Data2 mean.")
                                        else:
                                            print("No significant difference between Data1 and Data2.")
                                    else:
                                        print("t-statistic is zero, cannot determine significance.")
                                else:
                                    print("Pooled variance is zero, cannot perform t-test.")
                            else:
                                continue
                        continue
                continue
        else:
            print("Data2 is empty, cannot perform t-test.")
    else:
        print("Data1 is empty, cannot perform t-test.")
    return t_statistic, p_value

t_test(data1, data2)

No significant difference between Data1 and Data2.


(0.131054905179291, 0.8958649361982147)

**Alternatively**, we could achieve the exact same result by _extracting_ the calculations and interpretations performed within the funcion to three modular functions, as well as _inverting_ the if-statements.

In [18]:
data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(0.5, 1, 100)

def calculate_mean(data):
    return np.mean(data)

def calculate_variance(data, mean):
    return np.var(data, ddof=1)

def perform_t_test(data1, data2):
    if len(data1) == 0 or len(data2) == 0:
        raise ValueError("Both data1 and data2 must contain data.")

    mean1 = calculate_mean(data1)
    mean2 = calculate_mean(data2)

    var1 = calculate_variance(data1, mean1)
    var2 = calculate_variance(data2, mean2)

    n1, n2 = len(data1), len(data2)
    pooled_var = ((n1 - 1) * var1 + (n2 - 1) * var2) / (n1 + n2 - 2)

    if pooled_var == 0:
        raise ValueError("Pooled variance is zero, cannot perform t-test.")

    t_statistic = (mean1 - mean2) / np.sqrt(pooled_var * (1/n1 + 1/n2))
    p_value = 2 * (1 - stats.t.cdf(abs(t_statistic), df=n1 + n2 - 2))

    return t_statistic, p_value

def interpret_results(t_statistic, p_value, alpha=0.05):
    if p_value < alpha:
        if t_statistic > 0:
            return "Data1 mean is significantly greater than Data2 mean."
        elif t_statistic < 0:
            return "Data1 mean is significantly less than Data2 mean."
        else:
            return "Data1 mean is equal to Data2 mean."
    return "No significant difference between Data1 and Data2."

try:
    t_stat, p_val = perform_t_test(data1, data2)
    result = interpret_results(t_stat, p_val)
    print(result)
    print((t_stat, p_val))
except ValueError as e:
    print(e)

Data1 mean is significantly less than Data2 mean.
(-2.4046746299230826, 0.017107691525127144)


Also remember that the architecture of Jupyter Notebook allows chunks of code to be executed out of order (similar to R-Markdown for those who know that better). This can lead to the trap of never actually letting one's code run through from top to bottom. Thus, as a note to structure, try to _let the code run through end-to-end once in a while_.

### <u>3. **Commenting and Documentation**</u>

- In Jupyter Notebook, you have two levers for commenting and documenting the project: as comments within the code chunks or as markdown blocks between the code. We recommend dividing these two like so:
    - The *markdown sections* should include primarily *conceptual* explanations for what is being done in the code. They should serve as “word equations,” explaining the ideas behind what is then implemented in code form.
    - *Comments inside the code* itself should instead be *syntactical*, i.e., how *exactly* were the conceptual ideas of each section implemented as code? You should not feel the need to write novels in your comments. Instead, see if you can make your variable and function names, or your code structure in general, more readable.

Let's look at the example from the dependent sample t-test from before (the one we used to specify variable naming). 
A synergy of markdown and code comments might look like this:

_"Since lack of power analysis was a major criticism that was made against the study regarding open science practice and reproducibility, we decided to compute effect size and subsequently power alongside the conventional t-statistic and p-value. 
Additionally, similarly to the study, we implemented a two-sided alpha-threshold and did not correct for multiple testing. We are aware of this statistical inadequacy, but attempted to replicate the procedure from the study accurately in this regard. 
Power, on the other side, only provides an additional statistical measure and does not directly influence any of the results, which is why we felt free to add it."_

In [81]:
def perform_dependent_sample_ttest(data_pre_intervention, data_post_intervention):

    t_statitstic, p_value = stats.ttest_rel(data_pre_intervention, data_post_intervention)

    mean_difference = np.mean(data_pre_intervention - data_post_intervention) #Using the data from (source)
    standard_difference = np.std(data_pre_intervention - data_post_intervention, ddof=1)
    cohen_d = mean_difference / standard_difference

    effect_size = np.abs(cohen_d) #Effect size for power-analysis
    n = len(data_pre_intervention)
    power_analysis = TTestPower() #Power: Was seen as major potential drawback in open science practice in main study
    
    #Decision to perform a two-sided t-test to better approximate approach in study
    power = power_analysis.solve_power(effect_size=effect_size, nobs=n, alpha=0.05, alternative='two-sided')
                                                                        #alpha of 0.05 since we did not perform multiple tests

    results = pd.DataFrame({
        'Metric': ['T-statistic', 'P-value', 'Effect Size (Cohen\'s d)', 'Statistical Power'],
        'Value': [t_statitstic, p_value, cohen_d, power]
    })

    return results
    
def main():
    np.random.seed(42)
    data_pre_intervention = np.random.normal(100, 10, 30)
    data_post_intervention = data_pre_intervention + np.random.normal(-3, 5, 30)

    results = perform_dependent_sample_ttest(data_pre_intervention, data_post_intervention)

    print(results.to_string(index=False))

if __name__ == "__main__":
    main()

                 Metric    Value
            T-statistic 4.242251
                P-value 0.000206
Effect Size (Cohen's d) 0.774526
      Statistical Power 0.983744


### <u>4. **Error Handling, Logging, Print Statements**</u>

- If an _error_ is raised, where to make a fix should be instantly identifiable in the code.
    - Implement error handling by _anticipating possible failure points_. If you implement an idea in code intentionally, it should be intuitive to you where things may go wrong. Use exceptions judiciously and ensure they are informative.
- _Logging/print statements_ are particularly important in our use case since we are trying to build a long-form pipeline where frequent updates of the processing stage are useful. However, since we are mainly working with Jupyter Notebook, i.e., small, self-contained chunks, this should also not become too verbose. Logs or print statements should be _targeted and intentional_, providing the detail needed to diagnose an issue quickly or figure out generally where one is in the processes.

Again, let us look at two examples. The first one uses logging and print statements verbosely and unnecessarily. This just clutters the script without providing the user with much information for potential fixes or useful knowledge on the overall process. 
Note that each of these logs and print statements, if used with intent, is potentially meaningful. But just throwing them in arbitrarily should be avoided.

In [89]:
import logging
from scipy.stats import ttest_ind

# Setting up logging
logging.basicConfig(filename='app.log', level=logging.INFO)

def perform_t_test(data1, data2):
    try:
        if not isinstance(data1, list) or not isinstance(data2, list):
            logging.error("One of the inputs is not a list")
            print("Error: Inputs should be lists.")
            return
        
        if len(data1) == 0 or len(data2) == 0:
            logging.warning("One of the data lists is empty")
            print("Warning: One of the data lists is empty.")
            return
        
        print("Data received. Lengths:", len(data1), "and", len(data2))
        
        # Perform the t-test
        t_stat, p_value = ttest_ind(data1, data2)
        
        logging.info(f"t_stat: {t_stat}, p_value: {p_value}")
        
        if p_value is not None:
            print(f"p-value is not None: {p_value}")
        else:
            print("p-value is None. This should not happen.")

        print(f"t-statistic: {t_stat}")
        print(f"p-value: {p_value}")

        if p_value < 0.05:
            print("The result is statistically significant!")
        else:
            print("The result is not statistically significant.")
        
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}")
        print(f"An error occurred: {e}")

data1 = [1.2, 2.3, 3.1, 4.4, 5.5]
data2 = [2.3, 3.4, 4.1, 5.2, 6.5]
perform_t_test(data1, data2)


Data received. Lengths: 5 and 5
p-value is not None: 0.36827250552188723
t-statistic: -0.9534625892455922
p-value: 0.36827250552188723
The result is not statistically significant.


In this example, however, we may be worried, for some reason, that our inputs is not a list as the function requires and are only interested in the significance of the test:

In [93]:
# Setting up logging
logging.basicConfig(filename='app.log', level=logging.INFO)

def perform_t_test(data1, data2):
    try:
        if not isinstance(data1, list) or not isinstance(data2, list):
            logging.error("One of the inputs is not a list")
            return
        
        # Perform the t-test
        t_stat, p_value = ttest_ind(data1, data2)
        
        if p_value < 0.05:
            print("The result is statistically significant!")
        else:
            print("The result is not statistically significant.")
        
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}")

data1 = [1.2, 2.3, 3.1, 4.4, 5.5]
data2 = [2.3, 3.4, 4.1, 5.2, 6.5]
perform_t_test(data1, data2)


The result is not statistically significant.


This fourth point is perhaps the most controversial in our guidelines. Some people are very fond of extensive output to the user in their scripts. However, what we want to avoid with this is simply the generation of _too much unintentional code_. Please use logging or print statements whenever you feel it necessary. But most of us have probably been in the situation where we wished we had asked ourselves if a particular implementation was actually necessary _before_ writing it. 

## Summary and Conclusion
- Let’s end with a brief breakdown of the concrete actionables:
  
### Open Science:
- Open Science starts with Open Ignorance: _Feel free to ask every question you may have!_
- Indicate where _you_ have contributed to the project to facilitate effective communication and credit where it is due.
- Freely share ideas and findings within and across teams.
- Transparently refer to external resources that you use, such as papers or AI.
- Report _everything_ you find and document your decisions for or against certain approaches comprehensibly.
- Try to use the digital ReprodICUbility infrastructure, which is organized to support open science by design.

### Coding Guidelines:
- Being able to quickly understand one another’s code is essential in this project: try to follow the _standards_ and aim for _readability_.
- Name variables and functions _explicitly_. Try not to use abbreviations and provide more detail rather than less (err on the side of “too obvious”).
- Prioritize modularity and _avoid nesting_. Implement extraction and inversion to separate the code into easily readable and debuggable chunks.
- Use the markdown sections to explain your approach _conceptually_ and comments in code blocks to localize the concepts in concrete _syntax_.
- Generally: try coding with _intent_. Code is, after all, a _tool_ — there to implement specific ideas. This applies, for example, to _error handling and logging/print statements_. Use them with purpose to bolster specific, potential weak points or highlight important steps in the process.
- Our project runs the risk of producing a lot of code very quickly. This is natural in such a large multi-team endeavour, but ideally we avoid losing the thread in tangled thickets of untargeted, untraceable code.

Interesting Resource:  
Jackson Z. Code Is for Humans: A Guide to Human-Centric Software Engineering. Independently published; 2024 Mar 19. ISBN-13: 9798861816489.
