# Project design in the digital age

- add quickstart and TlDR

## Before we start..

**This is chapter is an intorudction to the topic of project design and largely connects most of the course content, we'll therefore be concentrating on the larger picture and only provide short introductions to most topics, while linking to in-depth sections. Following the steps in this chapter allows you to setup a professional research project, which is up to modern standards.**


It's a good idea to generally skim the lesson, noting the structure and the separate steps of project design, before diving in.



## Introduction


While the scientific methods remained largely the same over the years the tools we employ have changed drastically in recent times. It is of course still possible to simply maintain a word document with all your ideas, notes, citations or i.e code snippets & statistics or to simply store your data and writing in whatever way feels comfortable for you, there have been a number of new tools and standards that will make your life much easier and allow you to be a more organized, more efficient scientist.

These new tools can be implemented in every phase of a scientific process, but are especially valuable in the initial stages of a project: Project Design.

--------------------------------

## Goals


Specific goals of this first session will be to

* general understanding
* get familiar with process
* provide a checklist to follow
* understanding on why project design is an essential step

## Roadmap

each with a separate notebooks 

- **Goals**
- Introduction to project design/setup
- General considerations
- project design checklist
    - 1. Setup local folder structure (BIDS), 
    - 2. create an online folder structure (Github repo) 
    - 3. setup a project prsenece (OSF)
    - 4. create a reference library (Zotero)
    - 5. setting up a prgogramming environment
    - 6. Data managment plan
    - 7. Pre-registration
- Additional materials


## General considerations: How to get started

### Find a research question

At the start of each project there should be an initial question. These may appear due to obeservations of our surroundings or e.g. by expanding on already existing literature. The first step of every project should therefore involve turning our observations/thoughts into a well defined research question. Start of by [scouring literature](lit_review) and find out more about the topic. 

**Following `define your research question` based on:**

    - what(topic, data modality)? 
    - how (method)?
    - why (motivation)?
    
--------------------------------------------
    
### Types of research questions

It's further necessary to evaulate if you're questions relates to something quantifiable or if are you more interested in `discovering` or `exploring` the `quality` of a certain phenomenon.

A `quantitative research question` can be answered using numerical data and statistical analysis. They generally appear in 3 different forms:

- descriptive: describe a phenomenon or group of people in detail (e.g. what is the prevalence of depression in hamsters?)

- comparative: compare two or more groups or phenomena ("What are the differences in academic performance between students from academic and blue collar families?"

- `relationship`s: how do changes in one variable affect changes in another ("What is the relationship between social media use and the prevalence of eating disorders in teenagers?")

    
    
A `qualitative research question` focuses more on exploring and understanding e.g. social phenomena, experiences, and perspectives in-depth.

- They may be `open-ended`, allowing the researcher to explore the phenomenon in-depth. For example, "What factors contribute to the success of mentoring programs in the workplace?". 

- They also may `seek to explore exeriences, or perspectives`, e.g. "What is the experience of living with chronic illness?" or `seek to explore processes`, such as how something happens or how people make meaning of their experiences, e.g. "How do parents navigate the challenges of raising a child with autism?" or "What are the processes involved in the formation of a group identity among adolescents?

- They may further be `concerned with a description/exploration of the nature a specific context`, such as a culture, community, organization, or group of individuals. For example, "What are the attitudes of young adults towards marriage in a conservative society?" or "How do employees experience the culture of a large multinational corporation?


--------------------------------------------
### Find out what you need to answer your research question    

Next we'll have to find out how to work out a compelling, convinicing answer to this reserach question.

`Therefore we have to define the methods we want/need to employ`. Ask yourself the following question:

- Do you need to `collect new data or can you work with preexisting data`

- Which `data modality` is necessary to answer your question, e.g. do i need survey data, reaction times or neuroimaging?

- `What needs to be done with the data before you can analyse it?` (i.e. quality control, data preprocessing (cleaning, descriptive statistics etc.))

- What `statistical analysis` is necessary to answer my question (E.g. T-test vs. regression models vs. Machine learning etc.)?

- `Which tools` are you going to need (software or hardware e.g. for data collection)

- Is this something you can manage `Solo, in cooperation with your local group/researcher or does the project involve international collaboration`?

- What about `data protection, licensing and costs`?

--------------------------

### How do i find the information informing research questions and methods?

Both finding a solid research question and finding the necessary information on how to answer it involves doing a significant part of research. For guidance on this checkout our lesson on [finding and evaluating data sources, published research & Literature review strategies]().


## Project design

Project design is mostly concerned with the initial process planing and organizing the framework to conduct reserach in. This might seem trivial, but is essential. It allows us not only to more deeply engage with our research question, but is also relevant for the reproducibility of a project.

**An inidviudal project will surivive longer than the research that is done on it and will (potentially) influence the scientific consensus on a given topic down the line by providing data, insights or new methods. Therefore managing the presence and meticulously documenting what was done, what decisons were made and where the gathered information and data is stored is essential to the long-term impact of your work.**

For this puprose the following chapters will explore how to setup a project in the "digital age", what tools to use, what to look out for and where you may find more information. Don't be discouraged if most of the tools used will  be unfamillar to you as we will do our best to provide a short introduction and guide on their usage in the linked chapters.


-----------------


The following checklist can be used as a guide on how to organize, store and connect all information and data relevant to your project. We'll elaborate on each point in the lesson below.


### Project design checklist
1. Setup local folder structure (BIDS), 
2. create an online folder structure (Github repo) 
3. setup a project prsenece (OSF)
4. create a reference library (Zotero)
5. setting up a prgogramming environment
6. Data managment plan
7. Pre-registration



---------------------------------------



We're starting out local, by simply setting up our folder system to have an organized, deciated space to store all data relevant for our project.


### 1. Setup local folder structure (BIDS)


It is recommended to adopt a standarized approach to structuring your data, as this not only helps you stay consistent, but also allows you and possible collaborators to easily identify where specific data is located.


Your folder structure depends on your projects specific need (e.g. folders for data, documents, images etc.) and should be as clear and consistent as possible. The easiest way to achieve this is to copy and adapt an already existing folder hierarchy template for research projects.
    
    
One for example (including a template) is the [Transparent project management template for the OSF plattform](https://osf.io/4sdn3/) by [C.H.J. Hartgerink](https://osf.io/5fukm/)

   
The contained folder structure would then look like this:

```
project_name/
    └── archive
    │   └── 
        
    └── analyses
    │   └── 
    │   
    └── bibliography
    │   └── 
    │   
    └── data
    │   └── 
    │   
    └── figure
    │   └── 
    │   
    └── functions
    │   └── 
    │   
    └── materials
    │   └── 
    │   
    └── preregister
    │   └── 
    │
    └── submission
    │   └── 
    │   
    └── supplement
        └── 
```   


Where `project_name` is of course the name of the folder containing your project information/data. One level lower we would following have dedicated folders to store all your e.g. paperwork, data or figures.


----------------------------------------------

#### Adapting to specific datatypes: Neuroimaging

Working with neuroimaging data makes the setup of your system a little more complicated.  
The most promising/popular approach structuring your data is the [BIDS](https://bids.neuroimaging.io/) (Brain Imaging Data Structure) standard. 

The Bids (Brain Imaging Data Structure) standard is a community-driven specification that aims to facilitate the organization and sharing of neuroimaging data. The Bids standard specifies a common format for storing and organizing neuroimaging data, including MRI, EEG, MEG, and iEEG. The standard can of course additionally be used to store bahvioral data.

The Bids standard defines a specific folder hierarchy for organizing neuroimaging data. This hierarchy is organized into several separate folders, each with a specific purpose. As Bids is mostly concerned with our data it provides a standardized way to organize the `data` folder in the diagram above. The `data` folder would then be structured in the following way.



```
data/
    ├── derivatives/
    └── subject/
        └── session/
            └── datatype/
```     

   `/derivatives`: contains processed data, such as the results of statistical analyses
    
   `/sub- folder`: contains data from one subject. Each subject is identified by a unique code that starts with "sub-". This folder contains subfolders for each imaging session, which contains separate folders for each imaging file (`datatype` in the diagram above) recorded for this specific subject.
   
   
Neuroimaging datasets mostly contain data from more than 1 subject, the data folder will therefore necessearily contain multiple subject folder, named `sub-01, sub-02 ... sub-0n`. This could look something like this:


    project_data
        ├── dataset_description.json
        ├── participants.tsv
        ├── derivatives
        ├── sub-01
        │   ├── anat
        │   │   ├── sub-01_inplaneT2.nii.gz
        │   │   └── sub-01_T1w.nii.gz
        │   └── func
        │       ├── sub-01_task-X_run-01_bold.nii.gz
        │       ├── sub-01_task-X_run-01_events.tsv
        │       ├── sub-01_task-X_run-02_bold.nii.gz
        │       ├── sub-01_task-X_run-02_events.tsv
        │       ├── sub-01_task-X_run-03_bold.nii.gz
        │       └── sub-01_task-X_run-03_events.tsv
        ├── sub-02
        │   ├── anat
        │   │   ├── sub-02_inplaneT2.nii.gz
        │   │   └── sub-02_T1w.nii.gz
        │   └── func
        │       ├── sub-02_task-X_run-01_bold.nii.gz
        │       ├── sub-02_task-X_run-01_events.tsv
        │       ├── sub-02_task-Xk_run-02_bold.nii.gz
        │       └── sub-02_task-X-02_events.tsv

        ...
        ...

We'll not go into detail about the differnt neuroimaging files here (the `.nii.gz` files), but there is another thing we can learn from this standard: The inclusion of `metadata`.

# In this case the 

chatgpt-desription of particpants tsv and dataset_description.json


The BIDS standard further prescribes a very specific file naming system:

     key1 - value1 _ key2 - value2 _ suffix .extension 
 
Where `key-value` pairs are separated by underscores (e.g. `Sub-01-_task-01`), followed by an underscore and a suffix describing the datatype (e.g. `_events`), which is followed by the file extension (e.g. `.tsv`). Resulting in:

    Sub-01-_task-01_events.tsv
    
It's recommended that you adopt this filenaming system and apply it to all of your files, e.g. your project report could be called:
    
    firstname-lastname_project-report.txt

You may also want to add a date to non-data files (ideally in the year-month-day format (YYYYMMDD)), e.g. 

    YYYYMMDD_firstname-lastname_project-report.txt
    
    
Avoid adding descriptions such as `version_01` or `final_version` etc., instead you should rely on digital tools with version history functionality such as `Google Docs`. In the next section we'll further introduce the concept of a `version control system` to avoid this issue all together. 



### To learn more about how to setup your datasystem:

[Chapter: Data-management](#datamanagement)

[BIDS starter-kit](https://bids-standard.github.io/bids-starter-kit/index.html)

### 2. Github repo

Next we'll setup a `"Github repository"`. This is basically an online folder system mirroring your local file system, where you can store all your code, figures etc..

Github comes automatically with the `version control system "Git"`, this allows you to keep track of every change in your files/filesystem and revert them when necessary. You can also use `Git` to keep track of local files.

You'll find out how to do this in detail in the lesson: [Version: GitHub](https://m-earnest.github.io/diler_dgitial_literacy_course/introduction/github.html)

To learn more about the accompanying `version control system for Git` check out the following Lesson: [Intro to GitHub](https://m-earnest.github.io/diler_dgitial_literacy_course/content/excercise_intro_git.html)

### 3. OSF presence




Next up we create an Open Science Framework (OSF) repository.

The OSF platform is designed to facilitate the collaborative and transparent sharing of research materials, data, and workflows. It boosts your works visibility and findability and makes your code and data accessible to others, therfore making your work more reporducible. An OSF - repository can be cited, using their automatic citation generator, which makes sure that your projects (even when no publication is in sight or sought) can reliably be shared, e.g. on your CV.


https://d33v4339jhl8k0.cloudfront.net/docs/assets/6197cc3a0042a2708a127718/images/61d85ed4a0e8d7327cfa9d9c/89aac82c-897e-44a9-a6cf-6a8b56a48bf9.png

OSF further hosts local servers in e.g. Germany, meaning that it is much easier to adhere to data protection regulations when storing data on OSF.

The Chapter [OSF - Manage your Research]() illustrates how to setup an osf repository and how to link this repository to the Github repository we've created in the previous step.



### 4. Zotero Library


[Zotero](https://www.zotero.org/) is a free and open-source reference management software that helps you organize, store, and share their research materials. It further allows you to generate reference lists on the fly and has integrations for most word processing software, such as Google docs, and interacts seemlessly with literature search tool such as [ResearchRabbit](https://www.researchrabbit.ai/).

Zotero allows you to:

- organize literature based on collections, libraries, groups, etc.
- add notes, tags, annotations, etc. to your litearture
- various sources of references: articles, books, website, etc.
- browser & text editor plugin 
- manage citation 
- groub/organization level libraries online
- infos for every paper in your collection regrading DOIs, PDF versions online, etc.


-----------


###  Setup a [Zotero](https://www.zotero.org/) library

- Follow the official [Zotero installation guide](https://www.zotero.org/support/installation) to get a local version for your system
- Open it, click on the `yellow folder symbol` in the top left corner


zotero_start_collection.png


- give your collection a meaningful name

zotero_new_collection.png

- now add articles, books etc. to your collection via the `Add Item(s) by Identifider` button, using doi, ISBN, PMIDs (Pubmed-identifiers) and so on

zotero_import.png


Simple enough right? 

- Now to export your collection e.g. as a Bibtex file (for use with e.g. Overleaf, or to share with others, simply right-click on your `collection folder` and select `export collection`

zotero_export_bibtex.png


- to create an APA formatted reference list right-click on your `collection folder` and select `Create Bibliography from collection`, select the APA format, a language and an output mode, e.g `copy it to your clipboard` and simply paste it into a document of your choice


zotero_create_references.png



**Next up we want to synch our local collection with an online presence to easily access our literature from everywhere in the world or to share it with collaborators

To do this:

- create an [Zotero account](https://www.zotero.org/user/register/)

- next in your local zotero installation, open the preferences by clicking `“Edit → Preferences”` for Windows/Linux or “Zotero → Preferences” for Mac.
- select the `Sync` panel and into the account detials you've used to create your online account and click `synch account`


zotero_sync.png


- Now simply select which libraries, files etc. to synch


zotero_synch_library.png


- if you now log into your zotero online account and click on `web library` your synched libraries will apear in an window with the same capabilities as your local installation

zotero_web_library.png

-----------

### Zotero Connector

If you want to make your life even easier, you can further [install the Zotero Connector](https://www.zotero.org/download/connectors) for your respective browser.

Click on the zotero symbol in your browser (usually on the top right) and a menu will pop up allowing you to either save citable screenshot to a website or when you're viewing a journal article online the respective reference (including PDF download when possible) into one of your colllections

https://pbs.twimg.com/media/DcDlMsgW0AEdWFA?format=jpg&name=900x900




## 5. setting up a prgogramming environment (i.e. "virtualization)

If you're doing reserach you will necessarily need to learn how to programm, e.g. to organize and clean your data or to do statistical analyses. 

To make sure that you've got all the prerequisites installed for your respective need it's customary to use a [`package and environment management system`](https://towardsdatascience.com/environment-package-management-55168c56b77). This is commonly done using [conda](https://docs.conda.io/en/latest/miniconda.html).


### Conda 

Conda is an open-source package and environment management system that simplifies the process of installing, managing, and organizing packages and dependencies. Conda supports a multitude of programming languages such as Python and R.

One of the most important functions of Conda is creating and managing isolated environments. An evniroment in this case specifies the specific packages and their versions installed for a speific project. Programming packages are the same as other software such as e.g. `Word` or `Excel`. As every new `Excel` version adds new functionalities or slight changes to how the programm works, so are there changes in the packages of programming languages like `Python`. `Python` uses packages such as `Numpy` for mathematical operations, but with new `Numpy` versions the way certain calculations are done may change, e.g. to make the program more efficient or stable. 

Should changes like this occur it becomes hard to reproduce the results of earlier research. Therefore `conda` allows us to specify which `packages` and what `version` of these packages was used for a certain project. This information is stored in an `enviornment file`. Other researchers can following use this `enviornment file` to reproduce the exact same `analyses`. Thus making the original `work` reproducible.

It's recommended that one creates a separate environment for each project or application, with its own set of dependencies and packages, so as not to interfere with other environments or the system itself.


### Working with Conda

There are multiple ways how you can install a working conda version.

### `Anaconda`:

The full `Anaconda` installation comes with a web-like `Graphical User Interface`. This unfortunately meand that the installation if fairy large (larger than 2 GB).

https://docs.anaconda.com/_images/nav-application-dropdown.png


Anaconda can be installed following the [official installation guide](https://docs.anaconda.com/anaconda/install/)


To create an enviornment open the `Anaconda` application and click on the `enviornments` view on the left side, then click the `create` button in the bottom left, give your enviornment a descriptive name matching your project and select a `Python` or `R` version.


https://docs.anaconda.com/_images/nav-env-create.png



To use an enviornment, clik the `arrow button` next to the `environment name` to open the activation options dropdown and select `Terminal`, `Python interpreter`, `IPython Console`, or `Jupyter Notebook`.

https://docs.anaconda.com/_images/nav-env-use.png


This will open the `interpreter` of your choice and which will allow you start writing code.

To find out more about `interpreters` checkout the [Learn to code - the basics chapter](https://m-earnest.github.io/diler_dgitial_literacy_course/content/learn_to_code.html). To learn how to start programmin in Python visit the [Intro to Python chapter](https://m-earnest.github.io/diler_dgitial_literacy_course/content/intro_python_I.html).


To clone or backup and share an enviornment use the respective buttons at the bottom of the enviornments list.


https://docs.anaconda.com/_images/nav-env-backup.png


Find out more using the following ressources:

[Official Anaconda User Guide](https://docs.anaconda.com/anaconda/user-guide/)

[(Youtube: Python simplified -  Anaconda Beginners Guide for Linux and Windows - Python Working Environments Tutorial](https://www.youtube.com/watch?v=MUZtVEDKXsk)


---------------

### `Miniconda`

Or the much more lightweight `conda` installation, which requires you to use the `Command-Line-interface` or `Shell` of your system. Unix systems such as Linux and Mac already come with `Shell` preinstalled. 

To learn more aboute `Command-Line-interfaces` checkout the [Learn to code - the basics chapter](https://m-earnest.github.io/diler_dgitial_literacy_course/content/learn_to_code.html). To learn how to work with the `Shell` you can also follow the ["Intro - The (unix) command line: BASH" chapter](https://m-earnest.github.io/diler_dgitial_literacy_course/content/intro_to_shell.html)


You can install `conda` by following the [Conda installation instruction](https://conda.io/projects/conda/en/stable/user-guide/install/index.html) for your System.

**Next open a shell:**

`Ubuntu`:
To open a Shell in `Ubuntu` click on Activities in the top left of your screen, then typing the first few letters of “terminal”, “command”, “prompt” or “shell” and select the `terminal` application.

`Mac`:
On Mac click the Launchpad icon in the Dock, type Terminal in the search field, then click Terminal.

`Windows`:
Windows users will have to download the `Windows-subystems-for-linux (wsl)` application, this will install a Unix subsystem which will allow you to work with the bash shell. Simply follow e.g. the [Foss Guide: Install bash on windows](https://itsfoss.com/install-bash-on-windows/). Afterwards search and open the newly installed `Ubuntu` apllication.



### Generate an enviornment file using conda

To create an enviornment file and install packages in it. Copy the following into your `Shell` and hit `Enter`

`conda create --name my_project_enviornment`


where the `conda create` tells your `shell` to creat a new enviornment, the `--name` specifier allows us to name our enviornment. In this case an enviornment using the most recent `Python` version with the name `my_project_enviornment` would be created.

Conda will then prompt you to confirm the creation of the enviornment: `Proceed ([y]/n)?` 

Type "y" and press enter to proceed.


-----------------


To use the new enviornment copy the following into your shell and hit enter:

`conda activate my_project_enviornment` 



----------------------



To install a new package use the `conda install` command, e.g. to install the `jupyter package`:

`conda install -c anaconda jupyter`


--------------------



To install multiple packages just add the one after another to your command, e.g.:

`conda install -c conda-forge -y numpy pandas matplotlib scipy scikit-learn jupyter jupyterlab`


---------------------

Now you can export your enviornment as an `enviornment file` by using the following command in your shell:

`conda env export > enviornment.yml`


This will create a `YAML`file, a text document that contains data formatted using [YAML (YAML Ain't Markup Language)](https://yaml.org/) in the location that you're terminal is opened in (e.g. ~/Desktop). If you open it with a text-editor of your choice it will look something like this:

conda_yml.png


Where in this case `mne` is the enviornment name, `channel: conda-forge` indicates that the enviornment was created by downloading packages from the `conda-forge` server (called [channels](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html)), and under `dependencies:` you'll find the list of packages and for some the specific package version (e.g. python>=3.8) specified.


-------------------

This file can be shared with others or used to recreate the same environment on another machine. So if someone would want to reproduce this enviornment, they would simply have to copy this enviornment file and use the following command in a terminal:


`conda env create -f enviornment.yml`



----------------

To learn more about conda and how to work with environments check out the following resources:


[Conda: User Guide](https://conda.io/projects/conda/en/latest/user-guide/index.html)

[(Youtube) Coding Professor - The only CONDA tutorial you'll need to watch to get started](https://www.youtube.com/watch?v=sDCtY9Z1bqE)


--------------------


## 6. Data management plan

- consider how the data will be used, archived and shared
- documetation


You’ll be asked to start drafting a data management plan that you can later add to the online presences of your online project. This is again a reminder for yourself what exactly your planning to do and how to go about it and will be the main piece of your methods setion should you aim for publication, but also allows reader/collarorators to gain a quick understanding of what data you'll be collecting, where this data can be found and what exactly you're planing on doing with it.


 
https://the-turing-way.netlify.app/reproducible-research/rdm.html#rr-rdm
Turing way
1. Roles and Responsibilities
2. Type and size of data collected and documentation/metadata generated
3. Type of data storage used and back up procedures that are in place



## 7. Pre-registration

Pre-registration


„The specification of a research design, hypotheses, and analysis plan prior to observing the outcomes of a study“  Nosek & Lindsay (2018)
A time-stamped record of your experiment design, methods, planned analysis that is publicly available. 
(Can be done either before you collect data or before you begin analysis.)
AsPredicted.org
OSF Preregistration Template

**Advantages**

- separation of confirmatory & exploratory research (you can do both, just be honest & state everything)
- you cannot fool yourself due to decreased flexibility in data management & analyses
- closer to the scientific approach: make predictions about the world and investigate if they hold up


The important distinction(s) between confirmatory & exploratory research

**confirmatory research**
analyze hypotheses
Number of hypothesis tests known
Error rate control possible


**exploratory research**
browse through data
Number of hypothesis tests unknown
Error rate control impossible


#### Excercise
Here, we’ll use the aspredicted platform to get some experience.

https://aspredicted.org/
https://www.cos.io/initiatives/registered-reports

#### further reading

https://www.youtube.com/watch?v=8QK2-udwoK8

### Where are we now?

---- adapt and expand ------

Please take a moment to reflect on if and how your understanding of the concept of digital literacy has possibly changed following this lecture.

**2.1 How well informed we're you with regard to the current topic? (i.e.,
did you know what was expecting you?)**

**2.2) What was the most surprising part of the training/lecture for you? What information in particular stood out to you?**

**2.3 What was one of the mentioned reasons on why you should care about digital literacy that seemed especially important to you?**

**2.4 Could you think of other impacts that a better understanding of digital literacy may have on your life?**



### Homework, Excercises and ressources

### Additional materials

#### Further reading

The go-to ressource for creating and maintaining scientific projects was created by the [Turing Way project](https://the-turing-way.netlify.app/welcome.html).

We'll break down the ressource here very quickly, and provide a handy checklist, but once you get started with an project or if your interested in a deep-dive on project design, check out the following:

https://the-turing-way.netlify.app/project-design/project-design.html


## References



## TLDR

We will necessarily be increasingly relying on computational support to deal with information, data and analysis as a direct result of the increase in complexity, volume and speed in the creation of scientific works. Therefore it is crucial to be equipped with the understanding and skill to know how and when to rely on digital assets. If you wanna hang with the best, get with the times!

We've further learned about

- digital literacy, definition and it's components
- the role of digital literacy in the research worklfow
- ethical implication and why we should care