# Approach to usecase

##  A. What we want?
The ﬁrst step in the design of a study is the explicit clariﬁcation of the goal of the study.

### 1. hypothesis test 
Compare two or more groups, or one group to a ﬁxed value?
       
### 2. screening investigation 
Screen the observed responses to identify factors/effects that are important?. We must make sure that factors in the model are completely independent.
       
### 3. optimization problem
Maximize or minimize a response (variability, distance to target, robustness)?

### 4. statistical modeling
Develop a regression model to quantify the dependence of a response variable on the process input?




##  B. How to Do? (obtaining data)

### 1. Observational or Experimental
 
- In __observations__ we only obtain the sample data.but do not interact with the study population.
- In a  **controlled experiment** the researcher deliberately inﬂuences events (e.g., treats the patient with a new type of medication) and investigates the effects of these interventions.

### 2. Prospective or Retrospective
- In a **prospective study**, the data are collected from the beginning of the study.
- In contrast, a **retrospective study** takes data acquired from previous events, e.g., routine tests done at a hospital.

### 3. Longitudinal or Cross-Sectional
- In **longitudinal investigations**, the researcher collects information over a period of time, maybe multiple times from each patient. 
- In contrast, in **cross-sectional** studies individuals are observed only once (like _feedback forms_). For example, most surveys are cross-sectional, but experiments are usually longitudinal.

### 4. Case–Control and Cohort studies
- In a case–control study, ﬁrst the patients are treated, and then they are selected for inclusion in the study, based on certain criteria (e.g., whether they responded to a certain medication).
- In contrast, in a cohort study, subjects of interest are selected ﬁrst, and then these subjects are studied over time, e.g., for their response to a treatment.

### 5. Randomized Controlled Trial

    The gold standard for experimental scientiﬁc clinical trials, and the basis for the approval of newmedications, is the randomized controlled trial.Here bias is avoided by splitting the subjects to be tested into an intervention group and a control group. Group allocation is random.
    
    In a designed experiment, there may be several conditions, called factors,that are controlled by the experimenter. By having the groups differ in only one aspect, the factor treatment, one should be able to detect the effect of the treatment on the patients. Through randomization, confoundings should be balanced across the groups.
    
### 6. Crossover Studies
    An alternative to randomization is the crossover design of studies. A crossover study is a longitudinal study in which subjects receive a sequence of different treatments. Every subject receives every treatment. (The subject “crosses over” from one treatment to the next.) To avoid causal effects, the sequence of the treatment allocation should be randomized.
    
    For example, in an investigation that tests the effect of standing and sitting on the concentration of subjects, each subject performs both the execution of tasks while standing and the execution of tasks while sitting. The sequence of standing/sitting is randomized, to cancel out any sequence effects.
    


## C. Designing the model

#### "Block whatever you can; and randomize the rest!"

- **Block whatever you can** - Assume, for example, that we have an experiment where the results depend on the person who performs the experiment (e.g., the nurse who tests the subject), and on the time of the day. In that case we can block the factor nurse, by having all tests performed by the same nurse. 
- **randomize the rest** - But it won’t be possible to test all subjects at the same time. So we try to average out time effects, by randomly mixing the timing of the subjects. 

If, in contrast, we measure our patients in the morning and our healthy subjects in the afternoon, we will invariably bring some bias into our data.

### 1. Data Selection
- The samples should be representative of the group to be studied.
- In comparative studies, groups must be similar with respect to known sources ofvariation (e.g., age,..).
- **Note** :- Make sure that your selection of samples (or subjects) sufﬁciently **covers all of the parameters** that you need! 
        For example, if age is a nuisance factor, make sure you have enough young, middle aged, and elderly subjects.
  - **type1** - For example, randomly selected subjects from patients at a hospital automatically bias the sample towards subjects with health problems.(subject should also be a normal person without any health problem)
  - **type2** - For example, tests of the efﬁcacy of a new rehabilitation therapy for stroke patients should not only include patients who have had a stroke: make sure that there are equal numbers of patients with mild, medium, and severe symptoms. 
  
*Many surveys and studies fall short on these criteria, The ﬁeld of **“matching by propensity scores”** (Rosenbaum and Rubin 1983) attempts to correct these problems.*

### 2. Data Size
Many studies also fail because the sample size is too small to observe an effect of the desired magnitude. In determining the sample size, one has to know
- What is the variance of the parameter under investigation?
- What is the magnitude of the expected effect, relative to the standard deviation of the parameter?

This is known as **_power analysis_**. It is especially important in behavioral research, where research plans are not approved without careful sample size calculations.

### 3. Bias
"_large sample size alone does not guarantee a representative response. One has to watch out for __selection bias__ and __non-response bias__._"

During 1936 US elections, A magazine called _Literary Digest_ , asked ten million Americans who they would vote for?. **2.4 million** responded, and Literary Digest ** predicted _Landon_ would win 57 % of the vote compared with 41 % for _Roosevelt_**.

   However, the actual election results were **62 % for Roosevelt and 38 % for Landon**. In other words, despite the huge sample
size, the predictions were a whopping 19 % off!

**Reason** -
- __selection bias__ - First, the sample was poorly chosen, and not representative of the American voter: the mailing lists for the survey were taken from telephone directories, club membership lists, and lists of magazine subscribers. Thus, they were strongly biased towards the American middle- and upper-class.
- **non-response bias** - second, only about one-fourth of the people asked responded. And people who respond to surveys are different from people who don’t, the so-called **non-response bias**.

Bias can have a number of sources:
- The selection of subjects.
- The structure of the experiment.
- The measurement device.
- The analysis of the data.

### 4. Randomization
Randomization is used to avoid bias as much as possible, and there are different ways to randomize an experiment. 
- **Simple Randomization** - This procedure is robust against selection and accidental bias. The disadvantage is that the resulting group size can differ signiﬁcantly.
- **Block Randomization** - This is used to keep the number of subjects in the different groups closely balanced at all times.For example, with two types of treatment, A and B, and a block-size of four, one can allocate the two treatments to the blocks of four subjects in the following sequences:
    1. AABB
    2. ABAB
    3. ABBA
    4. BBAA
    5. BABA
    6. BAAB
    
  Based on this, one can use a random number generator to generate random integers between 1 and 6, and use the corresponding blocks to allocate the respective treatments. This will keep the number of subjects in each group always almost equal.
- **Minimization** - A closely related, but not completely random way to allocate a treatment is minimization. Here one takes whichever treatment has the smallest number of subjects, and allocates this treatment with a probability greater than 0.5 to the next patient.

    - Assume, for example, that you are conducting a randomized controlled trial of a new medication, with a _“placebo-group” and a “real medication group_.” Halfway through the trials you realize that your placebo-group already contains 60 subjects, while your medication-group only has 40. You can now solve this imbalance, by giving each remaining subject with 60 % probability (instead of the previously used 50 %) the medication instead of the placebo.

- **Stratiﬁed Randomization** - Sometimes one may want to include a wider variety of subjects, with different characteristics. For example, one may choose to have younger as well as older subjects. In this case, one should try to keep the number of subjects within each stratum balanced. In order to do this, separate lists of random numbers should be kept for each group of subjects.

### 5. Blinding
Consciously or not, the experimenter can signiﬁcantly inﬂuence the outcome of an experiment. For example, a young researcher with a _new “brilliant” idea for a new treatment will be biased in the execution of the experiment_, as well in the analysis of the data, to see the hypothesis conﬁrmed. 

To avoid such subjective inﬂuence, ideally the experimenter as well as the subject should be blinded to the therapy. This is referred to as **double blinding**. When also the person who does the analysis does not know which group the subject has been allocated to, we speak about **triple blinding**.

### 6. Factorial Design
When each combination of factors is tested, we speak of _full factorial design_ of the experiment. In planning the analysis, one must distinguish between **_within subject_** comparisons,and ___between subjects___ comparisons.
- The former, within subject comparisons, allows to detect smaller differences with the same number of subjects than between subject comparisons.