# Setup

In [None]:
# This R environment comes with many helpful analytics packages installed
# It is defined by the kaggle/rstats Docker image: https://github.com/kaggle/docker-rstats
# For example, here's a helpful package to load

library(tidyverse) # metapackage of all tidyverse packages

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

# Function to plot width and height of plot
fig<-function(x,y){
    options(repr.plot.width = x, repr.plot.height = y)
    }

df <- read_csv("/kaggle/input/fake-clinical-trial/Clinical trial made up data.csv")

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Basic info

In [None]:
head(df)

In [None]:
# Checking the data types of the data frame 
str(df)

In [None]:
summary(df)

# Data cleaning

In [None]:
# Renaming the columns 
colnames(df) <- c("Patient", "Group", "Pre_Intervention_Depression", "Pre_Intervention_Anxiety", "Pre_Intervention_Other_Issues", "Post_Intervention_Depression", "Post_Intervention_Anxiety", "Post_Intervention_Other_Issues")


In [None]:
# Changing column types
# Assuming the data is stored in a data frame called "mydata"
df$Post_Intervention_Depression <- as.numeric(df$Post_Intervention_Depression)
df$Post_Intervention_Anxiety <- as.numeric(df$Post_Intervention_Anxiety)
df$Post_Intervention_Other_Issues <- as.numeric(df$Post_Intervention_Other_Issues)

In [None]:
df

# Analyzing the data

In [None]:
summary(df)

In [None]:
# create correlation matrix
cor_matrix <- cor(df[, 3:8], use = "pairwise.complete.obs")

# print correlation matrix
#cor_matrix

library(corrplot)

corrplot(cor_matrix, method = "circle") #other methods are in the hidden cell below

### Other methods for the corrplot()

The `corrplot()` function in R provides seven visualization methods for creating correlation matrix plots[1][2][3][4][5]. These methods are:

1. "circle": This is the default method and creates a plot with circles representing the correlation coefficients between each pair of variables.

2. "square": This method creates a plot with squares representing the correlation coefficients between each pair of variables.

3. "ellipse": This method creates a plot with ellipses representing the correlation coefficients between each pair of variables. The eccentricity of the ellipses is scaled to the correlation value.

4. "number": This method creates a plot with numbers representing the correlation coefficients between each pair of variables.

5. "pie": This method creates a plot with pie charts representing the correlation coefficients between each pair of variables.

6. "shade": This method creates a plot with shaded rectangles representing the correlation coefficients between each pair of variables.

7. "color": This method creates a plot with colored rectangles representing the correlation coefficients between each pair of variables.

You can specify the method you want to use by setting the `method` parameter of the `corrplot()` function to one of these values. Each method has its own strengths and weaknesses, and you can choose the one that best suits your needs.

Citations:
[1] https://rdrr.io/cran/corrplot/man/corrplot.html
[2] https://rdrr.io/cran/corrplot/f/vignettes/corrplot-intro.Rmd
[3] https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html
[4] https://stackoverflow.com/questions/24298793/how-do-i-interpret-the-output-of-corrplot
[5] https://statisticsglobe.com/correlation-matrix-in-r
[6] https://www.statology.org/correlation-matrix-in-r/

# Plots

In [None]:
ggplot(df, aes(x = Pre_Intervention_Depression, y = Pre_Intervention_Anxiety)) +
    geom_point(color = "blue", size = 3)

In [None]:
# Making a subset with the intervention group
intervention_subset <- subset(df, Group == "Intervention")


In [None]:
ggplot(intervention_subset, aes(x = Pre_Intervention_Depression, y = Post_Intervention_Depression)) +
  geom_point(color = "blue", size = 3) +
  labs(title = "Scatter Plot of Pre vs Post Intervention Depression",
       x = "Pre-Intervention Depression", y = "Post-Intervention Depression")
