<a href="https://colab.research.google.com/github/Drishti-17/R_Programming/blob/main/R_Programming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Create a file that contains 1000 lines of random strings.**

In [None]:
# Install the required package
if (!require(stringi)) {
  install.packages("stringi")
}

# Generate random strings
generate_random_string <- function(n) {
  charset <- c(letters, LETTERS, 0:9)
  random_string <- paste(sample(charset, n, replace = TRUE), collapse = "")
  return(random_string)
}

# Set the number of lines and file path
num_lines <- 1000
file_path <- "/content/random_strings.txt"

# Generate random strings and write them to the file
writeLines(sapply(1:num_lines, function(x) generate_random_string(10)), file_path)


**Create a random dataset of 100 rows and 30 columns. All the values are defined between [1,200].**

Perform
the following operations:


(i) Replace all the values with NA in the dataset defined between [10, 60]. Print the count of number rows having missing values.


(ii) Replace all the NA values with the average of the column value.


(iii) Find the Pearson correlation among all the columns and plot heat map. Also select those columns having correlation <=0.7.


(iv) Normalize all the values in the dataset between 0 and 10.


(v) Replace all the values in the dataset with 1 if value <=0.5 else with 0

In [None]:
# Install the required packages
if (!require(tidyverse)) {
  install.packages("tidyverse")
}
if (!require(corrplot)) {
  install.packages("corrplot")
}

# Set the seed for reproducibility
set.seed(123)

# Generate random dataset
dataset <- as.data.frame(matrix(sample(1:200, 100*30, replace = TRUE), nrow = 100))

# (i) Replace values with NA between [10, 60] and count rows with missing values
dataset[10:60, ] <- NA
missing_rows <- sum(rowSums(is.na(dataset)) > 0)
print(paste("Number of rows with missing values:", missing_rows))

# (ii) Replace NA values with column averages
dataset <- apply(dataset, 2, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))

# (iii) Find Pearson correlation and plot heat map
cor_matrix <- cor(dataset)
corrplot(cor_matrix, method = "color")

# Select columns with correlation <= 0.7
low_corr_cols <- names(which(apply(cor_matrix, 2, function(x) any(abs(x) <= 0.7))))
print(paste("Columns with correlation <= 0.7:", paste(low_corr_cols, collapse = ", ")))

# (iv) Normalize values between 0 and 10
normalize <- function(x) {
  min_val <- min(x, na.rm = TRUE)
  max_val <- max(x, na.rm = TRUE)
  normalized <- (x - min_val) / (max_val - min_val) * 10
  return(normalized)
}
dataset <- apply(dataset, 2, normalize)

# (v) Replace values with 1 if <= 0.5, else with 0
dataset <- ifelse(dataset <= 0.5, 1, 0)


**Create a random dataset of 600 rows and 15 columns.**

All the values are defined between [-100,100].


Perform the following operations:


(i) Plot scatter graph between Column 5 and Column 6.


(ii) Plot histogram of each column in single graph.


(iii) Plot the Box plot of each column in single graph.

In [1]:
# Set the seed for reproducibility
set.seed(123)

# Generate random dataset
dataset <- as.data.frame(matrix(runif(600*15, min = -100, max = 100), nrow = 600))

# (i) Plot scatter graph between Column 5 and Column 6
plot(dataset$V5, dataset$V6, xlab = "Column 5", ylab = "Column 6", main = "Scatter Plot")

# (ii) Plot histogram of each column in a single graph
par(mfrow = c(3, 5))
for (i in 1:15) {
  hist(dataset[, i], main = paste("Histogram of Column", i), xlab = paste("Column", i))
}

# (iii) Plot the Box plot of each column in a single graph
par(mfrow = c(3, 5))
for (i in 1:15) {
  boxplot(dataset[, i], main = paste("Boxplot of Column", i), xlab = paste("Column", i))
}


**Create a random dataset of 500 rows and 5 columns:**


All the values are defined between [5,10].


Perform the following operations:


(i) Perform t-Test on each column.


(ii) Perform Wilcoxon Signed Rank Test on each column.


(iii) Perform Two Sample t-Test and Wilcoxon Rank Sum Test on Column 3 and Column 4

In [None]:
# Set the seed for reproducibility
set.seed(123)

# Generate random dataset
dataset <- as.data.frame(matrix(runif(500*5, min = 5, max = 10), nrow = 500))

# (i) Perform t-Test on each column
t_test_results <- lapply(dataset, function(column) {
  t.test(column)
})

# (ii) Perform Wilcoxon Signed Rank Test on each column
wilcoxon_test_results <- lapply(dataset, function(column) {
  wilcox.test(column)
})

# (iii) Perform Two Sample t-Test and Wilcoxon Rank Sum Test on Column 3 and Column 4
column3 <- dataset$V3
column4 <- dataset$V4

two_sample_t_test <- t.test(column3, column4)
wilcoxon_rank_sum_test <- wilcox.test(column3, column4)

# Print the results
print("T-Test Results:")
print(t_test_results)

print("Wilcoxon Signed Rank Test Results:")
print(wilcoxon_test_results)

print("Two Sample T-Test:")
print(two_sample_t_test)

print("Wilcoxon Rank Sum Test:")
print(wilcoxon_rank_sum_test)
