<a href="https://colab.research.google.com/github/HI160029/Data-science-R/blob/main/PRACTICAL_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**R-Box plots**

1. Boxplots are a measure of how well distributed the data in a data set is. It divides the data set into three quartiles. This graph represents the minimum, maximum, median, first quartile and third quartile in the data set. It is also useful in comparing the distribution of data across data sets by drawing boxplots for each of them.

2. Boxplots are created in R by using the boxplot() function.

Syntax:

The basic syntax to create a boxplot in R is −

```
boxplot(x, data, notch, varwidth, names, main)
```
Following is the description of the parameters used −
*  **x** is a vector or a formula.
*  **data** is the data frame.
*  **notch** is a logical value. Set as TRUE to draw a notch.
*  **varwidth** is a logical value. Set as true to draw width of the box proportionate to the sample size.
*  **names** are the group labels which will be printed under each boxplot.
*  **main** is used to give a title to the graph.

3. Example:

We use the data set "mtcars" available in the R environment to create a basic boxplot. Let's look at the columns "mpg" and "cyl" in mtcars.

Source: https://gist.github.com/seankross/a412dfbd88b3db70b74b

In [None]:
input <- read.csv("/content/mtcars.csv")
print(head(input))

              model  mpg cyl disp  hp drat    wt  qsec vs am gear carb
1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


4. Creating the Boxplot

The below script will create a boxplot graph for the relation between mpg (miles per gallon) and cyl (number of cylinders).

In [None]:
# Give the chart file a name.
png(file = "boxplot.png")

# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")

# Save the file.
dev.off()

5. Boxplot with Notch

We can draw boxplot with notch to find out how the medians of different data groups match with each other. The below script will create a boxplot graph with notch for each of the data group.

In [None]:
# Give the chart file a name.
png(file = "boxplot_with_notch.png")

# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE,
varwidth = TRUE,
col = c("green","yellow","purple"),
names = c("High","Medium","Low")
)

# Save the file.
dev.off()

“some notches went outside hinges ('box'): maybe set notch=FALSE”


#**R- Histogram**

1. A histogram represents the frequencies of values of a variable bucketed into ranges. Histogram is like bar chart, but the difference is it groups the values into continuous ranges. Each bar in histogram represents the height of the number of values present in that range.
2. R creates histogram using hist() function. This function takes a vector as an input and uses some more parameters to plot histograms.
3. Syntax

The basic syntax for creating a histogram using R is −

```
hist(v,main,xlab,xlim,ylim,breaks,col,border)
```

Following is the description of the parameters used −
*  **v** is a vector containing numeric values used in histogram.
*  **main** indicates title of the chart.
*  **col** is used to set color of the bars.
*  **border** is used to set border color of each bar.
*  **xlab** is used to give description of x-axis.
*  **xlim** is used to specify the range of values on the x-axis.
*  **ylim** is used to specify the range of values on the y-axis.
*  **breaks** is used to mention the width of each bar.

4. Example

A simple histogram is created using input vector, label, col and border parameters. The script given below will create and save the histogram in the current R working directory.

In [None]:
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.
dev.off()

Range of X and Y values

To specify the range of values allowed in X axis and Y axis, we can use the xlim and ylim parameters.

The width of each of the bar can be decided by using breaks.

In [None]:
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.
png(file = "histogram_lim_breaks.png")

# Create the histogram.
hist(v,xlab = "Weight",col = "green",border = "red", xlim =c(0,40), ylim = c(0,5),breaks = 5)

# Save the file.
dev.off()

**R- Mean, Median, Mode**

1. Statistical analysis in R is performed by using many in-built functions. Most of these functions are part of the R base package. These functions take R vector as an input along with the arguments and give the result. The functions we are discussing in this chapter are mean, median and mode.

2. Mean

It is calculated by taking the sum of the values and dividing with the number of values in a data series.
The function mean() is used to calculate this in R.

3. Syntax

The basic syntax for calculating mean in R is −


```
mean(x, trim = 0, na.rm = FALSE, ...)
```

Following is the description of the parameters used −
*  x is the input vector.
*  trim is used to drop some observations from both end of the sorted vector.
*  na.rm is used to remove the missing values from the input vector.

4. Example

In [None]:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)

[1] 8.22


5. Applying Trim Option

When trim parameter is supplied, the values in the vector get sorted and then the required numbers of observations are dropped from calculating the mean.
When trim = 0.3, 3 values from each end will be dropped from the calculations to find mean.

In this case the sorted vector is (−21, −5, 2, 3, 4.2, 7, 8, 12, 18, 54) and the values

removed from the vector for calculating mean are (−21,−5,2) from left and (12,18,54)from right.

In [None]:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)

[1] 5.55


6. Applying NA Option

If there are missing values, then the mean function returns NA.
To drop the missing values from the calculation use na.rm = TRUE. which means
remove the NA values.

In [None]:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)

# Find mean.
result.mean <- mean(x)
print(result.mean)

# Find mean dropping NA values.
result.mean <- mean(x,na.rm = TRUE)
print(result.mean)

[1] NA
[1] 8.22


7. Median

The middle most value in a data series is called the median. The median() function isused in R to calculate this value.
8. Syntax

The basic syntax for calculating median in R is − median(x, na.rm = FALSE)
Following is the description of the parameters used −
*  x is the input vector.
*  na.rm is used to remove the missing values from the input vector.

Example

In [None]:
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find the median.
median.result <- median(x)
print(median.result)

[1] 5.6


9. Mode

The mode is the value that has highest number of occurrences in a set of data. Unike mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output.

10. Example

In [None]:
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)

# Calculate the mode using the user function.
result <- getmode(v)
print(result)

# Create the vector with characters.
charv <- c("o","it","the","it","it")

# Calculate the mode using the user function.
result <- getmode(charv)
print(result)

[1] 2
[1] "it"
