# Import Library

In [None]:
library(tidyverse) # metapackage of all tidyverse packages
library(ggplot2)
library(dplyr)
library(reshape2) # Melt
library(plyr)

library(scales) # visualisation
library(GGally) # visualisation
library(ggthemes) # visualisation

# Interactivity
library(crosstalk)
library(plotly)

#Date
library(scales)
library(zoo)
library(lubridate)

# Data

In [None]:
university<-read_csv('../input/world-university-rankings/cwurData.csv')
health<-read_csv('../input/av-healthcare-analytics-ii/healthcare/train_data.csv')
netflix<-read_csv('../input/netflix-shows/netflix_titles.csv')
playstore<-read_csv('../input/google-play-store-apps/googleplaystore.csv')
campus<-read_csv('../input/factors-affecting-campus-placement/Placement_Data_Full_Class.csv')
nifty<-read_csv('../input/nifty50-stock-market-data/NIFTY50_all.csv')


# Function to set Height & Width

In [None]:
# Function to plot width and height of plot
fig<-function(x,y){
    options(repr.plot.width = x, repr.plot.height = y)
    }

# Basic Starters

<font color="darkblue"><b>Basic elements in layout:</b></font>
* **ggtitle** - title of plot
* **xlab** - Plot xaxis title
* **ylab** - Plot yaxis title
* **labs(title=__,x=__,y=__)** - all titles in single line 
* **scale_x_continuous(limits=c())** - Controlling coninuous x axis variable
* **scale_y_continuous(limits=c())** - Controlling coninuous y axis variable
* **scale_x_discrete()** - Controlling discrete x axis variable
* **scale_y_discrete()** - Controlling discrete y axis variable
* **scale_x_reverse** - Reverse directon of x axis
* **scale_x_log10()** - Plot x on log10 scale 
* **scale_x_date(labels = date_format(""),breaks = date_breaks(""))** - treat xvalues as dates
* **xlim** - Limiting x axis 
* **ylim** - Limiting y axis
* **theme_bw()** - White background with grid lines 
* **theme_grey()** -Grey background (default theme)
* **theme_classic()** - White background no gridlines 
* **theme_minimal()** - Minimal theme

<a id="35"></a>
<font color="olive" size=+2.5><b>1. Basic Bar chart</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** :  Displays quantitative representation of a variable.

**Question** : How many universities in each countries have good score? (Filtered for universities with score greater than 64)

In [None]:
fig(12,8)
ggplot(university[university$score>64,], aes(country))+
geom_bar(stat="count", width = 0.5, fill="darkblue")+
 labs(x="Country",
         y="Score", 
       title="Country vs Score that above 64 ")+ 
theme_grey()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

<a id="11"></a>
<font color="magenta" size=+2.5><b>1.1 Bar chart - Gradient & Text</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** :  Displays quantitative representation of a variable highlighting the most counts with color gradient and text position for all bars.

**Question** : In which genre does most of google playstore apps fall? Highlight from top count to low count

In [None]:
genre_data<-as.data.frame(table(playstore$Genres))
genre_data<-genre_data[order(-genre_data$Freq),] %>% top_n(10)
colnames(genre_data)<-c('Genre','Count')

In [None]:
fig(12,8)
ggplot(genre_data, aes(Genre,Count,fill=Count))+
geom_bar(stat="identity", width = 0.5)+
geom_text(aes(label=Count), vjust=2) +
scale_fill_gradient(low = "green", high = "red")+
 labs(x="Genre",
         y="Count", 
       title="Distribution of Playstore Genres in App ")+ 
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15,angle=90),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

<a id="12"></a>
<font color="magenta" size=+2.5><b>1.2 Bar chart - Stacked & Group</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Displays quantitative representation of a variable grouping/stacking the bars.

**Question** : How many shows/movies were released in Netflix by India & United States over past 5 years? (Grouping or Stacking countries)

In [None]:
ind_us_shows<-netflix%>%filter( (country == "United States" | country == "India" )& release_year>2015)
head(ind_us_shows)


In [None]:
fig(12,8)
ggplot(ind_us_shows, aes(release_year,fill=country))+
geom_bar(stat="count",position='stack', width = 0.5)+  # Stack for stacked chart
 labs(x="Year",
         y="Count", 
       title="Distribution of Netflix Shows in India & US ")+ 
theme_minimal()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=16,angle=90),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

In [None]:
fig(12,8)
ggplot(ind_us_shows, aes(release_year,fill=country))+
geom_bar(stat="count", position='dodge',width = 0.5)+  # Dodge for group
 labs(x="Genre",
         y="Count", 
       title="Distribution of Netflix Shows in India & US (Group)")+ 
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15,angle=90),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

<a id="13"></a>
<font color="magenta" size=+2.5><b>1.3. Facet Bar</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Displays quantitative representation of a variable with category in multiple charts

**Question** : How many shows/movies were released in Netflix by India,US,UK & Australia over past 5 years? (Facet)

In [None]:
four_shows<-netflix%>%filter( (country == "United States" | country == "India"  | country=='United Kingdom'|country =='Australia')& release_year>2015)

In [None]:
fig(12,8)
ggplot(four_shows, aes(release_year))+
geom_bar(stat="count", width = 0.5,aes(fill=country))+
 labs(x="Genre",
         y="Count", 
       title="Distribution of Netflix Shows in India,US,UK & Australia")+ 
facet_wrap(~country)+
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15,angle=90),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

<a id="14"></a>
<font color="magenta" size=+2.5><b>1.4. Horizontal Bar</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Displays quantitative representation of a variable in a horizontal manner.

**Question** : How many hospitals fall in each type?

In [None]:
hosp_code<-as.data.frame(table(health$Hospital_type_code))
colnames(hosp_code)<-c('hospital_code','count')

In [None]:
head(hosp_code)

In [None]:
fig(12,8)
ggplot(hosp_code, aes(x=hospital_code,y=count))+
geom_bar(stat="identity",width = 0.5,aes(fill=count))+ 
scale_fill_gradient(low = "red", high = "darkgreen")+
coord_flip()+
 labs(x="Hospital Code",
         y="Count", 
       title="Distribution of Hospital Type Code")+ 
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15,angle=90),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))

<a id="35"></a>
<font color="olive" size=+2.5><b>2. Basic Histogram</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

In [None]:
# Create a histogram of carat

ggplot(data=diamonds, aes(x=carat)) +      # Initialize plot 

       geom_histogram(fill="darkblue",      # Create histogram with blue bars
                      col="darkred",         # Set bar outline color to black
                      binwidth = 0.05) +   # Set bin width

       xlim(0,3)                           # Add x-axis limits

In [None]:
ggplot(data=diamonds, aes(x=clarity)) +        # Initialize plot 

       geom_bar(aes(fill=color),   # Create bar plot, fill based on diamond color
                color="black",                 # Set bar outline color
                position="dodge") +            # Place bars side by side

       scale_fill_manual(values=c("#FFFFFF","#F5FCC2",     # Use custom colors
        "#E0ED87","#CCDE57", "#B3C732","#94A813","#718200"))

## Example

**Purpose** : Display distribution of a continous variable.

**Question** : What is the salary distribution of Computer management graduates?

In [None]:
com_df<-campus%>%filter(degree_t=='Comm&Mgmt')
com_df<-com_df[complete.cases(com_df), ]

In [None]:
fig(12,8)
ggplot(com_df, aes(x=salary)) + 
geom_histogram(binwidth=10000,fill="magenta", color = "black")+
labs(x="Salary",
         y="Count", 
       title="Distribution of Salaries for Comm&Mgmt")+  
theme_bw()+
theme(plot.title = element_text(size=22)
      ,axis.text.x= element_text(size=15),
       axis.text.y= element_text(size=15),
        axis.title=element_text(size=18))

<a id="36"></a>
<font color="olive" size=+2.5><b>2.1. Histogram Stacked</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Display distribution of a continous variable for multiple categories.

**Question** : What is the salary distribution of candidates with different specialisations?

In [None]:
campus<-campus[complete.cases(campus), ]

In [None]:
fig(12,8)
ggplot(campus, aes(x=salary, color=specialisation,fill=specialisation)) +
  geom_histogram(alpha=0.2,position="identity",binwidth=10000)+
labs(x="Salary",
         y="Count", 
       title="Distribution of Salaries for different Specialisation ")+  
theme_bw()+
theme(plot.title = element_text(size=22)
      ,axis.text.x= element_text(size=15),
       axis.text.y= element_text(size=15),
        axis.title=element_text(size=18))

<a id="37"></a>
<font color="olive" size=+2.5><b>2.1. Histogram - Mean & Line type</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Display distribution of a continous variable with mean line

**Question** : What is the distribution of open price of CIPLA stock?

In [None]:
cipla<-read_csv('../input/nifty50-stock-market-data/CIPLA.csv')

In [None]:
fig(12,8)
ggplot(cipla, aes(x=Open)) + 
geom_histogram(binwidth=50,fill="lightblue",linetype="dashed",color="black",size=2)+
geom_vline(aes(xintercept=mean(Open)),
            color="blue", linetype="dashed", size=2)+
labs(x="Open Price",
         y="Count", 
       title="Distribution of CIPLA Open Price ")+  
theme_bw()+
theme(plot.title = element_text(size=22)
      ,axis.text.x= element_text(size=15),
       axis.text.y= element_text(size=15),
        axis.title=element_text(size=18))

<a id="38"></a>
<font color="olive" size=+2.5><b>2.2. Facet Histogram</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** : Display distribution of a continous variable with facet

**Question** : What is the distribution of open price of pharma stocks?

In [None]:
med_stock<-nifty%>%filter(Symbol=='CIPLA'|Symbol=='DRREDDY'|Symbol=='SUNPHARMA')

In [None]:
fig(12,8)
ggplot(med_stock, aes(x=Open)) + 
geom_histogram(binwidth=50,aes(fill=Symbol))+
facet_grid(Symbol ~ .)+
labs(x="Open Price",
         y="Count", 
       title="Distribution of Pharma Stocks Open Price ")+  
theme_bw()+
theme(plot.title = element_text(size=22)
      ,axis.text.x= element_text(size=15),
       axis.text.y= element_text(size=15),
        axis.title=element_text(size=18))

<a id="7"></a>
<font color="steelblue" size=+2.5><b>3. Basic Bubble Plot</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

In [None]:
ggplot(data=diamonds, aes(x=carat, y=price)) +  # Initialize plot 

        geom_point(aes(color=color),            # Color based on diamond color
                        alpha=0.5)     +

        facet_wrap(~clarity)           +        # Facet on clarity

        geom_smooth()                  +        # Add an estimated fit line*

        theme(legend.position=c(0.85,0.16))     # Set legend position

In [None]:
ggplot(data=diamonds, aes(x=carat, y=price)) +  # Initialize plot 
  
  geom_point(aes(size = carat,          # Size points based on carat
                 color = color,         # Color based on diamond color
                 alpha = clarity)) +    # Set transparency based on clarity
                           
  scale_color_manual( values=c("#FFFFFF","#F5FCC2",   # Use manual color values
                               "#E0ED87","#CCDE57", 
                               "#B3C732","#94A813",
                               "#718200")) +
  
  scale_alpha_manual(values = c(0.1,0.15,0.2,         # Use manual alpha values
                                0.3,0.4,0.6,
                                0.8,1)) + 
  
  scale_size_identity() +       # Set size values to the actual values of carat*
  
  xlim(0,2.5) +                 # Limit x-axis
  
  theme(panel.background = element_rect(fill = "#7FB2B8")) +   # Change background color
  
  theme(legend.key = element_rect(fill = '#7FB2B8'))    # Change legend background color

# Example

**Purpose** :  Displays quantitative representation highlighting the most occured category with the size of bubble.

**Question** : How much dependency between age and stay days of patients? Highlight the deposited amount in size

In [None]:
age1 <- data.frame(do.call('rbind', strsplit(as.character(health$Age),'-')))
stay1<-data.frame(do.call('rbind', strsplit(as.character(health$Stay),'-')))
health_df<-cbind(age1, stay1,health$Admission_Deposit,health$'Severity of Illness')
colnames(health_df) <- c("Age_Start","Age_End","Stay_Start","Stay_End","Deposit","Severity")

health_df<-health_df[complete.cases(health_df),]
health_df$Age_Start<-as.numeric(as.character(health_df$Age_Start))
health_df$Age_End<-as.numeric(as.character(health_df$Age_End))
health_df$Stay_Start<-as.numeric(as.character(health_df$Stay_Start))
health_df$Stay_End<-as.numeric(as.character(health_df$Stay_End))

health_df$age<-apply(health_df, 1, function(x) sample(seq(x[1], x[2]), 1))
health_df<-health_df[complete.cases(health_df),]
health_df$stay<-apply(health_df, 1, function(x) sample(seq(x[3], x[4]), 1))
                     

In [None]:
fig(12,8)
ggplot(sample_n(health_df,100), aes(x=age,y=stay)) +
  geom_jitter(aes(size=Deposit),color="blue")+
    labs(x="Age",
         y="Stay", 
       title=" Age vs Stay against Deposits ")+ 
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))


<a id="8"></a>
<font color="steelblue" size=+2.5><b>3.1. Bubble Plot with Color gradient</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

**Purpose** :  Displays quantitative representation highlighting the most occured category with the size of bubble and gradient.

**Question** : How much dependency between age and stay days of patients? Highlight the deposited amount in size and number of stay days with color gradient

In [None]:
fig(12,8)
ggplot(sample_n(health_df,200), aes(x=age,y=stay)) +
  geom_jitter(aes(size=Deposit,color=stay))+
    labs(x="Age",
         y="Stay", 
       title="Age vs Stay against Deposits")+ 
theme_bw()+
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))+
scale_color_gradient(low = "blue", high = "red")

<a id="9"></a>
<font color="steelblue" size=+2.5><b>3.2. Bubble Color</b></font>

<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

***Purpose*** :  Displays quantitative representation highlighting the most occured category with the size of bubble 

**Question** : How much dependency between age and stay days of patients? Highlight the deposited amount in size and categorize the severity of illness with different colors

In [None]:
fig(12,8)
ggplot(sample_n(health_df,200), aes(x=age,y=stay)) +
  geom_jitter(aes(size=Deposit,color=Severity))+
    labs(x="Age",
         y="Stay", 
       title="Age vs Stay against Severity ")+ 
theme_bw()+ 
theme(plot.title = element_text(size=22),axis.text.x= element_text(size=15),
                            axis.text.y= element_text(size=15), axis.title=element_text(size=18))