## In this notebook we will see the most commonly use Plots/Graphs in data science with Toy case and practical example

![viz](https://sm.pcmag.com/t/pcmag_in/feature/1/10-free-da/10-free-data-visualization-tools_5un4.1920.jpg)

## We can increase our plot understanding by applying the following code

`plt.title(“My Title”)` will add a title “My Title” to your plot

`plt.xlabel(“Year”)` will add a label “Year” to your x-axis

`plt.ylabel(“Population”)` will add a label “Population” to your y-axis

`plt.xticks([1, 2, 3, 4, 5])` set the numbers on the x-axis to be 1, 2, 3, 4, 5. We can also pass and labels as a second argument. 

For, example, if we use this code `plt.xticks([1, 2, 3, 4, 5]`, ["1M", "2M", "3M", "4M", "5M"]), it will set the labels 1M, 2M, 3M, 4M, 5M on the x-axis.

`plt.yticks()` - works the same as plt.xticks(), but for the y-axis.

`plt.figure(figsize=(12,10))` - we can adjust the size of figure using this


### Importing libraries

In [None]:
# Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')

### Read dataset

In [None]:
mobile = pd.read_csv('../input/mobile-price-classification/train.csv')
titanic = pd.read_csv('../input/c/titanic/train.csv')
house = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')

# Datetime dataset
comp = pd.read_csv('../input/time-series-starter-dataset/Month_Value_1.csv',parse_dates=['Period'])

# Bar Chart

## Toy Example

In [None]:
ds = {'Day':['Mon', 'Tue', 'Wed', 'Thu', 'Fri'], 'Hours':[8,7,11,9,6]}
dx = pd.DataFrame(ds)
sns.barplot(x='Day', y='Hours', data=dx)

`sns.barplot(x='Day', y='Hours', data=dx)`

`x` : for x axis (Column name)

`y` : for y axis (Column name)

`data` = dataframe where we take column


## Practical Example

In [None]:
sns.barplot(y='Survived', x='Sex', hue='Pclass', data=titanic)

* We see that in 1 Pclass more male as well as feamale survive

In [None]:
sns.barplot(y='Survived', x='Embarked', hue='Sex', data=titanic)

* In C embark more male feamale survived

Ignore the gray line that define confidence interval (it because our dataset has NaN values)

# Line Chart

In [None]:
sns.lineplot(x='Day',y='Hours', data=dx)

In [None]:
sns.lineplot(x='Period',y='Sales_quantity', data=comp)

* Sales quantity are decreasing at the starting of the year  
* overall increasing in sales quantity over time

# Scatterplot

In [None]:
# To get insight from dataset
def missing_values_table(df):
        # Total missing values
        mis_val = df.isnull().sum()
        
        # Percentage of missing values
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        
                
        # Coumn for dtypes
        dtype = df.dtypes
        
        # Column for Unique values
        num_unique = []
        for col in df.columns:
            num_unique.append(df[col].nunique())
        num_unique = pd.Series(num_unique)
        num_unique.index = df.columns
        
        # Make a table with the results
        mis_val_table = pd.concat([mis_val, mis_val_percent,dtype,num_unique], axis=1)
        
        # Rename the columns
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values', 2:'Data Types', 3:'Unique_values'})
        
        # Sort the table by percentage of missing descending
        mis_val_table_ren_columns = mis_val_table_ren_columns.sort_values('% of Total Values', ascending=False).round(1)
        
        # Print some summary information
        print("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].shape[0]) + 
              " columns that have missing values.")
        
        # Return the dataframe with missing information
        return mis_val_table_ren_columns,num_unique
a, k = missing_values_table(house)
k.index = house.columns

In [None]:
x = range(50)
y = range(50) + np.random.randint(0,20,50)

In [None]:
sns.scatterplot(x=x, y=y)

In [None]:
comp.head()

In [None]:
sns.scatterplot(x='Revenue', y='Sales_quantity', data=comp)

* Sales quantity and revenue has positive relation

# Pie Chart

In [None]:
y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.legend()
plt.show() 

In [None]:
titanic.head(5)

In [None]:
# We require the count of each Embarked so we grou titanic by Embarked
Emtitanic = titanic.groupby('Embarked').size()
Emtitanic

In [None]:
# Take his index as a labels
labels = Emtitanic.index

In [None]:
plt.pie(Emtitanic, labels=labels)
plt.legend()
plt.show() 

In [None]:
ptitanic = titanic.groupby('Pclass').size()
labels = ptitanic.index
plt.pie(ptitanic, labels=labels)
plt.legend()
plt.show() 

# Heat Map

In [None]:
# generating 2-D 10x10 matrix of random numbers
# from 1 to 100
data = np.random.randint(low=1,	high=100,size=(10, 10))
# plotting the heatmap
hm = sns.heatmap(data=data)
plt.show()


In [None]:
# Create a dataframe that contains correlation of features
corrtitanic = titanic.corr()

# Plot heatmap of correlation map
plt.figure(figsize=(12,10))
sns.heatmap(corrtitanic, annot=True)

* Fare and Pclass are strong negatively correlated
* SibSp and Parch are strong positively correlated

# Histogram

In [None]:
x = np.random.normal(size=60)
sns.histplot(x,bins=6)

In [None]:
sns.histplot(comp.Sales_quantity, bins=15)

# Box Plot

In [None]:
# Creating dataset
np.random.seed(10)
data_1 = np.random.normal(100, 10, 200)
data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]
fig = plt.figure(figsize =(12, 7))
plt.boxplot(data)
# show plot
plt.show()


In [None]:
plt.boxplot(house.MSSubClass)
plt.show()

* we have few outliers in MSSubClass

In [None]:
mobile.head()

In [None]:
#  plot multiple boxplot together
# Create a list of features 
features = [mobile.n_cores, mobile.fc]
plt.boxplot(features)
plt.show()

* fc featues has some outliers

# Tree Map

In [None]:
import squarify # pip install squarify
sizes = [50, 25, 12, 6]
squarify.plot(sizes)
plt.show()

In [None]:
plt.figure(figsize=(14,12))
ptitanic = house.groupby('Neighborhood').size()
labels = ptitanic.index
squarify.plot(ptitanic, label=labels)
plt.show() 

* we can see distribution of each Neighborhood compare with each other

## I hope this Helpful 😃