# Data Visualization

Data Visualization is the presentation of data in pictorial format. This is very important and useful for Data Analysis, primarily because the data-centric Python packages provide a fantastic ecosystem that support this. It is helpful in understanding the data, however, complex it may be. A huge amount of data can be summarized and presented in a simple and easy-to-understand format that helps communicate information clearly and effectively.

Examples of such Python packages are Pandas and Seaborn. They are packages that are used to make importing and analyzing data much easier.

# Pandas

Pandas offer tools for cleaning and process your data. It is the most popular Python library used for data analysis.

## 1. Let’s start with creating a Pandas data frame: 
In pandas, a data table is called a dataframe. 

In [19]:
# Python code demonstrate creating 
    
import pandas as pd
    
# initialise data of lists.
data = {'Name of city':[ 'Vaasa' , 'Helsinki' , 'Oulu' , 'Lapland' ],
        'Population':[ 66960 , 1100000 , 199526 , 178522 ]}
    
# Create DataFrame
df = pd.DataFrame(data)
    
# Print the output.
display(df)

## 2. Load a CSV file
You can also load a CSV data from the system and display it through pandas.

In [20]:
# import module
import pandas
 
# load the csv
data = pandas.read_csv("../input/gnss-kf-testdata/cys_rtk_wls2fix_hw_clock_error_test11.csv")
 
# show first 5 column
data.head(7)

# Seaborn

Seaborn is an amazing visualization library for statistical graphics plotting in Python. It is built on the top of matplotlib library and also closely integrated into the data structures from pandas.

Using seaborn.pairplot()

To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This shows the relationship for (n, 2) combination of variable in a DataFrame as a matrix of plots and the diagonal plots are the univariate plots.

In [21]:
# importing packages
import seaborn
import matplotlib.pyplot as plt
 
# loading dataset using seaborn
df = seaborn.load_dataset('tips')
 
# pairplot with hue sex
seaborn.pairplot(df, hue ='size')
plt.show()

## Seaborn: statistical data visualization
Seaborn helps to visualize the statistical relationships, To understand how variables in a dataset are related to one another and how that relationship is dependent on other variables, we perform statistical analysis. This Statistical analysis helps to visualize the trends and identify various patterns in the dataset.

These are the plot will help to visualize:

- Line Plot
- Scatter Plot
- Box plot
- Point plot
- Count plot
- Violin plot
- Swarm plot
- Bar plot
- KDE Plot

### A. Line plot:
Lineplot Is the most popular plot to draw a relationship between x and y with the possibility of several semantic groupings.

In [22]:
# import module
import seaborn as sns
import pandas
 
# loading csv
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
# plotting lineplot
sns.lineplot( data['Age'], data['Weight'])

### Using the hue parameter for plotting the graph.

In [23]:
# import module
import seaborn as sns
import pandas
 
# read the csv data
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
# plot
sns.lineplot(data['Age'],data['Weight'], hue =data["Position"])

### B. Scatter Plot:
Scatterplot Can be used with several semantic groupings which can help to understand well in a graph against continuous/categorical data. It can draw a two-dimensional graph.

In [24]:
# import module
import seaborn
import pandas
 
# load csv
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
# plotting
seaborn.scatterplot(data['Age'],data['Weight'])

### Using the hue parameter for plotting the graph.

In [25]:

import seaborn
import pandas
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
seaborn.scatterplot( data['Age'], data['Weight'], hue =data["Position"])

### C. Box plot:
A box plot (or box-and-whisker plot) s is the visual representation of the depicting groups of numerical data through their quartiles against continuous/categorical data.

A box plot consists of 5 things.

- Minimum
- First Quartile or 25%
- Median (Second Quartile) or 50%
- Third Quartile or 75%
- Maximum

**Draw the box plot with Pandas:**

**Example 1:**

In [26]:
# import module
import seaborn as sns
import pandas
 
# read csv and plotting
data = pandas.read_csv( "../input/datavisualizationsampledata/nba.csv" )
sns.boxplot( data['Age'] )

**Example 2:**


In [27]:
# import module
import seaborn as sns
import pandas
 
# read csv and plotting
data = pandas.read_csv( "../input/datavisualizationsampledata/nba.csv" )
sns.boxplot( data['Age'], data['Weight'])

### D. Violin Plot:
A violin plot is similar to a boxplot. It shows several quantitative data across one or more categorical variables such that those distributions can be compared.

**Draw the violin plot with Pandas:**

**Example 1:**

In [28]:
# import module
import seaborn as sns
import pandas
 
# read csv and plot
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
sns.violinplot(data['Age'])

**Example 2:**

In [29]:
# import module
import seaborn
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.violinplot(x ="Age", y ="Weight",data = data)

### E. Swarm plot:
A swarm plot is similar to a strip plot, We can draw a swarm plot with non-overlapping points against categorical data.

**Draw the swarm plot with Pandas:**

**Example 1:**

In [30]:
# import module
import seaborn
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.swarmplot(x = data["Age"])

**Example 2:**

In [31]:
# import module
import seaborn
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.swarmplot(x ="Age", y ="Weight",data = data)

### F. Bar plot:
Barplot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars.

**Draw the bar plot with Pandas:**

**Example 1:**

In [32]:
# import module
import seaborn
import pandas as pd
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pd.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.barplot(x =data["Age"])

**Example 2:**

In [33]:
# import module
import seaborn
import pandas as pd
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pd.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.barplot(x ="Age", y ="Weight", data = data)

### G. Point plot:
Point plot used to show point estimates and confidence intervals using scatter plot glyphs. A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars.

**Draw the point plot with Pandas:**

**Example:**

In [34]:
# import module
import seaborn
import pandas as pd
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pd.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.pointplot(x = "Age", y = "Weight", data = data)

### H. Count plot:
Count plot used to Show the counts of observations in each categorical bin using bars.

**Draw the count plot with Pandas:**

**Example:**

In [35]:
# import module
import seaborn
import pandas as pd
 
seaborn.set(style = 'whitegrid')
 
# read csv and plot
data = pd.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.countplot(data["Age"])

### I. KDE Plot:
KDE Plot described as Kernel Density Estimate is used for visualizing the Probability Density of a continuous variable. It depicts the probability density at different values in a continuous variable. We can also plot a single graph for multiple samples which helps in more efficient data visualization.

**Draw the KDE plot with Pandas:**

**Example:**

In [36]:
# import module
import seaborn as sns
import pandas as pd
 
# read top 5 column
data = pd.read_csv("../input/datavisualizationsampledata/nba.csv").head()
 
sns.kdeplot(data['Age'], data['Number'])

# Data visualization with Pairplot Seaborn and pandas

## Example 1: 

In this example, we will simply plot a pairplot with pandas data frame. Here we are simply loading nba.csv data and creating a dataframe and although passing as arguments in a pairplot.

In [37]:
# importing packages
import seaborn
import pandas
 
# load the csv
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
# pairplot
seaborn.pairplot(data)

## Example 2:

In this example, we will be going to use hue attributes for the visualization of a specific column.

In [38]:
# importing packages
import seaborn
import pandas
 
# load the csv
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
seaborn.pairplot(data.head(), hue = 'Age')

## Example 3:

In this example, we will pass the dictionaries of keyword arguments for bivariate plotting function(plot_kws and diag_kws)

In [39]:
# importing packages
import seaborn
import pandas

# load the csv
data = pandas.read_csv("../input/datavisualizationsampledata/nba.csv")
 
seaborn.pairplot(data, hue = 'Age', diag_kind = 'kde',
             plot_kws = {'edgecolor': 'k'}, size = 4)

# Summary

Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers. Data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions. The concept of using pictures to understand data has been used for centuries. General types of data visualizations are Charts, Tables, Graphs, Maps, Dashboards.