# Data Visualization using Python



Data visualization is a field in data analysis that deals with visual representation of data. It graphically plots data and is an effective way to communicate inferences from data.

Using data visualization, we can get a visual summary of our data. With pictures, maps and graphs, the human mind has an easier time processing and understanding any given data. Data visualization plays a significant role in the representation of both small and large data sets, but it is especially useful when we have large data sets, in which it is impossible to see all of our data, let alone process and understand it manually.


Python offers several plotting libraries, namely Matplotlib, Seaborn and many other such data visualization packages with different features for creating informative, customized, and appealing plots to present data in the most simple and effective way.

#### Using Colab

##Basic plotting in matplotlib

In [None]:
#import matplotlib for 2D visulation
import matplotlib.pyplot as plt


In [None]:
import seaborn as sns
#plt.style.use('seaborn-whitegrid')

import numpy as np
import pandas as pd


###Histograms

In [None]:
data = np.random.randn(1000)


In [None]:
data

In [None]:
plt.hist(data);

The hist() function has many options to tune both the calculation and the display.

In [None]:
var=dict(bins=15,
         alpha=0.6,
         histtype='bar',
         color='black',
         edgecolor='red')

In [None]:
plt.hist(data, **var);

Comparing histograms of several distributions

In [None]:
x1 = np.random.normal(0, 0.8, 1000)
x2 = np.random.normal(-2, 1, 1000)
x3 = np.random.normal(3, 2, 1000)

In [None]:
kwargs = dict(histtype='stepfilled',
              alpha=0.5,
              density=True,
              bins=20)

plt.hist(x1, **kwargs)
plt.hist(x2, **kwargs)
plt.hist(x3, **kwargs);

## Visualizing some datasets with seaborn

####Iris dataset

In [None]:
import sklearn
from sklearn import datasets

In [None]:
# Load the iris dataset
iris = datasets.load_iris()

In [None]:
# Load it to a pandas dataframe:
iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['label'])

iris_df['label_names'] = iris_df['label'].apply(lambda x: iris.target_names[int(x)])



In [None]:
iris_df.head(3)

In [None]:
iris_df.tail()

In [None]:
import seaborn as sns
df=sns.load_dataset('iris')
df.head()

In [None]:
# Let's have a look at how the data looks like
display(iris_df)

In [None]:
iris_df.shape

In [None]:
iris_df.head()

In [None]:
iris_df.tail()

In [None]:
iris_df.shape

In [None]:
iris_df.info()

In [None]:
iris_df.head(3)

In [None]:
plt.scatter(iris_df['petal length (cm)'], iris_df['sepal length (cm)'])

In [None]:
# Visualize the data using Matplotlib
import matplotlib.pyplot as plt

colours = ['orange', 'blue', 'green']
species = iris.target_names

f = plt.figure(figsize=(12,6))

for i in range(0, 3):
    species_df = iris_df[iris_df['label'] == i]
    plt.scatter(
        species_df['petal length (cm)'],
        species_df['sepal length (cm)'],
        color=colours[i],
        alpha=0.5,
        label=species[i]
    )
plt.xlabel('petal length (cm)')
plt.ylabel('sepal length (cm)')
plt.title('Iris dataset: petal length vs sepal length')
plt.legend(loc='upper left')
plt.show()

In [None]:
iris_df.head()

In [None]:
# Visualize the data using Seaborn
# Matrix scatter plot.

import seaborn as sns

sns.set_style("darkgrid")
sns.pairplot(iris_df[iris.feature_names + ['label_names']], hue="label_names",height=2);

#### Cars dataset

## Data Preparation and Cleaning

Upload the dataset *car.csv* before running the following code snippets.

#### Import all dependencies  

In [None]:
%matplotlib inline
sns.set(color_codes=True)

In [None]:
# load dataset from directory
cardf = pd.read_csv("Car_data.csv")


In [None]:
cardf

In [None]:
# display toop 5 rows
cardf.head()

In [None]:
cardf.tail(5)

In [None]:
cardf.shape

#### Check the dataType of all the columns of Dataset

In [None]:
cardf.dtypes

In [None]:
cardf.info()

#### Checking the number of rows and columns present

In [None]:
cardf.shape

In [None]:
cardf.describe()   ## gettign the details of all integer column from dataset like mean, median, avg, max,min etc

In [None]:
cardf.Volume.mean()

#### Removed the duplicate rows, If any  and verify using shape function.

In [None]:
df = cardf.drop_duplicates()

In [None]:
df.shape

In [None]:
cardf.count()

In [None]:
cardf.shape

In [None]:
cardf.head(5)

#### Rename the column from big column name to smaller name

In [None]:
cardf.head(5)

In [None]:
cardf = cardf.rename(columns={"Engine Fuel Type":"Fuel Type","Engine HP": "HP", "Engine Cylinders": "Cylinders", "Transmission Type": "Transmission", "Driven_Wheels":
                              "Drive Mode","highway MPG": "MPG-H", "city mpg": "MPG-C", "MSRP": "Price" })
cardf.head(10)

#### Removing the rows with null values/missing values from dataset

In [None]:
cardf.info()

In [None]:
cardf.shape

In [None]:
cardf.count()

In [None]:
print(cardf.isnull().sum())

In [None]:
cardf = cardf.dropna()


In [None]:
cardf.count()  ## Now every fields has values and all fields count are same.

In [None]:
cardf.info()

In [None]:
import matplotlib
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

## Exploratory Analysis and Visualization

In this Part we wil try to explore more about the data insigts. In previous part we have cleaned the data and now that clened processed data need to be analysed to gather some usefull info like mean, meadian and various relationship.


Find the top 40 Company and number of cars made by them . Visualize them using a bar plot.

In [None]:
cardf.Make.value_counts().nlargest(5).plot(kind='bar', figsize=(10,5))
plt.title("Number of cars by Company (Make)")
plt.ylabel('Number of cars')
plt.xlabel('Company (Make)');

In [None]:
plt.figure(figsize=(16,8))
plt1 = cardf.Make.value_counts().plot(kind='bar')
plt.title('Companies Histogram')
plt1.set(xlabel = 'Make', ylabel='Number of cars')

In [None]:
cardf.head()

Explore the relationship between engine Size and prce of the cars

In [None]:
plt.figure(figsize=(20,8))
plt.subplot(1,2,1)
plt.title('Car Price vs Engine Size plot ')
sns.regplot(x='HP',y='Price',data=cardf)

In [None]:
plt.figure(figsize=(30,12))
plt.title('Car Price vs Engine Size plot ')
sns.regplot(x='MPG-H',y='Price',data=cardf)

Explore the dataset with respect to Company name, Vehicle size and Price of the car

Visualization the top 100 Make of cars based on their prices

In [None]:
df=cardf.iloc[1:100]

In [None]:
df4=cardf.iloc[100:]

In [None]:
df4.head()

In [None]:
plt.figure(figsize=(10,5))
sns.boxplot(x='Make',y='Price',hue='Vehicle Size',data=df)

#### Describe the relationship between Price and Vehicle Size

In [None]:
sns.boxplot(x='Vehicle Size',y="Price",data=df) # box

We have plotted the multilpe graph with given daset to find the relationship between Fueltype , price and Avg price.

In [None]:
plt.figure(figsize=(20,8))

plt.subplot(1,2,1)
plt.title('Fuel Type Histogram')
sns.countplot(y=cardf['Fuel Type'], palette=("Blues_d"))

plt.subplot(1,2,2)
plt.title('Fuel Type vs Price')
sns.boxplot(x=df['Fuel Type'], y=df.Price, palette=("PuBuGn"))

plt.show()

df2 = pd.DataFrame(df.groupby(['Fuel Type'])['Price'].mean().sort_values(ascending = False))
df2.plot.bar(figsize=(8,6))
plt.title('Fuel Type vs Average Price')
plt.show()

#### Derive a relationship between the price of car and number of doors present in the car.

In [None]:
plt.subplot(1,2,1)
plt.title('Door Number Histogram')
sns.countplot(df['Number of Doors'], palette=("plasma"))

plt.subplot(1,2,2)
plt.title('Door Number vs Price')
sns.boxplot(x=df['Number of Doors'], y=df.Price, palette=("plasma"))
plt.tight_layout(pad=2)

plt.show()