# Exploratory Data Analysis

Pandas module is imported to create dataframes from csv file and manipulate the created dataframes to carry out analytical tasks.
Pyplot submodule of Matplotlib library is used here to create simple plots.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

Here using _read_csv_ function of Pandas library, the dataset in csv format is converted to a pandas dataframe saved in variable `grd`.

In [2]:
grd = pd.read_csv('../data/graphene_data_final.csv')

A glimple of the imported dataset is shown below.

In [3]:
grd

Unnamed: 0,Graphene_percentage,FEED,RPM,DOC,MRR_gm_per_sec,Ra,Unnamed: 6
0,0.0,100,1000,0.10,0.012697,0.821300,
1,0.0,100,1000,0.15,0.020327,1.065450,
2,0.0,100,1000,0.20,0.031002,0.751600,
3,0.0,100,2000,0.10,0.012720,1.520600,
4,0.0,100,2000,0.15,0.019914,1.502150,
...,...,...,...,...,...,...,...
130,3.0,200,2000,0.15,0.041111,1.747625,
131,3.0,200,2000,0.20,0.050602,1.736600,
132,3.0,200,3000,0.10,0.021910,3.272950,
133,3.0,200,3000,0.15,0.038930,3.283325,


The table below shows the variable names of the datset and their corresponding full/actual names.

  Variable name in dataset   | Full name of the variable
  ---------|----------------
  Graphene_percentage | Percentage of Graphene     
  FEED | Feed value of the machine
  RPM | Rotations per Minute
  DOC | Depth of Curve
  MRR_gm_per_sec | Material Removal Rate(in gram/second)
  Ra | Surface Roughness

Following cell shows the correlation coefficients among all the variables in the dataset.

In [4]:
grd.corr()

Unnamed: 0,Graphene_percentage,FEED,RPM,DOC,MRR_gm_per_sec,Ra,Unnamed: 6
Graphene_percentage,1.0,-7.684898e-16,-2.547268e-16,1.613178e-16,-0.040645,0.015693,
FEED,-7.684898e-16,1.0,-4.203887e-17,-5.329071000000001e-17,0.590197,0.127211,
RPM,-2.547268e-16,-4.203887e-17,1.0,-2.0526790000000003e-17,0.022629,0.846449,
DOC,1.613178e-16,-5.329071000000001e-17,-2.0526790000000003e-17,1.0,0.716658,-0.015264,
MRR_gm_per_sec,-0.04064458,0.5901971,0.02262917,0.7166582,1.0,0.113602,
Ra,0.01569263,0.1272111,0.8464493,-0.01526407,0.113602,1.0,
Unnamed: 6,,,,,,,


The cell given below only shows the correlation coefficients which are larger than 0.1 and makes the rest of the coefficients 0.
This table is shown to view only the statistically signifiacant correlations.

In [5]:
grd.corr()*(grd.corr()>=0.1).astype('float')

Unnamed: 0,Graphene_percentage,FEED,RPM,DOC,MRR_gm_per_sec,Ra,Unnamed: 6
Graphene_percentage,1.0,-0.0,-0.0,0.0,-0.0,0.0,
FEED,-0.0,1.0,-0.0,-0.0,0.590197,0.127211,
RPM,-0.0,-0.0,1.0,-0.0,0.0,0.846449,
DOC,0.0,-0.0,-0.0,1.0,0.716658,-0.0,
MRR_gm_per_sec,-0.0,0.590197,0.0,0.716658,1.0,0.113602,
Ra,0.0,0.127211,0.846449,-0.0,0.113602,1.0,
Unnamed: 6,,,,,,,


It can be seen that `DOC` and `MRR_gm_per_sec` pair has the highest correlation followed by `FEED` and `MRR_gm_per_sec` pair and `FEED` and `Ra` pair.

It also shows that **graphene percentage** has no linear relation with either **material removal rate**
or **surface roughness**.

In [6]:
grd.columns[:4]

Index(['Graphene_percentage', 'FEED', 'RPM', 'DOC'], dtype='object')

Using the above 4 variables **Material removal rate** and **Surface roughness** will be predicted by machine learning models in the next notebooks.