In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
data = pd.read_csv("Campus Placement.csv")
data.head()

Unnamed: 0,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


## Column Description

-   gender
-   ssc_p -> 10th class percentage
-   ssc_b -> 10th class board
-   hsc_p -> 12th class percentage
-   hsc_b -> 12th class board
-   hsc_s -> 12th class branch
-   degree_p -> undergraduate degree percentage
-   degree_t -> degree branch
-   workex -> work experience
-   etest_p -> Entrance test percentage
-   specialisation -> MBA branch
-   mba_p -> MBA percentage
-   status -> placement status
-   salary -> quoted salary

## Univariate Analysis

### Gender

gender
M    139
F     76
- The ratio between the category [Male,Female] is slightly imbalanced, because the difference betwee them is 63.

### ssc_p(10th percentage)

**DIST PLOT**
> - The students with above 40 in their ssc score, are only given admission to the college.

**MEAN AND MEDIAN PLOTING**
- According to the above graphs 
    - students scoring below the mean(67) are **less** when compared to students scoring above the mean(67)
- majority of the students scored above the mean(67)
> - The majority of the group are with good score students(:: 2nd graph, the are of the region between median and 75 percentile is significantly high) 
<br><br/>
> - The center value of the graph(x axis) is supposed to be the **mean, median, mode and peak** for a normal distribution graph but here the **mode and peak** are almost at the center but **mean and median** are drifted towards right, which means that the values are dense over the right side

min:           40.89
max:           89.4
center value:  65.14500000000001
mean:          67.30339534883721
median:        67.0
- The last point of the above cell is verified numerically
    - mean and median are almost the same and the center value is less than the mean 

median:  67.0
mean:  67.30339534883721
- this is a indication that the values are directed towards a normal distribution
    - Because, the mean and median are almost same.

skewedness =  -0.13
kurtosis =  2.38
#### Norms for normal distribution
- skewedness = 0
- kurtosis = 3
#### Our graph result
- As -0.13 < 0, right skewed
- as 2.38 < 3, flat peak and thick tails

**BOX PLOT**
    - As we already said, the graph seemed to be in normal distribution, so the are no outliers

### ssc_b(10th board)

ssc_b
Central    116
Others      99
- The ratio between the category is comparitively acceptable and balanced, because the difference betwee them is only 17

### hsc_p(12th percentage)

**DISTPLOT**
> - The reason for the peak is the sudden spike values around 60 to 65

**MEAN AND MEDIAN PLOTING**
- The frequency around the peak seem to be high. i.e..around 55 to 77 
- The standard deviation seem to be very high, let's check them numerically

std:  10.89750915750298
- For a normal distribution, within 3 standard deviations after the mean, the graph should get over. So, let's calculate them

99.02569026320663
97.7
- The values are under standard dstribution only

median:  65.0
mean:  66.33316279069768
- mean and median seem to have a slight difference, which indicates the absence of normal distribution

min:           37.0
max:           97.7
center value:  67.35
mean:          66.33316279069768
median:        65.0

skewedness =  0.16
kurtosis =  3.41
#### Norms for normal distribution
- skewedness = 0
- kurtosis = 3
#### Our graph result
- As 0.16 > 0, left skewed (slightly).
- As 3.41 > 3, sharp peak and thin tails.

**BOX PLOT**
- There are outliers due to the peak in the graph
    - we shall check the values of the peak and outliers

#### First, we'll check the peak
- now the limits are obtained [ 60, 67], let's check the table.
> - Around 40% of the records lie in the interval of 60 and 67(which is just a count of consecutive 7 integers out of 100).
-This is the reason for the peak.

#### Now, we'll check the values of outliers

- The indexes of the outliers are;
> - [ 42, 49, 120, 169, 206, 24, 134, 177]
- Observing ths table with respect to the status,
> - Students below the **lower bound are not placed** and **above the upper bound are placed**.

### hsc_b(12th board)

hsc_b
- Others     131
- Central     84

- The difference between the 12th board is 47.

### hsc_s(12th branch)

hsc_s
- Commerce    113
- Science      91
- Arts         11

### degree_p(underGraduate degree percentage)

**MEAN MEDIAN PLOTING**

By seeing both the graphs, the gray line, which points the center point of the number line(x-axis) has shifted towards the right.
- In other words the graph has shifted towards the left, which is because of the occupancy of values from 89 to 100 by the flat tail of the graph
For instance, if the tail was'nt there, then the graph would have been symmetric and the center value(gray line) would been at the peak. 

Conclusively, it can be stated that,
> ##### The values at the end might be an outlier(entry made by mistake) or the students might be inteligent to lie out of the box.

There are certain inferences that are directing the graph to be in normal distribution.
- Mean and Median are so close.
- Median almost touches the peak. and,
- The area around the 1st quartile region and 3rd quartile region seem to equal.
Also, we should concentrate on the sharp peak that the graph produces.

**DISTPLOT**

The reason for the sharp peak can be justified with this graph,
- The values around 65 in degree_p are solely responsible for the sharp peak.

Also,

The assumed presence of outlier can be confirmed here with the graph,
- We're able to see a small spike at 90 in degree_p, which is the reason for the long tail.

and finally,

We, can also see that, the graph starts growing only from the score of 50%.
- This can be because of the pass score set by the undergraduate university. That is, those who scored below 50% are considered to not have completed their degree yet.

A subtle note here,
- the graph has a small minor bump both before the peak and after the peak.
- There is possibility that this could be differenciated with groups.

**degree percentage > 85**
> Let's have a note at this record, being a credible outlier or a mistake entry.

- Finally we got the limits of the peak [ 63, 67].
> Around 32% of the records lie in the interval of 63 and 67(which is just a count of consecutive 4 integers out of 100).
- This is the reason for the peak

- Standard Deviation
    - 7.35874328733944

The max obtained is 91 and the graph gets over by 88.44 as per standard deviation rules
- which also drives the graph towards a normal distribution

min:           50.0
max:           91.0
center value:  70.5
mean:          66.37018604651163
median:        66.0
mode:        0    65.0

Here are two justifications that can be made with respect to the earlier statements
- The minimum score is 50
- mean and median are so close

skewedness =  0.24
kurtosis =  3.02
#### Norms for normal distribution
- skewedness = 0
- kurtosis = 3
#### Our graph result
- As 0.24 > 0, left skewed
- as 3.02 >= 3, slight sharp peak and thin tails

**BOX PLOT**
- Like we assumed earlier, the entry(91%) made might be an outlier or a brillient student

### degree_t(underGraduate degree title)

degree_t
- Comm&Mgmt    145
- Sci&Tech      59
- Others        11

### workex(Work Experience)

workex
- No     141
- Yes     74

### etest_p(Entrance Test)

**MEAN MEDIAN PLOTING**

By, seeing the chart, its obvious that this is not in normal distribution.

- The graph spikes up around 50 and falls down after 95. and,
    - from 50 to 95 there is gradual declination
    
- We also see, small minor bumps after the peak, which might be because of some groups.
    - we'll check them later
    
Fortunately, the mean and median are close to each other.

First graph:
- The standard deviation seem to be high, such that 3 times the std goes beyond the graph after or before the mean. Which is clearly not a sign for it to be in normal distribution.

Second graph:
- It seems that the group before the 2nd quartile(median) is large than the group after the median(2nd quartile).

**DIST PLOT**

From here, we can confirm that the entrance test scores strictly start from 50% only.

- This might be because of the MBA institute, which was giving admission to students based on cutoff(50%) of the entrance test.

Speaking about the difficulty of the test when compared to other exams(ssc, hsc, degree) that they have completed, this test seem to be the easiest among them.

- This is because, The graph's relegation should have come to the end(100) smoothly, whereas it shows a sudden fall after ~98.
    - Which means there are a lot of students who scored more than 80 or 90 percentage, who had almost a normal distribution graph for thier other exam scores.
    
To conclude,

> - 50% percentage is the minimum cutoff set for the admissions for MBA
> - The Entrance test is the easiest among other exams(ssc, hsc, degree) that they have taken earlier

min:           50.0
max:           98.0
center value:  74.0
mean:          72.10055813953488
median:        71.0
mode:        0    60.0

skewedness =  0.28
kurtosis =  1.91
#### Norms for normal distribution
- skewedness = 0
- kurtosis = 3
#### Our graph result
- As 0.28 > 0, left skewed
- as 1.91 << 3, very flat peak and thick tails

### mba_p(MBA degree percentage)

**MEAN MEDIAN PLOTING**

First of all, the graph is so clear and seem to be in normal distribution.
- The center value(gray line) has well shifted towards the right, which means the graph is very well left shifted(skewed).

Observing both the graphs, we can make a general inference

- The graph mostly starts at 50%, goes all the way to the peak at 60% and relegates smoothly and ends at 80%. which can be explained as,
    - **The postgraduate university might also have set the pass percentage as 50%** like undergraduate degree scores and entrance test scores. and,
    - **This exam is expected to be the most difficult exams of a students education chronology** because, the graph almost ends before 80% itself, which means no student has crossed the score of ~80%.
    
First graph:

- The mean is almost pointing to the peak, which is a positive sign for the graph to be in normal distribution.
- The standard deviation is also expected to be apt for the graph. we'll check the numerically too.

Second graph:

- Both 2nd and 3rd quartile regions seem to be equal.
Both mean and median are close

**DIST PLOT**

The subjects which were uncertainly discussed, can be supported with evidence here,
- The minimum value infact starts after 50% and the maximum ends before 80%.

- Standard Deviation
    - 5.833384580683801

The max obtained is 77.89 and the graph gets over by 79.77(as per standard deviation rules).
- which gives a reason for the graph to be in normal distribution

min:           51.21
max:           77.89
center value:  64.55
mean:          62.278186046511635
median:        62.0
mode:        0    56.7


- Mean and Median are close.
- Minimum and Maximum are as expected.

skewedness =  0.31
kurtosis =  2.51
#### Norms for normal distribution
- skewedness = 0
- kurtosis = 3
#### Our graph result
- As 0.31 > 0, left skewed
- As 2.51 < 3, slight flat peak and thick tails

### mba_t(MBA degree title)

mba_t
- Mkt&Fin    120
- Mkt&HR      95