# An analysis of the 2015 NET December results


We get our info from [the PDF here](http://cbsenet.nic.in/CMS/Handler/FileHandler.ashx?i=File&ii=7&iii=Y) which states **Complete Result of candidates qualified UGC NET December 2015**.
To get started we import some tools.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
%pylab inline

# Seeing the Data
The data is in PDF form so some cleaning is required to cet it into csv format.
After that is done, we load it in to the notebook and see what columns need a bit of change.

In [None]:
df = pd.read_csv('net.csv')
df.drop(['SNo', 'RollNo'], 1, inplace=True)
df.Sex = df.Sex == 'M'

# We calculate percentages
df['P1p'] = df.P1 / 100
df['P2p'] = df.P2 / 100
df['P3p'] = df.P3 / 150
df['Gtp'] = df.Gtotal / 350

df.info()

In [None]:
plt.figure(figsize=(15, 5))
plt.subplot(121)
df.Sex.hist()
plt.subplot(122)
sns.heatmap(pd.crosstab(df.Sex, df.Status).apply(lambda x:x/x.sum(), axis=1), annot=True)

- So now we know that there are more males sitting for the exams than there are females.
- Also that females have a slightly higher success rate.

In [None]:
mapp = {'GEN': 0,
        'OBC': 1,
        'SC': 2,
        'ST': 3,
       }
plt.figure(figsize=(15, 5))
plt.subplot(121)
df.Caty.map(mapp).hist()
plt.subplot(122)
sns.heatmap(pd.crosstab(df.Caty, df.Status).apply(lambda x:x/x.sum(), axis=1), annot=True)

- OBC category sees highest number of applications.
- It also enjoys the lowest JRF allocation rate.

In [None]:
def plot_marks_by_x(df, x):
    plt.figure(figsize=(15, 10))
    uq = [i for i in df[x].unique()]
    uq.sort()
    colors = ['red', 'green', 'blue', 'yellow', 'black', 'white']
    print(list(zip(uq, colors)))
    for i, c in enumerate(['P1p', 'P2p', 'P3p', 'Gtp']):
        plt.subplot(2,2,i+1)
        for sb, col in zip(uq, colors):
            df.loc[df[x] == sb, c].hist(alpha=0.5, color=col)
        plt.title(c)
plot_marks_by_x(df, 'Caty')

- Marks are distributed mostly the same for all Categories.
- The General category seems to have moved ahead in the mean compared to other reserved categories.
- We will check this in the next cell.

In [None]:
df['notReserved'] = df.Caty == 'GEN'
plot_marks_by_x(df, 'notReserved')

- As expected reservation and lower mean are correlated. Not a causal relation though.

In [None]:
plot_marks_by_x(df, 'Sex')

Sex does not seem to have an effect on hwo people perform on the exams.

# Now for Computer Science

In [None]:
cdf = df.loc[df.Subect == 87]
plt.figure(figsize=(15, 5))
plt.subplot(121)
cdf.Sex.hist()
plt.subplot(122)
sns.heatmap(pd.crosstab(cdf.Sex, cdf.Status).apply(lambda x:x/x.sum(), axis=1), annot=True)

- Again males are more in number.
- Both males and females enjoy the same JRF conversion.

In [None]:
plot_marks_by_x(cdf, 'Caty')

- This is interesting as Paper1 and Paper2 are almost similar but when you aggregate them, GEN category pulls ahead.

In [None]:
plot_marks_by_x(cdf, 'Sex')

- Again as expected, Sex has no effect.

In [None]:
plot_marks_by_x(cdf, 'Status')

- This is misleading as it has all categories. (Different cutoffs for each category)

In [None]:
plot_marks_by_x(cdf.loc[cdf.notReserved], 'Status')

- Here we clearly see the demarcation in the Grand Total Percentage. you need at least 65% to qualify for JRF
- Let us see for other categories which are reserved. Due to a lack of number of candidates in the SC and ST categories we will club them.