**The Lok Sabha, or House of the People, is the lower house of India’s bicameral Parliament, with the upper house being the Rajya Sabha. Members of the Lok Sabha are elected by an adult universal suffrage and a first-past-the-post system to represent their respective constituencies, and they hold their seats for five years or until the body is dissolved by the President on the advice of the council of ministers. The house meets in the Lok Sabha Chambers of the Sansad Bhavan, New Delhi.**


**The maximum membership of the House allotted by the Constitution of India is 552 (Initially, in 1950, it was 500). Currently, the house has 543 seats which are made up by the election of up to 543 elected members and at a maximum. Between 1952 and 2020, 2 additional members of the Anglo-Indian community were also nominated by the President of India on the advice of the Government of India, which was abolished in January 2020 by the 104th Constitutional Amendment Act, 2019. The Lok Sabha has a seating capacity of 550.**


**A total of 131 seats (24.03%) are reserved for representatives of Scheduled Castes (84) and Scheduled Tribes (47). The quorum for the House is 10% of the total membership. The Lok Sabha, unless sooner dissolved, continues to operate for five years for time being from the date appointed for its first meeting. However, while a proclamation of emergency is in operation, this period may be extended by Parliament by law or decree.**

![](https://camo.githubusercontent.com/ad1e70dd399bb9d12db5cca1668d35bdac86caf84d8ebc529da37af577ed9485/68747470733a2f2f75706c6f61642e77696b696d656469612e6f72672f77696b6970656469612f636f6d6d6f6e732f7468756d622f632f63322f496e6469616e5f47656e6572616c5f456c656374696f6e5f323031392e7376672f38303070782d496e6469616e5f47656e6572616c5f456c656374696f6e5f323031392e7376672e706e67)

# **Importing the Libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# **Loading the Files**

In [None]:
df2 = pd.read_csv('../input/indian-candidates-for-general-election-2019/LS_2.0.csv')

# **Displaying the Data**

In [None]:
df2.head()

In [None]:
df2.rename(columns={'CRIMINAL\nCASES': 'criminal'}, inplace=True)

In [None]:
df2.shape

# **Information about all the collumns in the Dataset**

In [None]:
df2.info()

This Dataset is based on the Lok Sabha 2019 in India. There are a total of 2263 rows and 19 columns in this dataset. By using this dataset this data analysis project is created.

# **Description of Dataset**

In [None]:
df2.describe()

# **Corelation between the Data**

In [None]:
df2.corr()

# **Checking the Null Value in the Dataset**

In [None]:
df2.isnull().values.any()

# **Crime count**

Here we are counting the total crimes done by an MP in an individual states in India.

In [None]:
df2['criminal'].value_counts()

In [None]:
df2['criminal'] = df2['criminal'].replace(['Not Available'],'0')
df2['criminal'] = pd.to_numeric(df2['criminal'] , errors='coerce')
df2['criminal'].value_counts()
df2['criminal'].isna()

# **Here removing the null values from the collumn criminal in the dataset.**

In [None]:
df2['criminal'].isnull().sum().sum()

# **Here displaying the data again.**

In [None]:
df2.head()

# **Barplot of crime Count in different states**

 Here we have created a barplot of crime count in different states in India.  

In [None]:
#Using Seaborn's CountPlot with figure size 10 * 6
plt.figure(figsize=(18,6))
sns.countplot(x='criminal',data=df2);

From the description given below, we can see that the mean of the crime among contestants is 1.45 where as the minimum crime,25% and 50% contestants did not make any crime but sadly in 75 % of total candidates the crime rate became 1.0. More surprisingly the maximum crime conceived by a person is 240,that's huge.

In [None]:
df2['criminal'].describe()

# **The Educational Qualification of the Candiates:**

We can see that the number of post graduate candidates in India is maximum(officially). So this is a positive site from the educatuional point of view.

In [None]:
df2.EDUCATION.value_counts()

In [None]:
df2['EDUCATION'] = df2['EDUCATION'].replace(['Not Available','Others'],'Illiterate')
df2['EDUCATION'] = df2['EDUCATION'].replace(['Post Graduate\n'],'Post Graduate')
#df2['criminal'] = pd.to_numeric(df2['criminal'] , errors='coerce')
df2['EDUCATION'].value_counts()
#df2['criminal'].isna()

# **Educational Qualification Count Graph**


After analysing the graph, we can see that there is two collumn of class VIII pass and class V pass.But we belive the minimum qualification to be called as literate is X pass.So we convert all V pass and VIII cadidates as illiterate.

In [None]:
#Using Seaborn's CountPlot with figure size 10 * 6
plt.figure(figsize=(20,6))
sns.countplot(x='EDUCATION',data=df2);

In [None]:
df2['EDUCATION'] = df2['EDUCATION'].replace(['5th Pass','8th Pass'],'Illiterate')
df2['EDUCATION'].value_counts()

In [None]:
#Using Seaborn's CountPlot with figure size 10 * 6
plt.figure(figsize=(20,6))
sns.countplot(x='EDUCATION',data=df2);

# **Education vs Criminal Barplot**

In [None]:
import seaborn as sns
sns.set_theme(style="whitegrid")
plt.figure(figsize=(20,6))
ax = sns.barplot(x="EDUCATION", y="criminal", data=df2)

We can analyze from the graph that Graduate and 12 th Pass criminal candidates are maximum. Specially we want to mention tha that a single graduate person has done 240 crimes.

In [None]:
cn1= int (0)
cn2= int (0)
for i in df2['GENDER']:
  if i=='MALE':
    cn1+=1
  elif i=='FEMALE':
    cn2+=1
print(cn1)
print(cn2)

# **pie chart of Male vs Female candidates**


From the pie chart we can see that the the number of male candidates is greater than the number of female candidates.

In [None]:
y = np.array([cn1,cn2])
mylabels = ["MALE","FEMALE"]
plt.pie(y, labels = mylabels, startangle = 90)
plt.show() 

From this pie chart, we can see that the number of male candidates is much greater than the number of female candidates.

# **State wise Candidates with Crime Cases**


 The number of candidates with crime cases are maximum in Bihar,Kerala,Maharashtra,West Bengal,Uttar Pradesh states. 

In [None]:
state_criminal = df2.groupby('STATE')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['STATE'])

state_criminal_winner = df2[df2['WINNER']>0].groupby('STATE')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['STATE'])
state_criminal

In [None]:
# 2 Barplot Side by Side
fig, axes = plt.subplots(1, 2, figsize=(20, 8))

# Passing X axis and Y axis along with subplot position
sns.barplot(x = state_criminal.index , y = state_criminal['criminal'] , ax=axes[0] , palette='YlOrBr');
axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly
axes[0].set_title('STATE WISE CRIMINAL CASE OF CONTESTANTS');

#We can also change the color of the barplots by giving different palletes
sns.barplot(x = state_criminal_winner.index , y = state_criminal_winner['criminal'] , ax=axes[1] , palette='viridis');
axes[1].set_title('STATE WISE CRIMINAL CASE OF WINNERS');
plt.xticks(rotation=45);

Here we can see the crime case across the states of candidates and winners. Here the maximum height of bargraph is showing in the state Kerala.

In [None]:
cn1= int (0)
cn2= int (0)
cn3= int (0)
cn4= int (0)
for i in df2['CATEGORY']:
  if i=='SC':
    cn1+=1
  elif i=='ST':
    cn2+=1
  elif i=='GENERAL':
    cn3+=1
  else:
    cn4+=1
print(cn1)
print(cn2)
print(cn3)
print(cn4)

# **Barplot of category Growth:**


Here we calulating the number of SC,ST and GENERAL candidates in the loksabha election 2019.

In [None]:
consumption = ['SC','ST','GENERAL','OTHERS']
growth = [cn1,cn2,cn3,cn4]
  

# Create a pandas dataframe
df = pd.DataFrame({"consumption": consumption,
                   "growth": growth})
df_sorted_desc= df.sort_values('growth',ascending=False)
plt.figure(figsize=(14,10))
# make bar plot with matplotlib
plt.bar('consumption', 'growth',data=df_sorted_desc,color ='blue',
        width = 0.4)
plt.xlabel("Category", size=15)
plt.ylabel("growth", size=15)
plt.title("Barplot of Category in the Loksabha Election Candidates", size=18)

From the graph we can see that the number of general cadidates are maximum in India.

# **Barplot of Candidate Allocation in Loksabha Election 2019**


Counting the total number of allocation of candidates for different parties in different constituncies in India.

In [None]:
cn1= int (0)
cn2= int (0)
cn3= int (0)
cn4= int (0)
cn5= int (0)
cn6= int (0)
for i in df2['PARTY']:
  if i=='BJP':
    cn1+=1
  elif i=='INC':
    cn2+=1
  elif i=='NOTA':
    cn3+=1
  elif i=='IND':
    cn4+=1
  elif i=='BSP':
    cn5+=1
  else:
    cn6+=1
#cn1+=cn6
print(cn1)
print(cn2)
print(cn3)
print(cn4)
print(cn5)
print(cn6)

In [None]:
consumption = ['BJP','INC','NOTA','IND','BSP']
growth = [cn1,cn2,cn3,cn4,cn5]
  

# Create a pandas dataframe
df = pd.DataFrame({"consumption": consumption,
                   "growth": growth})
df_sorted_desc= df.sort_values('growth',ascending=False)
plt.figure(figsize=(14,10))
# make bar plot with matplotlib
plt.bar('consumption', 'growth',data=df_sorted_desc,color ='orange',
        width = 0.4)
plt.xlabel("party Name", size=15)
plt.ylabel("Total Candidates", size=15)
plt.title("Barplot of Candidate Allocation in Loksabha Election 2019", size=18)

 The bjp candidates are maximum.

In [None]:
df2['criminal'] = pd.to_numeric(df2['criminal'] , errors='coerce')

# **Bar Plot of Party vs Candidates with Crime Case:**

Here we are calculating the criminal case candidates in different parties. From that knowledge we can aware of the criminal cases of the different parties.


In [None]:
party_criminal_winner = df2[df2['criminal']>0].groupby('PARTY')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['PARTY'])
party_winner = df2[(df2['criminal']>0) & (df2['WINNER']>0)].groupby('PARTY')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['PARTY'])

party_winner

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(20, 8))

# Passing X axis and Y axis along with subplot position
sns.barplot(x = party_criminal_winner.index , y = party_criminal_winner['criminal'] , ax=axes[0] , palette='icefire');
axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly
axes[0].set_title('PARTY WISE CRIMINAL CASE OF CONTESTANTS');

#We can also change the color of the barplots by giving different palletes
sns.barplot(x = party_winner.index , y = party_winner['criminal'] , ax=axes[1] , palette='viridis');
axes[1].set_title('PARTY WISE CRIMINAL CASE OF WINNERS');
plt.xticks(rotation=45);

From the above diagram we can see that the bjp and congress parties have maximum number of criminal cases in India. This is because of that, these two parties are all India based where as most of the other parties are regional parties.

In [None]:
age_criminal = df2[df2['criminal']>0].groupby('AGE')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['AGE'])
age_criminal

In [None]:
# 2 Barplot Side by Side
#fig, axes = plt.subplots(1, 2, figsize=(20, 8))
plt.figure(figsize=(14,10))
# Passing X axis and Y axis along with subplot position
sns.barplot(x = age_criminal.index , y = age_criminal['criminal'] , palette='icefire');
#axes[0].tick_params(axis='x' , rotation=45); #changing the X axis poition to read more clearly
#axes[0].set_title('AGE WISE CRIMINAL CASE OF CONTESTANTS');

In [None]:
total_voter1 = df2[df2['TOTAL\nVOTES']>0].groupby('STATE')[['TOTAL\nVOTES']].sum().sort_values(by=
                        ['TOTAL\nVOTES']).tail(15).sort_values(by=['STATE'])

total_voter1

In [None]:
plt.figure(figsize=(25,10))
# Passing X axis and Y axis along with subplot position
sns.barplot(x = total_voter1.index , y = total_voter1['TOTAL\nVOTES'] , palette='icefire');

From the graph, we can notice that the criminal cases history is maximum at the age of 49,37, and 51.

In [None]:
fm = df2.groupby(['GENDER','WINNER'])[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['GENDER'])

fm

In [None]:
party_winner1 = df2[(df2['criminal']>0) & (df2['WINNER']>0)].groupby('GENDER')[['criminal']].sum().sort_values(by=
                        ['criminal']).tail(15).sort_values(by=['GENDER'])
party_winner1

In [None]:
plt.figure(figsize=(9,4))
# Passing X axis and Y axis along with subplot position
sns.barplot(x = party_winner1.index , y = party_winner1['criminal']  , palette='icefire');

This is the bar plot of gender vs Crime from which we can know that the number of female candidates is maximum or the number of male candidates is maximum in India.

In [None]:
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [None]:
fig = px.scatter(df2, 
                 x='AGE', 
                 y='criminal', 
                 color='WINNER', 
                 opacity=0.8, 
                 hover_data=['GENDER','CATEGORY','STATE','PARTY','NAME','EDUCATION'], 
                 title='Age vs Crime vs Winner vs Gender vs Category vs State vs Party vs Education')
fig.update_traces(marker_size=5)
fig.show()

In [None]:
df=df2[df2.EDUCATION=='Doctorate']
df.shape

In [None]:
df=df[df.WINNER==1]
df.shape

In [None]:
df1=df[(df.PARTY=='BJP') & (df.WINNER==1)]
df1.shape

In [None]:
Female_winners = df2[(df2['WINNER']==1) & (df2['GENDER']=='FEMALE')]
ax = px.histogram(Female_winners, 'STATE', title = 'Female Winners from different States',width=1150,height=700)
ax.show()

In [None]:
fig = px.violin(df2, 
                 x='AGE', 
                 y='criminal', 
                 color='WINNER', 
                 
                 hover_data=['GENDER','CATEGORY','STATE','PARTY','NAME','EDUCATION'], 
                 title='Age vs Crime vs Winner vs Gender vs Category vs State vs Party vs Education')
fig.update_traces(marker_size=5)
fig.show()

In [None]:
fig = px.scatter(df2, 
                 x="AGE", 
                 y="EDUCATION", 
                 animation_frame="STATE", 
                 animation_group="PARTY",
#                  size="pop",     
                 color="PARTY", 
                 hover_name="CONSTITUENCY",
                 log_x=True, 
                 size_max=80, 
                 range_x=[20,90], 
                 range_y=[0,7])

fig.show()

# Blog:

Check out my Medium article ,where I have explained in detail.

https://medium.com/nerd-for-tech/exploratory-data-analysis-of-lok-sabha-election-2019-in-india-f73762268bd8

# Github Link of prjoect:

https://github.com/soham2707/Exploratory-Data-Analysis-of-Lok-Sabha-Election-2019-in-India..git

# Kindly upvote if you like it