# How the world's biggest democracy voted in 2019!

# Introduction:

The Constitution of India, which establishes the division of power between the federal government and the states, governs the nation of India, which comprises states and union territories. The Election Commission is a federal organization established in accordance with the Constitution and is in charge of overseeing and regulating all Indian election procedures. This organization is in charge of ensuring impartial, fair, and free elections.
The lower house of India's parliament, Lok Sabha, or the House of the People, is chosen by adult Indian citizens from a field of candidates who run in their individual seats. The analysis presented here is based on the Lok Sabha election (The General Elections) for the year 2019. Only their constituency is open to adult citizens of India. Winners of the Lok Sabha elections are referred to as "Members of Parliament," and they retain their positions for five years or until the President, acting on the recommendation of the cabinet, dissolves the body. On issues pertaining to the creation of new laws, removing or improving the existing laws that affect all citizens of India. The house meets in the Lok Sabha Chambers of the Sansad Bhavan in New Delhi. 

### Objective:

Using the data of 2019 Loksabha Elections, we aim to perform data cleaning and data wrangling tasks on it. We will also analyze the data for exploratory data analysis using matplotlib , seaborn,plotly,etc.
Our end goal is to predict the winner of the elections using EDA and ML prediction algorithms.

### About the Dataset:

The dataset consists of 2263 rows and 19 columns.

It has 9 “object” type columns, 3 “float” type and 7 “integer” type columns. There are columns like:
1) State- State from where candidate contested the election

2) Constituency- Constituency from where candidate contested the election

3) Name of candidate

4) Winner - this column is giving the result in boolean form if the candidate wins it shows 1 else it shows 0

5) Party- Name of the party of the candidate.

6) Symbol- respective party symbol

7) Gender

8) Criminal Cases- shows number of criminal cases(if any) against the candidate

9) Age- age of the candidate

10) Category- Gives the category such as general,st,sc,etc.

11) Education- Shows the education level of the candidate.

12) Assets- Contains the amount for the assets owned by the candidate.

13) Liabilities-  Contains the amount for the liabilities owned by the candidate.

14) Votes- Total number of votes casted.


## Read and Clean
How about we follow the 5 stage information science process - assemble, survey, clean, break down, imagine, and model. In this specific segment, we'll manage the initial three stages. We'll initially stack our information, check specific credits like number of lines, sections, kinds of factors, etc. After this, we'll search for missing qualities and tidy up any qualities which are not addressed accurately.

## Data Preprocessing


## Importing required libraries and modules. 

In [2]:
import pandas as pd
from sklearn.impute import KNNImputer


## Reading the dataset !

In [3]:
df=pd.read_csv("LS_2.0.csv")

In [4]:
df

Unnamed: 0,STATE,CONSTITUENCY,NAME,WINNER,PARTY,SYMBOL,GENDER,CRIMINAL\nCASES,AGE,CATEGORY,EDUCATION,ASSETS,LIABILITIES,GENERAL\nVOTES,POSTAL\nVOTES,TOTAL\nVOTES,OVER TOTAL ELECTORS \nIN CONSTITUENCY,OVER TOTAL VOTES POLLED \nIN CONSTITUENCY,TOTAL ELECTORS
0,Telangana,ADILABAD,SOYAM BAPU RAO,1,BJP,Lotus,MALE,52,52.0,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+",376892,482,377374,25.330684,35.468248,1489790
1,Telangana,ADILABAD,Godam Nagesh,0,TRS,Car,MALE,0,54.0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+",318665,149,318814,21.399929,29.964370,1489790
2,Telangana,ADILABAD,RATHOD RAMESH,0,INC,Hand,MALE,3,52.0,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+",314057,181,314238,21.092771,29.534285,1489790
3,Telangana,ADILABAD,NOTA,0,NOTA,,,,,,,,,13030,6,13036,0.875023,1.225214,1489790
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,1,BJP,Lotus,MALE,5,58.0,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+",644459,2416,646875,33.383823,56.464615,1937690
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,0,IND,SHIP,MALE,0,43.0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+",14661,25,14686,0.766419,1.250060,1916185
2259,Telangana,ZAHIRABAD,B.B.PATIL,1,TRS,Car,MALE,18,63.0,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+",434066,178,434244,28.975369,41.574183,1498666
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,0,INC,Hand,MALE,0,49.0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~,427900,115,428015,28.559732,40.977823,1498666
2261,Telangana,ZAHIRABAD,BANALA LAXMA REDDY,0,BJP,Lotus,MALE,3,47.0,GENERAL,12th Pass,"Rs 5,85,77,327\n ~ 5 Crore+","Rs 52,50,000\n ~ 52 Lacs+",138731,216,138947,9.271379,13.302678,1498666


## Checking for the shape of dataframe

In [5]:
df.shape

(2263, 19)

## Describing the dataset

In [9]:
df.describe()

Unnamed: 0,WINNER,AGE,GENERAL\nVOTES,POSTAL\nVOTES,TOTAL\nVOTES,OVER TOTAL ELECTORS \nIN CONSTITUENCY,OVER TOTAL VOTES POLLED \nIN CONSTITUENCY,TOTAL ELECTORS
count,2263.0,2018.0,2263.0,2263.0,2263.0,2263.0,2263.0,2263.0
mean,0.238179,52.273538,261599.1,990.710561,262589.8,15.811412,23.190525,1658016.0
std,0.426064,11.869373,254990.6,1602.839174,255982.2,14.962861,21.564758,314518.7
min,0.0,25.0,1339.0,0.0,1342.0,0.097941,1.000039,55189.0
25%,0.0,43.25,21034.5,57.0,21162.5,1.296518,1.899502,1530014.0
50%,0.0,52.0,153934.0,316.0,154489.0,10.510553,16.221721,1679030.0
75%,0.0,61.0,485804.0,1385.0,487231.5,29.468185,42.590233,1816857.0
max,1.0,86.0,1066824.0,19367.0,1068569.0,51.951012,74.411856,3150313.0


## Getting some info about the dataframe

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2263 entries, 0 to 2262
Data columns (total 19 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   STATE                                     2263 non-null   object 
 1   CONSTITUENCY                              2263 non-null   object 
 2   NAME                                      2263 non-null   object 
 3   WINNER                                    2263 non-null   int64  
 4   PARTY                                     2263 non-null   object 
 5   SYMBOL                                    2018 non-null   object 
 6   GENDER                                    2018 non-null   object 
 7   CRIMINAL
CASES                            2018 non-null   object 
 8   AGE                                       2018 non-null   float64
 9   CATEGORY                                  2018 non-null   object 
 10  EDUCATION                           

## Renaming specific columns 

In [5]:
df.rename(columns={"CRIMINAL\nCASES": "CRIMINAL CASES", "GENERAL\nVOTES": "GENERAL VOTES", "POSTAL\nVOTES": "POSTAL VOTES","TOTAL\nVOTES": "TOTAL VOTES","OVER TOTAL ELECTORS \nIN CONSTITUENCY": "OVER TOTAL ELECTORS IN CONSTITUENCY","OVER TOTAL VOTES POLLED \nIN CONSTITUENCY": "OVER TOTAL VOTES POLLED IN CONSTITUENCY"}, inplace=True)
df.head()

Unnamed: 0,STATE,CONSTITUENCY,NAME,WINNER,PARTY,SYMBOL,GENDER,CRIMINAL CASES,AGE,CATEGORY,EDUCATION,ASSETS,LIABILITIES,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS
0,Telangana,ADILABAD,SOYAM BAPU RAO,1,BJP,Lotus,MALE,52.0,52.0,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+",376892,482,377374,25.330684,35.468248,1489790
1,Telangana,ADILABAD,Godam Nagesh,0,TRS,Car,MALE,0.0,54.0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+",318665,149,318814,21.399929,29.96437,1489790
2,Telangana,ADILABAD,RATHOD RAMESH,0,INC,Hand,MALE,3.0,52.0,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+",314057,181,314238,21.092771,29.534285,1489790
3,Telangana,ADILABAD,NOTA,0,NOTA,,,,,,,,,13030,6,13036,0.875023,1.225214,1489790
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,1,BJP,Lotus,MALE,5.0,58.0,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+",644459,2416,646875,33.383823,56.464615,1937690


## Checking for datatypes of columns 

In [6]:
df.dtypes

STATE                                       object
CONSTITUENCY                                object
NAME                                        object
WINNER                                       int64
PARTY                                       object
SYMBOL                                      object
GENDER                                      object
CRIMINAL CASES                              object
AGE                                        float64
CATEGORY                                    object
EDUCATION                                   object
ASSETS                                      object
LIABILITIES                                 object
GENERAL VOTES                                int64
POSTAL VOTES                                 int64
TOTAL VOTES                                  int64
OVER TOTAL ELECTORS IN CONSTITUENCY        float64
OVER TOTAL VOTES POLLED IN CONSTITUENCY    float64
TOTAL ELECTORS                               int64
dtype: object

## Cleaning the Dataset !

![OIP.jfif](attachment:OIP.jfif)

## Finding which column has null values

In [8]:
df.isna().sum()

STATE                                        0
CONSTITUENCY                                 0
NAME                                         0
WINNER                                       0
PARTY                                        0
SYMBOL                                     245
GENDER                                     245
CRIMINAL CASES                             245
AGE                                        245
CATEGORY                                   245
EDUCATION                                  245
ASSETS                                     245
LIABILITIES                                245
GENERAL VOTES                                0
POSTAL VOTES                                 0
TOTAL VOTES                                  0
OVER TOTAL ELECTORS IN CONSTITUENCY          0
OVER TOTAL VOTES POLLED IN CONSTITUENCY      0
TOTAL ELECTORS                               0
dtype: int64

## Finding the total number of null values present 

In [9]:
df.isna().sum().sum()

1960

In [10]:
df.EDUCATION.replace({'Post Graduate/n':'Post Graduate'},inplace=True)
df.EDUCATION.unique()

array(['12th Pass', 'Post Graduate', nan, 'Doctorate', 'Graduate',
       'Others', '10th Pass', '8th Pass', 'Graduate Professional',
       'Literate', 'Illiterate', '5th Pass', 'Not Available',
       'Post Graduate\n'], dtype=object)

## Mapping the column GENDER to fill NaN values 

In [11]:
df["GENDER_map"]=df.GENDER.map({'MALE':0,'FEMALE':1})
df

Unnamed: 0,STATE,CONSTITUENCY,NAME,WINNER,PARTY,SYMBOL,GENDER,CRIMINAL CASES,AGE,CATEGORY,EDUCATION,ASSETS,LIABILITIES,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER_map
0,Telangana,ADILABAD,SOYAM BAPU RAO,1,BJP,Lotus,MALE,52,52.0,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+",376892,482,377374,25.330684,35.468248,1489790,0.0
1,Telangana,ADILABAD,Godam Nagesh,0,TRS,Car,MALE,0,54.0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+",318665,149,318814,21.399929,29.964370,1489790,0.0
2,Telangana,ADILABAD,RATHOD RAMESH,0,INC,Hand,MALE,3,52.0,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+",314057,181,314238,21.092771,29.534285,1489790,0.0
3,Telangana,ADILABAD,NOTA,0,NOTA,,,,,,,,,13030,6,13036,0.875023,1.225214,1489790,
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,1,BJP,Lotus,MALE,5,58.0,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+",644459,2416,646875,33.383823,56.464615,1937690,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,0,IND,SHIP,MALE,0,43.0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+",14661,25,14686,0.766419,1.250060,1916185,0.0
2259,Telangana,ZAHIRABAD,B.B.PATIL,1,TRS,Car,MALE,18,63.0,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+",434066,178,434244,28.975369,41.574183,1498666,0.0
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,0,INC,Hand,MALE,0,49.0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~,427900,115,428015,28.559732,40.977823,1498666,0.0
2261,Telangana,ZAHIRABAD,BANALA LAXMA REDDY,0,BJP,Lotus,MALE,3,47.0,GENERAL,12th Pass,"Rs 5,85,77,327\n ~ 5 Crore+","Rs 52,50,000\n ~ 52 Lacs+",138731,216,138947,9.271379,13.302678,1498666,0.0


## Dataframe consisting of GENDER_map and columns with float and int as dtypes

In [12]:
df2=df[["GENDER_map","WINNER","AGE","GENERAL VOTES","POSTAL VOTES","TOTAL VOTES","OVER TOTAL ELECTORS IN CONSTITUENCY","OVER TOTAL VOTES POLLED IN CONSTITUENCY","TOTAL ELECTORS"]]
df2

Unnamed: 0,GENDER_map,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS
0,0.0,1,52.0,376892,482,377374,25.330684,35.468248,1489790
1,0.0,0,54.0,318665,149,318814,21.399929,29.964370,1489790
2,0.0,0,52.0,314057,181,314238,21.092771,29.534285,1489790
3,,0,,13030,6,13036,0.875023,1.225214,1489790
4,0.0,1,58.0,644459,2416,646875,33.383823,56.464615,1937690
...,...,...,...,...,...,...,...,...,...
2258,0.0,0,43.0,14661,25,14686,0.766419,1.250060,1916185
2259,0.0,1,63.0,434066,178,434244,28.975369,41.574183,1498666
2260,0.0,0,49.0,427900,115,428015,28.559732,40.977823,1498666
2261,0.0,0,47.0,138731,216,138947,9.271379,13.302678,1498666


In [None]:
#USING fillna method , filling null values with mean

In [None]:
#df2.mean()
#df2.fillna(df2.mean)

## Using KNN Imputer to replace the NaN values 

WE WILL GET VALUES IN ARRAY FORMAT, WE'LL HAVE TO CONVERT IT INTO DATAFRAME

In [13]:
imputer=KNNImputer(n_neighbors=1)
After_imputation=imputer.fit_transform(df2)
After_imputation 

array([[0.00000000e+00, 1.00000000e+00, 5.20000000e+01, ...,
        2.53306842e+01, 3.54682479e+01, 1.48979000e+06],
       [0.00000000e+00, 0.00000000e+00, 5.40000000e+01, ...,
        2.13999288e+01, 2.99643695e+01, 1.48979000e+06],
       [0.00000000e+00, 0.00000000e+00, 5.20000000e+01, ...,
        2.10927715e+01, 2.95342851e+01, 1.48979000e+06],
       ...,
       [0.00000000e+00, 0.00000000e+00, 4.90000000e+01, ...,
        2.85597325e+01, 4.09778230e+01, 1.49866600e+06],
       [0.00000000e+00, 0.00000000e+00, 4.70000000e+01, ...,
        9.27137868e+00, 1.33026776e+01, 1.49866600e+06],
       [0.00000000e+00, 0.00000000e+00, 5.50000000e+01, ...,
        7.43327733e-01, 1.06653493e+00, 1.49866600e+06]])

In [14]:
col=df2.columns
col

Index(['GENDER_map', 'WINNER', 'AGE', 'GENERAL VOTES', 'POSTAL VOTES',
       'TOTAL VOTES', 'OVER TOTAL ELECTORS IN CONSTITUENCY',
       'OVER TOTAL VOTES POLLED IN CONSTITUENCY', 'TOTAL ELECTORS'],
      dtype='object')

## df3 consisting of 0 NaN values 

In [15]:
df3=pd.DataFrame(After_imputation,columns=col)
df3

Unnamed: 0,GENDER_map,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS
0,0.0,1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0
1,0.0,0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0
2,0.0,0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0
3,1.0,0.0,47.0,13030.0,6.0,13036.0,0.875023,1.225214,1489790.0
4,0.0,1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0
...,...,...,...,...,...,...,...,...,...
2258,0.0,0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0
2259,0.0,1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0
2260,0.0,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0
2261,0.0,0.0,47.0,138731.0,216.0,138947.0,9.271379,13.302678,1498666.0


In [16]:
df3.isna().sum().sum()

0

## Again mapping Gender column to get it in proper form. 

In [17]:
df3["GENDER"]=df3.GENDER_map.map({0:"MALE",1:"FEMALE"})
df3

Unnamed: 0,GENDER_map,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,0.0,1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,0.0,0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0,MALE
2,0.0,0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
3,1.0,0.0,47.0,13030.0,6.0,13036.0,0.875023,1.225214,1489790.0,FEMALE
4,0.0,1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
...,...,...,...,...,...,...,...,...,...,...
2258,0.0,0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0,MALE
2259,0.0,1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0,MALE
2260,0.0,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0,MALE
2261,0.0,0.0,47.0,138731.0,216.0,138947.0,9.271379,13.302678,1498666.0,MALE


## Dropping GENDER_map column which is not required now. 

In [18]:
df3.drop("GENDER_map",axis=1,inplace=True)

In [19]:
df3

Unnamed: 0,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0,MALE
2,0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
3,0.0,47.0,13030.0,6.0,13036.0,0.875023,1.225214,1489790.0,FEMALE
4,1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
...,...,...,...,...,...,...,...,...,...
2258,0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0,MALE
2259,1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0,MALE
2260,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0,MALE
2261,0.0,47.0,138731.0,216.0,138947.0,9.271379,13.302678,1498666.0,MALE


## Creating a dataframe consisting only "object" type columns

In [20]:
df_obj=df.select_dtypes(include=["object"])
df_obj

Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,GENDER,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,MALE,52,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+"
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,MALE,0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+"
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,MALE,3,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+"
3,Telangana,ADILABAD,NOTA,NOTA,,,,,,,
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,MALE,5,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+"
...,...,...,...,...,...,...,...,...,...,...,...
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,IND,SHIP,MALE,0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+"
2259,Telangana,ZAHIRABAD,B.B.PATIL,TRS,Car,MALE,18,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+"
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,INC,Hand,MALE,0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~
2261,Telangana,ZAHIRABAD,BANALA LAXMA REDDY,BJP,Lotus,MALE,3,GENERAL,12th Pass,"Rs 5,85,77,327\n ~ 5 Crore+","Rs 52,50,000\n ~ 52 Lacs+"


In [21]:
df_obj.drop("GENDER",axis=1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [22]:
df_obj

Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,52,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+"
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+"
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,3,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+"
3,Telangana,ADILABAD,NOTA,NOTA,,,,,,
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,5,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+"
...,...,...,...,...,...,...,...,...,...,...
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,IND,SHIP,0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+"
2259,Telangana,ZAHIRABAD,B.B.PATIL,TRS,Car,18,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+"
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,INC,Hand,0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~
2261,Telangana,ZAHIRABAD,BANALA LAXMA REDDY,BJP,Lotus,3,GENERAL,12th Pass,"Rs 5,85,77,327\n ~ 5 Crore+","Rs 52,50,000\n ~ 52 Lacs+"


## Creating new_df by joining df_obj and df3 using pd.concat

In [23]:
new_df=pd.concat([df_obj,df3],axis=1)
new_df

Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,52,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+",1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+",0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0,MALE
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,3,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+",0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
3,Telangana,ADILABAD,NOTA,NOTA,,,,,,,0.0,47.0,13030.0,6.0,13036.0,0.875023,1.225214,1489790.0,FEMALE
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,5,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+",1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,IND,SHIP,0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+",0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0,MALE
2259,Telangana,ZAHIRABAD,B.B.PATIL,TRS,Car,18,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+",1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0,MALE
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,INC,Hand,0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0,MALE
2261,Telangana,ZAHIRABAD,BANALA LAXMA REDDY,BJP,Lotus,3,GENERAL,12th Pass,"Rs 5,85,77,327\n ~ 5 Crore+","Rs 52,50,000\n ~ 52 Lacs+",0.0,47.0,138731.0,216.0,138947.0,9.271379,13.302678,1498666.0,MALE


## Removing all the records where PARTY is "NOTA". 

In [24]:
new_df=new_df[new_df['PARTY']!='NOTA']
new_df

Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,52,ST,12th Pass,"Rs 30,99,414\n ~ 30 Lacs+","Rs 2,31,450\n ~ 2 Lacs+",1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,0,ST,Post Graduate,"Rs 1,84,77,888\n ~ 1 Crore+","Rs 8,47,000\n ~ 8 Lacs+",0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0,MALE
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,3,ST,12th Pass,"Rs 3,64,91,000\n ~ 3 Crore+","Rs 1,53,00,000\n ~ 1 Crore+",0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,5,SC,Doctorate,"Rs 7,42,74,036\n ~ 7 Crore+","Rs 86,06,522\n ~ 86 Lacs+",1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
5,Uttar Pradesh,AGRA,Manoj Kumar Soni,BSP,Elephant,0,SC,Post Graduate,"Rs 13,37,84,385\n ~ 13 Crore+","Rs 2,22,51,891\n ~ 2 Crore+",0.0,47.0,434199.0,1130.0,435329.0,22.466390,37.999125,1937690.0,MALE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2257,Maharashtra,YAVATMAL-WASHIM,Vaishali Sudhakar Yede,PHJSP,Whistle,0,GENERAL,10th Pass,"Rs 11,68,500\n ~ 11 Lacs+","Rs 9,000\n ~ 9 Thou+",0.0,28.0,20563.0,57.0,20620.0,1.076097,1.755157,1916185.0,FEMALE
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,IND,SHIP,0,GENERAL,Post Graduate,"Rs 48,90,000\n ~ 48 Lacs+","Rs 10,20,000\n ~ 10 Lacs+",0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0,MALE
2259,Telangana,ZAHIRABAD,B.B.PATIL,TRS,Car,18,GENERAL,Graduate,"Rs 1,28,78,51,556\n ~ 128 Crore+","Rs 1,15,35,000\n ~ 1 Crore+",1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0,MALE
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,INC,Hand,0,GENERAL,Post Graduate,"Rs 90,36,63,001\n ~ 90 Crore+",Rs 0\n ~,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0,MALE


## defining a function to change the format of values in columns ASSETS and LIABILITIES. 

In [25]:
def value_cleaner(x):
    try:
        str_temp = (x.split('Rs')[1].split('\n')[0].strip())
        str_temp_2 = ''
        for i in str_temp.split(","):
            str_temp_2 = str_temp_2+i
        return str_temp_2
    except:
        x = 0
        return x
new_df['ASSETS'] = new_df['ASSETS'].apply((value_cleaner))
new_df['LIABILITIES'] = new_df['LIABILITIES'].apply((value_cleaner))
new_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['ASSETS'] = new_df['ASSETS'].apply((value_cleaner))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['LIABILITIES'] = new_df['LIABILITIES'].apply((value_cleaner))


Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,52,ST,12th Pass,3099414,231450,1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,0,ST,Post Graduate,18477888,847000,0.0,54.0,318665.0,149.0,318814.0,21.399929,29.96437,1489790.0,MALE
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,3,ST,12th Pass,36491000,15300000,0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,5,SC,Doctorate,74274036,8606522,1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
5,Uttar Pradesh,AGRA,Manoj Kumar Soni,BSP,Elephant,0,SC,Post Graduate,133784385,22251891,0.0,47.0,434199.0,1130.0,435329.0,22.46639,37.999125,1937690.0,MALE


## Creating a dataframe for "NOTA" records 

In [26]:
nota_df=df[df["PARTY"]=="NOTA"]
nota_df

Unnamed: 0,STATE,CONSTITUENCY,NAME,WINNER,PARTY,SYMBOL,GENDER,CRIMINAL CASES,AGE,CATEGORY,EDUCATION,ASSETS,LIABILITIES,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER_map
3,Telangana,ADILABAD,NOTA,0,NOTA,,,,,,,,,13030,6,13036,0.875023,1.225214,1489790,
14,Gujarat,AHMEDABAD WEST,NOTA,0,NOTA,,,,,,,,,14580,139,14719,0.895688,1.473030,1643317,
39,West Bengal,ALIPURDUARS,NOTA,0,NOTA,,,,,,,,,21147,28,21175,1.284592,1.533114,1648383,
46,Uttarakhand,ALMORA,NOTA,0,NOTA,,,,,,,,,15311,194,15505,1.158985,2.215611,1337808,
54,Andhra Pradesh,AMALAPURAM,NOTA,0,NOTA,,,,,,,,,16427,41,16468,1.128288,1.333044,1459556,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2225,Tamil Nadu,VIRUDHUNAGAR,NOTA,0,NOTA,,,,,,,,,17087,205,17292,1.165028,1.607174,1484256,
2230,Andhra Pradesh,VISAKHAPATNAM,NOTA,0,NOTA,,,,,,,,,16626,20,16646,0.909966,1.342505,1829300,
2235,Andhra Pradesh,VIZIANAGARAM,NOTA,0,NOTA,,,,,,,,,29468,33,29501,1.961529,2.413302,1503980,
2241,Telangana,WARANGAL,NOTA,0,NOTA,,,,,,,,,18764,37,18801,1.127990,1.770886,1666770,


In [27]:
new_df.shape

(2018, 19)

## Converting the clean dataframe into csv file.

In [29]:
new_df.to_csv("Dataset_for_EDA.csv")

## Converting the dataset of 'NOTA' into csv file.

In [30]:
nota_df.to_csv("Nota_dataset_for_EDA.csv")

In [28]:
new_df

Unnamed: 0,STATE,CONSTITUENCY,NAME,PARTY,SYMBOL,CRIMINAL CASES,CATEGORY,EDUCATION,ASSETS,LIABILITIES,WINNER,AGE,GENERAL VOTES,POSTAL VOTES,TOTAL VOTES,OVER TOTAL ELECTORS IN CONSTITUENCY,OVER TOTAL VOTES POLLED IN CONSTITUENCY,TOTAL ELECTORS,GENDER
0,Telangana,ADILABAD,SOYAM BAPU RAO,BJP,Lotus,52,ST,12th Pass,3099414,231450,1.0,52.0,376892.0,482.0,377374.0,25.330684,35.468248,1489790.0,MALE
1,Telangana,ADILABAD,Godam Nagesh,TRS,Car,0,ST,Post Graduate,18477888,847000,0.0,54.0,318665.0,149.0,318814.0,21.399929,29.964370,1489790.0,MALE
2,Telangana,ADILABAD,RATHOD RAMESH,INC,Hand,3,ST,12th Pass,36491000,15300000,0.0,52.0,314057.0,181.0,314238.0,21.092771,29.534285,1489790.0,MALE
4,Uttar Pradesh,AGRA,Satyapal Singh Baghel,BJP,Lotus,5,SC,Doctorate,74274036,8606522,1.0,58.0,644459.0,2416.0,646875.0,33.383823,56.464615,1937690.0,MALE
5,Uttar Pradesh,AGRA,Manoj Kumar Soni,BSP,Elephant,0,SC,Post Graduate,133784385,22251891,0.0,47.0,434199.0,1130.0,435329.0,22.466390,37.999125,1937690.0,MALE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2257,Maharashtra,YAVATMAL-WASHIM,Vaishali Sudhakar Yede,PHJSP,Whistle,0,GENERAL,10th Pass,1168500,9000,0.0,28.0,20563.0,57.0,20620.0,1.076097,1.755157,1916185.0,FEMALE
2258,Maharashtra,YAVATMAL-WASHIM,Anil Jayram Rathod,IND,SHIP,0,GENERAL,Post Graduate,4890000,1020000,0.0,43.0,14661.0,25.0,14686.0,0.766419,1.250060,1916185.0,MALE
2259,Telangana,ZAHIRABAD,B.B.PATIL,TRS,Car,18,GENERAL,Graduate,1287851556,11535000,1.0,63.0,434066.0,178.0,434244.0,28.975369,41.574183,1498666.0,MALE
2260,Telangana,ZAHIRABAD,MADAN MOHAN RAO,INC,Hand,0,GENERAL,Post Graduate,903663001,0,0.0,49.0,427900.0,115.0,428015.0,28.559732,40.977823,1498666.0,MALE
