Predict-party-affiliation-Democrat-or-Republican

Task project:

Predict party affiliation ('Democrat' or 'Republican') based on how they are voted by US House of Representatives Congressmen

Introduction:

I'll be working with a dataset obtained from the UCI Machine Learning Repository consisting of votes made by US House of Representatives Congressmen. My goal will be to predict their party affiliation ('Democrat' or 'Republican') based on how they voted on certain key issues.

Structure dataset:

Class Name: 2 (democrat, republican)
handicapped-infants: 2 (y,n)
water-project-cost-sharing: 2 (y,n)
adoption-of-the-budget-resolution: 2 (y,n)
physician-fee-freeze: 2 (y,n)
el-salvador-aid: 2 (y,n)
religious-groups-in-schools: 2 (y,n)
anti-satellite-test-ban: 2 (y,n)
aid-to-nicaraguan-contras: 2 (y,n)
mx-missile: 2 (y,n)
immigration: 2 (y,n)
synfuels-corporation-cutback: 2 (y,n)
education-spending: 2 (y,n)
superfund-right-to-sue: 2 (y,n)
crime: 2 (y,n)
duty-free-exports: 2 (y,n)
export-administration-act-south-africa: 2 (y,n)

(*) Missing Attribute Values: Denoted by "?"
(**) y: voted for the key issue and 'n' mean the oposite.

Guidelines:

(I recommend you read my guideline along with Democrat or Republican.ipynb) A. Preprocessing data

My data is lack of header -> import and create header for it. There are 16 subjects (key issues) for voting (as mention above)
Base on the result it look like only 392 data points out of 6960 is missing (around 5,6%) but if we use .dropnan function nearly half of total observation is removed cause this function delete entire rows which have missing values -> drop them all is unacceptable -> Ideally, inpute missing value using SimpleImputer
Before fill in missing ones, I converted all 'y' or 'n' into 1 or 0 cause The KNeighborsClassifier in sklearn is designed to work with numeric data only
Impute missing data (Note: there is some problems when I use SimpleImputer with pandas DataFrames -> I have to change it to Numpy arrays and drop header again) then after that I convert it back to pandas DataFrames (for vizualization)

B. Fit/train model

As I said before, KNeighborsClassifier work with numeric data -> encode target 'party' using LabelEncoder() (1: republican, 0: democrat).
The accuracy for my model is 94,83% for training dataset and 97,7% for testing dataset, which is quite good.

C. Make a prediction

A random unlabeled data point has been generated for predicting and is available to you as X_new (unseen data)

You can still use the .predict() method on the X that was used to fit the model (imagine like you do an exercise with the result has already presented to you), but it is not a good indicator.
You will use your classifier to predict the label for this new data point, as well as on the training data X that the model has already seen

The predicted result need to be converted from numeric value (1 & 0) to 'republican' or 'democrat' by using for-in and if-else

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Democrat or Republican updated.ipynb		Democrat or Republican updated.ipynb
LICENSE		LICENSE
README.md		README.md
votes84.csv		votes84.csv
x_new.csv		x_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Democrat or Republican updated.ipynb

Democrat or Republican updated.ipynb

LICENSE

LICENSE

README.md

README.md

votes84.csv

votes84.csv

x_new.csv

x_new.csv

Repository files navigation

Predict-party-affiliation-Democrat-or-Republican

Task project:

Introduction:

Structure dataset:

Guidelines:

About

Releases

Packages

Languages

License

LinhNguyen-MyLi/Predict-Democrat-or-Republican

Folders and files

Latest commit

History

Repository files navigation

Predict-party-affiliation-Democrat-or-Republican

Task project:

Introduction:

Structure dataset:

Guidelines:

About

Resources

License

Stars

Watchers

Forks

Languages