Built Classification models to determine the player from Shakespeare-plays dataset using Feature Engineering and exploratory data analysis.
- Set up a data science project structure in a new git repository in your GitHub account
- Download the Shakespeare plays dataset from https://www.kaggle.com/kingburrito666/shakespeare-plays
- Load the data set into panda data frames
- Formulate one or two ideas on how feature engineering would help the data set to establish additional value using exploratory data analysis
- Build one or more classification models to determine the player using the other columns as features
- Document your process and results
- Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub
- The first column is the Data-Line, it just keeps track of all the rows there are.
- The second column is the play that the lines are from.
- The third column is the actual line being spoken at any given time.
- The fourth column is the Act-Scene-Line from which any given line is from.
- The fifth column is the player who is saying any given line.
- The sixth column is the line being spoken.
- Train Classification models to predict Player from Shakespeare plays dataset.
- Explore Shakespeare plays dataset
- Using Feature Engineering, transform data to provide best accuracy with classification models.
- Use different Models to identify the best one for this dataset.
After exploring dataset we understood the availability of huge data for each play helps build more accurate model
- performed pretty good compared to others.
- not too bad performance.
- worst performance overall.
- best performing model.
- Project description
- Contains raw data in csv format
- Jupyter Notebook for Exploratory data analysis, Visualization, Feature Engineering and Classification.
- plot- number of players vs play
- Plot- Play count
- Plot- Player count
- Info about Tools, frameworks and libraries required to reproduce the work flow