Skip to content

ML Project, used for experiment prediction of books success (by receiving an award).

Notifications You must be signed in to change notification settings

chapost1/books-success-prediction-experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Books Success Prediction Experiment

Assuming we were a huge books publisher and a writer came to us with a book, how could we know if this book will be successful? Also if we were to be the authors of the book, could we ever know if the book will get the audience sympathy or even reach the cinema theatres? To answer the following questions we set a goal to our research: to see if we can build a model that will predict if a book is so successful that it will also be awarded by list of books features.

Table of contents

  1. Introduction
  2. Imports
  3. Data acquisition
    3.1 Scraping challanges
    3.2 Scraping clean data
    3.3 Authentication process
    3.4 Authentication class
    3.5 Scraping Process
    3.6 Book Spider Class
    3.7 Scraping route creation
    3.8 Genre spider
  4. Scrapping and threading
    4.1 First crawl
    4.2 Concating Data
    4.3 Total data scraped
  5. Data cleaning
    5.1 Corrupted data cleaning
    5.2 Replace missing data - original title
    5.3 None values - discussion and strategy
  6. Pre outliers cleaning EDA
    6.1 Genre distribution
    6.2 Mean rating by genre
    6.3 Language distribution
    6.4 Edition count to rating
    6.5 Rating to award
    6.6 Pages count to books count
  7. Dealing with outliers
    7.1 Outliers detection
    7.2 Outliers cleaning
    7.3 Outliers cleaning results
  8. EDA after outliers cleaning
    8.1 Thoughts of the results
    8.2 Aggregation metrics
    8.3 Original title correlation with awards
    8.4 Awards count per genre
    8.5 Awards percentage by genre
  9. Machine learning preperation
  10. Machine learning - Decision tree
    10.1 Single decision tree
    10.2 First prediction
    10.3 New dimenstion - The ace in the sleeve
    10.4 Depth optiomazation
  11. Machine learning - Random forest
    11.1 Overfitting?
    11.2 Model improvment
    11.3 Adjusting features
    11.4 Grid search many forests
    11.5 F-score accuracy addition
    11.6 Random states tests
  12. Conclusion and credits

For implementation, visit hosted notebook:

https://chapost1.github.io/books-success-prediction-experiment/

About

ML Project, used for experiment prediction of books success (by receiving an award).

Resources

Stars

Watchers

Forks