The repository makes use of two files:
- 'sales_pipeline.csv' - This consists of all sales related data
- 'interactions.csv' - This file consists of all interactions between the sales-person and the customer
There are 5 different jupyter notebooks.
-
EDA + ML - This notebook has the model built only on the numeric features for prediction. It also contains EDA of the entire dataset.
-
NLP Model - This model uses text+numeric features for classification. It also contains the EDA analysis of all text features. Model used - Multinomial Naive Bayes using TFIDF and BOW Vectorizer
-
Logistic Model - Uses a Logistic Regression Model for classification. Hyperparameter tuning done. Using TFIDF and BOW Vectorizer
-
All Numeric Training - This consists of model building for Numeric Features for prediction. Algorithms used - KNN, GBM, Ridge, Lasso, SVR, Randm Forests
-
All Predictions - This notebook contains all the models used for classification. All results are properly documented.
Thanks