Sales Forecast Engine

The repository makes use of two files:

'sales_pipeline.csv' - This consists of all sales related data
'interactions.csv' - This file consists of all interactions between the sales-person and the customer

There are 5 different jupyter notebooks.

EDA + ML - This notebook has the model built only on the numeric features for prediction. It also contains EDA of the entire dataset.
NLP Model - This model uses text+numeric features for classification. It also contains the EDA analysis of all text features. Model used - Multinomial Naive Bayes using TFIDF and BOW Vectorizer
Logistic Model - Uses a Logistic Regression Model for classification. Hyperparameter tuning done. Using TFIDF and BOW Vectorizer
All Numeric Training - This consists of model building for Numeric Features for prediction. Algorithms used - KNN, GBM, Ridge, Lasso, SVR, Randm Forests
All Predictions - This notebook contains all the models used for classification. All results are properly documented.

Thanks

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
All Numeric Training.ipynb		All Numeric Training.ipynb
All Predictions.ipynb		All Predictions.ipynb
EDA + Model.ipynb		EDA + Model.ipynb
Logistic Model.ipynb		Logistic Model.ipynb
NLP + ML.ipynb		NLP + ML.ipynb
NLP Model.ipynb		NLP Model.ipynb
README.md		README.md
gbm_model		gbm_model
predictions.csv		predictions.csv
requirements.txt		requirements.txt
sales-pipeline.xlsx		sales-pipeline.xlsx
sales.csv		sales.csv
scaler_RS		scaler_RS
task_in_Spark.ipynb		task_in_Spark.ipynb
text_model		text_model
vect_model		vect_model

2000siddharth/Sales_pred_plus_NLP