Time_Series_ML_Project

Context

The task at hand involves time series forecasting, where we will be working on predicting store sales using data from Corporation Favorita, a major grocery retailer based in Ecuador. The goal in this project is to develop a model that can provide more precise predictions for the unit sales of various items sold across different Favorita stores.

Procedure

The document will comprehensively detail the steps and procedures undertaken to successfully complete this project at every stage. The following steps were meticulously followed to attain the project's objectives.

Steps

Data Collection: The Time series sales data utilized in this project is sourced from various provided databases and files, including a SQL Server database consisting of table1,2 and 3, as well as csv files from designated zip files and one drive. The dataset encompasses valuable details such as store_nbr,family,sales,onpromotion,test.csv,transaction.csv,sample_submission.csv,stores.csv,oil.csv,holidays_events.csv.
Data Loading: The collected data is loaded into the code and transformed into a suitable format for analysis. The pyodbc package is used to connect to the SQL Server database and fetch data from the a given table. The data from the CSV files is read using the pandas library and concatenated with the SQL data to create a comprehensive dataset.
Data Evaluation (EDA): Exploratory data analysis is performed to gain insights into the dataset. This includes summarizing the data, checking for duplicates, handling missing values, and performing visual analyses using the sarima and adf test to check to spot pattern and trends within the given data. The pandas, numpy, matplotlib, and seaborn libraries are utilized for data manipulation and visualization.
Data Processing and Engineering: The dataset undergoes data processing steps to cleanse and preprocess it. These steps involve addressing missing values, transforming categorical variables, and potentially generating new features. Techniques from the pandas library are applied to prepare the dataset for subsequent analysis.
Hypothesis Testing: Time series-related hypotheses are formulated and subjected to statistical testing using methods from the scipy library. Hypothesis tests, such as the Chi-Square Test, Independence Test, and t-test, are employed to assess the significance of various factors.
Answering Questions with Visualizations: Essential inquiries concerning time series are addressed through informative visualizations. Utilizing the matplotlib and seaborn libraries, we create meaningful plots and charts that effectively illustrate the relationships between variables and time series data.
Power BI Deployment: The analysis and visualizations were deployed in Power BI, enabling interactive exploration and sharing with stakeholders. The insights obtained from the analysis were presented effectively using Power BI's dashboarding and reporting features.
Train and Evaluate Four Models: In this project, four machine learning models, namely ARIMA, SARIMA, XGBoost Regressor, and CatBoost Regressor, are trained and evaluated using both the imbalanced and balanced datasets. The evaluation metrics used for assessing model performance include Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Root Mean Log Squared Error (RMLSE).
Evaluate Chosen Model: Advanced Model Improvement: For selected models, GridSearchCV is employed to conduct hyperparameter tuning. The best-tuned models and their optimized parameters are obtained through this process, and predictions are subsequently made using these refined models.
Future Predictions: The trained and validated time series model can be utilized to make predictions on new, unseen data. This enables businesses to forecast various time-dependent outcomes and take proactive measures accordingly. The model can be deployed in production to continuously monitor and predict future events or trends.

Installation

Packages

Authors and Aknowledgement

Below is a table of the initial contributors of the project with their respective Github ID and Articles written to document their individual perspective of the project.

Project LP3	Contribitors	Github Profile
1.	Israel Anaba Ayamga	Israel-Anaba
2.	Isaac Sarpong	IsaacSarpong
3.	Peter Mutwiri	PETERMUTWIRI
4.	Emmanuel Morkeh	Ekmorkeh

Conclusion

In conclusion, this project involves tackling a time series forecasting problem. The utilization of time-dependent data and advanced modeling techniques has enabled us to make accurate predictions and gain valuable insights from the temporal patterns in the dataset. By leveraging the power of time series analysis, we can make informed decisions and effectively plan for the future.

License

MIT-LICENSE.txt is an open-source software license widely used for distributing and sharing software, code, and other creative works.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
Assets		Assets
Regression_Time_Series		Regression_Time_Series
catboost_info		catboost_info
exported_data_folder		exported_data_folder
venv		venv
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SCALING AND FEATURE ENGINEERING.docx		SCALING AND FEATURE ENGINEERING.docx
Time Series Updated.ipynb		Time Series Updated.ipynb
exported_data.zip		exported_data.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assets

Assets

Regression_Time_Series

Regression_Time_Series

catboost_info

catboost_info

exported_data_folder

exported_data_folder

venv

venv

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

SCALING AND FEATURE ENGINEERING.docx

SCALING AND FEATURE ENGINEERING.docx

Time Series Updated.ipynb

Time Series Updated.ipynb

exported_data.zip

exported_data.zip

Repository files navigation

Time_Series_ML_Project

Context

Procedure

Steps

Installation

Packages

Authors and Aknowledgement

Conclusion

License

About

Releases

Packages

Contributors 3

Languages

License

IsaacSarpong/Sales-Forecasting-Project-at-Favorita

Folders and files

Latest commit

History

Repository files navigation

Time_Series_ML_Project

Context

Procedure

Steps

Installation

Packages

Authors and Aknowledgement

Conclusion

License

About

Resources

License

Stars

Watchers

Forks

Languages