End-to-end Data Science project

This is the repo with the notebooks, code, and additional material used in the ITI's workshop. The goal of the sessions was to illustrate the end-to-end process of an real project.

Additional material

In addition to the notebooks and code, the following material is also available:

Video recordings of the sessions are uploaded to youtube
Slide decks are also added to this repo here

Problem statement

Our (fictional) client is an IT educational institute. They have reached out to us has reach out with the following: “IT jobs and technologies keep evolving quickly. This makes our field to be one of the most interesting out there. But on the other hand, such fast development confuses our students. They do not know which skills they need to learn for which job. “Do I need to learn C++ to be a Data Scientist?” “Do DevOps and System admins use the same technologies?” “I really like JavaScript; can I use it in Data Analytics?” Those are some of the questions that our students ask. Could you please develop a data-driven solution for our students to answer such questions? They mostly want to understand the relationships between the jobs and the technologies.

Level guide

	Basic	Intermediate	Advanced
Business case		Decide on the KPIs that you will positively influence	Calculate the expected financial returns
Data collection	Decide on and collect a suitable data source for your business case	Decide on, collect and connect multiple data sources for better performance
Legal review		Get basic information about the local data privacy law	Study the local data privacy law
Cookie Cutter	Create the standard directory structure
Git	Use Git's GUI to track on master branch	Use Git's CLI to track on Dev branch and merge back to Master	Decide on a branching strategy and solve merge conflicts
Environments	Install python packages using conda	Create a dedicated conda environment	Share your environment and install it on a different machine
Data cleaning	Use basic statistics to filter out non-sense entries	Use advanced statistics and unsupervised learning to filter out non-sense entries	Calculate a 'sanity probability value' for each data point and use it later as the weight
Descriptive analytics	Calculate summary statistics to provide data insights	Produce visualizations to provide deeper understanding	Apply unsupervised learning to provide even deeper understanding
Predictive analytics	Create a single baseline model	Create multiple hyper-tuned models. Benchmark their performance	Combine the chosen models via ensemble and provide prediction confidence
Prescriptive analytics			Recommend the action that the user should take
Software Engineering	Refactor your notebooks to simple python scripts	Create a production OOP class for predictions	Expose your model using an API
MLops	Export and load models from pickle files	Track your models using Mlflow	Create and run a docker image for your project
Product	Create a Web App / GUI to expose prediction functionality	Add the relevant historical insights, predictions and optimization results	Collect users' feedback and retrain your model accordingly

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
docs		docs
models		models
notebooks		notebooks
reports		reports
scripts		scripts
slide_decks		slide_decks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docs

docs

models

models

notebooks

notebooks

reports

reports

scripts

scripts

slide_decks

slide_decks

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

tox.ini

tox.ini

Repository files navigation

End-to-end Data Science project

Additional material

Problem statement

Level guide

About

Languages

License

Deena-Gergis/e2e_ds_project

Folders and files

Latest commit

History

Repository files navigation

End-to-end Data Science project

Additional material

Problem statement

Level guide

About

Topics

Resources

License

Stars

Watchers

Forks

Languages