# Data Science Workflow with Python Cheatsheet

-----

## 1. Import Data

- [Pandas](http://pandas.pydata.org/pandas-docs/stable/index.html)[CS](https://github.com/pandas-dev/pandas/raw/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)
- [sqlite3](https://docs.python.org/3/library/sqlite3.html)
- [SQLAlchemy](https://docs.sqlalchemy.org/en/14/dialects/index.html)
- [Context Manager](https://book.pythontips.com/en/latest/context_managers.html)
- [Glob](https://docs.python.org/3/library/glob.html)
--------

## 2. Wrangle Data

- [NumPy](https://numpy.org/) - data manipulation

- [data structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html)
- [groupby](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)
- [joins & merge](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)
- [reshape (pivot)](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html)
-------- 

## 3. Transformations

- [text](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html)
- [time series](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)
- [categorical](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html)
- [missing values](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html)
------ 

## 4. Visualization

- [matplotlib](https://matplotlib.org/index.html)
- [plotly](https://plotly.com/python/)[CS](https://images.plot.ly/plotly-documentation/images/plotly_js_cheat_sheet.pdf)
- [plotnine](https://plotnine.readthedocs.io/en/stable/)
- [seaborn](http://seaborn.pydata.org/)
- [bokeh](http://bokeh.org/)
-------

## 5. Modeling

- [scikit-learn](https://scikit-learn.org/stable/documentation.html)
- [statsmodels](https://www.statsmodels.org/devel/)
- [TensorFlow](https://www.tensorflow.org/)
- [Keras](https://keras.io/)
- [Pycaret](https://pycaret.org/)
------- 

## 6. Results Communication

- [JupyterLab](https://jupyterlab.readthedocs.io/)
- [Dash](https://plotly.com/dash/)
- [Streamlit](https://www.streamlit.io/)
- [Flask](http://flask.pocoo.org/)
- [FastAPI](https://fastapi.tiangolo.com/)
------------

# Special Topics
------

## 1. Machine Learning

- [Scikit-learn](https://scikit-learn.org/stable/) - ML in python
- [H2O](https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/intro.html) - AutoML & Scalable
- [PyCaret](https://pycaret.org/) - Low Code ML
- ML Packages: [XGBoost](https://xgboost.readthedocs.io/en/latest/), [CatBoost](https://catboost.ai/), [LightGBM](https://lightgbm.readthedocs.io/en/latest/)
-------

## 2. Feature Engineering

- [Sklearn Data Transformations](https://scikit-learn.org/stable/data_transforms.html)
- [category_encoders](http://contrib.scikit-learn.org/category_encoders/) - Categorical Encoding
- [imbalanced-learn](https://imbalanced-learn.org/stable/) - Resampling for Imbalanced
--------

## 3. Text Analysis and NLP

- [NLTK](http://www.nltk.org/) - Text Tokenization & Modeling
- [spaCy](https://spacy.io/) - NLP using Cython for Speed
------- 

## 4. Recommendation Systems

- [Annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors
- [LightFM](https://making.lyst.com/lightfm/docs/home.html) - Popular recommendation algorithms
-------- 

## 5. Deep Learning

- [TensorFlow](https://www.tensorflow.org/) & [Keras](https://keras.io/)
- [PyTorch](https://github.com/pytorch/pytorch)
- [MXNet](https://github.com/dmlc/mxnet), [Gloun](http://gluon.ai/) & [GluonTS](https://ts.gluon.ai/index.html)
--------  

## 6. Image Processing & Computer Vision

- [OpenCV](https://opencv.org/) - Open Source Computer Vision
- [Scikit Image](http://scikit-image.org/) - Image Processing
- [Pillow](https://python-pillow.org/) - Python Imaging Library
------- 

## 7. Time Series Forecasting

- [statsmodels](https://www.statsmodels.org/devel/user-guide.html#time-series-analysis) - Time Series Analysis
- [sktime](https://www.sktime.org/en/latest/) - Scikit-Learn Extenstion for Time Series / pytimetk
- [GluonTS](https://ts.gluon.ai/) - MXNet/Gluon DL for Time Series

### Time Series Features

- [pytimetk](https://business-science.github.io/pytimetk/) - Simplifying Time Series Analysis
- [TSFresh](https://tsfresh.readthedocs.io/en/latest/) - Time Series Feature Engineering
- [tslearn](https://tslearn.readthedocs.io/en/stable/) - Time Series Features
- [Pandas](https://pandas.pydata.org/docs/user_guide/timeseries.html) - Time Series
- [Arrow](https://arrow.readthedocs.io/en/latest/) - Human-Friendly Time
------- 

## 8. Exploratory Data Analysis

- [skimpy](https://pypi.org/project/skimpy/) - data profiling
- [pandas-profiling](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/)
- [SweetViz](https://github.com/fbdesignpro/sweetviz)
- [lux](https://github.com/lux-org/lux)
-------- 

## 9. Web Scrapping

- [beautifulsoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - Extract data from HTML
- [requests-html](https://github.com/psf/requests-html) - HTML Parsing
------- 

## 10. Web Apps & API

- [FastAPI](https://fastapi.tiangolo.com/) - Web Framework for building APIs in python
- [Dash](https://plotly.com/dash/) & [Streamlit](https://www.streamlit.io/) - Data Science Web Frameworks
- [Flask](https://flask.palletsprojects.com/en/1.1.x/) - Web Development
--------

## 11. MLOps

- [Pycaret MLFlow Integration](https://pycaret.org/mlflow/)
- [MLFlow](https://www.mlflow.org/docs/latest/index.html) - ML Lifecycle, Tracking, Deployment
- [MetaFlow](https://metaflow.org/) - Scalable AWS Jobs for DS
--------

## 12. ETL & Automations

- [JobLib](https://joblib.readthedocs.io/en/latest/) - Run python jobs
- [Airflow](https://github.com/apache/airflow), [Prefect](https://docs.prefect.io/latest/) - Workflow Scheduling & Monitoring
- [Ansible](https://www.ansible.com/) - Deployment Automation
-------- 

## 13. Speed & Scale

- [datatable](https://datatable.readthedocs.io/en/latest/) - C++ Speed Up
- [Dask](https://dask.org/)[(CS)](https://rapids.ai/assets/files/cheatsheet.pdf) - Parallel Pandas & Scikit Learn
- [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) - Spark Clusters
---------

## 14. Libraries coming from R into python

- [datatable](https://datatable.readthedocs.io/en/latest/) - data.table port
- [plotnine](https://plotnine.readthedocs.io/en/stable/) - ggplot2 port
- [suiba](https://siuba.readthedocs.io/en/latest/) & [plydata](https://plydata.readthedocs.io/en/stable/index.html) - dplyr/tidyr ports
-------- 

## 15. Cloud

- [Azure](https://docs.microsoft.com/en-us/azure/developer/python/) - Azure python SDK
- [Google Cloud](https://github.com/googleapis/google-cloud-python) - GCP python SDK
- [boto3](https://github.com/boto/boto3) (AWS) - AWS python SDK
---------

## 16. Reporting

- [python-pptx](https://github.com/scanny/python-pptx) - PowerPoint Documents 
- [python-docx](https://github.com/scanny/python-docx) - Word Documents
- [pdfminer](https://github.com/euske/pdfminer) - Text extraction from PDF
- [textract](https://textract.readthedocs.io/en/stable/) - Extract text from any document
- [PyPDF2](https://github.com/mstamy2/PyPDF2) - Create PDF documents
------- 