# Data Science Workflow with Python Cheatsheet

-----

## 1. Import Data

- Pandas
- sqlite3
- SQLAlchemy
- Context Manager
- Glob
--------

## 2. Wrangle Data

__NumPy__

- data structures
- groupby
- joins and merge
- reshape
-------- 

## 3. Transformations

__NumPy__

- text
- time series
- categorical
- missing values
------ 

## 4. Visualization

- matplotlib
- plotly
- plotnine
- seaborn
- bokeh
-------

## 5. Modeling

- scikit-learn
- statsmodels
- TensorFlow
- Keras
- Pycaret
------- 

## 6. Results Communication

- JupyterLab
- Dash
- Streamlit
- Flask
- FastAPI
------------

# Special Topics
------

## 1. Machine Learning

- Scikit-learn - ML in python
- H2O - AutoML & Scalable
- PyCaret - Low Code ML
- ML Packages: XGBoost, AdaBoost, CatBoost, LightGBM
-------

## 2. Feature Engineering

- Sklearn Data Transformations
- category_encoders - Categorical Encoding
- imbalanced-learn - Resampling for Imbalanced
--------

## 3. Text Analysis and NLP

- NLTK - Text Tokenization & Modeling
- spaCy - NLP using Cython for Speed
------- 

## 4. Recommendation Systems

- Annoy - Approximate Nearest Neighbors
- LightFM - Popular recommendation algorithms
-------- 

## 5. Deep Learning

- TensorFlow & Keras
- PyTorch
- MXNet, Gloun & GluonTS
--------  

## 6. Image Processing & Computer Vision

- OpenCV - Open Source Computer Vision
- Scikit Image - Image Processing
- Pillow - PYthon Imaging Library
------- 

## 7. Time Series Forecasting

- statsmodels - Time Series Analysis
- sktime - Scikit-Learn Extenstion for Time Series / pytimetk
- GluonTS - MXNet/Gluon DL for Time Series

### Time Series Features

- pytimetk
- TSFresh - Time Series Feature Engineering
- tslearn - Time Series Features
- Pandas - Time Series
- Arrow - Huma-Friendly Time
------- 

## 8. Exploratory Data Analysis

- skim
- pandas-profiling
- SweetViz
- lux
-------- 

## 9. Web Scrapping

- beautifulsoup - Extract data from HTML
- requests-html - HTML Parsing
------- 

## 10. Web Apps & API

- FastAPI - Web Framework for building APIs in python
- Dash & Streamlit - Data Science Web Frameworks
- Flask - Web Development
--------

## 11. MLOps

- Pycaret MLFlow Integration
- MLFlow - ML Lifecycle, Tracking, Deployment
- MetaFlow - Scalable AWS Jobs for DS
--------

## 12. ETL & Automations

- JobLib - Run python jobs
- Airflow, Prefect - Workflow Scheduling & Monitoring
- Ansible - Deployment Automation
-------- 

## 13. Speed & Scale

- datatable - C++ Speed Up
- Dask - Parallel Pandas & Scikit Learn
- PySpark - Spark Clusters
---------

## 14. Libraries coming from R into python

- datatable - data.table port
- plotnine - ggplot2 port
- suiba & plydata - dplyr/tidyr ports
-------- 

## 15. Cloud

- Azure - Azure python SDK
- Google Cloud - GCP python SDK
- boto3 (AWS) - AWS python SDK
---------

## 16. Reporting

- python-pptx - PowerPoint Documents 
- python-docx - Word Documents
- pdfminer - Text extraction from PDF
- textract - Extract text from any document
- PyPDF2 - Create PDF documents
------- 