<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/MLPG-Book-Cover-Small.png"><br>

This notebook contains an excerpt from the **`Machine Learning Project Guidelines - For Beginners`** book written by *Balasubramanian Chandran*; the content is available [on GitHub](https://github.com/BalaChandranGH/Books/ML-Project-Guidelines).

<br>
<!--NAVIGATION-->

<[ [Other Considerations - Databases](18.10-mlpg-Other-Considerations-Databases.ipynb) | [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Other Considerations - Testing](18.12-mlpg-Other-Considerations-Testing.ipynb) ]>

# 18. Other Considerations

## 18.11. Python Libraries

### 18.11.1. PyCaret library
* PyCaret is a Python version of the popular and widely used caret machine learning package in R
* The goal of the caret package is to automate the major steps for evaluating and comparing machine learning algorithms for classification and regression. The main benefit of the library is that a lot can be achieved with very few lines of code and little manual configuration. The PyCaret library brings these capabilities to Python
* It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding

**PyCaret for Comparing Machine Learning Models (Example given below):**
![](figures/MLPG-OC-PyCaret1.png)

**Tuning Machine Learning Models (Example given below):**
![](figures/MLPG-OC-PyCaret2.png)<br>
&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; Images credit [ (Source) ](https://machinelearningmastery.com/pycaret-for-machine-learning/)

### 18.11.2. Python packages for ML & Data Science – A must-know
Top 24 Python packages that are most commonly used in Data Science, Data Engineering, and ML projects.

#### `18.11.2.1. Data processing, analysis, and manipulation packages`
#### 18.11.2.1.1. **`Numpy`**
* Fundamental package for scientific computing
* Powerful N-dimensional arrays
  - Fast, versatile, vectorization, indexing, broadcasting, etc.
* Numerical computing tools
  - Offers comprehensive mathematical functions, random number generators, linear algebra routines, basic linear algebra, basic statistical operations, selecting, sorting, I/O, data transforms, data manipulation, and more
* Interoperable 
  - Supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries
* Performance 
  - The core of NumPy is well-optimized C code that runs very fast
* Easy to use
  - The high-level syntax makes it accessible and productive for programmers from any background or experience level
* Open-source
  - Developed and maintained [publicly on GitHub](https://github.com/numpy/numpy) by a vibrant, responsive, and diverse [community](https://numpy.org/community)

#### 18.11.2.1.2. **`Pandas`**
* An open-source, fast, powerful, flexible, and easy to use data analysis and manipulation tool,
built on top of the Python programming language
* It offers data structures and operations for manipulating numerical tables and time-series data
* It provides some of the most useful set of tools to explore, clean, and analyze data
* With Pandas, you can load, prepare, manipulate, and analyze all kinds of structured data
* ML libraries also revolve around Pandas DataFrames as an input

#### 18.11.2.1.3. **`SciPy`**
* An open-source library and is one of the core packages that make up the SciPy stack/eco-system
* It provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics
* SciPy is mainly used for its scientific functions and mathematical functions derived from NumPy
* Some useful functions that this library provides are stats functions, optimization functions, and signal processing functions

#### 18.11.2.1.4. **`Statesmodels`** module
* A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration
* It is a great library for doing hardcore statistics
* This multifunctional library is a blend of different Python libraries, taking its graphical features and functions from Matplotlib, for data handling, it uses Pandas, for handling R-like formulas, it uses Pasty, and is built on NumPy and SciPy
* Specifically, it’s useful for creating stats models, like OLS, and also for performing statistical tests

#### 18.11.2.1.5. **`Python-dateutil`** module
* A module that provides powerful extensions to the standard datetime module, available in Python
* Most importantly it handles the time zone issues 

#### `18.11.2.2. Visualization packages`

#### 18.11.2.2.1. **`Matplotlib`**
* An open-source comprehensive library for creating static, animated, and interactive visualizations 

#### 18.11.2.2.2. **`Seaborn`**
* An open-source Python data visualization library, for making statistical graphics in Python, built on top of matplotlib and integrates closely with Pandas data structures
* One of the most important features of Seaborn is the creation of amplified data visuals. Some of the correlations that are not obvious initially can be displayed in a visual context, allowing Data Scientists to understand the models more properly

#### 18.11.2.2.3. **`Plotly`**
* An open-Source Python Graphing Library makes interactive, publication-quality graphs
* Supports over 40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and 3-dimensional use-cases
* A must-know tool for building visualizations since it is extremely powerful, easy to use, and has a big benefit of being able to interact with the visualizations

#### `18.11.2.3. Machine Learning packages`

#### 18.11.2.3.1. **`Scikit-Learn`**
* An open-source ML library for Python and arguably the most important library for ML
* Simple and efficient tools for predictive data analysis
* Built on NumPy, SciPy, and Matplotlib
* Used to build ML models as it has lots of tools for predictive modeling and analysis

#### 18.11.2.3.2. **`XGBoost`**
* An open-source optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable
* Implements ML algorithms under the Gradient Boosting framework
* Provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way
* The same code runs on the major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples
* One of the most popular ML algorithms for any type of prediction task at hand; regression or classification
* It is a well-known to provide better solutions than other ML algorithms and it has become the "state-of-the-art” ML algorithm to deal with structured data

#### `18.11.2.4. Text processing/Natural Language Processing packages`

#### 18.11.2.4.1. **`NLTK`** (Natural Language Toolkit)
* An open-source Python package for Natural Language Processing (NLP)
* A leading platform for building Python programs to work with human language data
* It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning

#### `18.11.2.5. Networks & Graphs packages`

#### 18.11.2.5.1. **`NetworkX`**
* An open-source Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It provides,
  - Data structures for graphs, digraphs, and multigraphs 
  - Many standard graph algorithms 
  - Network structure and analysis measures 
  - Generators for classic graphs, random graphs, and synthetic networks 
  - Nodes can be "anything" (e.g., text, images, XML records) 
  - Edges can hold arbitrary data (e.g., weights, time series) 

#### `18.11.2.6. Geographic Processing (Maps) packages`

#### 18.11.2.6.1. **`Leaflet`**
* Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps, weighing just about 39 KB of JS, it has all the mapping features most developers ever need
* Leaflet is designed with simplicity, performance, and usability in mind
* It works efficiently across all major desktop and mobile platforms, can be extended with lots of plugins, has a beautiful, easy to use, and well-documented API 

#### 18.11.2.6.2. **`Folium`**
* An open-source Python library used for visualizing geospatial data
* A Python wrapper for `Leaflet.js` which is a leading open-source JavaScript library for plotting interactive maps
* Folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the `Leaflet.js` library
* Makes it easy to visualize data that’s been manipulated in Python on an interactive `Leaflet` map
* The library has several built-in tilesets from OpenStreetMap, MapQuest Open, MapQuest Open Aerial, Mapbox, and Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys
* Folium supports both GeoJSON and TopoJSON overlays, as well as the binding of data to those overlays to create choropleth maps with color-brewer color schemes.

#### `18.11.2.7. Image and video processing packages`

#### 18.11.2.7.1. **`OpenCV`** (Open-Source Computer Vision Library)
* An open-source computer vision and ML library (basically a library or image and video processing)
* The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and ML algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high-resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc. 
* Cross-platform support: It has C++, Python, and Java interfaces and supports Linux, macOS, Windows, iOS, and Android

#### `18.11.2.8. Deep Learning packages`

#### 18.11.2.8.1. **`TensorFlow`**
* An end-to-end open-source platform for ML
* It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML, and developers easily build and deploy ML applications
* It helps to easily train and deploy models in the cloud, on-premises, in the browser, or on-device no matter what language you use
* Because it is highly parallel, it can train multiple neural networks and GPUs for highly efficient and scalable models
* Multi-dimensional arrays are called `tensors`

#### 18.11.2.8.2. **`Keras`**
* Keras is a powerful and easy-to-use free open-source Python library for developing and evaluating deep learning models
* It wraps the efficient numerical computation libraries `Theano` and `TensorFlow` and allows you to define and train neural network models in just a few lines of code
* Keras is a deep learning API written in Python, running on top of the ML platform TensorFlow
* It was developed with a focus on enabling fast experimentation: _`Being able to go from idea to result as fast as possible`_
* It’s is simple, flexible, and powerful
* TensorFlow 2 is an end-to-end, open-source ML platform
* _`Keras is the high-level API of TensorFlow 2`_: an approachable, highly-productive interface for solving ML problems, with a focus on modern deep learning. It provides essential abstractions and building blocks for developing and shipping ML solutions with high iteration velocity
* _`Keras empowers engineers and researchers to take full advantage of the scalability and cross-platform capabilities of TensorFlow 2`_: you can run Keras on TPU or large clusters of GPUs, or CPUs, and you can export your Keras models to run in browsers or on mobile devices or servers

#### 18.11.2.8.3. **`PyTorch`**
* An open-source ML framework that accelerates the path from research prototyping to production deployment
* A Python ML package based on Torch, based on the programming language Lua developed by FB
* An optimized tensor library primarily used for Deep Learning applications using GPUs and CPUs
* It is one of the widely used ML libraries, the others being TensorFlow and Keras
* The popularity of the PyTorch library is relatively higher compared to TensorFlow and Keras
* Features:
  - Tensor computation (like NumPy) with strong GPU acceleration
  - Automatic differentiation for building and training neural networks

#### `18.11.2.9. Deployment packages`

#### 18.11.2.9.1. **`Gradio`**
* Gradio is an open-source python library to rapidly make simple, adjustable UI parts for ML models, any API, or any subjective capacity in only a couple of lines of code
* It makes it easier to play with models in web browsers by just drooping and dragging images, text, or recording of your voice, etc and seeing live the output in an interactive way
* Gradio works with a wide range of media-text, pictures, video, and sound
* `Uses of Gradio:`
  - Used to generate an easy-to-use demo for ML models or functions with only a few lines of code
  - Allows to quickly create customizable UI components around TensorFlow or PyTorch models, or even arbitrary Python functions - Mix and match components to support any combination of inputs and outputs
  - Fast, easy setup
    - Gradio can be installed directly through pip. Creating a Gradio interface only requires adding a couple of lines of code to the project
  - Present and share
    - Gradio can be embedded in Python notebooks or presented as a webpage
    - A Gradio interface can automatically generate a public link you can share with colleagues that lets them interact with the model on your computer remotely from their own devices
  - Permanent hosting
    - Once you've created an interface, you can point Gradio towards the GitHub repository where it is contained
    - Gradio will host the interface on its servers and provide you with a link you can share

#### `18.11.2.10. Web Scrapping packages`

#### 18.11.2.10.1. **`Scrapy`**
* An open-source and collaborative framework for extracting the data from websites, in a fast, simple, yet extensible way
* Scrapy is a Python framework for large scale web scraping, maintained by Zyte
* It gives you all the tools to efficiently **extract** data from websites, **process** them as you want, and **store** them in your preferred structure and format
* Features: 
  - Fast and powerful - write the rules to extract the data and let Scrapy do the rest
  - Easily extensible - extensible by design, plug new functionality easily without touching the core
  - Portable, Python - written in Python and runs on Linux, Windows, Mac

#### 18.11.2.10.2. **`Beautiful Soup`**
* A Python library for pulling data out of HTML and XML files
* It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping

#### `18.11.2.11. Distributed Environment packages`

#### 18.11.2.11.1. **`PySpark`**
* A Python API for Spark and it helps to interface with Resilient Distributed Datasets (RDDs) in Apache Spark and Python programming language
* PySpark is an interface for Apache Spark in Python
* It not only allows you to write Spark applications using Python APIs but also provides the PySpark shell for interactively analyzing your data in a distributed environment
* PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning), and Spark Core

#### `18.11.2.12. HTTP client packages`

#### 18.11.2.12.1. **`Urllib3`**
* urllib3 is a powerful, user-friendly HTTP client for Python
* Much of the Python ecosystem already uses urllib3
* urllib3 brings many critical features that are missing from the Python standard libraries:
  - Thread safety
  - Connection pooling
  - Client-side TLS/SSL verification
  - File uploads with multipart encoding
  - Helpers for retrying requests and dealing with HTTP redirects
  - Support for gzip, deflate, and brotli encoding
  - Proxy support for HTTP and SOCKS
  - 100% test coverage

#### `18.11.2.13. Package Manager package`

#### 18.11.2.13.1. **`Pip`**
* PIP is a package manager for Python packages/libraries or modules
  - A package contains all the files you need for a module
  - Modules are Python code libraries you can include in your project
* PIP is the standard package manager for Python that allows you to install and manage additional packages that are not part of the Python standard library
* PIP is written in Python

<!--NAVIGATION-->
<br>

<[ [Other Considerations - Databases](18.10-mlpg-Other-Considerations-Databases.ipynb) | [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Other Considerations - Testing](18.12-mlpg-Other-Considerations-Testing.ipynb) ]>