#  AI Glossary 


##  Table of Contents 
* [Weak AI](#weak_ai)
* [Strong AI](#strong_ai)
* [Turing test](#turing_test)
* [Data](#data)
    * [Signal](#signal)
    * [Noise](#noise)
    * [Structured Data](#structured_data)
    * [Unstructured Data](#unstructured_data)
    * [Database](#database)
    * [Data Warehouse](#data_warehouse)
    * [Data Lake](#data_lake)
* [Brute Force Search](#brute_force_search)
* [Heuristics](#heuristics)
* [Machine Learning](#machine_learning)
* [Data mining](#data_mining)
* [Model](#model)
* [Parameter](#parameter)
    * [Hyperparameter](#hyperparameter)
* [Feature Extraction](#feature_extraction)
* [Supervised Learning](#supervised_learning)
* [Label](#label)
* [Unsupervised Learning](#unsupervised_learning)
* [Regression](#regression)
* [Classification](#classification)
* [Clustering](#clustering)
* [Reinforcement Learning](#reinforcement_learning)
* [Deep learning](#deep_learning)
    * [Deep Neural Network](#deep_neural_network)
    * [Weights](#weights)
* [Data Set Split](#data_set_split)
    * [Test Data Set](#test_data_set)
    * [Training Data Set](#training_data_set)
    * [Validation Data Set](#validation_data_set)
* [GPU (Graphics Processing Unit)](#gpu_(graphics_processing_unit))
* [Natural language processing (NLP)](#natural_language_processing_(nlp))
    * [Natural language understanding (NLU)](#natural_language_understanding_(nlu))
    * [Natural language generation (NLG)](#natural_language_generation_(nlg))
* [Chatbot](#chatbot)
* [Image Recognition](#image_recognition)
* [Image Segmentation](#image_segmentation)
* [ImageNet](#imagenet)
* [Convolutional Neural Network](#convolutional_neural_network)
* [Transfer Learning](#transfer_learning)
* [Overfitting](#overfitting)
* [Underfitting](#underfitting)
* [Bias-Variance Tradeoff](#bias-variance_tradeoff)
    * [Bias](#bias)
    * [Variance](#variance)
    * [Regularization](#regularization)
* [Evaluation](#evaluation)
    * [False Positives (Type I Error)](#false_positives_(type_i_error))
    * [False Negatives (Type II Error)](#false_negatives_(type_ii_error))
    * [True Negatives](#true_negatives)
    * [True Positives](#true_positives)
    * [Accuracy](#accuracy)
    * [A/B Testing](#a/b_testing)
* [ML-Ops](#ml-ops)

---

Artificial intelligence as a discipline consists of hundreds of individual technologies, concepts, and applications. It is important to have the big picture before doing a deep dive. So let's have a review.

Artificial intelligence (AI) is a blanket term that often refers to a suite of technologies and concepts. When we talk about self-learning machines in any form, it’s usually tagged with “AI.” What AI really refers to is **"weak AI"** or the simulation of human intelligence to complete a very narrow task. Broader ability to perform tasks within the realm of human intelligence is **Artificial General Intelligence (AGI)** and is more commonly what we are referring to when describing chatbots, voice systems, or medical bots.

<a class="anchor" id="weak_ai"></a>


## Weak AI

Also known as narrow AI, weak AI refers to a non-sentient computer system that operates within a predetermined range of skills and usually focuses on a singular task or small set of tasks. Most AI in use today is weak AI.

<a class="anchor" id="strong_ai"></a>


## Strong AI

An area of AI development that is working toward the goal of making AI systems that are as useful and skilled as the human mind.

<a class="anchor" id="turing_test"></a>


## Turing test

A test developed by Alan Turing that tests the ability of a machine to mimic human behavior. The test involves a human evaluator who undertakes natural language conversations with another human and a machine and rates the conversations.

<a class="anchor" id="data"></a>


## Data

Any collection of information converted into a digital form.

<a class="anchor" id="signal"></a>


### Signal

A signal represents the way you communicate: Say AM/FM radio, smoke, a hand gesture, etc.

Note that data denotes the information conveyed in the signal. But data may also be stored. So a signal may convey information (data), but not all data is signaled. Data is usually associated with digital storage of information . Signal on the other hand need not contain any information . It is possible to derive information from some signals and store for further processing . For example a white noise that come out of any source (like thermal noise from electronic devices, sound from traffic etc) is a signal, but may or may not be converted in to useful data , depending of subject of interest

<a class="anchor" id="noise"></a>


### Noise

Signals with no causal relation to the target function.

<a class="anchor" id="structured_data"></a>


### Structured Data

Data in any form that is generated, captured, analyzed in a linear, tabular, organized format would be considered structured data, i.e., business data generated within organizations within traditional applications are typically structured data.

<a class="anchor" id="unstructured_data"></a>


### Unstructured Data

Unstructured data is defined as the type of data that can have multiple origins from online digital files, text documents, SMS, video, images, voice, sensors, pings, etc.—anything that is not available in a traditional row, column, or table format. Most of the data being generated today is unstructured data and is one of the driving forces behind the rise of AI.

<a class="anchor" id="database"></a>

### Database

A database is a storage location that houses structured data. We usually think of a database on a computer—holding data, easily accessible in a number of ways. Arguably, you could consider your smartphone a database on its own, thanks to all the data it stores about you.

Popular databases are:

- Oracle
- PostgreSQL
- MongoDB
- Redis
- Elasticsearch
- Apache Cassandra

<a class="anchor" id="data_warehouse"></a>

### Data Warehouse

The next step up from a database is a data warehouse. Data warehouses are large storage locations for data that you accumulate from a wide range of sources. For decades, the foundation for business intelligence and data discovery/storage rested on data warehouses. Their specific, static structures dictate what data analysis you could perform.

Data warehouses are popular with mid- and large-size businesses as a way of sharing data and content across the team- or department-siloed databases. Data warehouses help organizations become more efficient. Organizations that use data warehouses often do so to guide management decisions—all those “data-driven” decisions you always hear about.

Popular companies that offer data warehouses include:

- Snowflake
- Yellowbrick
- Teradata

<a class="anchor" id="data_lake"></a>


### Data Lake

A data lake is a large storage repository that holds a huge amount of raw data in its original format until you need it. Data lakes exploit the biggest limitation of data warehouses: their ability to be more flexible.

As we’ll see below, the use cases for data lakes are generally limited to data science research and testing—so the primary users of data lakes are data scientists and engineers. For a company that actually builds data warehouses, for instance, the data lake is a place to dump and temporarily store all the data until the data warehouse is up and running. Small and medium sized organizations likely have little to no reason to use a data lake.

Popular data lake companies are:

- Hadoop
- Azure
- Amazon S3

<a class="anchor" id="brute_force_search"></a>


## Brute Force Search

A search that isn’t limited by clustering/ approximations; it searches across all inputs. Often more time-consuming and expensive but more thorough.

<a class="anchor" id="heuristics"></a>


## Heuristics

These are rules drawn from experience used to solve a problem more quickly than traditional problem-solving methods in AI. While faster, a heuristic approach typically is less optimal than the classic methods it replaces.

In other words, a heuristic, or heuristic technique, is any approach to problem-solving that uses a practical method or various shortcuts in order to produce solutions that may not be optimal but are sufficient given a limited timeframe or deadline

<a class="anchor" id="machine_learning"></a>


## Machine Learning

A field of AI focused on getting machines to act without being programmed to do so. Machines “learn” from patterns they recognize and adjust their behavior accordingly.

<a class="anchor" id="data_mining"></a>


## Data mining

The process by which patterns are discovered within large sets of data with the goal of extracting useful information from it.

It is different to machine learning. Data mining is designed to extract the rules from large quantities of data, while machine learning teaches a computer how to learn and comprehend the given parameters. Or to put it another way, data mining is simply a method of researching to determine a particular outcome based on the total of the gathered data. On the other side of the coin, we have machine learning, which trains a system to perform complex tasks and uses harvested data and experience to become smarter.

Broadly speaking, data mining is the computer-driven process of exploring data sets, pinpointing key trends and anomalies, and subsequently analyzing these findings to form conclusions and make better decisions. Data mining is used in countless industries as a means of improving efficiency, developing crucial consumer insights, and innovating on existing business models. 

<a class="anchor" id="model"></a>


## Model

A processing block that takes inputs, such as images or videos, and returns predicted concepts.

<a class="anchor" id="parameter"></a>


## Parameter

Any characteristic that can be used to help define or classify a system. In AI, they are used to clarify exactly what an algorithm should be seeking to identify as important data when performing its target function.

<a class="anchor" id="hyperparameter"></a>


### Hyperparameter

Occasionally used interchangeably with parameter, although the terms have some subtle differences. Hyperparameters are values that affect the way your model learns. They are usually set manually outside the model.

<a class="anchor" id="feature_extraction"></a>


## Feature Extraction

The process by which data that is too large to be processed is transformed into a reduced representation set of features such as a car features: speed, number of wheels, price, etc.

<a class="anchor" id="supervised_learning"></a>


## Supervised Learning

A type of machine learning in which human input and supervision are an integral part of the machine learning process on an ongoing basis. In supervised learning, there is a clear outcome to the machine’s data mining and its target function is to achieve this outcome, nothing more.

<a class="anchor" id="label"></a>


## Label

A part of training data that identifies the desired output for that particular piece of data.

<a class="anchor" id="unsupervised_learning"></a>


## Unsupervised Learning

A class of machine learning algorithms that learns patterns in data without knowing outcomes. Here, the machine is presented with totally unlabelled data, then asked to find the intrinsic patterns in or draw its own conclusions from the data.

<a class="anchor" id="regression"></a>


## Regression

A statistical measure used to determine the strength of the relationships between dependent and independent variables.

<a class="anchor" id="classification"></a>


## Classification

Classification is the task of predicting a discrete class label. Regression is the task of predicting a continuous quantity.

<a class="anchor" id="clustering"></a>


## Clustering

In Machine Learning, the unsupervised task of grouping a set of objects so that objects within the same group (called a cluster) are more “similar” to each other than they are to those in other groups.

<a class="anchor" id="reinforcement_learning"></a>


## Reinforcement Learning

A type of machine learning in which machines are “taught” to achieve their target function through a process of experimentation and reward receiving positive reinforcement when its processes produce the desired result and negative reinforcement when they do not. This is differentiated from supervised learning, which would require an annotation for every individual action the algorithm would take.

<a class="anchor" id="deep_learning"></a>


## Deep learning

A subset of machine learning that uses specialized algorithms to model and understand complex structures and relationships among data and datasets.

<a class="anchor" id="deep_neural_network"></a>


### Deep Neural Network

An artificial neural network (ANN) with multiple layers between the input and output layers. It uses sophisticated mathematical modelling to process data in complex ways.

<a class="anchor" id="weights"></a>


### Weights

The connection strength between units, or nodes, in a neural network. These weights can be adjusted in a process called learning.

<a class="anchor" id="data_set_split"></a>


## Data Set Split

<a class="anchor" id="test_data_set"></a>


### Test Data Set

In machine learning, the test data set is the data given to the machine after the training and validation phases have been completed. This data set is used to check the performance characteristics of the algorithms produced after the completion of the first two phases when presented with unknown data. This will give a good indication of the accuracy, sensitivity, and specificity of the algorithm’s predictive powers.

<a class="anchor" id="training_data_set"></a>


### Training Data Set
In machine learning, the training data set is the data given to the machine during the initial “learning” or “training” phase. From this data set the machine is meant to gain some insight into options for the efficient completion of its assigned task through identifying relationships between the data.

<a class="anchor" id="validation_data_set"></a>


### Validation Data Set

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper parameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

<a class="anchor" id="gpu_(graphics_processing_unit)"></a>


## GPU (Graphics Processing Unit)

A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, game consoles.

<a class="anchor" id="natural_language_processing_(nlp)"></a>


## Natural language processing (NLP)

The umbrella term for any machine’s ability to perform conversational tasks, such as recognizing what is said to it, understanding the intended meaning and responding intelligibly.

<a class="anchor" id="natural_language_understanding_(nlu)"></a>


### Natural language understanding (NLU)

As a subset of natural language processing, natural language understanding deals with helping machines to recognize the intended meaning of language — taking into account its subtle nuances and any grammatical errors.

<a class="anchor" id="natural_language_generation_(nlg)"></a>


### Natural language generation (NLG)

This refers to the process by which a machine turns structured data into text or speech that humans can understand. Essentially, NLG is concerned with what a machine writes or says as the end part of the communication process.

<a class="anchor" id="chatbot"></a>


## Chatbot

A chatbot is program that is designed to communicate with people through text or voice commands in a way that mimics human-to-human conversation.

<a class="anchor" id="image_recognition"></a>


## Image Recognition

The ability of software to identify objects, places, people, writing, and actions in images.

<a class="anchor" id="image_segmentation"></a>


## Image Segmentation

The process of dividing a digital image into multiple segments/fragments, with the goal of simplifying or changing the representation of an image into something that is easier to analyze. Segmentation divides whole images into pixel groupings, which can then be labelled and classified. Put simply, segmentation is to put a bounding box around the desired object in an image and do a pixel-by-pixel outline of that object, removing the background.

<a class="anchor" id="imagenet"></a>


## ImageNet

A large visual database designed for use in visual object recognition software research. Over 14 million URLs of images have been hand-annotated by ImageNet to indicate what objects are pictured; in at least one million of the images, bounding boxes are also provided.

<a class="anchor" id="convolutional_neural_network"></a>


## Convolutional Neural Network

Convolutional neural networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes.

<a class="anchor" id="transfer_learning"></a>


## Transfer Learning

This method of learning involves spending time teaching a machine to do a related task, then allowing it to return to its original work with improved accuracy. One potential example of this is taking a model that analyzes sentiment in product reviews and asking it to analyze tweets for a week.

<a class="anchor" id="overfitting"></a>


## Overfitting

A machine learning problem where an algorithm is unable to discern information that is relevant to its assigned task from information which is irrelevant within training data. Overfitting inhibits the algorithm’s predictive performance when dealing with new data.

<a class="anchor" id="underfitting"></a>


## Underfitting

The fact that a Machine Learning algorithm fails to capture the underlying structure of the data properly, typically because the model is either not sophisticated enough, or not appropriate for the task at hand; opposite of Overfitting.

<a class="anchor" id="bias-variance_tradeoff"></a>


## Bias-Variance Tradeoff

A conflict arising when data scientists try to simultaneously minimize bias and variance, that prevents supervised algorithms from generalizing beyond their training set.

<a class="anchor" id="bias"></a>


### Bias

Assumptions made by a model that simplify the process of learning to do its assigned task. Most supervised machine learning models perform better with low bias, as these assumptions can negatively affect results.

<a class="anchor" id="variance"></a>


### Variance

The amount that the intended function of a machine learning model changes while it’s being trained. Despite being flexible, models with high variance are prone to overfitting and low predictive accuracy because they are reliant on their training data.

<a class="anchor" id="regularization"></a>


### Regularization

The process of introducing additional information in order to prevent overfitting.

<a class="anchor" id="evaluation"></a>


## Evaluation

<a class="anchor" id="false_positives_(type_i_error)"></a>


### False Positives (Type I Error)

An error where a model falsely predicts the presence of the desired outcome in an input, when in reality it is not present (Actual No, Predicted Yes).

<a class="anchor" id="false_negatives_(type_ii_error)"></a>


### False Negatives (Type II Error)

An error where a model falsely predicts an input as not having a desired outcome, when one is actually present. (Actual Yes, Predicted No).

<a class="anchor" id="true_negatives"></a>


### True Negatives

Actual negatives that are correctly identified as such (Actual No, Predicted No).

<a class="anchor" id="true_positives"></a>


### True Positives

Actual positives that are correctly identified as such (Actual Yes, Predicted Yes).

<a class="anchor" id="accuracy"></a>


### Accuracy

Refers to the percentage of correct predictions the classifier made.

<a class="anchor" id="a/b_testing"></a>


### A/B Testing

A controlled, real-life experiment designed to compare two variants of a system or a model, A and B.

<a class="anchor" id="ml-ops"></a>


## ML-Ops

ML-ops or machine learning operations is the process of taking an experimental machine learning model into a production web system. Machine learning models are tested and developed in isolated experimental systems. When an algorithm is ready to be launched, ML ops is practiced between data scientists, DevOps, and machine learning engineers to transition the algorithm to production systems.1