# Automated Feature Engineering

* Often a predictive model's performance is limited by its features — you can tune the model for the best parameters and still not have the best performing model. 
* Identifying and engineering features that clearly demonstrate the predictive signal is paramount to model performance.
* The single biggest technical hurdle that machine learning algorithms must overcome is their need for processed data in order to work — they can only make predictions from numeric data. 
* The process for extracting these numeric features is called “feature engineering.

## Deep Feature Synthesis (DFS)
* Developed at MIT in 2014
* generates many of the same features that a human data scientist would create.

* There are three key concepts in understanding Deep Feature Synthesis:

* **1) Features are derived from relationships between the data points in a dataset.**
    * DFS performs feature engineering for multi-table and transactional datasets commonly found in databases or log files.

* **2) Across datasets, many features are derived by using similar mathematical operations.**
    * Dataset-agnostic operations are called “primitives.”
    
* **3) New features are often composed from utilizing previously derived features.**
    * Primitives are the building blocks of DFS. 
    * Because primitives define their input and output types, we can stack them to construct complex features that mimic the ones that humans create today.
    * DFS can apply primitives across relationships between entities, so features can be created from datasets with many tables. 
    * We can control the complexity of the features we create by setting a maximum depth for our search.

* A second advantage of primitives: they can be used to quickly enumerate many interesting features in a parameterized fashion
    
*  Since primitives are defined independently of a specific dataset, any new primitive added to Featuretools can be incorporated into any other dataset that contains the same variable data types. In some cases, this might be a dataset in the same domain, but it could also be for a completely different use case.

* It’s easy to accidentally leak information about what you’re trying to predict into a model.
* DFS can be used to develop baseline models with little human intervention.
* the automation of feature engineering should be thought of as a complement to critical human expertise — it enables data scientists to be more precise and productive.

* Deep Feature Synthesis vs. Deep Learning
* Deep Learning automates feature engineering for images, text, and audio where a large training set is typically required, whereas DFS targets the structured transactional and relational datasets that companies work with.
* The features that DFS generates are more explainable to humans because they are based on combinations of primitives that are easily described in natural language. 
* The transformations in deep learning must be possible through matrix multiplication, while the primitives in DFS can be mapped to any function that a domain expert can describe.
* This increases the accessibility of the technology and offers more opportunities for those who are not experienced machine learning professionals to contribute their own expertise.
* Additionally, while deep learning often requires many training examples to train the complex architectures it needs to work, DFS can start creating potential features based only on the schema of a dataset.
* For many enterprise use cases, enough training examples for deep learning are not available.
* DFS offers a way to begin creating interpretable features for smaller datasets that humans can manually validate.
* Automating feature engineering offers the potential to accelerate the process of applying machine learning to the valuable datasets collected by data science teams today. 
* It will help data scientists to quickly address new problems as they arise and, more importantly, make it easier for those new to data science to develop the skills necessary to apply their own domain expertise.