# Feature Extraction

## Definition
Feature extraction involves transforming the raw data into informative and meaningful features while maintaining the information from original data. This features can then be used to input models for data analysis and prediction. Feature extraction is an important step in the time series analysis when dealing with large raw data. It improves the results in data analysis.

## Description
Feature extraction is a powerful tool used to understand and predict the trends in the industries such as financial, economics and etc. In real financial markets, the time series data are more complicated with fluctuating trends and volatility. Therefore, feature extraction can help to transform the data to a more meaningful model to ease for the data analysis. There are several feature extraction techniques for time series analysis:
1. Statistical data analysis such as mean, standard deviation, skewness, kurtosis etc.
2. Technical Indicators such as Exponential Moving Average (EMA), Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI) and etc. These indicators help to analyze on the price trend, price movement, momentum, and oversold or overbought conditions.

## Demonstration and Diagram

Below shows the demonstration of feature extraction technique of a time series dataset from NVIDIA historical stock prices for both statistical data analysis and Technical indicators.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Diagnosis


We may need to do **feature extraction** on our dataset if one or more of the following conditions exist:
* If the current features are difficult to understand or interpret, extracting new features can sometimes provide more intuitive or meaningful representations.
* If a model is overfitting the training data, reducing the feature space can help the model generalize better to unseen data.
* If the dataset includes irrelevant or redundant features, it can degrade the performance of your model. Feature extraction helps in identifying and retaining only the relevant features.
* If your dataset has a large number of features (high-dimensional space), it can lead to issues like overfitting and increased computational complexity. Feature extraction can reduce dimensionality, making the dataset more manageable.
* When features have complex, non-linear relationships, traditional models might struggle to capture these patterns. Feature extraction techniques can transform the data into a form where these relationships are more apparent.

## Damage

If feature extraction is needed but not performed, several issues can arise, impacting the effectiveness and efficiency of data analysis or predictive modeling. Firstly, high dimensionality without proper feature extraction can lead to the curse of dimensionality, where models become overly complex and prone to overfitting, resulting in poor generalization to new data. Additionally, the presence of irrelevant or redundant features can degrade model performance, as the model might learn noise instead of the underlying patterns in the data. This also increases computational costs and complexity, making the model less efficient and more challenging to interpret. 

## Directions

If you're considering performing **feature extraction** on a dataset, you have various model options at your disposal, with **Principal Components Analysis (PCA)** and **Factor Analysis** being among the most popular. In the **Multicollinearity** section of this workbook, we have thoroughly explored how to implement **PCA**. In this **Directions** section, we'll outline the preparatory steps you need to take before applying PCA to your dataset.

**1) Define the Objective:** Clearly define what you want to achieve with the **PCA**. In our case, we want to extract meaning **features** for data analysis.

**2) Data Collection:** Acquire the relevant data. This could involve collecting new data or using an existing dataset that fits your objective.

**3) Data Cleaning:** Address missing values, outliers, and remove duplicates. Ensure your data is accurate and consistent. Convert all categorical data to a numerical format if necessary, as PCA requires numerical input.

**4) Data Preprocessing:** Standardize or normalize the data. PCA is affected by the scale of the features, so it’s crucial to scale features to have a mean of 0 and a standard deviation of 1.

**5) Exploratory Data Analysis:**  Conduct an exploratory analysis to understand the distributions, relationships, and structure of your data. Use statistical summaries and visualizations to get insights.

**6) Assess Suitability for PCA:** Check if PCA is suitable for your data. PCA works well when the features have **linear relationships** and the dataset has **high dimensionality**.

**7) Choose the PCA Model:** Depending on your data and software, choose the appropriate PCA model. This could include standard PCA, incremental PCA, randomized PCA, or kernel PCA for non-linear relationships.

**8) Implement PCA:** Once the data is ready and you have a clear understanding of its structure and your goals, apply PCA to transform your features into principal components. This step is beyond the scope of these preparatory steps and is covered in the **Multicollinearity** section.

## References
1. Cole Hagen, "Time Series Feature Extraction with Python and Pandas: Techniques and Examples", 28 Mar, https://medium.com/geekculture/time-series-feature-extraction-with-python-and-pandas-techniques-and-examples-2e2158de5356
2. Marty MK, "Relative Strength Index (RSI) in Python", https://www.qmr.ai/relative-strength-index-rsi-in-python/
3. Financial Python, "How to code an EMA crossover in Python", 25 Mar, https://medium.com/@financial_python/how-to-code-an-ema-crossover-in-python-96ecd22ae252
4. Financial Python, "Building a MACD Indicator in Python", 23 Sep, https://medium.com/@financial_python/building-a-macd-indicator-in-python-190b2a4c1777