#  Data Science Interview questions
## Part 1 : Data cleaning, pre-processing, EDA
This Jupyter notebook serves as a comprehensive resource for data science interview questions, focusing specifically on topics related to data cleaning, preprocessing, and analysis. Here, you'll find a curated collection of questions that span various aspects of data preparation, transformation, and exploratory data analysis. Whether you're preparing for an interview or seeking to deepen your understanding of essential data science concepts, this notebook aims to provide a structured and informative guide to help you navigate through key challenges in the field. Explore the questions, test your knowledge, and enhance your proficiency in the fundamental stages of data science workflows.

### 0- What are the main tasks of data cleaning in Data Science?
Here are the main tasks to perform in the cleaning phase :
- Finding and handling missing data
- Finding and handling duplicates
- Finding and handling outliers

Note : encoding categorical data can be done in feature engineering phase. 

### 1- How to deal with missing values ?

Handling missing values is a crucial step in data preprocessing to ensure accurate and unbiased analysis. Here are two main methods to deal with missing values: 
- Remove missing values 
- Impute missing values 
- Forward or Backward Fill


**Note:** to make better predictions we can add an extension to imputation.

Choosing the right method is based on:
- The characteristics of the data
- The percentage of missing values
- The goals of the performed analysis.

No single method is suitable for all situations, so it's essential to understand the context and implications of each approach.

#### 1. 1- How to detect or identify missing values? 
- Identifying missing values is the first step to perform when dealing with them. 
- Using Pandas functions like `isnull()` or `info()`.

#### 1. 2- How to remove missing values? 

Here is how to remove missing values :
- Remove Rows with nan/null values using `df = df.dropna()`
- Remove Columns with nan/null values using `df = df.dropna(axis=1)`

Dropping rows or columns is not too advantageous because most values are going to be lost and they contain important information

#### 1. 3- How to impute missing values? 

We have four main methods:

    - Impute with statistical measures
    - Impute with a Placeholder 
    - Impute with Machine Learning Algorithms
    - Impute using Interpolation
    - Multiple Imputation
Imputed value won't be exactly right in most cases but it usually leads to more accurate models than you would get from dropping the column entirely  
    
##### a. What does impute with a statistical measures mean ? 
- Fill missing values with statistical measures (mean, median, mode) or using more advanced imputation methods.
- Example: `df['column'] = df['column'].fillna(df['column'].mean())`
    
##### b. What does impute with a Placeholder mean ? 
- Replace with a specific value that does not occur naturally in the dataset. 
- Example: `df = df.fillna(-1)`

##### c. What does impute with a Machine Learning Algorithm mean ?    
- **Solution 1:**  use `KNNImputer()` class from the scikit-learn Python library.
- **Solution 2:**
    - Train a machine learning model to predict missing values based on other features in the dataset.
    - Example : Random Forest 
##### d. What does Impute using Interpolation mean ?
- Interpolation is a technique used to estimate missing values based on the observed values in a dataset.
- It works by filling in the gaps between known data points, assuming some underlying pattern or relationship.
- Here are some interpolation techniques:
    - Linear Interpolation 
    - Polynomial Interpolation
    - Quadratic 
    - Etc.

Note : the choice of the right interpolation method depends on:
- The nature of the data.
- The assumptions about its behavior
##### e. What does multiple imputation mean ? 
- It is a statistical technique used to handle missing data via creating multiple imputed datasets. 
- Multiple datasets are created by imputing missing values using a chosen imputation method. 
- Examples : mean imputation, regression imputation, k-Nearest Neighbors imputation, or more sophisticated methods.
- Each dataset represents a set of values for the missing entries.
- Instead of imputing a single value for each missing observation, multiple imputation illustrates the uncertainty associated with missing data by generating several imputed datasets. 
- The results from the analyses conducted on the imputed datasets are combined, or "pooled," to obtain an overall estimate of the parameter of interest.
- The combined results provide not only a point estimate but also an estimate of the uncertainty associated with the missing data. This incorporates both the imputation variability and the variability due to analyzing different imputed datasets.
- `fancyimpute()` Python library can be employed to implement multiple imputation efficiently.

#### 1. 4- Why do we need an extension to imputation? 

- Sometimes, missing values themselves can be indicative. Create a new binary column indicating whether a value is missing. 
- For each column with missing entries in the original dataset, we add a new column that shows the location of imputed entries. 
- Models would make better predictions by considering which values were originally missing.   
- Example:  `df['column_missing'] = df['column'].isnull().astype(int)` 

#### 1. 5- Why it is better to use the median value for imputation in the case of outliers?
- Using the median for imputation in case of outliers is often considered a better solution compared to the mean.
- The median is a measure of central tendency that has: 
    - **Robustness to Outliers:** it is less influenced by extreme values because it is not affected by the actual values of data points but rather their order. Outliers have a minimal impact on the median.
    - **Resilient to Skewness:** in a skewed distribution, where the tail is longer in one direction, the mean can be heavily influenced by the skewness. The median, being the middle value, is less affected by the skewness and provides a more representative measure in such situations.
    - **Ability to avoid Biased Estimates:** in the presence of outliers, using the mean for imputation might lead to biased estimates, especially when the distribution is not symmetric. The median provides a more balanced estimate in skewed or asymmetric distributions.
    - **Ability to maintain Robustness in Non-Normal Distributions:** in case our data does not have a normal distribution, the median is often a more reliable measure of central tendencyas it helps in producing more accurate imputations.
    
#### 1. 6-  How to perform Forward or Backward Fill   ? 
Propagate the last valid observation forward or use the next valid observation to fill missing values: 

- Forward fill using : `df = df.ffill()`  or `df.fillna(method='ffill')`
- Backward fill using : `df = df.bfill()` or `df.fillna(method='bfill')`

### 2- How to handle duplicates ? 
Handling duplicates in data science is an essential step to ensure data quality and avoid biases or inaccuracies in analysis. Here are common methods to handle duplicates:
- 1- Identifying Duplicates using `duplicated()` using Pandas
- 2- Removing Duplicates - all : `df = df.drop_duplicates()`
- 3- Removing Duplicates - Keep first Occurrences : `df = df.drop_duplicates(keep='first')`
- 4- Removing Duplicates - Keep last Occurrences : `df = df.drop_duplicates(keep='last')`
- 5- Handling Duplicates Based on Columns

### 3- How to find outliers?
To find outliers, only numerical columns are considered in our analysis. Here are the common methods to do that :
- Visualization technique :  Box Plot, Scatter Plot and Histogram Plot (the most used ones).
- Mathematical approach :
    - Z-score
    - Interquartile range : IQR score 
- Machine Learning Models :
    - Clustering Algorithms
    - Isolation Forest
- Domain-Specific Knowledge

#### 3. 1-  How to handle outliers in dataset ? 
Here are some methods about how we handle outliers :

- **Deleting the values:** removing the value completely, if we are sure that this value is wrong and it will never occur again, we remove it using either Interquartile range or Z-score.
- **Replace the values:** change the values if we know the reason for the outliers. (Example: using 99th percentile)
- **Data transformation:** some times data transformation such as natural log reduces the variation caused by the extreme values. Most used for highly skewed data sets.

#### a. What does Z-Score mean?
- It calculates the Z-score for each data point.
- Z-score measures how many standard deviations a data point is from the mean.
- Typically, a threshold of 2 to 3 standard deviations is used to identify outliers.
- Formula: $Z ={ X - \mu \over\sigma}$

#### b. What does IQR : interquartile range mean? 
- The IQR is the difference between the third quartile (Q3) and the first quartile (Q1): IQR = Q3 - Q1
- Q1: It represents the median of the lower 50% of the data. Represents 0.25 percentile
- Q3 : It represents the median of the upper 50% of the data. Represents 0.75 percentile

To calculate percentiles or quantiles, we need to sort the data in ascending order and finding the value below which a certain percentage of the data falls.

![title](images/boxplot.png) 
#### c. How ML Algorithms used for outliers detection ?
We have two main methods: 
- **Clustering Algorithms:** for example k-means can be used to detect outliers where points that do not belong to any cluster or are in small clusters can be identified as outliers.
- **Isolation Forest:** designed specifically for outlier detection. It isolates outliers by recursively partitioning the data.

Notes: 
- Box plot is considered as Uni-variate analysis
- Scatter plot is considered as Bi-variate analysis

#### 3. 2-  What does percentile and quantile mean ?
 Percentile and quantile are statistical concepts used to describe the relative standing or distribution of a particular value within a dataset. Both concepts help understand the position of a data point in relation to other values.
#### a- Percentile
- A percentile is a measure that indicates the relative standing of a particular value within a dataset.
- The pth percentile is the value below which p percent of the data falls.
- The dataset is divided into 100 equal parts
#### b- Quantile 
- A quantile is a generic term for dividing the data into intervals or groups of equal probability.
- Percentiles are a specific type of quantile, where the division is based on percentages.
- The term "quantile" is often used more broadly to refer to any division of the data, not necessarily in increments of 1% (as in percentiles).
- For example : **Quartile**: the dataset is divided into four equal parts (Q1, Q2, Q3):
    - Q1: the 25th percentile is the value below which 25% of the data falls.
    - Q2: the 50th percentile is the value below which 50% of the data falls
    - Q3: the 75th percentile is the value below which 75% of the data falls
#### c- Calculation : 
- The calculation of percentiles and quantiles involves sorting the data in ascending order and finding the value below which a certain percentage of the data falls.
- **Example:** if a student scores in the 90th percentile on a standardized test, it means they performed better than 90% of the students who took the test.

### 4- What does Exploratory Data Analysis (EDA) mean? 
It is a critical step in the data analysis process and is often the second step after cleaning the provided dataset. The primary goal of EDA is to summarize the main characteristics of a dataset, gain insights into the underlying structure, identify patterns, detect anomalies, and formulate hypotheses for further analysis.

Key aspects of Exploratory Data Analysis include:
- Summary Statistics using `.describe()` pandas library.
- Data Visualization
- Distribution Analysis
- Correlation Analysis

Effective EDA aims to perform more targeted and informed analyses, leading to better decision-making and valuable insights from the data.

#### 4. 1- What does Distribution Analysis mean?
- This analysis aims to examine the distribution of values within a dataset.
- Understanding the distribution of data is essential for gaining insights into its underlying characteristics, identifying patterns, and making informed decisions about subsequent analyses or modeling.
- Here are some examples of distribution analysis: 
    - Frequency Distribution:  It provides a summary of how often each value appears. We can use `.value_counts()` Pandas library.
    - Univariate and Bivariate Analysis : distplot, histplot and X versus Y etc.
    - Probability Distribution
    - Spread or Dispersion analysis
    - Skewness and Kurtosis analysis
    
- Understanding the data distribution is very important in many tasks, including identifying outliers, assessing the appropriateness of statistical models, and making decisions about data transformations.
- Different types of distributions may require different approaches in data analysis and modeling, and distribution analysis helps inform these decisions.

#### a. What does normal distribution mean ?
The normal distribution is very useful in machine learning becasue it has deterministic statistical characteristics  and it helps detect linear relationship between variables. It consists that mode=mean=median: 

- Mean: called also average of a data set and it is found by summing all numbers in the data set and then dividing by the number of values in the set.
- Mode : it is the value that appears most often in a set of data values.
- Median : the middle number; found by ordering all data points and picking out the one in the middle (or if there are two middle numbers, taking the mean of those two numbers).

#### b. What does Skewness and Kurtosis mean ?
**Skewness:**
- It is a measure of the asymmetry of a distribution.
- A distribution is asymmetrical when its left and right side are not mirror images.
- A skewed data can not be used to generate normal distribution. 
- It provides insights into the shape of a distribution.
- The three types of skewness are:
    - **Skewness > 0 :** right (or positive) skewness. This indicates that the tail on the right side is longer or fatter than the left side, and the majority of the data points are concentrated on the left side.
    - **Skewness < 0 :** left (or negative) skewness. It means the tail on the left side is longer or fatter than the right side, and the majority of the data points are concentrated on the right side.
    - **Skewness=0, Zero skewness :** the distribution is perfectly symmetrical.
    
<img src="images/Skewness.png" width="400">
    
**Kurtosis:**
- A statistical measure that describes the shape or "tailedness" of a distribution. 
- It provides information about the concentration of data points in the tails relative to the center of the distribution. 
- The three types of Kurtosis are:
    - **Kurtosis=0 (Mesokurtic) :** the distribution has the same tail behavior as a normal distribution.
    - **Kurtosis>0 (Leptokurtic):** the distribution has fatter tails (heavier tails) and a sharper peak than a normal distribution. This indicates a higher probability of extreme values.
    - **Kurtosis<0 (Platykurtic):** the distribution has thinner tails (lighter tails) and a flatter peak than a normal distribution. This suggests a lower probability of extreme values.
   
kurtosis measures whether the data is heavy-tailed (more extreme values than a normal distribution) or light-tailed (fewer extreme values than a normal distribution).


<img src="images/Kurtosis.png" width="400">


#### c. What does Spread or Dispersion mean ?
- Data spread: 
    - It provides information about the range of values in a dataset.
    - It provides information about how dispersed or scattered the individual data points are around a measure of central tendency, such as the mean or median.
    - Spread measures help to understand the variability or dispersion of the data.
    - **Examples: IQR, range, variance, standard deviation** 
    - It is crucial to understand the spread of data for better outliers detection, risk assessment, decision-Making etc.
- Dispersion:
    - It explains how individual data points in a dataset deviate or spread out from a central measure of tendency, such as the mean or median. 
    - Dispersion measures provide insights into the variability or spread of the data and are crucial for understanding the overall distribution.
    - **Examples: IQR, range, variance, standard deviation, Mean Absolute Deviation (MAD), Coefficient of Variation (CV)**

#### d. How to get Summary Statistics ? 
- In the statistical description we try to select the next values for each numerical features:
    - Maximum values
    - Minimum
    - Average
    - Standard deviation
    - Median
    - Mean
- Code: `df.describe().transpose()`

#### 4. 1- What does Correlation Analysis mean?
- Correlation analysis is a statistical method used to evaluate the strength and direction of the linear relationship between two quantitative variables.
- The result of a correlation analysis is a correlation coefficient, which quantifies the degree to which changes in one variable correspond to changes in another.
- Correlation analysis is widely used in various fields, including economics, biology, psychology, and data science, to understand relationships between variables and make predictions based on observed patterns.
##### a. What are the plot used to illustrate correlation?
- Correlation matrix and heatmap 
- Scatter Plot : it provides a visual representation of the relationship between two variables. X versus Y
##### b. How to interpret Correlation Coefficient ? 
- It is a numerical measure that ranges from -1 to 1.
- A positive correlation >0 : indicates that if one variable increases, the other variable tends to increase as well.
- A negative correlation <0 : indicates that if one variable increases, the other variable tends to decrease.
- A correlation coefficient equal to 0: indicates no linear relationship between the variables.
- **Strength of Correlation:**
    - The closer the correlation coefficient is to -1 or 1, the stronger the correlation. A coefficient of -1 or 1 implies a perfect linear relationship.
    - A coefficient closer to 0 indicates a weaker linear relationship.
##### c. What does correlation matrix means? 
- It is a table that displays the correlation coefficients between many variables. 
- Each cell Corresponds to the correlation coefficient between two variables. 
- This matrix helps detect the presence of any positive or negative correlation between variables.
- The correlation is calculated using the pearson correlation coefficient so values varies from -1 to 1

##### d. Pearson Correlation vs. Spearman Correlation? 
- If variables have a linear relationship and follow a normal distribution ==> then use Pearson correlation. 
- Spearman correlation is a non-parametric measure that assesses the strength and direction of monotonic relationships (whether the variables tend to increase or decrease together, but not necessarily at a constant rate)
##### e. Cautions and Considerations when analysing correlation?
- Correlation does not imply causation : even if two variables are correlated, it does not necessarily mean that one causes the other.
- Outliers can have a significant impact on correlation results, so it's important to check for their presence.
- Correlation analysis is sensitive to the scale of measurement, so data standardization(eg. z-scores) can be improve results.
- Other data transformation such as log tranformation can improve the correlation results.

#### 4.2- What else we can perform in EDA ? 
Here are more analysis to perform during EDA phase:
- Data frame dimension `df.shape`
- Data frame columns: `df.columns`
- Count values: `df['SaleCondition'].value_counts().to_frame()`
- Data sampling: sometimes, it is required to perform over/undersampling in case we have Imbalanced datasets
- Data Grouping using groupby : df_group=df[['YearRemodAdd','SalePrice']].groupby(by=['YearRemodAdd']).max()
- Data filtering :
    - `df_filter =df[df.column>200000]` 
    - `df_filter =df[(df.column1>150000) & (df.column2==2008)]`
    - `df_filter =df[(df.column1>2011) | (df.column2==2008)]`
- Data analysis: 
    - Univariate Analysis : `distplot` and `histplot`
    - Bivariate Analysis `pairplot`, `FacetGrid`, `jointplot` etc.
    - Multivariate Analysis: correlation matrix or heatmap

Notes:
- Multivariate analysis involves analyzing the relationship between three or more variables. We can use scatter matrix plots to visualize the relationship between each pair of features, along with the distribution of each feature.
- Bivariate analysis involves analyzing the relationship between two variables. We can use scatter plots to visualize the relationship between each pair of feature.

## Part 2: Feature Engineering

### 1- What does feature engineering mean? 

Feature engineering refers to the process of raw data manipulation such as addition, deletion, combination, mutation etc. It encompasses the process of creating new features or modifying existing ones to improve the performance of a machine learning model. 

Here is a range of significant activities used in Feature Engineering :

- Feature Selection
- Data Transformation
- Text Data Processing
- Time-Series Feature Engineering

### 2- What does data transformation mean?

Data transformation is indeed one subtask within the broader field of feature engineering in machine learning. It is a specific aspect of feature engineering that involves modifying the raw data to make it more suitable for the learning algorithm.
It includes : 
- Feature Scaling
- Feature encoding
- Feature extraction
- Binning or Discretization
- Creating Interaction Terms

### 3- What does feature scaling mean ?
Feature scaling is a preprocessing step in machine learning that involves transforming the numerical features of a dataset to a common scale. Feature scaling is particularly important for algorithms that rely on distance metrics or gradient descent optimization.

Here are common techniques for feature scaling:
- Normalization
- Standard scaling : converts features to standard normal variables (by subtracting the mean and dividing the standard error)
- Log scaling or Log transformation
- Polynomial transformation
- Robust scaling

#### 3. 1- Why do we need perform feature scaling ? 
The goal is to ensure that all features contribute equally to the learning process and to prevent certain features from dominating due to differences in their magnitudes.

#### 3. 2- Normalization - Min-Max Scaling
- Scales the feature values to a specific range, usually between 0 and 1
- Formula : $X_{normalized}= {X-X_{min}\over X_{max}-X_{min}}$

#### 3. 3- Standard scaling - Z-score normalization
- Centers the feature values around zero with a standard deviation of 1.
- Suitable for algorithms that assume a normal distribution of features.
- Formula: $X_{standardized} ={ X - mean(X) \over std(X)}$

#### 3. 4- Robust Scaling
- Scales the features based on the interquartile range (IQR) to handle outliers.
- Formula: $X_{robust} = {X - median(X)\over IQR(X)}$

#### a. IQR : interquartile range
- The IQR is the difference between the third quartile (Q3) and the first quartile (Q1): IQR = Q3 - Q1
- Q1: It represents the median of the lower 50% of the data.
- Q3 : It represents the median of the upper 50% of the data

![title](images/boxplot.png)


Here's how you calculate the IQR: 
- 1. Order the dataset: arrange the values in the dataset in ascending order
- 2. Determine the median (Q2): which is the middle value of the dataset. If the dataset has an odd number of observations, the median is the middle value. If it has an even number, the median is the average of the two middle values.
- 3. Find the First Quartile (Q1)
- 4. Find the Third Quartile (Q3)
- 5. Calculate the IQR

The IQR provides a robust measure of the spread of the middle 50% of the data, making it less sensitive to extreme values or outliers. It is commonly used in box plots to visually represent the dispersion of data.

#### 3. 5- Log Transformation

- The log transformation is the most popular among the different types of transformations used in machine learning.
- It aims to make highly skewed distributions (features with high variance) less skewed.
- The logarithm used is often the natural logarithm (base e) or the common logarithm (base 10).
- Generally, we use the natural logarithm function in Log transformation.
- If the original data follows a log-normal distribution or approximately so, then the log-transformed data follows a normal or near normal distribution.
- However, our real raw data do not always follow a normal distribution. They are often so skewed making the results of our statistical analyses invalid. That’s where Log Transformation comes in.

#### 3. 6- Polynomial transformation
- It is a feature engineering technique used in machine learning and statistics to capture non-linear relationships between variables.
- It involves transforming input features by raising them to the power of an integer, creating polynomial terms. The most common form is the quadratic transformation (squared terms), but higher-order polynomials can also be used.
- Such transformations are often beneficial for machine learning algorithms, particularly in tasks involving numerical input variables, improving predictive accuracy, especially in regression tasks.
- If X is one input feature ==> $X^2$ is its polynomial feature.
- The “degree” of the polynomial is used to control the number of features added, e.g. a degree of 3 will add two new variables for each input variable. Typically a small degree is used such as 2 or 3. Choosing the best polynomial degree is so important as it impacts the number of input features created. 

**More notes :** 

- Higher-degree polynomials (Degree > 2) can lead to overfitting, capturing noise in the data rather than true underlying patterns. Regularization techniques may be needed to mitigate this.
- It's important to scale features before applying polynomial transformations to prevent features with larger scales from dominating the transformed values.

### 4- How to deal with categorical values ?
- Drop categorical variables
- Perform feature encoding

#### 4.1- What does feature encoding means? 

Feature encoding is the process of converting categorical data or text data into a numerical format that can be easily used for machine learning algorithms. In many machine learning models, the input features are expected to be numerical, and encoding is necessary when dealing with non-numeric data.

Here are some common encoding methods: 
- Ordinal encoding: Assign numerical values based on the inherent order of categories
- One-hot encoding : Create binary columns for each category, indicating its presence (1) or absence (0)
- Label Encoding : Assign a unique numerical label to each category in a categorical variable
- Binary Encoding : Convert each category into its binary representation.
- Frequency (Count) Encoding: Replace each category with its frequency or count in the dataset


**!! Notes :**
- Ordianl encoding is a good choice in case we have ranking in our categorical variables (Low, medium, high), most used with DT and Random Forest.
- One-hot encoding is more used when there is no ranking in the categorical variables.
- If our dataset is very large (high cardinality) --> one-hot encoding can greatly expand the size of dataset : number columns.

### 5- What does Feature extraction means?
One of the primary goals of feature extraction is to reduce the dimensionality of the dataset. High-dimensional data can lead to the curse of dimensionality, making it challenging for models to generalize well.

Feature extraction aims to retain the most relevant information from the original data. This involves identifying features that contribute significantly to the variability and patterns within the dataset while discarding redundant or irrelevant information.

Here are all types of Feature Extraction:

- Principal Component Analysis (PCA)
- Singular Value Decomposition (SVD)
- Independent Component Analysis (ICA)
- Bag-of-Words (BoW)

#### 5. 1- What does Principal Component Analysis (PCA) means ? 
- It is an unsupervised dimensionality reduction technique that aims to transforms input data into a new set of uncorrelated features while keeping the maximum variance in the data.
- It can be applied to both supervised and unsupervised machine learning tasks
- To calculate it, we can use various python libraries such as `NumPy`, `SciPy`, and `scikit-learn`
- It has two main use cases :
    - Data Visualization: it aids in visualizing complex datasets, providing valuable insights into the underlying patterns.
    - Algorithm Optimization: it can significantly accelerate the learning process of algorithms that may otherwise exhibit slow training speeds.
    
Here are the steps of calculating PCA using the covariance matrix and use eigenvalue decomposition to obtain the eigenvectors and eigenvalues. Here are the steps to apply :
 - 1. Standardise the data
 - 2. Compute the covariance matrix and use eigenvalue decomposition to obtain the eigenvectors and eigenvalues.
 - 3. Select the k largest eigenvalues and their associated eigenvectors.
 - 4. Transform the data into a k dimensional subspace using those k eigenvectors.
    
#### a. How to choose the correct number of PCA Components ?

#### b. Why do we need to find eigenvalues and eigenvectors?
Because the principal component directions are given by the eigenvectors of the matrix, and the magnitudes of the components are given by the eigenvalues.



#### 5. 2- What does Singular Value Decomposition (SVD) means ? 
Singular Value Decomposition (SVD) is a mathematical technique widely used in linear algebra and numerical analysis. 

It is a method for decomposing a matrix into three other matrices, which can be helpful in various applications, including signal processing, data analysis, and machine learning. The SVD of a matrix A is represented as:

$ A = U Σ V^T $ 
 

Here's a breakdown of the terms:

- A: The original matrix that we want to decompose.
- U: The left singular vectors matrix. Columns of U are the eigenvectors of $AA^T$ (covariance matrix of A).
- Σ: The diagonal matrix of singular values. The singular values are the square roots of the eigenvalues of $AA^T$ OR $A^TA$. They represent the magnitude of the singular vectors.

- $V^TV$: The transpose of the right singular vectors matrix. Columns of V are the eigenvectors of $A^TA$ (or $AA^T$) 
 
SVD has several applications and implications:

- **Dimensionality Reduction** 
- **Image Compression**
- **Pseudo-Inverse** 
- **Collaborative Filtering**
- **Latent Semantic Analysis (LSA)** 

#### 5. 3 - What are the different dimensionality reduction techniques?

#### 5. 3- What does Independent Component Analysis means ? 

### 6- How to perform Text Data Processing

### 7- How to perform Time-Series Feature Engineering?