In [None]:
1. Feature Engineering
Definition: Feature engineering is the process of creating new input features from existing data to improve model performance.
Key Aspects:
Transformation: Scaling, normalization, and encoding categorical variables.
Interaction: Creating new features by combining existing ones, such as multiplying or dividing features.
Aggregation: Summarizing information, such as calculating averages, counts, or other statistics.
Domain-Specific: Crafting features based on domain knowledge, like converting dates to “days since a specific event.”

2. Feature Selection
Definition: Feature selection is the process of identifying and selecting the most relevant features for model building.
Aim: It aims to reduce overfitting, enhance model interpretability, and improve computational efficiency.
Methods:
Filter Methods: Ranks features using statistical tests, independent of the model (e.g., chi-square, mutual information).
Wrapper Methods: Uses a model to evaluate feature subsets iteratively (e.g., forward selection, backward elimination).
Embedded Methods: Feature selection occurs within the model training process (e.g., Lasso regularization).

3. Filter and Wrapper Approaches in Feature Selection
Filter Approach:
How it Works: Features are selected based on statistical criteria without relying on a model.
Pros: Computationally fast and scales well with high-dimensional data.
Cons: Does not account for feature interactions.

Wrapper Approach:
How it Works: Evaluates subsets of features using a specific model and iteratively selects the best combination.
Pros: Accounts for feature interactions and may yield better model performance.
Cons: Computationally expensive, especially on large datasets.

4. Feature Selection and Extraction
i. Overall Feature Selection Process:
Data preprocessing.
Ranking or scoring features based on relevance.
Choosing the top-ranked features for model building.
ii. Principle of Feature Extraction: Reduces data dimensions by creating new features, often using techniques like PCA, which combines correlated features to form principal components that explain most variance in the data. - Example: In text data, feature extraction can convert a document into a set of topic scores. - Popular Algorithms: Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and t-SNE.

5. Feature Engineering for Text Categorization
For text categorization, feature engineering involves:
Tokenization: Breaking text into words or phrases.
Normalization: Converting to lowercase, removing punctuation.
Vectorization: Representing text as vectors (e.g., using TF-IDF).
Dimensionality Reduction: Reducing the feature space using methods like PCA or LDA.

6. Cosine Similarity in Text Categorization
Cosine similarity measures the cosine of the angle between two vectors, making it well-suited for text since it considers the direction rather than magnitude.
Formula: 
Cosine Similarity =∑𝑎𝑖𝑏𝑖∑𝑎𝑖2×∑𝑏𝑖2
Cosine Similarity= ∑a i2​ ​ × ∑b i2​ ​ ∑a i​ b ​ ​ 
Calculation: For vectors (2, 3, 2, 0, 2, 3, 3, 0, 1) and (2, 1, 0, 0, 3, 2, 1, 3, 1), 
Cosine Similarity=2⋅2+3⋅1+2⋅0+⋯+1⋅122+32+⋯+12×22+12+⋯+12
Cosine Similarity= 2 2 +3 2 +⋯+1 2​ × 2 2 +1 2 +⋯+1 2 ​ 2⋅2+3⋅1+2⋅0+⋯+1⋅1​
 
7. Hamming Distance and Similarity Measures
i. Hamming Distance: Counts the number of positions at which corresponding bits differ.
Formula: Hamming Distance between 10001011 and 11001111 is 2.
ii. Jaccard Index vs. Similarity Matching Coefficient (SMC):
Jaccard Index: Measures similarity as the intersection divided by the union of sets.
SMC: Calculates similarity as the number of matching values divided by the total number of values.

8. High-Dimensional Dataset
A high-dimensional dataset has a large number of features relative to the number of observations, making it challenging for machine learning.
Examples: Text data with thousands of words as features, genetic data with many genes.
Challenges: Increased computational cost, overfitting, and interpretability.
Solutions: Use dimensionality reduction techniques (PCA, t-SNE), feature selection, and regularization.

9. Quick Notes
PCA (Principal Component Analysis): Reduces dimensions by transforming features into a set of orthogonal components explaining maximum variance.
Vectors: In machine learning, vectors are ordered arrays of data values, often representing features of data points in a multidimensional space.
Embedded Technique: Integrates feature selection within the model training process, common in decision trees and regularization techniques.

10. Comparisons
Sequential Backward Exclusion vs. Sequential Forward Selection:
Backward Exclusion: Starts with all features and iteratively removes the least relevant.
Forward Selection: Starts with no features and adds the most relevant one by one.
Filter vs. Wrapper Method:
Filter: Independent of the model, faster, and used for high-dimensional data.
Wrapper: Evaluates feature subsets with a model, slower, but may capture feature interactions.
SMC vs. Jaccard Coefficient:
SMC: Considers both matching presence and absence of features.
Jaccard Coefficient: Focuses only on shared presence between features.