In [None]:
"""
Q1) What is a parameter?
Answer-
    A parameter is a variable used to pass information to a function, method, or procedure. It acts as a placeholder that gets a value when the
    function is called. Parameters allow functions to operate on different data without modifying their internal code.

    Why Parameters Are Useful:
    1. They make functions more flexible and reusable.
    2. They help functions to operate on different inputs and return corresponding results.
"""

In [None]:
# Types of Parameters:
# 1. Formal Parameters - Defined in the function signature.

def greet(name):   
    print(f'Hello {name}!')

# name is the formal parameter

In [None]:
# Actual Parameters (Arguments):
# 2. The values provided to the function when it is called.

greet('Gopal')

# Gopal is an actual parameter (or argument)

Hello Gopal!


In [None]:
"""
Q2) What is correlation? What does negative correlation mean?
Answer-
    Correlation is a statistical measure that describes the relationship between two variables. It indicates how changes in one variable are
    associated with changes in another. Correlation values range from -1 to +1.

    Types of Correlation:
    1. Positive Correlation - When one variable increases, the other also increases.
       Example: Height and weight tend to have a positive correlation.

    2. Zero Correlation - No relationship exists between the two variables.
       Example: Shoe size and intelligence.

    
    3. Negative Correlation - When one variable increases, the other decreases.
       Example: Speed of a car and time taken to reach a destination.
       Interpreting Negative Correlation:
       If the correlation between two variables is negative, it suggests an inverse relationship:
       * Close to -1: A strong inverse relationship.
       * Close to 0: A weak or negligible inverse relationship.
"""

![image.png](attachment:image.png)

In [None]:
"""
Q3) Define Machine Learning. What are the main components in Machine Learning?
Answer-
    Machine Learning (ML) is a subset of artificial intelligence (AI) that focuses on building systems that can learn from and make decisions
    based on data without being explicitly programmed. It involves creating algorithms that can identify patterns, make predictions, or take
    actions based on input data.
    Key Features of Machine Learning:
    1. Data-Driven: Uses data to train models.
    2. Predictive Capability: Can predict outcomes based on learned patterns.
    3. Self-Improvement: Improves performance as it is exposed to more data.

    Main Components of Machine Learning:
    1. Data: The foundation of any ML system. ML models learn patterns and relationships from data.
       Types:
       1. Structured Data: Tabular data (e.g., CSV files).
       2. Unstructured Data: Images, text, audio.
       Key Activities: Data collection, cleaning, preprocessing.
    
    2. Features: Individual measurable properties or characteristics used as input for the model.
       Feature Engineering:
       1. Selecting the right features.
       2. Transforming data into suitable formats.

    3. Model: A mathematical representation of a real-world process based on input data.
       Types:
       1. Supervised Learning Models: Regression, classification.
       2. Unsupervised Learning Models: Clustering, dimensionality reduction.
       3. Reinforcement Learning Models: Decision-making systems.
    
    4. Algorithm: The method or procedure used to build the model by learning patterns from data.
       Examples: Linear Regression, Decision Trees, Neural Networks, Support Vector Machines (SVM).
    
    5. Training: The process of feeding data to the model and allowing it to learn the underlying patterns.
       Objective: Minimize error and improve accuracy.
    
    6. Evaluation: Assessing the model's performance using metrics.
       Key Metrics:
       Accuracy, Precision, Recall, F1-Score (for classification).
       Mean Squared Error, R² (for regression).
    
    7. Hyperparameters: Configurable parameters external to the model that affect its training process.
       Examples: Learning rate, number of layers in a neural network, maximum depth of a tree.
    
    8. Prediction: Using the trained model to make predictions on new, unseen data.
       Applications: Spam detection, stock price prediction, image classification.
    
    9. Optimization: Adjusting model parameters to minimize a loss function.
       Techniques: Gradient Descent, Stochastic Gradient Descent (SGD).
"""

In [None]:
"""
Q4) How does loss value help in determining whether the model is good or not?
Answer-
    The loss value is a numerical measure of how well a machine learning model's predictions match the actual target values. It plays a critical role in determining the model's 
    performance during training and evaluation.

    How Loss Value Determines Model Quality -
    1. Indicates Model's Accuracy:
       * A lower loss value indicates that the model's predictions are closer to the actual targets, meaning the model is performing well.
       * A higher loss value suggests the model is making significant errors and may need further improvement.
    
    2. Guides Optimization:
       * During training, the model uses the gradient of the loss function to update its parameters (weights and biases) in a direction that minimizes the loss.
       * A consistently decreasing loss during training implies the model is learning effectively.
    
    3. Prevents Overfitting or Underfitting:
       * If the loss on the training set is low but high on the validation set, the model might be overfitting.
       * A persistently high loss on both training and validation sets indicates underfitting, meaning the model is too simple to capture the patterns in the data.
    
    4. Comparison Between Models:
       * The loss value is a key metric for comparing the performance of different models or configurations.
       * For example, after testing several architectures, the one with the lowest validation loss might be selected.
    
    5. Identifies Convergence:
       * The loss value plateaus when the model has reached its optimal learning capacity (converged).
       * If the loss does not decrease further, adjustments like learning rate changes or early stopping might be necessary.
"""

In [None]:
"""
Q5) What are continuous and categorical variables?
Answer-
    In data analysis and statistics, variables are classified based on the type of data they represent. Two common types are continuous variables and categorical variables:
    1. Continuous Variables - Continuous variables are numerical values that can take any value within a range, often representing measurements.
    Characteristics:  1. Can have fractions or decimals (e.g., 1.5, 3.14).
    2. Have an infinite number of possible values between two points.
    3. Typically used in quantitative analysis.
    Examples:
    Height (e.g., 5.6 feet)
    Weight (e.g., 70.5 kg)
    Temperature (e.g., 98.6°F)
    Time (e.g., 2.35 seconds)

    2. Categorical Variables
    Definition: Categorical variables represent data that can be divided into groups or categories. These categories are often labels or names.
    Characteristics:
    Do not have numerical meaning.
    Can be nominal (no inherent order) or ordinal (inherent order).
    Used in qualitative analysis.
    Examples:
    Nominal:
    Gender (e.g., Male, Female, Other)
    Colors (e.g., Red, Green, Blue)
    Marital Status (e.g., Single, Married, Divorced)
    Ordinal:
    Education Level (e.g., High School, Bachelor’s, Master’s)
    Satisfaction Rating (e.g., Low, Medium, High)

"""

In [None]:
"""
Q6) How do we handle categorical variables in Machine Learning? What are the common techniques?
Answer-
    To handle categorical variables in machine learning, common techniques include:

    1. **Label Encoding**: Assigns a unique integer to each category (e.g., `Red = 0, Green = 1`).
    2. **One-Hot Encoding**: Creates binary columns for each category (e.g., `Red = [1, 0, 0], Green = [0, 1, 0]`).
    3. **Ordinal Encoding**: Assigns ordered integers based on category ranking.
    4. **Target/Mean Encoding**: Replaces categories with their corresponding target variable mean.
    5. **Frequency Encoding**: Replaces categories with their frequency in the dataset.
    6. **Embedding**: Maps categories into dense vectors, typically used in deep learning.

    The choice depends on the dataset, model, and problem type.
"""

In [None]:
"""
Q7) What do you mean by training and testing a dataset?
Answer-
    Training a dataset means using a portion of the data to teach the machine learning model to recognize patterns and relationships.
    Testing a dataset means using a separate portion of the data to evaluate the model's performance on unseen data, ensuring it
    generalizes well.
    Typically, the data is split into training (e.g., 70-80%) and testing (e.g., 20-30%) sets.

"""

In [None]:
"""
Q8) What is sklearn.preprocessing?
Answer-
    sklearn.preprocessing is a module in scikit-learn that provides tools to preprocess data, such as scaling, normalizing, encoding, 
    and transforming features, to make it suitable for machine learning models. Common tools include StandardScaler, MinMaxScaler, 
    LabelEncoder, and OneHotEncoder.

"""

In [None]:
"""
Q9) What is a Test set?
Answer-
    A test set is a portion of the dataset used to evaluate the performance of a trained machine learning model. It contains unseen data
    (not used during training) to measure how well the model generalizes to new, real-world data.

"""

In [None]:
"""
Q10) How do we split data for model fitting (training and testing) in Python? How do you approach a Machine Learning problem?
Answer-
    In Python, data can be split using `train_test_split()` from `sklearn.model_selection`, where you specify the dataset and the test 
    size (e.g., 80% training, 20% testing). The typical approach involves: understanding the problem, collecting and preparing data 
    (including feature engineering), splitting the data into training and test sets, selecting a model, training it, evaluating performance, and fine-tuning.


"""

In [None]:
"""
Q11) Why do we have to perform EDA before fitting a model to the data?
Answer-
    EDA (Exploratory Data Analysis) helps understand the data, detect patterns, identify outliers, and guide feature selection, ensuring the model is built on a solid understanding of the data.

"""

In [None]:
"""
Q12) What is correlation?
Answer-
    Correlation is a statistical measure that describes the relationship between two variables, indicating whether they move together or in opposite directions.

"""

In [None]:
"""
Q13) What does negative correlation mean?
Answer-
    Negative correlation means that as one variable increases, the other decreases. For example, as temperature increases, the use of heating may decrease.
"""

In [None]:

"""
Q14) How can you find correlation between variables in Python?
Answer-
    You can find correlation using `pandas.DataFrame.corr()` to compute pairwise correlations or `seaborn.heatmap()` for a visual representation.
"""


In [None]:
"""
Q15) What is causation? Explain difference between correlation and causation with an example.
Answer-
    Causation implies that one event directly causes another. Correlation shows a relationship but doesn’t imply causality. For example, ice cream sales and drowning incidents are correlated, but ice cream does not cause drowning; both are related to warm weather.
"""


In [None]:
"""
Q16) What is an Optimizer? What are different types of optimizers? Explain each with an example.
Answer-
    An optimizer adjusts model parameters to minimize the loss function. Common types are:
    - **Gradient Descent**: Iteratively updates weights by following the gradient of the loss function.
    - **SGD (Stochastic Gradient Descent)**: A variant of gradient descent using random subsets of the data for faster updates.
    - **Adam**: Combines the advantages of both AdaGrad and RMSProp, adapting learning rates for each parameter.
"""

In [None]:
"""
Q17) What is sklearn.linear_model?
Answer-
    `sklearn.linear_model` is a module in scikit-learn that contains linear models, such as **Linear Regression**, **Logistic Regression**, and **Ridge Regression**, used for regression and classification tasks.
"""

In [None]:
"""
Q18) What does model.fit() do? What arguments must be given?
Answer-
    `model.fit()` trains the model using the provided training data (`X_train`, `y_train`), where `X_train` is the input features and `y_train` is the target labels.
"""

In [None]:
"""
Q19) What does model.predict() do? What arguments must be given?
Answer-
    `model.predict()` makes predictions using the trained model on new input data (`X_test`), where `X_test` consists of feature values for which you want predictions.
"""


In [None]:
"""
Q20) What are continuous and categorical variables?
Answer-
    Continuous variables are numeric values that can take any value within a range (e.g., height), while categorical variables represent discrete groups or categories (e.g., gender).
"""

In [None]:
"""
Q21) What is feature scaling? How does it help in Machine Learning?
Answer-
    Feature scaling standardizes or normalizes the feature values, ensuring all features contribute equally to the model. It helps improve convergence in optimization algorithms and ensures models are not biased by varying feature scales.
"""

In [None]:
"""
Q22) How do we perform scaling in Python?
Answer-
    Scaling can be performed using `sklearn.preprocessing.StandardScaler` for standardization or `MinMaxScaler` for normalization.
"""

In [None]:

"""
Q23) What is sklearn.preprocessing?
Answer-
    `sklearn.preprocessing` is a module in scikit-learn providing utilities for scaling, encoding, and transforming data, such as **StandardScaler**, **OneHotEncoder**, and **LabelEncoder**.
"""

In [None]:
"""
Q24) How do we split data for model fitting (training and testing) in Python?
Answer-
    Data is split using `train_test_split()` from `sklearn.model_selection`, where you provide the dataset and specify the `test_size` to split the data into training and testing sets.
"""

In [None]:

"""
Q25) Explain data encoding?
Answer-
    Data encoding transforms categorical variables into numeric representations, such as **Label Encoding** (assigning integers) or **One-Hot Encoding** (creating binary columns for each category).
"""