In [None]:
"""
#Q1: What is a parameter?
- A parameter is a numerical value that defines a model’s behavior.
- In statistics, it describes a population characteristic (e.g., mean, variance).
- In machine learning, parameters are values learned during training (like weights in linear regression or neural networks).

==================================================================================================================================
#Q2: What is correlation?
- Correlation measures the strength and direction of a linear relationship between two variables.
- It ranges from –1 to +1.
- +1 → perfect positive correlation
- –1 → perfect negative correlation
- 0 → no correlation
 negative correlation mean
- When one variable increases, the other decreases.
- Example: Number of hours spent gaming vs. exam scores (more gaming → lower scores).


==================================================================================================================================
#Q3: Define Machine Learning. What are the main components?
- Machine Learning (ML) is a field of AI where systems learn patterns from data to make predictions or decisions without explicit programming.
- Main components:
- Data (training + testing sets)
- Features (input variables)
- Model/Algorithm (e.g., decision tree, neural network)
- Training process (optimizing parameters)
- Evaluation (metrics, loss functions)

==================================================================================================================================
#Q4: How does loss value help in determining whether the model is good or not?
- Loss function quantifies the difference between predicted and actual values.
- Lower loss → better model fit.
- High loss → poor predictions.
- Example: Mean Squared Error (MSE) in regression.

==================================================================================================================================
#Q5: What are continuous and categorical variables?
- Continuous variables: Numeric values with infinite possible outcomes (e.g., height, weight).
- Categorical variables: Discrete values representing categories (e.g., gender, color, city).


==================================================================================================================================
#Q6: How do we handle categorical variables in Machine Learning?
- ML algorithms require numeric input, so categorical variables must be encoded.
- Common techniques:
- Label Encoding (assigns integers to categories)
- One-Hot Encoding (binary columns for each category)
- Ordinal Encoding (for ordered categories)
- Target Encoding (replace category with mean target value)

==================================================================================================================================
#Q7: What do you mean by training and testing a dataset?
- Training dataset: Used to teach the model (adjust parameters).
- Testing dataset: Used to evaluate performance on unseen data.
- Ensures the model generalizes well.

==================================================================================================================================
#Q8: What is sklearn.preprocessing?
- A module in scikit-learn that provides tools for data preprocessing.
- Examples:
- StandardScaler (normalize features)
- MinMaxScaler (scale values between 0–1)
- LabelEncoder, OneHotEncoder (handle categorical variables)

==================================================================================================================================
#Q9: What is a Test set?
- A subset of data reserved for evaluating model performance.
- It simulates how the model will perform on real-world unseen data.

==================================================================================================================================
#Q10: How do we split data for model fitting in Python?
- Using scikit-learn’s train_test_split:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


- Here, 80% data is for training, 20% for testing.

*How do you approach a Machine Learning problem
                    
- Define the problem (classification, regression, clustering).
- Collect and clean data (handle missing values, outliers).
- Feature engineering (select/transform variables).
- Split data (train/test sets).
- Choose algorithm (based on problem type).
- Train model (fit data).
- Evaluate model (metrics like accuracy, precision, recall, RMSE).
- Tune hyperparameters (GridSearch, RandomSearch).
- Deploy model (real-world usage).
- Monitor performance (update with new data).
"""

In [None]:
"""
#Q11: Why do we have to perform EDA before fitting a model to the data?
- EDA (Exploratory Data Analysis) helps us:
- Understand the structure, distribution, and relationships in data.
- Detect missing values, outliers, and anomalies.
- Identify patterns and correlations.
- Decide preprocessing steps (scaling, encoding, feature selection).
- Without EDA, models may be inaccurate or misleading because hidden issues remain unchecked.

==================================================================================================================================
#Q12: What is correlation?
- Correlation measures the strength and direction of a linear relationship between two variables.
- Values range from -1 to +1:
- +1 → perfect positive correlation.
- -1 → perfect negative correlation.
- 0 → no correlation.

==================================================================================================================================
#Q13: What does negative correlation mean?
- Negative correlation means when one variable increases, the other decreases.
- Example: Number of hours spent watching TV vs. exam scores (more TV → lower scores).

==================================================================================================================================
#Q14: How can you find correlation between variables in Python?
import pandas as pd

# Example dataset

df = pd.DataFrame({
    'x': [1,2,3,4,5],
    'y': [5,4,3,2,1]
})

# Correlation matrix
print(df.corr())

- df.corr() gives correlation values between all numerical columns.

==================================================================================================================================
#Q15: What is causation? Explain difference between correlation and causation with an example.
- Causation: One variable directly affects another.
- Correlation: Two variables move together but may not have a cause-effect relationship.
- Example:
- Ice cream sales and drowning cases are correlated (both rise in summer).
- But ice cream does not cause drowning → temperature is the causal factor.

==================================================================================================================================
#Q16: What is an Optimizer? What are different types of optimizers? Explain each with an example.
- Optimizer: Algorithm that updates model parameters to minimize loss.
- Common types:
- Gradient Descent: Updates weights in direction of negative gradient.
# Conceptual example
w = w - learning_rate * gradient
- SGD (Stochastic Gradient Descent): Updates using one sample at a time.
- Momentum: Adds velocity term to accelerate convergence.
- Adam (Adaptive Moment Estimation): Combines momentum + adaptive learning rate.
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=0.001)


==================================================================================================================================

#Q17: What is sklearn.linear_model?
- A Scikit-learn module that provides linear models:
- LinearRegression, LogisticRegression, Ridge, Lasso, etc.
- Used for regression and classification tasks.

==================================================================================================================================
#Q18: What does model.fit() do? What arguments must be given?
- model.fit(X, y):
- Trains the model using features X and target y.
- Arguments:
- X: Training data (features).
- y: Labels/target values.


==================================================================================================================================

#Q19: What does model.predict() do? What arguments must be given?
- model.predict(X):
- Uses trained model to predict outputs for new data.
- Arguments:
- X: Input features for prediction.


==================================================================================================================================
#Q20: What are continuous and categorical variables?
- Continuous variables: Numeric values with infinite possible values (e.g., height, weight).
- Categorical variables: Represent categories or labels (e.g., gender, color).


==================================================================================================================================
#Q21: What is feature scaling? How does it help in Machine Learning?
- Feature scaling: Standardizing or normalizing features to a similar range.
- Helps:
- Prevents dominance of large-scale features.
- Improves convergence speed in gradient-based algorithms.
- Essential for distance-based models (KNN, SVM).



==================================================================================================================================
#Q22: How do we perform scaling in Python?
from sklearn.preprocessing import StandardScaler, MinMaxScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)



==================================================================================================================================
#Q23: What is sklearn.preprocessing?
- A Scikit-learn module for preprocessing tasks:
- Scaling (StandardScaler, MinMaxScaler).
- Encoding (OneHotEncoder, LabelEncoder).
- Normalization, binarization, polynomial features.


==================================================================================================================================
#Q24: How do we split data for model fitting (training and testing) in Python?
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


- Splits dataset into training and testing sets.


==================================================================================================================================
#Q25: Explain data encoding?
- Data encoding: Converting categorical variables into numerical format.
- Types:
- Label Encoding: Assigns integer values to categories.
- One-Hot Encoding: Creates binary columns for each category.
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()
X_encoded = encoder.fit_transform(X[['Category']])


"""

