// 1. What is a parameter?
In machine learning, a parameter is a variable that the learning algorithm automatically learns from the training data. These parameters define the model’s behavior. For example, in a linear regression model, the coefficients (weights) and intercept are the parameters learned during training to make accurate predictions.

// 2. What is correlation?
Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A high correlation means that when one variable changes, the other tends to change in a predictable way. It can be positive (both increase together), negative (one increases while the other decreases), or zero (no relationship).

// 3. What does negative correlation mean?
Negative correlation means that as the value of one variable increases, the value of the other variable tends to decrease. For example, as the speed of a car increases, the time taken to reach a destination decreases. A correlation coefficient close to -1 indicates a strong negative correlation.

// 4. Define Machine Learning. What are the main components in Machine Learning?
Machine Learning (ML) is a branch of artificial intelligence that enables systems to learn patterns from data and improve their performance without being explicitly programmed. The main components of ML include:
- Data: The input used for learning.
- Model: The mathematical representation of a system.
- Algorithm: The method used to train the model.
- Loss Function: Measures the error between predicted and actual values.
- Optimizer: Minimizes the loss function and improves the model's performance.

// 5. How does loss value help in determining whether the model is good or not?
The loss value quantifies the difference between the predicted values and actual values. A lower loss value indicates a better-performing model, while a high loss value suggests the model is making large prediction errors. It is a key indicator used during training to evaluate and optimize the model.

// 6. What are continuous and categorical variables?
Continuous variables are numerical values that can take any number within a range, such as height, weight, or temperature. Categorical variables represent groups or categories, such as gender, color, or product type. These must often be encoded into numbers for machine learning models.

// 7. How do we handle categorical variables in Machine Learning? What are the common techniques?
Categorical variables need to be converted into numerical format before being used in machine learning algorithms. Common techniques include:
- Label Encoding: Assigns each category a unique integer.
- One-Hot Encoding: Creates binary columns for each category.
- Ordinal Encoding: Used for categorical data with a meaningful order.

// 8. What do you mean by training and testing a dataset?
Training a dataset means using a portion of the data to teach the model and help it learn patterns. Testing a dataset means using another portion of data to evaluate how well the model has learned. The model should perform well on both the training and testing datasets to ensure it can generalize to new data.

// 9. What is sklearn.preprocessing?
`sklearn.preprocessing` is a module in the scikit-learn library that provides functions to prepare data for machine learning models. It includes tools for scaling, encoding categorical variables, normalization, binarization, and more. These steps ensure the model gets clean, consistent input.

// 10. What is a Test set?
A test set is the subset of the dataset used to evaluate the model after it has been trained. It acts like new, unseen data and helps in checking the model's performance and generalization ability. The test set is not involved during training to ensure unbiased evaluation.

// 11. How do we split data for model fitting (training and testing) in Python?
In Python, we can use `train_test_split()` from the `sklearn.model_selection` module to split the data. It randomly divides the data into training and testing sets. For example:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
// 12. How do you approach a Machine Learning problem?
A structured approach includes the following steps:

Define the problem clearly.

Collect and understand the data.

Clean and preprocess the data (handling null values, encoding, scaling).

Perform Exploratory Data Analysis (EDA).

Select appropriate features.

Choose a suitable algorithm.

Train the model.

Evaluate the model using metrics.

Tune hyperparameters if needed.

Deploy the model.

// 13. Why do we have to perform EDA before fitting a model to the data?
EDA (Exploratory Data Analysis) helps us understand the dataset better by summarizing its main characteristics using visual and statistical techniques. It allows us to identify patterns, detect outliers, and find correlations. This step is crucial before building models, as it guides data cleaning, feature selection, and model choice.

// 14. What is correlation?
Correlation is a statistical method used to measure the strength and direction of a relationship between two variables. (Same as Q2)

// 15. What does negative correlation mean?
Negative correlation indicates an inverse relationship between two variables. (Same as Q3)

// 16. How can you find correlation between variables in Python?
We can use the corr() function provided by the pandas library:

python
Copy
Edit
import pandas as pd
df = pd.read_csv('data.csv')
correlation_matrix = df.corr()
print(correlation_matrix)
This will output the correlation coefficients between all numerical features.

// 17. What is causation? Explain difference between correlation and causation with an example.
Causation means that one variable directly affects the other. Correlation just shows that two variables move together but doesn't prove one causes the other.
Example: Ice cream sales and drowning incidents may both increase in summer (correlation), but eating ice cream doesn’t cause drowning (no causation). The real cause is hot weather.

// 18. What is an Optimizer? What are different types of optimizers? Explain each with an example.
An optimizer is an algorithm that adjusts the model’s weights to reduce the loss during training.
Types include:

SGD (Stochastic Gradient Descent): Updates weights for each data point, slower but simple.

Adam: Combines momentum and adaptive learning rate; very popular.

RMSProp: Maintains a moving average of squared gradients; good for non-stationary data.
Example:

python
Copy
Edit
from tensorflow.keras.optimizers import Adam
model.compile(optimizer=Adam(), loss='mse')
// 19. What is sklearn.linear_model?
sklearn.linear_model is a module in scikit-learn that contains linear models for regression and classification tasks. Examples include LinearRegression, LogisticRegression, and Ridge. These models assume a linear relationship between input features and target variable.

// 20. What does model.fit() do? What arguments must be given?
The fit() method trains the machine learning model on the given data.
Arguments:

X_train: Feature data.

y_train: Target labels.

python
Copy
Edit
model.fit(X_train, y_train)
// 21. What does model.predict() do? What arguments must be given?
The predict() method uses the trained model to make predictions on new data.
Argument:

X_test: The data to make predictions on.

python
Copy
Edit
y_pred = model.predict(X_test)
// 22. What are continuous and categorical variables?
(Already covered in Q6)

// 23. What is feature scaling? How does it help in Machine Learning?
Feature scaling is the process of normalizing or standardizing the range of independent variables. It is important because models like SVM and KNN are sensitive to feature magnitude. Scaling helps speed up training and leads to better performance.

// 24. How do we perform scaling in Python?
We use classes like StandardScaler or MinMaxScaler from sklearn.preprocessing.
Example:

python
Copy
Edit
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
// 25. What is sklearn.preprocessing?
(Already covered in Q9)

// 26. How do we split data for model fitting (training and testing) in Python?
(Already covered in Q11)

// 27. Explain data encoding?
Data encoding is the process of converting categorical data into numerical form so that it can be used by machine learning models. Techniques include:

Label Encoding: Assigns each label a unique number.

One-Hot Encoding: Converts each category into a binary column (0 or 1).
These techniques ensure that models can process non-numeric data properly.

yaml
Copy
Edit

---

Let me know if you'd like this exported to a **PDF or Word document**, or if you want explanations