# Machine Learning and Data Science Concepts


1. **What is a parameter?**  
A parameter is a variable that defines a characteristic of a model or system and is determined during training. In machine learning, parameters are the internal variables that the learning algorithm optimizes, such as weights in a neural network.

2. **What is correlation?**  
Correlation measures the strength and direction of the linear relationship between two variables. Its value ranges from -1 to +1.

3. **What does negative correlation mean?**  
Negative correlation indicates that as one variable increases, the other variable decreases. A correlation value close to -1 signifies a strong negative relationship.

4. **Define Machine Learning.**  
Machine learning is a subset of artificial intelligence that enables systems to learn from data, identify patterns, and make decisions without being explicitly programmed.

5. **What are the main components in Machine Learning?**  
- **Dataset:** Input data for training and testing
- **Features:** Attributes or variables used for training
- **Model:** Algorithm used to learn from data
- **Loss Function:** Measures model error
- **Optimizer:** Updates model parameters to reduce loss

6. **How does loss value help in determining whether the model is good or not?**  
The loss value quantifies the difference between predicted and actual values. A lower loss value indicates better model performance.

7. **What are continuous and categorical variables?**  
- **Continuous Variables:** Numeric values with an infinite range (e.g., height, temperature).
- **Categorical Variables:** Discrete categories (e.g., gender, color).

8. **How do we handle categorical variables in Machine Learning? What are the common techniques?**  
- **Label Encoding:** Assigns integers to categories
- **One-Hot Encoding:** Creates binary columns for each category
- **Target Encoding:** Uses target statistics for encoding

9. **What do you mean by training and testing a dataset?**  
Training a dataset involves teaching the model to learn patterns from data, while testing evaluates the model’s generalization performance on unseen data.

10. **What is sklearn.preprocessing?**  
`sklearn.preprocessing` provides functions to transform input data, such as scaling, encoding, and normalization.

11. **What is a Test set?**  
A test set is a subset of data used to evaluate the performance of a trained model.

12. **How do we split data for model fitting (training and testing) in Python?**  
Using `train_test_split()` from `sklearn.model_selection`, data is split into training and testing sets.
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

13. **How do you approach a Machine Learning problem?**  
- Define the problem
- Collect and clean data
- Perform Exploratory Data Analysis (EDA)
- Select features and preprocess data
- Choose a model and train it
- Evaluate and optimize the model
- Deploy and monitor the model

14. **Why do we have to perform EDA before fitting a model to the data?**  
EDA helps understand data distributions, identify missing values, detect outliers, and reveal relationships between variables, improving model accuracy.

15. **What is correlation?**  
It measures the linear relationship between variables.

16. **What does negative correlation mean?**  
It signifies an inverse relationship between two variables.

17. **How can you find correlation between variables in Python?**  
Using `df.corr()` from pandas.
```python
correlation_matrix = df.corr()
print(correlation_matrix)
```

18. **What is causation? Explain the difference between correlation and causation with an example.**  
- **Causation:** One variable directly affects another.
- **Example:** Ice cream sales and temperature have a positive correlation. However, temperature causes an increase in ice cream sales, not vice versa.

19. **What is an Optimizer?**  
An optimizer updates model parameters to minimize loss during training.

20. **What are different types of optimizers? Explain each with an example.**  
- **SGD (Stochastic Gradient Descent):** Updates weights using individual samples
- **Adam:** Combines momentum and adaptive learning rates for faster convergence
- **RMSprop:** Uses adaptive learning rates

21. **What is sklearn.linear_model?**  
This module provides linear models such as `LinearRegression`, `Ridge`, and `Lasso`.

22. **What does model.fit() do? What arguments must be given?**  
`model.fit()` trains the model on input data. Arguments required are feature matrix `X` and target vector `y`.

23. **What does model.predict() do? What arguments must be given?**  
`model.predict()` generates predictions based on the trained model. It requires input features `X`.

24. **What are continuous and categorical variables?**  
Continuous variables have numeric ranges, while categorical variables represent discrete categories.

25. **What is feature scaling? How does it help in Machine Learning?**  
Feature scaling standardizes input features to a common scale, improving model convergence and performance.

26. **How do we perform scaling in Python?**  
Using `StandardScaler` from `sklearn.preprocessing`.
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

27. **What is sklearn.preprocessing?**  
It contains utilities for data transformation, including scaling, encoding, and normalization.

28. **How do we split data for model fitting (training and testing) in Python?**  
Using `train_test_split()` as shown previously.

29. **Explain data encoding.**  
Data encoding converts categorical values into numerical formats for model compatibility. Common techniques are:
- **One-Hot Encoding:** Binary columns for each category
- **Label Encoding:** Integer values for categories
- **Target Encoding:** Target mean for each category
