# Machine Learning Basics Assignment

### What is a parameter?

A parameter is a variable within a model that is acquired through training, like weights and biases within a neural network.

### What is correlation?

Correlation assesses the strength and direction of the linear relationship between two variables, generally between -1 and 1.

### What does negative correlation mean?

Negative correlation implies that when one variable rises, the other falls, showing an inverse relationship (i.e., -0.7 correlation).

### Define Machine Learning. What are the key elements in Machine Learning?

Machine Learning (ML) refers to an approach where computers learn patterns in data in order to make predictions or decisions. Key elements: data, model, loss function, and optimization algorithm.

### How can loss value assist in establishing if the model is good or bad?

The model's prediction approaches the real values as it has a lower loss value, meaning it performs better. The model has poor performance when loss is high.

### What are categorical and continuous variables?

Continuous variables: numerical values that can take any value (e.g., height). Categorical variables: discrete categories (e.g., color).

### How do we treat categorical variables in Machine Learning? What are the typical methods?

We treat them by encoding: typical methods include one-hot encoding (produces binary columns for each category) and label encoding (gives integers to categories).

### What do you refer to when you talk about training and testing a dataset?

Training a dataset is utilized to fit the model (learn patterns), whereas testing a dataset tests the model's performance on unseen data.

### What is sklearn.preprocessing?

`sklearn.preprocessing` is a Scikit-learn (Python) module that offers tools for data preprocessing, such as scaling, encoding, and normalization.

### What is a Test set?

A test set is a smaller portion of data, independent of the training set, to check the performance of the model on new data.

### How to split data for fitting the model (training and testing) in Python?

```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

### How do you solve a Machine Learning problem?

Steps: 1) Frame the problem, 2) Data collection and exploration (EDA), 3) Data preprocessing (handling missing values, variable encoding), 4) Split data into training and test sets, 5) Model selection, 6) Train the model, 7) Test on the test set, 8) Hyperparameter tuning, 9) Deploy if acceptable.

### Why must we do EDA prior to fitting the model on the data?

Exploratory Data Analysis (EDA) assists in comprehending the data, discovering patterns, finding outliers, managing missing values, and pre-processing features, which yields improved model performance.

### What is correlation?

Correlation gauges the linear relationship between two variables from -1 (perfect negative) to 1 (perfect positive). 0 indicates no linear relationship.

### What is negative correlation?

Negative correlation indicates that with an increase in one variable, the other decreases (e.g., more study hours could negatively correlate with mistakes, -0.6).

### How can you calculate correlation between variables in Python?

```python
import pandas as pd
data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
correlation = data.corr()  # Returns a correlation matrix
```

### What is causation? Define difference between correlation and causation using an example.

Causation refers to one variable influencing another directly. Correlation reflects a relationship, not causation.
Example: Sales of ice cream and drowning rates correlate (both increase during summer), but ice cream does not cause drowning—summer heat does.

### What is an Optimizer? What are various types of optimizers? Define each using an example.

An optimizer is a parameter that reduces the loss function. There are several types:
- **Gradient Descent (GD)**: Scales parameters by the gradient of the loss (e.g., `optimizer = 'SGD'`).
- **Adam**: Adaptive learning rate, combines momentum and RMSProp (e.g., `optimizer = 'adam'`).
- **RMSProp**: Adaptively scales learning rate (e.g., in deep learning for faster convergence).

### What is sklearn.linear_model?

`sklearn.linear_model` is a Scikit-learn module for linear models such as Linear Regression, Logistic Regression, etc.

```python
from sklearn.linear_model import LinearRegression
model = LinearRegression()
```

### What does model.fit() do? What arguments must be given?

`model.fit(X, y)` trains the model on input features `X` and target `y`.

```python
model.fit(X_train, y_train)
```

### What does model.predict() do? What arguments must be given?

`model.predict(X)` produces predictions from input features `X`.

```python
predictions = model.predict(X_test)
```

### What are continuous and categorical variables?

Continuous variables are numeric and can take any value (e.g., age). Categorical variables are discrete categories (e.g., gender).

### What is feature scaling? How does it help in Machine Learning?

Feature scaling normalizes the range of features (e.g., to 0-1) so they all have an equal contribution to the model. It makes gradient-based algorithms converge more quickly and keeps features with wider ranges from overwhelming.

### How do we scale in Python?

```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

### What is sklearn.preprocessing?

`sklearn.preprocessing` is a Scikit-learn module for preprocessing operations such as scaling, encoding, and normalizing.

### How do we split data for model fitting (training and testing) in Python?

```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

### Explain data encoding?

Data encoding transforms categorical variables to numeric form for ML models. Example: One-hot encoding converts "color" (red, blue) into binary columns: red=[1,0], blue=[0,1].