**Question 1:** Write a Python function to implement logistic regression using gradient descent.

**Answer:** Let $\phi(z)$ be the logistic function, defined as

$$
\phi(z) = \frac{1}{1 + e^{-z}}
$$

This function maps any real value $z$ into the interval $[0, 1]$, making it widely used in binary classification problems.

### Gradient Implementation

The logistic regression model aims to find the parameters $\theta$ that minimize the error between the predictions $\phi(z)$ and the actual values $y$. This is achieved through gradient descent.

The parameter vector $\theta$ is updated iteratively using the formula

$$
\theta := \theta - \alpha \nabla J(\theta)
$$

where
- $\alpha$ is the learning rate, a scalar value that controls the step size in the gradient direction.
- $\nabla J(\theta)$ is the gradient of the cost function, measuring the slope of the error relative to $\theta$.

### Gradient of Logistic Regression

The cost of logistic regression is measured using the log-likelihood loss function, and the parameter updates are based on its derivative with respect to $\theta$

$$
\nabla J(\theta) = \frac{1}{m} X^T (\phi(X\theta) - y)
$$

where
- $m$ is the number of examples in the dataset.
- $X$ is the feature matrix with dimensions $(m \times n)$, where $n$ is the number of features (including the intercept term).
- $y$ is the vector of actual labels.
- $\phi(X\theta)$ are the model's predictions for all examples.

### Parameter Update Steps

For each iteration, the steps are as follows:
1. Compute the product $z = X\theta$, where $z$ represents the predicted values before applying the logistic function.
2. Obtain the predictions $\phi(z)$ by applying the logistic function to $z$:
   $$
   \phi(z) = \frac{1}{1 + e^{-z}}
   $$
3. Calculate the error as the difference between predictions and actual values:
   $$
   \text{error} = \phi(z) - y
   $$
4. Compute the gradient:
   $$
   \nabla J(\theta) = \frac{1}{m} X^T (\phi(z) - y)
   $$
5. Update the parameters $\theta$:
   $$
   \theta := \theta - \alpha \nabla J(\theta)
   $$

### Considerations about $\alpha$
The learning rate $\alpha$ must be chosen carefully. Large values may prevent convergence, while small values can slow down the process.

### Final Result
After several iterations, the parameters $\theta$ converge to values that minimize the cost, allowing the model to make accurate predictions.


In [1]:
import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def logistic_regression(X, y, learning_rate=0.01, iterations=1000):
    m, n = X.shape
    X = np.insert(X, 0, 1, axis=1)  # Add intercept term
    theta = np.zeros(n + 1)

    for _ in range(iterations):
        z = X.dot(theta)
        predictions = sigmoid(z)
        errors = predictions - y
        gradient = X.T.dot(errors) / m
        theta -= learning_rate * gradient

    return theta

**Question 2:** What is the purpose of feature selection in machine learning?

**Answer:** Feature selection is the process of identifying and selecting the most important variables (features) that contribute to the model’s predictive power. This helps in improving model performance, reducing overfitting, and decreasing training time.

**Question 3:** Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model.

Answer: To validate a multiple regression model, I would focus on these two methods:

1. **Train-Test Split:**
   - Split the data into training (e.g., 80%) and test (e.g., 20%) sets.
   - Train the model on the training data and evaluate its performance on the test data using metrics like \(R^2\), RMSE, or MAE to assess predictive accuracy.

2. **Residual Analysis:**
   - Examine residual plots to verify assumptions of linearity, homoscedasticity (constant variance of residuals), and independence.
   - Check for patterns or systematic deviations, which might indicate issues with the model's fit.

This combination ensures both predictive validity and the assessment of fundamental assumptions.

## Questions about Neural Networks

**Question 1:** What are recurrent neural networks (RNN)? 

**Answer:** Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states.

---

**Question 2:** What is the role of the activation function?

**Answer:** The purpose of the activation function is to introduce non-linearity into the output of a neuron. The activation function decides whether a neuron should be activated or not by calculating weighted sum and further adding bias with it.

---

**Question 3:** What happens if the learning rate is set too high or too low?

**Answer:** If the learning rate is too low, your model will train very slowly as minimal updates are made to the weights through each iteration. Thus, it would take many updates before reaching the minimum point.
If the learning rate is set too high, this causes undesirable divergent behavior to the loss function due to drastic updates in weights, and it may fail to converge.

---

## Questions about LLM's

**Question 1:** What is Tokenization?

**Answer:** Tokenization is the process of breaking text into smaller units, called **tokens**, for natural language processing (NLP). Tokens can be words, subwords, characters, or sentences. 

### Types:
1. **Word Tokenization:** Splits text into words (e.g., `"I love NLP!" → ["I", "love", "NLP", "!"]`).
2. **Subword Tokenization:** Breaks words into subunits (e.g., `"unbelievable" → ["un", "believ", "able"]`).
3. **Character Tokenization:** Splits text into characters.
4. **Sentence Tokenization:** Splits text into sentences.

### Importance:
- Prepares text for models.
- Handles multilingual data.
- Maps tokens to numerical representations.

### Applications:
Used in tasks like sentiment analysis, machine translation, and text classification.

In [11]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize

text = "I love NLP!"
tokens = word_tokenize(text)
print(tokens)  

['I', 'love', 'NLP', '!']


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\victor.francheto\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [12]:
from gensim.utils import simple_preprocess

text = "I love NLP!"
tokens = simple_preprocess(text)
print(tokens) 


['love', 'nlp']


In [13]:
from keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer()
tokenizer.fit_on_texts(["I love NLP!"])
print(tokenizer.word_index)  

{'i': 1, 'love': 2, 'nlp': 3}


----

**Question 2:** What is Normalization.

**Answer:** Preprocess the text by lowercasing, removing unnecessary spaces, and handling special characters to standardize the input.

---

 **Question:** What is the role of embeddings in LLM's?
 
 **Answer:** Embeddings in LLMs are numerical representations that convert words or tokens into dense vectors in a high-dimensional continuous space. These vectors capture the meaning of words and their context, enabling the model to understand and generate language more effectively.
 
 ---