## <a id='toc1_1_'></a>[Gradient Descent Methods](#toc0_)

In this notebook, we will explore various loss functions and apply gradient descent methods to optimize these functions. Our focus will be on the Diabetes dataset from the scikit-learn library, a well-regarded dataset in the machine learning community. This dataset consists of medical diagnostic measurements from numerous patients and is specifically designed to study diabetes progression. We will use these data points to predict the quantitative measure of disease progression one year after baseline, thus practicing the application of regression analysis in a medical context.

## <a id='toc1_2_'></a>[Authors](#toc0_)
* **Alireza Arbabi**
* **Hadi Babalou**
* **Ali Padyav**
* **Kasra Hajiheidari**

## <a id='toc1_3_'></a>[Table of Contents](#toc0_)

- [Gradient Descent Methods](#toc1_1_)    
  - [Authors](#toc1_2_)    
  - [Table of Contents](#toc1_3_)    
  - [Setting Up the Environment](#toc1_4_)    
  - [Data Preparation](#toc1_5_)    
    - [Dataset Description](#toc1_5_1_)    
    - [Loading the Dataset](#toc1_5_2_)    
    - [Preprocessing](#toc1_5_3_)    
      - [Missing Values](#toc1_5_3_1_)    
      - [Duplicates](#toc1_5_3_2_)    
      - [Type Conversion](#toc1_5_3_3_)    
      - [Normalization](#toc1_5_3_4_)    
      - [Train-Test Split](#toc1_5_3_5_)    
  - [Loss Functions](#toc1_6_)    
    - [Mean Squared Error (MSE)](#toc1_6_1_)    
    - [Mean Absolute Error (MAE)](#toc1_6_2_)    
    - [Root Mean Squared Error (RMSE)](#toc1_6_3_)    
    - [R² Score (Coefficient of Determination)](#toc1_6_4_)    
    - [Ordinary Least Squares (OLS)](#toc1_6_5_)    
  - [Regression Model](#toc1_7_)    
    - [Model](#toc1_7_1_)    
    - [Training](#toc1_7_2_)    
  - [Evaluation](#toc1_8_)    
    - [Results Summary Table](#toc1_8_1_)    
  - [Questions](#toc1_9_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_4_'></a>[Setting Up the Environment](#toc0_)

In [None]:
!pip install numpy
!pip install pandas
!pip install seaborn
!pip install matplotlib
!pip install tqdm
!pip install scipy
!pip install sklearn

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import KNNImputer
from scipy import stats
import tqdm

import warnings
warnings.filterwarnings("ignore")

## <a id='toc1_5_'></a>[Data Preparation](#toc0_)

### <a id='toc1_5_1_'></a>[Dataset Description](#toc0_)

### <a id='toc1_5_2_'></a>[Loading the Dataset](#toc0_)

### <a id='toc1_5_3_'></a>[Preprocessing](#toc0_)

#### <a id='toc1_5_3_1_'></a>[Missing Values](#toc0_)

#### <a id='toc1_5_3_2_'></a>[Duplicates](#toc0_)

#### <a id='toc1_5_3_3_'></a>[Type Conversion](#toc0_)

#### <a id='toc1_5_3_4_'></a>[Normalization](#toc0_)

#### <a id='toc1_5_3_5_'></a>[Train-Test Split](#toc0_)

## <a id='toc1_6_'></a>[Loss Functions](#toc0_)

### <a id='toc1_6_1_'></a>[Mean Squared Error (MSE)](#toc0_)

### <a id='toc1_6_2_'></a>[Mean Absolute Error (MAE)](#toc0_)

### <a id='toc1_6_3_'></a>[Root Mean Squared Error (RMSE)](#toc0_)

### <a id='toc1_6_4_'></a>[R² Score (Coefficient of Determination)](#toc0_)

### <a id='toc1_6_5_'></a>[Ordinary Least Squares (OLS)](#toc0_)

## <a id='toc1_7_'></a>[Regression Model](#toc0_)

### <a id='toc1_7_1_'></a>[Model](#toc0_)

### <a id='toc1_7_2_'></a>[Training](#toc0_)

## <a id='toc1_8_'></a>[Evaluation](#toc0_)

### <a id='toc1_8_1_'></a>[Results Summary Table](#toc0_)

In [None]:
results_summary_data = [
    ['', 'MSE', 'MAE', 'RMSE', 'R2 score', 'OLE'],
    ['Train Set', None, None, None, None, None],
    ['Test Set', None, None, None, None, None]
]
results_summary_df = pd.DataFrame(results_summary_data, columns=results_summary_data[0])
results_summary_df


## <a id='toc1_9_'></a>[Questions](#toc0_)

**Analyze and evaluate the values in Table (1).**

**Review the R² and Adjusted R² values obtained in part 4. Explain what these values indicate and what the implications of high or low values might be.  
Also, discuss the differences between these two metrics.**

**Review the p-values obtained in part 4 for each column of data and explain what these values indicate. Discuss what an appropriate value for p-values is and which columns currently have suitable values.**

**Assess and analyze the importance of each feature in the dataset based on the results obtained in part 4 regarding an individual's diabetic condition.**