# Introduction

Today, you will learn to recode entirely a linear regression algorithm. Linear regression can be solved using the *normal equation* (closed-form solution) or with gradient descent (which can be faster when there are a lot of features in the dataset). Here are the steps we will implement:

- 1. Start with some random parameter values: we call this *random initialization*. *Zero initialization* is also possible.
- 2. Calculate the output of the algorithm with these parameters: this is the equivalent of *forward propagation* with neural networks.
- 3. Calculate the error with these parameters: we compare the output of the algorithm with the ground truth (this is a supervised algorithm so we need the ground truth).
- 4. Evaluate how to change each parameter to reduce the error and update the parameters: this is the equivalent of *backward propagation* with neural networks and gradient descent is the core of this step.

For these exercises, we'll use the [Ciqual dataset](https://ciqual.anses.fr/#) showing the composition of food.🌽

# 01 - Data Exploration

Let's start importing the libraries and loading a simplified version of the ciqual dataset.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
data = pd.read_csv("../data/ciqual_small.csv")

We will use this dataset to estimate the amount of phosporus in food according to the amount of zinc.

Your first task is to familiarize yourself with the data. You can:

- 1. Display the first rows of the table with Pandas
- 2. Get some description of the features (columns)
- 3. Check that there is no missing values
- 4. Visualize the relation between the variables `Phosphorus (mg/100g)` and `Zinc (mg/100g)`.


In [3]:
# Your code here
