# Linear Regression

So I decided to learn about the main aspects of
some of the most used machine learning models.

Starting simple, the first model I wanna Learn about is
**Linear Regression**, which is simpler compared to other models.

## What is Linear Regression?

Linear regression is a statistical method used to model
the relationship between a dependent variable and one or
more independent variables by fitting a linear equation
to the observed data.

## Modeling a Linear Regressor

First things first. Lets import some libraries we'll gonna use.
We also need data in order to train a data-based model.

### Obtaining data

For data, I wasn't able to choose any database, so I decided to
just generate some **synthetic data**, as this is just a learning
purpose experiment. Also *Synthetic Data Generation* is a area
that has catch my attention recently, so I'll probably write more
about any soon.

So, the cells bellow are able are responsible for generating a dataset
which can be modeled by a linear regressor. We have a class with some
useful parameters like how many columns we want in the dataset, for example,
in case you want to generate a dataset for multiple linear regression.

In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

In [5]:
from sklearn.datasets import make_regression

class SyntheticDataGenerator:
    def __init__(self, n_samples=100, n_features=1, noise=0.0, random_state=None):
        self.n_samples = n_samples
        self.n_features = n_features
        self.noise = noise
        self.random_state = random_state
        self.X = None
        self.y = None

    def generate_data(self):
        self.X, self.y = make_regression(
            n_samples=self.n_samples,
            n_features=self.n_features,
            noise=self.noise,
            random_state=self.random_state
        )

    def save_data_to_file(self, file_path, append=False):
        data = pd.DataFrame(
            np.hstack((self.X, self.y.reshape(-1, 1))), 
            columns=[f"X{i}" for i in range(self.n_features)] + ['y']
        )
        mode = 'a' if append else 'w'
        header = append is False
        data.to_csv(file_path, mode=mode, header=header, index=False)


I organized the generation parameters so We can easily modify if needed.

⚠️ Note here that the dataset directory already exists.

In [10]:
dataset1_parameters = {
    'n_samples': 10000,
    'n_features': 1,
    'noise': 21,
    'file_path': 'datasets/synthetic_data_1.csv',
}

In [11]:
data_generator = SyntheticDataGenerator(
    n_samples=dataset1_parameters['n_samples'], 
    n_features=dataset1_parameters['n_features'],
    noise=dataset1_parameters['noise'])

data_generator.generate_data()

## This line writes the data to a file.
## Leave it commented so you don't overwrite your data accidentally
## by running "run all cells"

# data_generator.save_data_to_file(dataset1_parameters['file_path'])

Now we can view some random samples from this synthetic dataset:

In [12]:
df = pd.read_csv(dataset1_parameters['file_path'])

df.sample(5)

Unnamed: 0,X0,y
6661,-1.452403,-23.94111
8230,-0.259056,2.54556
8494,-0.953766,-11.375511
1110,-1.045699,-55.604432
4056,1.341622,82.445879


Excellent. For now we can focus on discussing how linear regression work.

## Training your first linear regressor

# References

* [scribbr: Simple Linear Regression | An Easy Introduction & Examples](https://www.scribbr.com/statistics/simple-linear-regression/)

* [scribbr: Multiple Linear Regression | A Quick Guide (Examples)](https://www.scribbr.com/statistics/multiple-linear-regression/)

