- In **late September 2016**, **scikit-learn version 0.18** was released.  
  A small change was made:
   * The `train_test_split` function is now imported from `model_selection` instead of `cross_validation`.

- Old code:  
    from sklearn.cross_validation import train_test_split

- New code:  
    from sklearn.model_selection import train_test_split

<br>

- **We'll use the new method**
  - using older method sends a **warning message** about the update.
  - The same change happened for `GridSearchCV` (covered later).



Sure — here’s a clean, corrected, and organized version of that text, rewritten into clear paragraphs in a neutral, instructional style, without conversational or video/lecture references:

---

## Introduction to Linear Regression

This introduction provides a light theoretical background and historical context for the idea of **linear regression** before exploring its implementation using Python and the scikit-learn library. For readers interested in a deeper mathematical treatment, Chapters 2 and 3 of *An Introduction to Statistical Learning* offer a comprehensive explanation.

## Historical Background

The concept of regression dates back to the 1800s, when **Francis Galton** studied the relationship between parents and their children — specifically, the relationship between fathers' heights and their sons' heights. Galton observed that while tall fathers tended to have tall sons, those sons' heights tended to be closer to the average height of the general population, rather than exactly matching their fathers' stature.

For example, NBA player **Shaquille O'Neal**, who is 7 feet 1 inch (2.2 meters) tall, has a son who is also tall at 6 feet 7 inches, but not nearly as tall as his father. Galton termed this phenomenon **"regression"** — describing how an extreme characteristic in one generation tends to move closer to the mean in subsequent generations.

## Concept of Linear Regression

The basic idea behind linear regression is to model the relationship between a dependent variable and one or more independent variables by fitting a straight line to observed data points. The goal is to find a line that is as close as possible to all the data points, minimizing the differences between the actual data values and the predicted values made by the line.

A simple example involves only two data points: one at (x = 2, y = 4) and another at (x = 5, y = 10). With just these two points, a perfect line can be drawn through both. However, the real value of regression lies in applying this technique to larger datasets, where predictions can be made for new, unlabeled data based on learned patterns.

## The Role in Supervised Learning

In supervised learning, a model is built from labeled data. In the context of regression, the model learns the relationship between the input features (such as a father's height) and the target variable (the son's height). Once trained, the model can then predict outcomes for new data points. The objective is to minimize the **vertical distance (errors or residuals)** between the predicted values (on the regression line) and the actual data points.

## Minimizing Error: Least Squares Method

Various methods exist for measuring and minimizing these differences, including the **sum of squared errors** and **sum of absolute errors**. One of the most widely used techniques is the **least squares method**, which minimizes the sum of the squares of the residuals — the differences between actual observed values and the values predicted by the regression line.

In a regression plot, data points are displayed along the x and y axes. The regression line is drawn through the data, and the residuals are the vertical distances from each data point to the line. These residuals are squared and summed, and the line that produces the smallest total is selected as the best fit.

---

Would you like this turned into a Markdown cell format for a notebook as well?
