# Introduction
This dataset gives average masses for women as a function of their height in a sample of American women of age 30–39.

<font color = 'red'>
    
Content:
1. [Import Libraries](#1)
2. [Import the Dataset](#2)
3. [Taking Care of Missing Data](#3)
4. [Visualization of the Dataset](#4)
5. [Split the Dataset into Training Set and Test set](#5)
6. [Training the Simple Linear Regression Model on the Training Set](#6)
7. [Visualization of Models](#7)
    * [Training Set Visualization](#8)
    * [Test Set Visualization](#9)

<a id = "1"></a><br>
# Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

<a id = "2"></a><br>
# Import the dataset

In [None]:
df = pd.read_csv('/kaggle/input/heights-and-weights/data.csv')
x = df.iloc[:, 0] #heights
y = df.iloc[:, 1] #weights

In [None]:
df

There are 2 columns, height and weight.  
Also there are 15 entries. Since this dataset is not that big, it can be clearly seen that there is not any null value.  
But still, we will see this with code below.

In [None]:
df.describe()

Statistical results are shown above.  
Maximum height is 183 cm. Minimum height is 147 cm. Mean height is 165 cm.  
Maximum weight is 74.46 kg. Minimum weight is 52.21 kg. Mean weight is 62 kg.


<a id = "3"></a><br>
# Taking Care of Missing Data
### Check whether there is null entry or not.
.notnull() returns True or False for each row.  
.all() checks whether all of them is True.

In [None]:
x.notnull().all()

In [None]:
y.notnull().all()

<a id = "4"></a><br>
# Visualization of the Dataset
If there were more entries, the histogram could tell us something more.  
Since there is a maximum of one of each value, it doesn't make much sense.

In [None]:
df.Weight.plot(kind = 'hist', bins = 25, figsize = (6,6))
plt.xlabel('Height')
plt.title('Frequency of Heights')
plt.show()

In [None]:
df.Weight.plot(kind = 'hist', bins = 25, figsize = (6,6))
plt.xlabel('Weight')
plt.title('Frequency of Weights')
plt.show()

<a id = "5"></a><br>
# Split the Dataset into Training Set and Test Set

In [None]:
from sklearn.model_selection import train_test_split
x = x.values.reshape(15, 1)
y = y.values
x_train, x_test, y_train, y_test = train_test_split(x, y,test_size = 0.2, random_state = 0)

<a id = "6"></a><br>
# Training the Simple Linear Regression Model on the Training Set

In [None]:
from sklearn.linear_model import LinearRegression
linear_reg = LinearRegression()
linear_reg.fit(x_train, y_train)
y_pred = linear_reg.predict(x_test)

<a id = "7"></a><br>
# Visualization of the Models

1. [Training Set Visualization](#8)
2. [Test Set Visualization](#9)

<a id = "8"></a><br>
## 1. Training Set Visualization

In [None]:
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, linear_reg.predict(x_train))
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Height vs Weight')
plt.show()

<a id = "9"></a><br>
## Test Set Visualization

In [None]:
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, linear_reg.predict(x_train))
plt.xlabel('Height')
plt.ylabel('Weight')
plt.title('Height vs Weight')
plt.show()