# Linear models

In this chapter, you will learn how to build, solve, and make predictions with models in TensorFlow 2. You will focus on a simple class of models – the linear regression model – and will try to predict housing prices. By the end of the chapter, you will know how to load and manipulate data, construct loss functions, perform minimization, make predictions, and reduce resource use with batch training.

# (1) Input data

<img src="image/Screenshot 2021-01-23 180358.png">

## Importing data for use in TensorFlow

- **Data can be imported using Tensorflow**
    - Useful for managing compex pipelines
    - Not necessary for this chapter
- **Simpler option used in this chapter**
    - Import data using `pandas`
    - Convert data to `numpy
    - Use in `tensorflow` without modification

How to import and convert data

```
# Import numpy and pandas
import numpy as np
import pandas as pd

# Load data from csv
housing = pd.read_csv('kc_housing.csv')

# Convert to numpy array
housing = np.array(housing)
```

- We will foucs on data stored in csv format in this chapter
- Pandas also has methods for handing data in others formats
    - E.g. `read_json`, `read_html`, read_excel`

## Parameter of read_csv()

| **Parameter** | **Description** | **Default**|
| :- | :- | :- |
| `filepath_or_buffer` | Accepts a file path or a URL. | `None` |
| `sep` | Delimiter between colums | `,` |
| `delim_whitespace` | Boolean for whether to delimit whitespace. | `False` |
| `encoding` | Specifies encoding to be used if any. | `None` |

## Using mixed type datasets

<img src='image/Screenshot 2021-01-23 181841.png'>

## Setting the data type

```
# Load KC dataset
housing = pd.read_csv('kc_housing.csv')

# Convert price column to float32
price = np.array(housing['prince'], np.float32)

# Convert waterfront column to Boolean
waterfront = np.array(housing['waterfront'], np.bool)
```

### Or

```
# Load KC dataset
housing = pd.read_csv('kc_housing.csv')

# Convert price column to float32
price = tf.cast(housing['price'], tf.float32)

# Convert waterfront column to Boolean
waterfront = tf.cast(housing['waterfront'], tf.bool)
```

# Exercise I: Load data using pandas

Before you can train a machine learning model, you must first import data. There are several valid ways to do this, but for now, we will use a simple one-liner from `pandas`: `pd.read_csv()`. Recall from the video that the first argument specifies the path or URL. All other arguments are optional.

In this exercise, you will import the King County housing dataset, which we will use to train a linear model later in the chapter.

### Instructions

- Import `pandas` under the alias pd.
- Assign the path to a string variable with the name `data_path`.
- Load the dataset as a pandas dataframe named `housing`.
- Print the `price` column of `housing`.


In [None]:
# Import pandas under the alias pd
import pandas as pd

# Assign the path to a string variable named data_path
data_path = 'kc_house_data.csv'

# Load the dataset as a dataframe named housing
housing = pd.read_csv(data_path)

# Print the price column of housing
print(housing['price'])

# Exercise II: Setting the data type

In this exercise, you will both load data and set its type. Note that `housing` is available and `pandas` has been imported as `pd`. You will import `numpy` and `tensorflow`, and define tensors that are usable in `tensorflow` using columns in `housing` with a given data type. Recall that you can select the `price` column, for instance, from `housing` using `housing['price']`.

### Instructions

- Import `numpy` and `tensorflow` under their standard aliases.
- Use a `numpy` array to set the tensor `price` to have a data type of 32-bit floating point number
- Use the `tensorflow` function `cast()` to set the tensor `waterfront` to have a Boolean data type.
- Print `price` and then `waterfront`. Did you notice any important differences?

In [None]:
# Import numpy and tensorflow with their standard aliases
import numpy as np
import tensorflow as tf

# Use a numpy array to define price as a 32-bit float
price = np.array(housing['price'], np.float32)

# Define waterfront as a Boolean using cast
waterfront = tf.cast(housing['waterfront'], tf.bool)

# Print price and waterfront
print(price)
print(waterfront)