#### Copyright 2018 Google LLC.

In [2]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#Intro to Modeling


**Learning Objectives:**
* Become familiar with pandas for handling small datasets
* Use the tf.Estimator and Feature Column API to experiment with feature transformations
* Use visualizations and run experiments to understand the value of feature transformations

Please **make a copy** of this Colab notebook before starting this lab. To do so, choose **File**->**Save a copy in Drive**.

## Setup

Let's start by importing our dependencies.

In [1]:
%reset -f
import numpy as np
import pandas as pd
import math

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

Instructions for updating:
non-resource variables are not supported in the long term


## Pandas, a helpful data analysis library for in-memory dataset

We use a package called [Pandas](http://pandas.pydata.org/) for reading in our data, exploring our data and doing some basic processing. It is really helpful for datasets that fit in memory! And it has some nice integrations, as you will see.

First we set up some options to control how items are displayed and the maximum number of rows to show when displaying a table.  Feel free to change this setup to whatever you'd like.

In [2]:
# Set pandas output display to have one digit for decimal places and limit it to
# printing 15 rows.
pd.options.display.float_format = '{:.2f}'.format
pd.options.display.max_rows = 15

### Load the dataset with pandas
The car data set we will be using in this lab is provided as a comma separated file without a header row.  In order for each column to have a meaningful header name we must provide it.  We get the information about the columns from the [Automobile Data Set](https://archive.ics.uci.edu/ml/datasets/automobile).

We will use the features of the car, to try to predict its price.


In [3]:
# Provide the names for the columns since the CSV file with the data does
# not have a header row.
feature_names = ['symboling', 'normalized-losses', 'make', 'fuel-type',
        'aspiration', 'num-doors', 'body-style', 'drive-wheels',
        'engine-location', 'wheel-base', 'length', 'width', 'height', 'weight',
        'engine-type', 'num-cylinders', 'engine-size', 'fuel-system', 'bore',
        'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',
        'highway-mpg', 'price']


# Load in the data from a CSV file that is comma separated.
car_data = pd.read_csv('https://storage.googleapis.com/mledu-datasets/cars_data.csv',
                        sep=',', names=feature_names, header=None, encoding='latin-1')


# We'll then randomize the data, just to be sure not to get any pathological
# ordering effects that might harm the performance of Stochastic Gradient
# Descent.
car_data = car_data.reindex(np.random.permutation(car_data.index))

print("Data set loaded. Num examples: ", len(car_data))

Data set loaded. Num examples:  205


This is a really small dataset! Only 205 examples.

For simplicity in this codelab, we do not split the data further into training and validation. But you MUST do this on real datasets, or else you will overfit to your single dataset.

## Task 0: Use pandas to explore and prepare the data

- Use Pandas to inspect the data and manually curate a list of numeric_feature_names and categorical_feature_names.


Useful functions:
- `type()` called on any Python object describes the type of the object
- `dataframe[4:7]` pulls out rows 4, 5, 6 in a Pandas dataframe
- `dataframe[['mycol1', 'mycol2']]` pulls out the two requested columns into a new Pandas dataframe
- `dataframe['mycol1']` returns a Pandas series -- not a dataframe!
- `dataframe.describe()` prints out statistics for each dataframe column

In [6]:
car_data[4:7]

Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
31,2,137,honda,gas,std,two,hatchback,fwd,front,86.6,...,92,1bbl,2.91,3.41,9.2,76,6000,31,38,6855
53,1,113,mazda,gas,std,four,sedan,fwd,front,93.1,...,91,2bbl,3.03,3.15,9.0,68,5000,31,38,6695
101,0,128,nissan,gas,std,four,sedan,fwd,front,100.4,...,181,mpfi,3.43,3.27,9.0,152,5200,17,22,13499


In [6]:
#@title Solution (to view code, from cell's menu, select Form -> Show Code)
numeric_feature_names = ['symboling', 'normalized-losses', 'wheel-base',
        'length', 'width', 'height', 'weight', 'engine-size', 'horsepower',
        'peak-rpm', 'city-mpg', 'highway-mpg', 'bore', 'stroke',
         'compression-ratio']

LABEL = 'price'

categorical_feature_names = list(set(feature_names) - set(numeric_feature_names) - set([LABEL]))

assert len(numeric_feature_names) == 15
assert len(categorical_feature_names) == 10

In [9]:
# Run to inspect numeric features.
car_data[numeric_feature_names]

Unnamed: 0,symboling,normalized-losses,wheel-base,length,width,height,weight,engine-size,horsepower,peak-rpm,city-mpg,highway-mpg,bore,stroke,compression-ratio
57,3,150,95.30,169.00,65.70,49.60,2385,70,101,6000,17,23,?,?,9.40
47,0,145,113.00,199.60,69.60,52.80,4066,258,176,4750,15,19,3.63,4.17,8.10
22,1,118,93.70,157.30,63.80,50.80,1876,90,68,5500,31,38,2.97,3.23,9.40
144,0,102,97.00,172.00,65.40,54.30,2385,108,82,4800,24,25,3.62,2.64,9.00
31,2,137,86.60,144.60,63.90,50.80,1819,92,76,6000,31,38,2.91,3.41,9.20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
87,1,125,96.30,172.40,65.40,51.60,2403,110,116,5500,23,30,3.17,3.46,7.50
153,0,77,95.70,169.70,63.60,59.10,2280,92,62,4800,31,37,3.05,3.03,9.00
168,2,134,98.40,176.20,65.60,52.00,2536,146,116,4800,24,30,3.62,3.50,9.30
64,0,115,98.80,177.80,66.50,55.50,2425,122,84,4800,26,32,3.39,3.39,8.60


In [10]:
# Run to inspect categorical features.
car_data[categorical_feature_names]

Unnamed: 0,engine-location,num-cylinders,fuel-system,body-style,drive-wheels,aspiration,make,engine-type,fuel-type,num-doors
57,front,two,4bbl,hatchback,rwd,std,mazda,rotor,gas,two
47,front,six,mpfi,sedan,rwd,std,jaguar,dohc,gas,four
22,front,four,2bbl,hatchback,fwd,std,dodge,ohc,gas,two
144,front,four,2bbl,sedan,4wd,std,subaru,ohcf,gas,four
31,front,four,1bbl,hatchback,fwd,std,honda,ohc,gas,two
...,...,...,...,...,...,...,...,...,...,...
87,front,four,spdi,sedan,fwd,turbo,mitsubishi,ohc,gas,four
153,front,four,2bbl,wagon,fwd,std,toyota,ohc,gas,four
168,front,four,mpfi,hardtop,rwd,std,toyota,ohc,gas,two
64,front,four,2bbl,hatchback,fwd,std,mazda,ohc,gas,four


In [7]:
# Coerce the numeric features to numbers. This is necessary because the model
# crashes because not all the values are numeric.
for feature_name in numeric_feature_names + [LABEL]:
  car_data[feature_name] = pd.to_numeric(car_data[feature_name], errors='coerce')

# Fill missing values with 0.
# Is this an OK thing to do? You may want to come back and revisit this decision later.
car_data.fillna(0, inplace=True)
car_data[4:7]

for feature_name in numeric_feature_names:
    print(feature_name, car_data[feature_name].dtype)


symboling int64
normalized-losses float64
wheel-base float64
length float64
width float64
height float64
weight int64
engine-size int64
horsepower float64
peak-rpm float64
city-mpg int64
highway-mpg int64
bore float64
stroke float64
compression-ratio float64


## Task 1: Make your best model with numeric features. No normalization allowed.

Modify the model provided below to achieve the lowest eval loss. You may want to change various hyperparameters:
- learning rate
- choice of optimizer
- hidden layer dimensions -- make sure your choice here makes sense given the number of training examples
- batch size
- num training steps
- (anything else you can think of changing)

Do not use the `normalizer_fn` arg on `numeric_column`.

In [8]:
# Defining linear features
linear_features = []

# tolerance to select linear features
tol = 0.35

for feature_name in numeric_feature_names:
  correlation = car_data[feature_name].corr(car_data['price'])
  print("Correlation of %s with price is %f." % (feature_name, correlation))

  if 1.0 - abs(correlation) < tol:
    linear_features.append(feature_name)

print("Linear features: ")
print(linear_features)

non_linear_features = list(set(numeric_feature_names) - set(linear_features) - set([LABEL]))
print("Non linear features: ")
print(non_linear_features)


Correlation of symboling with price is -0.071461.
Correlation of normalized-losses with price is -0.237939.
Correlation of wheel-base with price is 0.578804.
Correlation of length with price is 0.685019.
Correlation of width with price is 0.695654.
Correlation of height with price is 0.158436.
Correlation of weight with price is 0.799773.
Correlation of engine-size with price is 0.838097.
Correlation of horsepower with price is 0.691288.
Correlation of peak-rpm with price is -0.055278.
Correlation of city-mpg with price is -0.660026.
Correlation of highway-mpg with price is -0.687675.
Correlation of bore with price is 0.264096.
Correlation of stroke with price is 0.048860.
Correlation of compression-ratio with price is 0.077959.
Linear features: 
['length', 'width', 'weight', 'engine-size', 'horsepower', 'city-mpg', 'highway-mpg']
Non linear features: 
['stroke', 'peak-rpm', 'height', 'normalized-losses', 'bore', 'wheel-base', 'symboling', 'compression-ratio']


In [9]:
# This code "works", but because of bad hyperparameter choices it gets NaN loss
# during training. Try fixing this.
batch_size = 128

print(numeric_feature_names)
x_df = car_data[numeric_feature_names]
y_series = car_data['price']

# Create input_fn's so that the estimator knows how to read in your data.
train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

# Feature columns allow the model to parse the data, perform common
# preprocessing, and automatically generate an input layer for the tf.Estimator.
linear_feature_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in linear_features
]

non_linear_feature_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in non_linear_features
]

est = tf.estimator.DNNLinearCombinedRegressor(
    linear_feature_columns=linear_feature_columns,
    linear_optimizer = tf.compat.v1.train.FtrlOptimizer(learning_rate=0.1),

    dnn_feature_columns=non_linear_feature_columns,
    #dnn_hidden_units=[len(non_linear_features)],
    dnn_hidden_units=[32, 32],
    #dnn_optimizer=tf.train.GradientDescentOptimizer(learning_rate=1e-5),
    #dnn_optimizer=tf.train.AdagradOptimizer(learning_rate=0.1)
    #dnn_optimizer=tf.train.AdamOptimizer(learning_rate=0.001)
    dnn_optimizer=tf.train.RMSPropOptimizer(learning_rate=0.001)
)

# TRAIN
num_print_statements = 10
num_training_steps = 10000


for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps)
  scores = est.evaluate(eval_input_fn)

  # The `scores` dictionary has several metrics automatically generated by the
  # canned Estimator.
  # `average_loss` is the average loss for an individual example.
  # `loss` is the summed loss for the batch.
  # In addition to these scalar losses, you may find the visualization functions
  # in the next cell helpful for debugging model quality.
  print('scores', scores)



['symboling', 'normalized-losses', 'wheel-base', 'length', 'width', 'height', 'weight', 'engine-size', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'bore', 'stroke', 'compression-ratio']



Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.keras instead.
Instructions for updating:
Use tf.keras instead.
Instruc

scores {'average_loss': 24519178.0, 'label/mean': 12949.43, 'loss': 2513215700.0, 'prediction/mean': 13045.236, 'global_step': 10000}


Instructions for updating:
Use standard file utilities to get mtimes.


KeyboardInterrupt: 

In [14]:
#@title Possible solution
# Here is one possible solution:
# The only necessary change to fix the NaN training loss was the choice of optimizer.

# Changing other parameters could improve model quality, but take it with a
# grain of salt. The dataset is very small.

batch_size = 16

print(numeric_feature_names)
x_df = car_data[numeric_feature_names]
y_series = car_data['price']

train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

# Feature columns allow the model to parse the data, perform common
# preprocessing, and automatically generate an input layer for the tf.Estimator.
model_feature_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in numeric_feature_names
]
print('model_feature_columns', model_feature_columns)

est = tf.estimator.DNNRegressor(
    feature_columns=model_feature_columns,
    hidden_units=[64],
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.01),
  )

# TRAIN
num_print_statements = 10
num_training_steps = 10000
for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps // num_print_statements)
  scores = est.evaluate(eval_input_fn)

  # The `scores` dictionary has several metrics automatically generated by the
  # canned Estimator.
  # `average_loss` is the average loss for an individual example.
  # `loss` is the summed loss for the batch.
  # In addition to these scalar losses, you may find the visualization functions
  # in the next cell helpful for debugging model quality.
  print('scores', scores)



Instructions for updating:
Use tf.keras instead.


['symboling', 'normalized-losses', 'wheel-base', 'length', 'width', 'height', 'weight', 'engine-size', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'bore', 'stroke', 'compression-ratio']
model_feature_columns [NumericColumn(key='symboling', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='normalized-losses', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='wheel-base', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='height', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='weight', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='engine-size', shape=(1,), default_value=None, dtype=t

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


scores {'average_loss': 41030036.0, 'label/mean': 12949.43, 'loss': 647012100.0, 'prediction/mean': 13340.2, 'global_step': 1000}
scores {'average_loss': 32519006.0, 'label/mean': 12949.43, 'loss': 512799700.0, 'prediction/mean': 13095.935, 'global_step': 2000}
scores {'average_loss': 28358744.0, 'label/mean': 12949.43, 'loss': 447195600.0, 'prediction/mean': 13045.876, 'global_step': 3000}


KeyboardInterrupt: 

### Visualize your model's predictions

After you have a trained model, it may be helpful to understand how your model's inference differs from the actual data.

This helper function `scatter_plot_inference` does that for you. Real data is in grey. Your model's predictions are in orange.


In [None]:
from matplotlib import pyplot as plt


def scatter_plot_inference_grid(est, x_df, feature_names):
  """Plots the predictions of the model against each feature.

  Args:
    est: The trained tf.Estimator.
    x_df: The pandas dataframe with the input data (used to create
      predict_input_fn).
    feature_names: An iterable of string feature names to plot.
  """
  def scatter_plot_inference(axis,
                             x_axis_feature_name,
                             y_axis_feature_name,
                             predictions):
    """Generate one subplot."""
    # Plot the real data in grey.
    y_axis_feature_name = 'price'
    axis.set_ylabel(y_axis_feature_name)
    axis.set_xlabel(x_axis_feature_name)
    axis.scatter(car_data[x_axis_feature_name],
                 car_data[y_axis_feature_name],
                 c='grey')

    # Plot the predicted data in orange.
    axis.scatter(car_data[x_axis_feature_name], predictions, c='orange')

  predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

  predictions = [
    x['predictions'][0]
    for x in est.predict(predict_input_fn)
  ]

  num_cols = 3
  num_rows = int(math.ceil(len(feature_names)/float(num_cols)))
  f, axarr = plt.subplots(num_rows, num_cols)
  size = 4.5
  f.set_size_inches(num_cols*size, num_rows*size)

  for i, feature_name in enumerate(numeric_feature_names):
    axis = axarr[int(i/num_cols), i%num_cols]
    scatter_plot_inference(axis, feature_name, 'price', predictions)
  plt.show()

scatter_plot_inference_grid(est, x_df, numeric_feature_names)

## Task 2: Take your best numeric model from earlier. Add normalization.

### Add normalization to your best numeric model from earlier

- You decide what type of normalization to add, and for which features
- You will need to use the `normalizer_fn` arg on [`numeric_column`](https://g3doc.corp.google.com/learning/brain/public/g3doc/api_docs/python/tf/feature_column/numeric_column.md?cl=head)
    - An example of a silly normalizer_fn that shifts inputs down by 1, and then negates the value:
    
         normalizer_fn = lambda x: tf.neg(tf.subtract(x, 1))

- You may find these pandas functions helpful:
    - dataframe.mean()['your_feature_name']
    - dataframe.std()['your_feature_name']
- You will need to retune the hyperparameters from earlier.


**Does normalization improve model quality on this dataset? Why or why not?**

In [None]:
# This 1D visualization of each numeric feature might inform your normalization
# decisions.
for feature_name in numeric_feature_names:
  car_data.hist(column=feature_name)

###Train your model with numeric features + normalization

In [None]:
## Your code goes here

def standard_normalizer(features, from_data):
  to = from_data[features].copy()

  for column in features:
    to[column] = (from_data[column] - from_data[column].mean()) / from_data[column].std()

  return to

def min_max_normalizer(features, from_data):
  to = from_data[features].copy()

  df_min = from_data[features].min()
  df_max = from_data[features].max()

  for column in features:
    to[column] = (from_data[column] - df_min[column]) / (df_max[column] - df_min[column])

  return to

#
batch_size = 128

print(numeric_feature_names)
x_df = standard_normalizer(numeric_feature_names, car_data)
y_series = car_data['price']

# Create input_fn's so that the estimator knows how to read in your data.
train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

linear_feature_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in linear_features
]

non_linear_feature_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in non_linear_features
]

print(linear_features)
print(non_linear_features)

est = tf.estimator.DNNLinearCombinedRegressor(
     linear_feature_columns=linear_feature_columns,
     linear_optimizer = tf.compat.v1.train.FtrlOptimizer(learning_rate=0.1),

     dnn_feature_columns=non_linear_feature_columns,
     dnn_hidden_units=[32, 32],
     dnn_optimizer=tf.train.RMSPropOptimizer(learning_rate=0.001),
     #dnn_optimizer=tf.train.GradientDescentOptimizer(learning_rate=1e-5) it also doesn't work with optimizer and learning rate more than 1e-4 that is very slowly
 )

# TRAIN
num_print_statements = 10
num_training_steps = 10000

for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps)
  scores = est.evaluate(eval_input_fn)

  print('scores', scores)

In [None]:
#@title Possible solution
# This does Z-score normalization since the distributions for most features looked
# roughly normally distributed.

# Z-score normalization subtracts the mean and divides by the standard deviation,
# to give a roughly standard normal distribution (mean = 0, std = 1) under a
# normal distribution assumption. Epsilon prevents divide by zero.

# With normalization, are you able to get the model working with
# GradientDescentOptimizer? Z-score normalization doesn't seem to be able to get
# SGD working. Maybe a different type of normalization would?

batch_size = 16

print(numeric_feature_names)
x_df = car_data[numeric_feature_names]
y_series = car_data['price']

train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

# Epsilon prevents divide by zero.
epsilon = 0.000001
model_feature_columns = [
    tf.feature_column.numeric_column(feature_name,
                                     normalizer_fn=lambda val: (val - x_df.mean()[feature_name]) / (epsilon + x_df.std()[feature_name]))
    for feature_name in numeric_feature_names
]
print('model_feature_columns', model_feature_columns)

est = tf.estimator.DNNRegressor(
    feature_columns=model_feature_columns,
    hidden_units=[64],
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.01),
  )

# TRAIN
num_print_statements = 10
num_training_steps = 10000
for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps // num_print_statements)
  scores = est.evaluate(eval_input_fn)

  # The `scores` dictionary has several metrics automatically generated by the
  # canned Estimator.
  # `average_loss` is the average loss for an individual example.
  # `loss` is the summed loss for the batch.
  # In addition to these scalar losses, you may find the visualization functions
  # in the next cell helpful for debugging model quality.
  print('scores', scores)

scatter_plot_inference_grid(est, x_df, numeric_feature_names)

## Task 3: Make your best model using only categorical features

- Look at the possible feature columns for categorical features. They begin with `categorical_column_with_` in go/tf-ops.
- You may find `dataframe[categorical_feature_names].unique()` helpful.


In [41]:
## Your code goes here (failed attempt to use GradientBoostedTreesModel)
#!pip install tensorflow_decision_forests
import tensorflow_decision_forests as tfdf

df = car_data.copy()

for col in categorical_feature_names:
    df[col] = df[col].astype(str)

target = 'price'
features = categorical_feature_names

df_with_target = df[features + [target]]
print(df_with_target)

# Gradient boosting model
model = tfdf.keras.GradientBoostedTreesModel(task=tfdf.keras.Task.REGRESSION)

# Explicitly set batch size for the dataset
batch_size = 32  # Choose an appropriate batch size
ds = tfdf.keras.pd_dataframe_to_tf_dataset(
    df_with_target,
    label='price',
    max_num_classes=200,
    task=tfdf.keras.Task.REGRESSION,
    batch_size=batch_size  # Add batch_size here
)

# TRAIN
num_print_statements = 10
for _ in range(num_print_statements):
  model.fit(ds.take(100))
  evaluation = model.evaluate(ds)
  print('Evaluation:', evaluation) # Print the evaluation results

    fuel-system num-doors fuel-type aspiration engine-location num-cylinders  \
57         4bbl       two       gas        std           front           two   
99         2bbl      four       gas        std           front          four   
134        mpfi       two       gas        std           front          four   
182         idi       two    diesel        std           front          four   
13         mpfi      four       gas        std           front           six   
..          ...       ...       ...        ...             ...           ...   
124        spdi       two       gas      turbo           front          four   
87         spdi      four       gas      turbo           front          four   
35         1bbl      four       gas        std           front          four   
68          idi      four    diesel      turbo           front          five   
180        mpfi      four       gas        std           front           six   

    body-style engine-type           ma

OperatorNotAllowedInGraphError: Using a symbolic `tf.Tensor` as a Python `bool` is not allowed. You can attempt the following resolutions to the problem: If you are running in Graph mode, use Eager execution mode or decorate this function with @tf.function. If you are using AutoGraph, you can try decorating this function with @tf.function. If that does not work, then you may be using an unsupported feature or your source code may not be visible to AutoGraph. See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/autograph/g3doc/reference/limitations.md#access-to-source-code for more information.

In [23]:
## Your code goes here using DNNLinearCombinedRegressor

batch_size = 128

print(categorical_feature_names)
x_df = car_data[categorical_feature_names]
y_series = car_data['price']

# Create input_fn's so that the estimator knows how to read in your data.
train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

model_feature_linear_columns = [tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list('num-cylinders', vocabulary_list=car_data['num-cylinders'].unique()))]

model_feature_dnn_columns = [
    tf.feature_column.indicator_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            feature_name, vocabulary_list=car_data[feature_name].unique()))
    for feature_name in categorical_feature_names
]

est = tf.estimator.DNNLinearCombinedRegressor(
    linear_feature_columns=model_feature_linear_columns,
    linear_optimizer = tf.compat.v1.train.FtrlOptimizer(learning_rate=0.1),

    dnn_feature_columns=model_feature_dnn_columns,
    dnn_hidden_units=[8, 8],
    dnn_optimizer=tf.train.RMSPropOptimizer(learning_rate=0.001)
)

# TRAIN
num_print_statements = 10
num_training_steps = 10000


for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps)
  scores = est.evaluate(eval_input_fn)

  print('scores', scores)



['engine-location', 'drive-wheels', 'fuel-system', 'num-doors', 'body-style', 'engine-type', 'make', 'num-cylinders', 'aspiration', 'fuel-type']
scores {'average_loss': 11760895.0, 'label/mean': 12949.43, 'loss': 1205491700.0, 'prediction/mean': 13013.1045, 'global_step': 10000}
scores {'average_loss': 7112836.5, 'label/mean': 12949.43, 'loss': 729065700.0, 'prediction/mean': 12970.265, 'global_step': 20000}
scores {'average_loss': 6536831.5, 'label/mean': 12949.43, 'loss': 670025200.0, 'prediction/mean': 12945.619, 'global_step': 30000}
scores {'average_loss': 6294180.0, 'label/mean': 12949.43, 'loss': 645153500.0, 'prediction/mean': 12957.507, 'global_step': 40000}


Instructions for updating:
Use standard file APIs to delete files with this prefix.


scores {'average_loss': 6218423.5, 'label/mean': 12949.43, 'loss': 637388400.0, 'prediction/mean': 12972.939, 'global_step': 50000}
scores {'average_loss': 6194956.5, 'label/mean': 12949.43, 'loss': 634983040.0, 'prediction/mean': 12955.502, 'global_step': 60000}


KeyboardInterrupt: 

In [100]:
#@title Possible solution
# We have the full list of values that each feature takes on, and the list is
# relatively small so we use categorical_column_with_vocabulary_list.

batch_size = 16

x_df = car_data[categorical_feature_names]
y_series = car_data['price']

train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

model_feature_columns = [
    tf.feature_column.indicator_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            feature_name, vocabulary_list=car_data[feature_name].unique()))
    for feature_name in categorical_feature_names
]
print('model_feature_columns', model_feature_columns)

est = tf.estimator.DNNRegressor(
    feature_columns=model_feature_columns,
    hidden_units=[64],
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.01),
  )

# TRAIN
num_print_statements = 10
num_training_steps = 10000
for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps // num_print_statements)
  scores = est.evaluate(eval_input_fn)

  # The `scores` dictionary has several metrics automatically generated by the
  # canned Estimator.
  # `average_loss` is the average loss for an individual example.
  # `loss` is the summed loss for the batch.
  # In addition to these scalar losses, you may find the visualization functions
  # in the next cell helpful for debugging model quality.
  print('scores', scores)



Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


model_feature_columns [IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='engine-type', vocabulary_list=('ohc', 'dohc', 'ohcf', 'l', 'ohcv', 'rotor', 'dohcv'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='aspiration', vocabulary_list=('turbo', 'std'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='make', vocabulary_list=('volvo', 'toyota', 'subaru', 'nissan', 'plymouth', 'peugot', 'mitsubishi', 'mazda', 'honda', 'mercedes-benz', 'volkswagen', 'saab', 'renault', 'bmw', 'isuzu', 'audi', 'porsche', 'mercury', 'jaguar', 'dodge', 'alfa-romero', 'chevrolet'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='fuel-system', vocabulary_list=('mpfi', '2bbl', 'idi', '1bbl', 'spdi', '4bbl', 'spfi', 'mfi'), dtype=tf.string, default_value=-1

KeyboardInterrupt: 

## Task 4: Using all the features, make the best model that you can make

With all the features combined, your model should perform better than your earlier models using numerical and categorical models alone. Tune your model until that is the case.

In [None]:
## Your code goes here

def standard_normalizer(features, from_data):
  to = from_data[features + categorical_feature_names].copy()

  for column in features:
    to[column] = (from_data[column] - from_data[column].mean()) / from_data[column].std()

  return to

#
batch_size = 128

print(numeric_feature_names)
x_df = standard_normalizer(numeric_feature_names, car_data)

y_series = car_data['price']

train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

model_feature_linear_columns = [
    tf.feature_column.numeric_column(feature_name) for feature_name in linear_features
] + [
    tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list('num-cylinders', vocabulary_list=car_data['num-cylinders'].unique()))
]

model_feature_dnn_columns = [
    tf.feature_column.indicator_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            feature_name, vocabulary_list=car_data[feature_name].unique()))
    for feature_name in categorical_feature_names
] + [
    feature_column
    for feature_column in non_linear_feature_columns
]

est = tf.estimator.DNNLinearCombinedRegressor(
    linear_feature_columns=model_feature_linear_columns,
    linear_optimizer = tf.compat.v1.train.FtrlOptimizer(learning_rate=0.1),

    dnn_feature_columns=model_feature_dnn_columns,
    dnn_hidden_units=[8, 8],
    dnn_optimizer=tf.train.RMSPropOptimizer(learning_rate=0.001)
)

# TRAIN
num_print_statements = 10
num_training_steps = 10000


for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps)
  scores = est.evaluate(eval_input_fn)

  print('scores', scores)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


['symboling', 'normalized-losses', 'wheel-base', 'length', 'width', 'height', 'weight', 'engine-size', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'bore', 'stroke', 'compression-ratio']
scores {'average_loss': 12491967.0, 'label/mean': 12949.43, 'loss': 1280426600.0, 'prediction/mean': 12749.896, 'global_step': 10000}
scores {'average_loss': 5579199.0, 'label/mean': 12949.43, 'loss': 571867900.0, 'prediction/mean': 12945.6045, 'global_step': 20000}
scores {'average_loss': 4675510.5, 'label/mean': 12949.43, 'loss': 479239800.0, 'prediction/mean': 12968.024, 'global_step': 30000}


In [27]:
#@title Possible solution
# This is a first pass at a model that uses all the features.
# Do you have any improvements?

batch_size = 16

x_df = car_data[numeric_feature_names + categorical_feature_names]
y_series = car_data['price']

train_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    num_epochs=None,
    shuffle=True)

eval_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    y=y_series,
    batch_size=batch_size,
    shuffle=False)

predict_input_fn = tf.estimator.inputs.pandas_input_fn(
    x=x_df,
    batch_size=batch_size,
    shuffle=False)

epsilon = 0.000001
model_feature_columns = [
    tf.feature_column.indicator_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            feature_name, vocabulary_list=car_data[feature_name].unique()))
    for feature_name in categorical_feature_names
] + [
    tf.feature_column.numeric_column(feature_name,
                                     normalizer_fn=lambda val: (val - x_df.mean()[feature_name]) / (epsilon + x_df.std()[feature_name]))
    for feature_name in numeric_feature_names
]


print('model_feature_columns', model_feature_columns)

est = tf.estimator.DNNRegressor(
    feature_columns=model_feature_columns,
    hidden_units=[64],
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.01),
  )

# TRAIN
num_print_statements = 10
num_training_steps = 10000
for _ in range(num_print_statements):
  est.train(train_input_fn, steps=num_training_steps // num_print_statements)
  scores = est.evaluate(eval_input_fn)

  # The `scores` dictionary has several metrics automatically generated by the
  # canned Estimator.
  # `average_loss` is the average loss for an individual example.
  # `loss` is the summed loss for the batch.
  # In addition to these scalar losses, you may find the visualization functions
  # in the next cell helpful for debugging model quality.
  print('scores', scores)



Instructions for updating:
Use tf.keras instead.


model_feature_columns [IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='engine-location', vocabulary_list=('front', 'rear'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='drive-wheels', vocabulary_list=('rwd', 'fwd', '4wd'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='fuel-system', vocabulary_list=('idi', '2bbl', 'mpfi', '1bbl', 'spdi', '4bbl', 'spfi', 'mfi'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='num-doors', vocabulary_list=('four', 'two', '?'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='body-style', vocabulary_list=('wagon', 'sedan', 'hatchback', 'hardtop', 'convertible'), dtype=tf.string, default_value=-1, num_oov_buckets=0))

TypeError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/tensorflow_estimator/python/estimator/canned/dnn.py", line 240, in call  *
        net = self._input_layer(features)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer_v1.py", line 838, in __call__  **
        outputs = call_fn(cast_inputs, *args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/feature_column/dense_features.py", line 184, in call  **
        tensor = column.get_dense_tensor(
    File "<ipython-input-27-433b5950776b>", line 36, in <lambda>
        normalizer_fn=lambda val: (val - x_df.mean()[feature_name]) / (epsilon + x_df.std()[feature_name]))
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py", line 11556, in mean
        return NDFrame.mean(self, axis, skipna, numeric_only, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py", line 11201, in mean
        return self._stat_function(
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py", line 11158, in _stat_function
        return self._reduce(
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 10519, in _reduce
        res = df._mgr.reduce(blk_func)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/internals/managers.py", line 1534, in reduce
        nbs = blk.reduce(func)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/internals/blocks.py", line 339, in reduce
        result = func(self.values)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py", line 10482, in blk_func
        return op(values, axis=axis, skipna=skipna, **kwds)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/nanops.py", line 96, in _f
        return f(*args, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/nanops.py", line 158, in f
        result = alt(values, axis=axis, skipna=skipna, **kwds)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/nanops.py", line 421, in new_func
        result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/nanops.py", line 727, in nanmean
        the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    File "/usr/local/lib/python3.10/dist-packages/pandas/core/nanops.py", line 1686, in _ensure_numeric
        raise TypeError(f"Could not convert {x} to numeric") from err

    TypeError: Could not convert ['frontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontrearfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontrearfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontfrontrearfrontfrontfrontfrontfrontfrontfrontfront'
     'rwdfwdfwdfwd4wdrwdrwdrwdrwdfwdrwdfwdfwdfwd4wdfwdfwdrwdfwdfwdrwdfwdfwdfwdrwdfwdfwdfwdrwdfwdrwdrwdfwdfwdrwdrwdfwdfwdfwdrwdfwdfwdfwdfwdfwdfwdfwdfwdrwd4wdfwdrwdrwdfwdrwdfwdfwdfwdfwdrwdrwdrwdrwdrwdrwdrwdrwdfwdfwdfwdfwdrwdfwdfwdrwdfwdfwdrwdfwdfwdfwdfwdfwdrwdfwdfwd4wdfwdrwdfwdfwdfwdrwdrwdfwdfwdrwd4wdfwdrwdfwd4wdfwdfwdrwdfwdfwdfwdfwdfwdrwdfwdrwdrwdfwdrwdrwdrwdrwdfwdfwdfwdrwdfwdrwdfwdrwdrwd4wdfwdrwdfwdfwdfwdfwdfwdrwdfwdfwdrwdfwdfwd4wdrwdfwdfwdfwdrwdfwdfwdrwdrwdrwdfwdfwdfwdfwdfwdrwdrwdfwdfwdrwdrwdrwdrwdrwdfwdfwdfwdrwdrwd4wdrwdrwdfwdfwdrwdfwdrwdrwdfwdfwdrwdfwdfwdfwdfwdfwdfwdfwdfwdfwdfwdrwdfwdrwdfwdrwdfwdfwdrwdfwdfwdrwd'
     'idiidi2bblmpfi2bblmpfimpfimpfimpfimpfimpfimpfi2bbl1bbl2bbl2bblmpfimpfi2bblmpfimpfi2bbl1bblspdimpfimpfimpfi1bblidi2bblmpfimpfiidimpfi4bblmpfiidimpfiidiidi2bbl2bbl2bblmpfimpfimpfispdi2bblmpfi2bbl2bblmpfiidimpfimpfi2bbl2bblmpfimpfimpfi4bblmpfimpfimpfimpfimpfimpfi2bblmpfi1bbl2bblmpfi2bblspdimpfi2bbl2bblmpfi2bblmpfi2bbl2bbl2bblmpfi2bblidimpfispdimpfi2bbl2bbl2bblmpfiidi1bbl2bblidi2bbl1bblmpfimpfimpfi2bbl2bblmpfi2bblspdi1bbl2bbl2bblmpfimpfimpfispdi1bbl2bblmpfimpfimpfimpfi2bblidimpfi2bblmpfimpfimpfimpfi2bblmpfiidi2bbl1bbl1bblspdi2bblidi2bbl2bblmpfi2bbl2bblmpfimpfi2bblidi1bblmpfi2bblmpfiidimpfimpfispdimpfiidi2bblmpfiidimpfimpfi2bblidimpfimpfimpfispfi2bblmpfiidimpfimpfimpfimpfimpfi2bbl2bbl2bblspdi2bblmpfimpfimpfimpfi2bbl2bbl2bblmpfimpfi2bbl2bbl2bbl2bblmpfimpfi2bblmpfi2bblmpfi2bblmfi4bblmpfi2bblmpfi'
     'fourtwotwotwotwofourtwotwotwofourfourfourtwotwofourtwofourtwofourfourfourtwofourtwotwo?fourtwofourfourtwofourtwofourtwofourfourtwofourfourfourfourfourfourfourtwotwotwofourfourtwofourfourfourtwofourfourtwofourtwotwotwofourtwofourfourtwotwofourfourfourfourtwotwofourfourfourfourtwotwofourfourtwotwofourfourtwotwofourtwotwofourfourfourtwotwofourfourfourfourfourfourfourtwotwofourfourtwofourtwotwofourtwotwofourtwotwofourtwotwotwofourfourfourfourfourtwofourfourtwotwotwotwotwotwofourfourfourtwofourfourfourfourfourtwofourtwotwotwofourfourtwotwofourfourfourfourfourfourfourtwofourfourfourtwotwotwofourfour?fourtwofourfourtwofourtwofourtwotwofourfourfourtwofourfourfourfourfourtwotwotwofourtwotwofourtwofourtwofourtwotwofourtwofour'
     'wagonsedanhatchbackhatchbackhatchbacksedanhatchbackhatchbackhatchbacksedansedanwagonhatchbackhatchbackwagonhatchbackwagonhatchbackwagonsedansedanhatchbacksedanhatchbacksedansedansedanhatchbacksedanwagonhatchbacksedansedansedanhatchbacksedanhatchbackhatchbacksedansedansedanhatchbacksedansedansedanhatchbackhatchbacksedansedanwagonhatchbackwagonsedansedansedansedansedanhatchbacksedanhardtophatchbacksedansedanconvertiblesedansedanhatchbackhatchbacksedansedanhatchbacksedansedanhatchbackwagonhatchbackwagonsedanhatchbackhatchbacksedansedansedanhatchbacksedansedanhatchbackhatchbacksedanhatchbackhardtopwagonsedansedanhatchbacksedansedanwagonwagonsedanwagonwagonsedanhatchbackhardtopsedansedanhatchbacksedanhatchbackhatchbackhatchbackconvertiblehatchbacksedanhatchbackhatchbackwagonconvertiblesedanhatchbacksedansedansedansedanwagonsedansedansedanconvertiblehardtophatchbackhatchbackhatchbackhatchbacksedansedanwagonhatchbacksedanhatchbacksedansedanwagonsedansedanhatchbacksedanhatchbacksedanwagonhatchbackhatchbacksedanhatchbacksedansedansedanwagonwagonsedanwagonsedanwagonconvertiblehardtophatchbacksedansedansedansedanhardtopsedansedanhatchbacksedanhatchbacksedanhatchbacksedansedansedansedanhardtophatchbacksedansedanwagonsedanhatchbackhatchbackhatchbackhatchbackhatchbackconvertiblehatchbackhardtopsedanhatchbacksedanhatchbackhatchbacksedanhatchbacksedan'
     'lohcohcfohcohcfdohcdohcohcvohcohcvlohclohcohcfohcohcvohcvohcfohcohcvohcohcohcohcohcohcohcohcohcohcohcvohcohcfrotorohcohcohcohcohcohcohcohcohcohcohcohcohcohcohcohcohclohcohcohcohcohcohcohcfrotorohcohcohcohcohcvohcvohcohcohcohclohcohcdohcohcohclohcdohcohcohcohcdohcvohcohcohcohcdohcohcohcohcohcohcohcohclohcohcohcohcohcfohcohcohcvohcohcohcohcohcohcohcohcvohcohcohcdohclohcfohcohcohclohcohcohcfdohcohcohcfohcohcohcohcohcohcohclohcohcdohcohcohcfohclohcohcohcohcvohcohclohcohcvohcohcohcohcohcohcohcohcohcohcohcdohcohcohcohcohcohcohcohcohcfohcrotorohcohcohcohcohcohcohcohcohcohcohcohcohcohcvohcohcfohcohcohcdohcohcohcfohcdohcohcfohcrotordohcohcohc'
     'peugotnissansubaruvolkswagensubarujaguartoyotanissantoyotanissanpeugotrenaultchevrolethondasubarudodgenissannissansubaruaudimercedes-benzmitsubishihondaplymouthbmwdodgeaudihondavolvotoyotamercuryvolvovolkswagensubarumazdavolvotoyotasaabtoyotamercedes-benznissanmazdamazdasaabhondarenaultmitsubishinissanvolvotoyotamazdavolvopeugotvolkswagenbmwchevroletnissandodgevolkswagenporschemazdabmwvolvotoyotabmwmercedes-benzalfa-romeromazdatoyotahondatoyotapeugotnissanmitsubishitoyotatoyotanissanpeugotdodgesaabplymouthplymouthhondaporschemazdavolkswagenaudimitsubishijaguarnissannissanplymouthvolvomercedes-benzhondanissanpeugottoyotahondavolvovolkswagensubarunissanmazdamercedes-benzmitsubishimitsubishihondadodgeplymouthtoyotatoyotamercedes-benzplymouthhondatoyotatoyotapeugotporschevolkswagenmazdatoyotapeugotmitsubishibmwsubarutoyotamazdasubaruvolkswagenmercedes-benztoyotahondahondamitsubishitoyotapeugotnissanmazdatoyotanissansubaruaudipeugotisuzuvolkswagenhondajaguarchevroletvolkswagenpeugotporschenissanmitsubishitoyotavolkswagentoyotavolkswagenmercedes-benzvolvoaudidodgemazdavolvoalfa-romerotoyotaisuzudodgetoyotamazdavolvotoyotasubarubmwmazdatoyotamitsubishiisuzumitsubishitoyotabmwaudisaabtoyotaplymouthmazdamazdaaudinissanmitsubishisubarutoyotadodgesaabalfa-romerotoyotaporscheisuzutoyotasubarudodgemazdasaabmitsubishibmw'
     'fourfourfourfourfoursixsixsixfoursixfourfourthreefourfourfoursixsixfourfiveeightfourfourfoursixfourfourfoursixfourfoursixfourfourtwofourfourfourfourfivefourfourfourfourfourfourfourfourfourfourfourfourfourfourfourfourfourfourfivesixtwosixfourfourfoureightsixfourfourfourfourfourfourfoursixfourfourfourfourfourfourfourfoureightfourfourfivefoursixfourfourfourfourfivefourfourfourfourfourfourfourfourfourfoureightfourfourfourfourfourfourfoureightfourfourfoursixfoursixfourfourfourfourfoursixfourfourfourfourfourfivefourfourfourfourfourfourfourfoursixfourfourfivefourfourfourfourtwelvefourfourfourfoursixfourfourfourfourfourfivefourfivefourfourfourfourfourfourfourfourfourfourfourfoursixtwofourfourfourfourfoursixfivefourfourfourfourfourfivesixfourfourfourfourfourfourfoursixfourfourfourfourtwofourfoursix'
     'turbostdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdturbostdstdstdturbostdturbostdstdturbostdturbostdstdstdstdstdstdstdturboturbostdstdstdstdstdstdturbostdturbostdstdstdturbostdstdstdstdturbostdstdstdstdstdstdstdstdstdstdstdstdstdturbostdturbostdstdstdstdstdturbostdstdstdstdstdstdturboturbostdstdstdstdstdturbostdstdturbostdstdturbostdturbostdstdstdstdturbostdstdstdstdstdstdturbostdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdturbostdstdstdturbostdturbostdstdstdstdstdstdstdstdturbostdstdstdstdturbostdturbostdstdturbostdstdturbostdstdstdstdturbostdstdstdstdstdstdturbostdturbostdstdstdstdstdturbostdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdstdturbostdturbostdstd'
     'dieseldieselgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasdieselgasgasgasdieselgasgasgasdieselgasdieseldieselgasgasgasgasgasgasgasgasgasgasgasgasdieselgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasdieselgasgasgasgasgasgasgasdieselgasgasdieselgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasdieselgasgasgasgasgasgasgasgasdieselgasgasgasgasgasdieselgasgasgasgasgasgasgasgasdieselgasgasgasgasdieselgasgasgasgasdieselgasgasdieselgasgasgasdieselgasgasgasgasgasgasdieselgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgasgas'] to numeric
