# Creating a Sampled Dataset

**Learning Objectives**
- Sample the natality dataset to create train/eval/test sets
- Preprocess the data in Pandas dataframe

## Introduction

In this notebook we'll read data from BigQuery into our notebook to preprocess the data within a Pandas dataframe. 

In [3]:
PROJECT = "qwiklabs-gcp-636667ae83e902b6"  # Replace with your PROJECT
BUCKET =  "qwiklabs-gcp-636667ae83e902b6_al"  # Replace with your BUCKET
REGION = "us-east1"            # Choose an available region for AI Platform  
TFVERSION = "1.13"                # TF version for AI Platform

In [4]:
import os
os.environ["BUCKET"] = BUCKET
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["TFVERSION"] = TFVERSION

In [5]:
%%bash
if ! gsutil ls | grep -q gs://${BUCKET}/; then
    gsutil mb -l ${REGION} gs://${BUCKET}
fi

## Create ML datasets by sampling using BigQuery

We'll begin by sampling the BigQuery data to create smaller datasets.

In [6]:
# Create SQL query using natality data after the year 2000
query_string = """
WITH
  CTE_hash_cols_fixed AS (
  SELECT
    weight_pounds,
    is_male,
    mother_age,
    mother_race,
    father_race,
    cigarette_use,
    mother_married,
    ever_born,
    plurality,
    gestation_weeks,
    weight_gain_pounds,
    year,
    month,
    CASE
      WHEN day IS NULL AND wday IS NULL THEN 0
    ELSE
    CASE
      WHEN day IS NULL THEN wday
    ELSE
    wday
  END
  END
    AS date,
    IFNULL(state,
      "Unknown") AS state,
    IFNULL(mother_birth_state,
      "Unknown") AS mother_birth_state
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000)

SELECT
  weight_pounds,
  is_male,
  mother_age,
  mother_race,
  father_race,
  cigarette_use,
  mother_married,
  ever_born,
  weight_gain_pounds,
  plurality,
  gestation_weeks,
  ABS(FARM_FINGERPRINT(CONCAT(CAST(year AS STRING), CAST(month AS STRING), CAST(date AS STRING), CAST(state AS STRING), CAST(mother_birth_state AS STRING)))) AS hashvalues
FROM
  CTE_hash_cols_fixed
"""

There are only a limited number of years, months, days, and states in the dataset. Let's see what the hash values are.

We'll call BigQuery but group by the hashcolumn and see the number of records for each group. This will enable us to get the correct train/eval/test percentages

In [7]:
from google.cloud import bigquery
bq = bigquery.Client(project = PROJECT)

df = bq.query("SELECT hashvalues, COUNT(weight_pounds) AS num_babies FROM (" 
              + query_string + 
              ") GROUP BY hashvalues").to_dataframe()

print("There are {} unique hashvalues.".format(len(df)))
df.head()

There are 658107 unique hashvalues.


Unnamed: 0,hashvalues,num_babies
0,9184965280050727138,2640
1,1164502582443349792,852
2,7439415367079602718,1793
3,2574830646122322867,854
4,1893017458511098033,1305


We can make a query to check if our bucketing values result in the correct sizes of each of our dataset splits and then adjust accordingly

In [8]:
sampling_percentages_query = """
WITH
  -- Get label, features, and column that we are going to use to split into buckets on
  CTE_hash_cols_fixed AS (
  SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks,
    mother_race,
    father_race,
    mother_married,
    ever_born,
    weight_gain_pounds,
    cigarettes_per_day,
    year,
    month,
    CASE
      WHEN day IS NULL AND wday IS NULL THEN 0
    ELSE
    CASE
      WHEN day IS NULL THEN wday
    ELSE
    wday
  END
  END
    AS date,
    IFNULL(state,
      "Unknown") AS state,
    IFNULL(mother_birth_state,
      "Unknown") AS mother_birth_state
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000),
  CTE_data AS (
  SELECT
    weight_pounds,
    is_male,
    mother_age,
    plurality,
    gestation_weeks,
    mother_race,
    father_race,
    mother_married,
    weight_gain_pounds,
    ever_born,
    cigarettes_per_day,
    ABS(FARM_FINGERPRINT(CONCAT(CAST(year AS STRING), CAST(month AS STRING), CAST(date AS STRING), CAST(state AS STRING), CAST(mother_birth_state AS STRING)))) AS hashvalues
  FROM
    CTE_hash_cols_fixed),
  -- Get the counts of each of the unique hashs of our splitting column
  CTE_first_bucketing AS (
  SELECT
    hashvalues,
    COUNT(*) AS num_records
  FROM
    CTE_data
  GROUP BY
    hashvalues ),
  -- Get the number of records in each of the hash buckets
  CTE_second_bucketing AS (
  SELECT
    MOD(hashvalues, {0}) AS bucket_index,
    SUM(num_records) AS num_records
  FROM
    CTE_first_bucketing
  GROUP BY
    MOD(hashvalues, {0})),
  -- Calculate the overall percentages
  CTE_percentages AS (
  SELECT
    bucket_index,
    num_records,
    CAST(num_records AS FLOAT64) / (
    SELECT
      SUM(num_records)
    FROM
      CTE_second_bucketing) AS percent_records
  FROM
    CTE_second_bucketing ),
  -- Choose which of the hash buckets will be used for training and pull in their statistics
  CTE_train AS (
  SELECT
    *,
    "train" AS dataset_name
  FROM
    CTE_percentages
  WHERE
    bucket_index >= 0
    AND bucket_index < {1}),
  -- Choose which of the hash buckets will be used for validation and pull in their statistics
  CTE_eval AS (
  SELECT
    *,
    "eval" AS dataset_name
  FROM
    CTE_percentages
  WHERE
    bucket_index >= {1}
    AND bucket_index < {2}),
  -- Choose which of the hash buckets will be used for testing and pull in their statistics
  CTE_test AS (
  SELECT
    *,
    "test" AS dataset_name
  FROM
    CTE_percentages
  WHERE
    bucket_index >= {2}
    AND bucket_index < {0}),
  -- Union the training, validation, and testing dataset statistics
  CTE_union AS (
  SELECT
    0 AS dataset_id,
    *
  FROM
    CTE_train
  UNION ALL
  SELECT
    1 AS dataset_id,
    *
  FROM
    CTE_eval
  UNION ALL
  SELECT
    2 AS dataset_id,
    *
  FROM
    CTE_test ),
  -- Show final splitting and associated statistics
  CTE_split AS (
  SELECT
    dataset_id,
    dataset_name,
    SUM(num_records) AS num_records,
    SUM(percent_records) AS percent_records
  FROM
    CTE_union
  GROUP BY
    dataset_id,
    dataset_name )
SELECT
  *
FROM
  CTE_split
ORDER BY
    dataset_id
"""

modulo_divisor = 100
train_percent = 80.0
eval_percent = 10.0

train_buckets = int(modulo_divisor * train_percent / 100.0)
eval_buckets = int(modulo_divisor * eval_percent / 100.0)

df = bq.query(sampling_percentages_query.format(modulo_divisor, train_buckets, train_buckets + eval_buckets)).to_dataframe()
df.head()

Unnamed: 0,dataset_id,dataset_name,num_records,percent_records
0,0,train,26080035,0.783845
1,1,eval,3639721,0.109393
2,2,test,3552158,0.106761


#### **Exercise 1**

Modify the `query_string` above so to produce a 80/10/10 split for the train/valid/test set. Use the `hashvalues` taking an appropriate `MOD()` value.

**Hint**: You can use every_n in the SQL query to create a smaller subset of the data

In [9]:
# Added every_n so that we can now subsample from each of the hash values to get approximately the record counts we want
every_n = 100

train_query = "SELECT * FROM ({0}) WHERE MOD(hashvalues, {1} * 100) < 80".format(query_string, every_n)
eval_query = "SELECT * FROM ({0}) WHERE MOD(hashvalues, {1} * 100) >= 80 AND MOD(hashvalues, {1} * 100) < 90".format(query_string, every_n)
test_query = "SELECT * FROM ({0}) WHERE MOD(hashvalues, {1} * 100) >= 90 AND MOD(hashvalues, {1} * 100) < 100".format(query_string, every_n)

In [8]:
print(train_query)

SELECT * FROM (
WITH
  CTE_hash_cols_fixed AS (
  SELECT
    weight_pounds,
    is_male,
    mother_age,
    mother_race,
    father_race,
    cigarette_use,
    mother_married,
    ever_born,
    plurality,
    gestation_weeks,
    weight_gain_pounds,
    year,
    month,
    CASE
      WHEN day IS NULL AND wday IS NULL THEN 0
    ELSE
    CASE
      WHEN day IS NULL THEN wday
    ELSE
    wday
  END
  END
    AS date,
    IFNULL(state,
      "Unknown") AS state,
    IFNULL(mother_birth_state,
      "Unknown") AS mother_birth_state
  FROM
    publicdata.samples.natality
  WHERE
    year > 2000)

SELECT
  weight_pounds,
  is_male,
  mother_age,
  mother_race,
  father_race,
  cigarette_use,
  mother_married,
  ever_born,
  weight_gain_pounds,
  plurality,
  gestation_weeks,
  ABS(FARM_FINGERPRINT(CONCAT(CAST(year AS STRING), CAST(month AS STRING), CAST(date AS STRING), CAST(state AS STRING), CAST(mother_birth_state AS STRING)))) AS hashvalues
FROM
  CTE_hash_cols_fixed
) WHERE MOD(hash

In [13]:
train_df = bq.query(train_query).to_dataframe()

In [14]:
eval_df = bq.query(eval_query).to_dataframe()

In [15]:
test_df = bq.query(test_query).to_dataframe()

In [16]:
print("There are {} examples in the train dataset.".format(len(train_df)))
print("There are {} examples in the validation dataset.".format(len(eval_df)))
print("There are {} examples in the test dataset.".format(len(test_df)))

There are 281921 examples in the train dataset.
There are 14265 examples in the validation dataset.
There are 12700 examples in the test dataset.


## Preprocess data using Pandas

We'll perform a few preprocessing steps to the data in our dataset. Let's add extra rows to simulate the lack of ultrasound. That is we'll duplicate some rows and make the `is_male` field be `Unknown`. Also, if there is more than child we'll change the `plurality` to `Multiple(2+)`. While we're at it, We'll also change the plurality column to be a string. We'll perform these operations below. 

Let's start by examining the training dataset as is.

In [17]:
train_df.head()

Unnamed: 0,weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,hashvalues
0,3.688334,False,32,28.0,28.0,,True,1.0,28.0,2,35.0,6108646794532530006
1,9.210913,True,27,28.0,28.0,,True,1.0,99.0,1,38.0,6509849307615120027
2,7.500126,False,36,68.0,68.0,False,True,2.0,33.0,1,39.0,8717259940738900003
3,8.126239,True,32,28.0,28.0,,True,1.0,40.0,1,39.0,564625834676230020
4,6.437498,False,21,1.0,68.0,,True,1.0,99.0,1,38.0,639641401909930058


Also, notice that there are some very important numeric fields that are missing in some rows (the count in Pandas doesn't count missing data)

In [18]:
train_df.describe(include='all')

Unnamed: 0,weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,hashvalues
count,281699.0,281921,281921.0,215437.0,215437.0,129043,281921,280777.0,277722.0,281921.0,279414.0,281921.0
unique,,2,,,,2,2,,,,,
top,,True,,,,False,True,,,,,
freq,,144844,,,,115962,181525,,,,,
mean,7.241341,,27.436243,2.794998,13.210999,,,2.067071,43.424086,1.035464,38.582065,4.668618e+18
std,1.324503,,6.160549,9.368746,30.77687,,,1.236902,28.618292,0.195522,2.585737,2.941035e+18
min,0.500449,,11.0,1.0,1.0,,,1.0,1.0,1.0,17.0,272090200000000.0
25%,6.563162,,23.0,1.0,1.0,,,1.0,25.0,1.0,38.0,1.441975e+18
50%,7.319347,,27.0,1.0,1.0,,,2.0,34.0,1.0,39.0,5.786564e+18
75%,8.062305,,32.0,1.0,2.0,,,3.0,50.0,1.0,40.0,6.514283e+18


In [19]:
train_df.isnull().sum(axis=0)

weight_pounds            222
is_male                    0
mother_age                 0
mother_race            66484
father_race            66484
cigarette_use         152878
mother_married             0
ever_born               1144
weight_gain_pounds      4199
plurality                  0
gestation_weeks         2507
hashvalues                 0
dtype: int64

It is always crucial to clean raw data before using in machine learning, so we have a preprocessing step. We'll define a `preprocess` function below. Note that the mother's age is an input to our model so users will have to provide the mother's age; otherwise, our service won't work. The features we use for our model were chosen because they are such good predictors and because they are easy enough to collect.

#### **Exercise 2**

The code cell below has some TODOs for you to complete.

In the first block of TODOs, we'll clean the data so that 
- `weight_pounds` is always positive
- `mother_age` is always positive
- `gestation_weeks` is always positive
- `plurality` is always positive

The next block of TODOs will create extra rows to simulate lack of ultrasound information. That is, we'll make a copy of the dataframe and call it `no_ultrasound`. Then, use Pandas functionality to make two changes in place to `no_ultrasound`:
- set the `plurality` value of `no_ultrasound` to be 'Multiple(2+)' whenever the plurality is not 'Single(1)'
- set the `is_male` value of `no_ultrasound` to be 'Unknown'

In [20]:
train_df['mother_age'].value_counts()

28    15496
26    15334
29    15271
27    15264
25    15034
24    14998
30    14885
23    14771
22    14421
31    14232
21    13619
32    13367
20    12861
33    12246
34    11320
19    11277
35     9669
18     8143
36     7693
37     6233
17     5039
38     4984
39     3941
16     2839
40     2726
41     1867
15     1246
42     1220
43      669
14      398
44      354
45      201
46       93
13       73
47       58
48       25
50       20
12       13
49       11
51        9
11        1
Name: mother_age, dtype: int64

In [21]:
train_df['gestation_weeks'].value_counts()

39.0    70002
40.0    54076
38.0    52644
37.0    25399
41.0    25283
36.0    13064
42.0     8141
35.0     7372
34.0     4742
43.0     4322
33.0     2709
44.0     2185
32.0     1803
31.0     1208
45.0     1134
30.0      950
29.0      676
46.0      624
28.0      588
27.0      433
26.0      384
47.0      343
25.0      326
24.0      264
23.0      233
22.0      173
21.0      132
20.0       90
19.0       61
18.0       31
17.0       22
Name: gestation_weeks, dtype: int64

In [22]:
train_df['plurality'].value_counts()

1    272441
2      8998
3       459
5        13
4        10
Name: plurality, dtype: int64

In [23]:
import pandas as pd

def preprocess(df):
    # Clean up data
    # Remove what we don"t want to use for training
    df = df[(df['weight_pounds'] > 0) & (df['weight_pounds'] < 20)]   # We don't have crazy-high values but discard anyway in case we retrain in future 
    df = df[(df['mother_age'] >= 14) & (df['mother_age'] <= 45)]
    df = df[df['gestation_weeks'] >= 22]
    df = df[df['plurality'] > 0]
    df['weight_pounds'].dropna(inplace=True)
    df['gestation_weeks'].dropna(inplace=True)
    df['mother_age'].dropna(inplace=True)
    df['cigarette_use'].fillna(False, inplace=True)
    df['mother_race'].fillna(0.0, inplace=True)
    df['father_race'].fillna(0.0, inplace=True)
    df['mother_race'] = df['mother_race'].astype(str)
    df['father_race'] = df['father_race'].astype(str)
    # Modify plurality field to be a string
    twins_etc = dict(zip([1,2,3,4,5],
                   ["Single(1)", "Twins(2)", "Triplets(3)", "Quadruplets(4)", "Quintuplets(5)"]))
    df["plurality"].replace(twins_etc, inplace=True)
    df['had_ultrasound'] = True

    # Now create extra rows to simulate lack of ultrasound
    no_ultrasound = df.copy(deep=True)
    no_ultrasound['is_male'] = 'Unknown'
    no_ultrasound['plurality'] = no_ultrasound['plurality'].map(lambda x: 'Multiple(2+)' if x != 'Single(1)' else x)
    no_ultrasound['had_ultrasound'] = False
    # Concatenate both datasets together and shuffle
    return pd.concat([df, no_ultrasound]).sample(frac=1).reset_index(drop=True)

Let's process the train/eval/test set and see a small sample of the training data after our preprocessing:

In [24]:
train_df = preprocess(train_df)
eval_df = preprocess(eval_df)
test_df = preprocess(test_df)

In [25]:
eval_df.head()

Unnamed: 0,weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,hashvalues,had_ultrasound
0,4.312242,Unknown,24,1.0,1.0,True,False,2.0,35.0,Single(1),31.0,64970426453540084,False
1,8.375361,True,22,1.0,1.0,False,True,1.0,50.0,Single(1),41.0,6457405458860680082,True
2,6.01862,True,36,1.0,1.0,False,True,2.0,30.0,Single(1),40.0,3578808539220210083,True
3,6.375769,False,33,1.0,1.0,False,True,2.0,26.0,Single(1),36.0,3371685991680330081,True
4,7.18707,Unknown,20,1.0,1.0,False,True,2.0,5.0,Single(1),38.0,8947997276250310086,False


In [22]:
train_df.tail()

Unnamed: 0,weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,hashvalues,had_ultrasound
557359,7.687519,Unknown,21,1.0,1.0,False,True,2.0,34.0,Single(1),41.0,8026788062750630025,False
557360,6.558752,False,18,2.0,1.0,False,False,1.0,42.0,Single(1),40.0,8183076475299060040,True
557361,6.124442,Unknown,22,1.0,1.0,False,True,1.0,30.0,Single(1),38.0,9207087581648290004,False
557362,8.750147,Unknown,32,1.0,1.0,False,True,3.0,99.0,Single(1),39.0,5786564370389690066,False
557363,6.847558,Unknown,23,0.0,0.0,False,True,2.0,39.0,Single(1),39.0,851529813781310044,False


Let's look again at a summary of the dataset. Note that we only see numeric columns, so `plurality` does not show up.

In [23]:
train_df.describe(include='all')

Unnamed: 0,weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,hashvalues,had_ultrasound
count,557364.0,557364,557364.0,557364.0,557364.0,557364,557364,555384.0,549076.0,557364,557364.0,557364.0,557364
unique,,3,,16.0,17.0,2,2,,,6,,,2
top,,Unknown,,1.0,1.0,False,True,,,Single(1),,,True
freq,,278682,,332720.0,293296.0,531298,359520,,,538880,,,278682
mean,7.248311,,27.429299,,,,,2.064604,42.958308,,38.607004,4.667194e+18,
std,1.307728,,6.130326,,,,,1.233405,28.260017,,2.499858,2.944734e+18,
min,0.500449,,14.0,,,,,1.0,1.0,,22.0,272090200000000.0,
25%,6.569775,,23.0,,,,,1.0,25.0,,38.0,1.418299e+18,
50%,7.319347,,27.0,,,,,2.0,34.0,,39.0,5.786564e+18,
75%,8.062305,,32.0,,,,,3.0,50.0,,40.0,6.514283e+18,


In [24]:
train_df['mother_race'].unique()

array(['0.0', '2.0', '1.0', '7.0', '28.0', '9.0', '68.0', '78.0', '18.0',
       '4.0', '3.0', '5.0', '48.0', '58.0', '6.0', '38.0'], dtype=object)

In [25]:
train_df['plurality'].unique()

array(['Single(1)', 'Twins(2)', 'Multiple(2+)', 'Triplets(3)',
       'Quadruplets(4)', 'Quintuplets(5)'], dtype=object)

## Write to .csv files 

In the final versions, we want to read from files, not Pandas dataframes. So, we write the Pandas dataframes out as csv files. Using csv files gives us the advantage of shuffling during read. This is important for distributed training because some workers might be slower than others, and shuffling the data helps prevent the same data from being assigned to the slow workers.

#### **Exercise 3**

Complete the code in the cell below to write the the three Pandas dataframes you made above to csv files. Have a look at [the documentation for `.to_csv`]( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) to remind yourself its usage. Remove `hashvalues` from the data since we will not be using it in training so there is no need to move around extra data.

In [26]:
del train_df['hashvalues']
del test_df['hashvalues']
del eval_df['hashvalues']

In [27]:
train_df['plurality'].value_counts()

Single(1)         538880
Multiple(2+)        9242
Twins(2)            8786
Triplets(3)          434
Quintuplets(5)        13
Quadruplets(4)         9
Name: plurality, dtype: int64

In [28]:
train_df.to_csv("babyweight_train.csv", index=False)
test_df.to_csv("babyweight_test.csv", index=False)
eval_df.to_csv("babyweight_valid.csv", index=False)

Check your work above by inspecting the files you made. 

In [29]:
%%bash
wc -l *.csv

   25255 babyweight_test.csv
  557365 babyweight_train.csv
   28307 babyweight_valid.csv
  610927 total


In [30]:
%%bash
head *.csv

==> babyweight_test.csv <==
weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_pounds,plurality,gestation_weeks,had_ultrasound
6.87621795178,Unknown,30,1.0,1.0,False,True,1.0,25.0,Single(1),39.0,False
8.7633749145,Unknown,17,0.0,0.0,False,False,1.0,40.0,Single(1),41.0,False
6.2501051276999995,False,23,1.0,1.0,False,True,3.0,23.0,Single(1),39.0,True
5.37486994756,False,24,0.0,0.0,False,False,3.0,10.0,Single(1),36.0,True
7.5398093604,Unknown,33,1.0,1.0,False,True,1.0,21.0,Single(1),40.0,False
8.000575487979999,Unknown,34,0.0,0.0,False,True,1.0,42.0,Single(1),40.0,False
7.31273323054,Unknown,27,0.0,0.0,False,False,5.0,12.0,Single(1),36.0,False
7.3744626639,Unknown,37,1.0,1.0,False,True,1.0,40.0,Single(1),40.0,False
6.7020527647999995,Unknown,22,1.0,1.0,False,True,2.0,30.0,Single(1),41.0,False

==> babyweight_train.csv <==
weight_pounds,is_male,mother_age,mother_race,father_race,cigarette_use,mother_married,ever_born,weight_gain_poun

In [31]:
%%bash
tail *.csv

==> babyweight_test.csv <==
7.5618555866,True,29,1.0,1.0,False,True,1.0,40.0,Single(1),41.0,True
6.87621795178,Unknown,35,1.0,1.0,False,True,3.0,32.0,Single(1),37.0,False
8.3004041643,True,27,1.0,1.0,False,True,1.0,45.0,Single(1),40.0,True
6.8122838958,Unknown,20,1.0,1.0,False,False,1.0,25.0,Single(1),39.0,False
7.76909011288,Unknown,29,1.0,1.0,False,True,1.0,44.0,Single(1),40.0,False
9.37405538024,Unknown,33,18.0,18.0,False,True,2.0,42.0,Single(1),41.0,False
6.4374980503999994,Unknown,40,1.0,1.0,False,True,1.0,38.0,Single(1),42.0,False
8.50102482272,True,30,9.0,1.0,False,True,2.0,60.0,Single(1),39.0,True
8.344496616699999,Unknown,37,7.0,7.0,False,True,1.0,35.0,Single(1),40.0,False
7.5618555866,False,31,0.0,0.0,False,True,2.0,25.0,Single(1),38.0,True

==> babyweight_train.csv <==
6.87621795178,Unknown,44,9.0,9.0,False,True,5.0,23.0,Single(1),40.0,False
2.43831261772,Unknown,36,1.0,2.0,False,True,2.0,24.0,Single(1),34.0,False
4.68702769012,False,27,0.0,0.0,False,False,2.0,25.0,Single(1)

Copyright 2017-2018 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License