Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

Chapter 5: Continued Evaluation: Dataset Access, EarlyStopping, Evaluation #10

Closed
mshearer0 opened this issue Jul 28, 2020 · 1 comment

Comments

@mshearer0
Copy link

mshearer0 commented Jul 28, 2020

  1. The munn-sandbox is not publically available so the txtcls is not available.

I created using code from: https://datalab.office.datisan.com.au/notebooks/training-data-analyst/blogs/textclassification/txtcls.ipynb

as:

query="""
SELECT source, REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title FROM
(SELECT
ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,
title
FROM
bigquery-public-data.hacker_news.stories
WHERE
REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.
://(.[^/]+)/'), '.com$')
AND LENGTH(title) > 10
)
WHERE (source = 'github' OR source = 'nytimes' OR source = 'techcrunch')
"""

from google.cloud import bigquery
client = bigquery.Client()
df = client.query(query).to_dataframe()
df.to_csv('titles_full.csv', header=False, index=False, encoding='utf-8', sep=',')

I had to swap the column order:
COLUMNS = ['source', 'title']

  1. With EarlyStopping enabled training finished after just 2 Epochs

callbacks=[EarlyStopping(), TensorBoard(model_dir)],

without it loss was minimised after 20.

  1. Evaluation job section is 'to-do':

"some stuff here about setting up Eval jobs"

@mshearer0 mshearer0 changed the title Chapter 5 - Continued Evaluation: Dataset Access, EarlyStopping, Evaluation Job Chapter 5 - Continued Evaluation: Dataset Access, EarlyStopping, Evaluation Jul 28, 2020
@mshearer0 mshearer0 changed the title Chapter 5 - Continued Evaluation: Dataset Access, EarlyStopping, Evaluation Chapter 5: Continued Evaluation: Dataset Access, EarlyStopping, Evaluation Jul 28, 2020
@mshearer0
Copy link
Author

Closed as new version updated since download

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant