Use Timestream #702

coderbyheart · 2020-11-26T16:58:02Z

If you need to migrate existing historical data, see this comment.

coderbyheart · 2020-11-27T16:05:24Z

Historical data can be imported from S3 (the data source for Athena), if needed, however that means that the memory store retention of Timestream has to be as long as the oldest entry that should be imported, which increases costs: pricing for memory store is 720 times higher than magnetic storage, so we can't just add a long memory store for every new user.

After importing data, the memory retention period can be changed to a shorter period again.

Note: The rate at which Timestream writes to magnetic store from the memory store depends on a number of factors such as the volume of data. AWS does not provide exact numbers. Timestream writes data to the magnetic store in an async process and tries to do this as quickly as possible. But writing historical data and then immediately reducing the memory storage retention is a safe operation: the data will be available during the move from the memory store to the magnetic store.

For reference here is an import script:

import { stackOutput } from '@bifravst/cloudformation-helpers'
import { CloudFormation, S3, TimestreamWrite } from 'aws-sdk'
import { promises as fs } from 'fs'
import * as path from 'path'
import { v4 } from 'uuid'
import { CORE_STACK_NAME } from './cdk/stacks/stackName'
import { messageToTimestreamRecords } from './historicalData/messageToTimestreamRecords'
import { shadowUpdateToTimestreamRecords } from './historicalData/shadowUpdateToTimestreamRecords'
import { storeRecordsInTimeseries } from './historicalData/storeRecordsInTimeseries'

const s3 = new S3({
	region: process.env.AWS_DEFAULT_REGION,
})

const timestream = new TimestreamWrite({
	region: process.env.AWS_DEFAULT_REGION,
})

const chunk = (arr: any[], size: number): any[][] =>
	arr.reduce(
		(chunks, el, i) =>
			(i % size ? chunks[chunks.length - 1].push(el) : chunks.push([el])) &&
			chunks,
		[],
	)

const fetchFiles = async () => {
	try {
		await fs.mkdir(path.resolve(process.cwd(), 'messages'))
		const files = (await s3
			.listObjects({
				Bucket: process.env.HISTORICAL_DATA_BUCKET as string,
				Prefix: 'updates',
			})
			.promise()
			.then(({ Contents }) => Contents?.map(({ Key }) => Key))) as string[]

		await Promise.all(
			files?.map((file) =>
				s3
					.getObject({
						Bucket: process.env.HISTORICAL_DATA_BUCKET as string,
						Key: file,
					})
					.promise()
					.then(async (res) => {
						await fs.writeFile(
							path.resolve(process.cwd(), 'messages', v4()),
							res.Body as string,
							'utf-8',
						)
						console.log(file, 'written')
					}),
			),
		)
	} catch {}
}

const main = async () => {
	const { historicaldataTableInfo } = await stackOutput(
		new CloudFormation({ region: process.env.AWS_DEFAULT_REGION }),
	)(CORE_STACK_NAME)

	console.log(historicaldataTableInfo)

	const [DatabaseName, TableName] = historicaldataTableInfo.split('|')

	const store = storeRecordsInTimeseries({
		timestream,
		DatabaseName,
		TableName,
	})

	await fetchFiles()

	const files = await fs.readdir(path.resolve(process.cwd(), 'messages'))
	const records = (
		await Promise.all(
			files.map((file) =>
				fs
					.readFile(path.resolve(process.cwd(), 'messages', file), 'utf-8')
					.then((s) =>
						s
							.split('\n')
							.map((l) => {
								const event = JSON.parse(l)
								if ('reported' in event) {
									return shadowUpdateToTimestreamRecords(event)
								}
								if ('message' in event) {
									return messageToTimestreamRecords(event)
								}
								console.error({
									error: 'Event ignored',
									event,
								})
								return []
							})
							.flat()
							.filter((s) => s !== undefined),
					),
			),
		)
	).flat() as TimestreamWrite.Records

	await chunk(records, 100).reduce(
		(p: Promise<void>, c: Record<string, any>[], k) =>
			p.then(async () => {
				console.log(`${(k + 1) * 100} / ${records.length}`)
				return store(c)
			}),
		Promise.resolve(),
	)
}

main()

BREAKING CHANGE: This changes the historical data storage to Timestream See [this comment](#702 (comment)) in case you want to migrate existing device data.

BREAKING CHANGE: The switches the historical data storage to Timestream If you need to migrate existing historical data, [see this comment](bifravst/aws#702 (comment)).

coderbyheart added enhancement New feature or request BREAKING CHANGE This will break something labels Nov 26, 2020

coderbyheart self-assigned this Nov 26, 2020

coderbyheart added this to Doing in Development Nov 26, 2020

coderbyheart added 4 commits November 27, 2020 15:38

feat: add Timestream

e348dc9

feat: add Timestream

f446494

ci: remove node cli historical-data

6e7ed70

feat: finalize switch to Timestream

e6284d6

coderbyheart force-pushed the use-timestream branch from d8af868 to e6284d6 Compare November 27, 2020 14:39

coderbyheart marked this pull request as ready for review November 27, 2020 14:41

coderbyheart added 2 commits November 27, 2020 17:06

fix: improve some names

aeb7711

ci: run release step only on saga

7c3d587

coderbyheart added a commit that referenced this pull request Nov 30, 2020

feat: add Timestream (#702)

85e913d

BREAKING CHANGE: This changes the historical data storage to Timestream See [this comment](#702 (comment)) in case you want to migrate existing device data.

coderbyheart closed this Nov 30, 2020

Development automation moved this from Doing to Done Nov 30, 2020

coderbyheart deleted the use-timestream branch December 14, 2020 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Timestream #702

Use Timestream #702

coderbyheart commented Nov 26, 2020 •

edited

coderbyheart commented Nov 27, 2020 •

edited

Use Timestream #702

Use Timestream #702

Conversation

coderbyheart commented Nov 26, 2020 • edited

coderbyheart commented Nov 27, 2020 • edited

coderbyheart commented Nov 26, 2020 •

edited

coderbyheart commented Nov 27, 2020 •

edited