Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Use Timestream #702

Closed
wants to merge 6 commits into from
Closed

Use Timestream #702

wants to merge 6 commits into from

Conversation

coderbyheart
Copy link
Member

@coderbyheart coderbyheart commented Nov 26, 2020

See #394

If you need to migrate existing historical data, see this comment.

@coderbyheart coderbyheart added enhancement New feature or request BREAKING CHANGE This will break something labels Nov 26, 2020
@coderbyheart coderbyheart self-assigned this Nov 26, 2020
@coderbyheart coderbyheart added this to Doing in Development Nov 26, 2020
@coderbyheart coderbyheart marked this pull request as ready for review November 27, 2020 14:41
@coderbyheart
Copy link
Member Author

coderbyheart commented Nov 27, 2020

Historical data can be imported from S3 (the data source for Athena), if needed, however that means that the memory store retention of Timestream has to be as long as the oldest entry that should be imported, which increases costs: pricing for memory store is 720 times higher than magnetic storage, so we can't just add a long memory store for every new user.

After importing data, the memory retention period can be changed to a shorter period again.

Note: The rate at which Timestream writes to magnetic store from the memory store depends on a number of factors such as the volume of data. AWS does not provide exact numbers. Timestream writes data to the magnetic store in an async process and tries to do this as quickly as possible. But writing historical data and then immediately reducing the memory storage retention is a safe operation: the data will be available during the move from the memory store to the magnetic store.

For reference here is an import script:

import { stackOutput } from '@bifravst/cloudformation-helpers'
import { CloudFormation, S3, TimestreamWrite } from 'aws-sdk'
import { promises as fs } from 'fs'
import * as path from 'path'
import { v4 } from 'uuid'
import { CORE_STACK_NAME } from './cdk/stacks/stackName'
import { messageToTimestreamRecords } from './historicalData/messageToTimestreamRecords'
import { shadowUpdateToTimestreamRecords } from './historicalData/shadowUpdateToTimestreamRecords'
import { storeRecordsInTimeseries } from './historicalData/storeRecordsInTimeseries'

const s3 = new S3({
	region: process.env.AWS_DEFAULT_REGION,
})

const timestream = new TimestreamWrite({
	region: process.env.AWS_DEFAULT_REGION,
})

const chunk = (arr: any[], size: number): any[][] =>
	arr.reduce(
		(chunks, el, i) =>
			(i % size ? chunks[chunks.length - 1].push(el) : chunks.push([el])) &&
			chunks,
		[],
	)

const fetchFiles = async () => {
	try {
		await fs.mkdir(path.resolve(process.cwd(), 'messages'))
		const files = (await s3
			.listObjects({
				Bucket: process.env.HISTORICAL_DATA_BUCKET as string,
				Prefix: 'updates',
			})
			.promise()
			.then(({ Contents }) => Contents?.map(({ Key }) => Key))) as string[]

		await Promise.all(
			files?.map((file) =>
				s3
					.getObject({
						Bucket: process.env.HISTORICAL_DATA_BUCKET as string,
						Key: file,
					})
					.promise()
					.then(async (res) => {
						await fs.writeFile(
							path.resolve(process.cwd(), 'messages', v4()),
							res.Body as string,
							'utf-8',
						)
						console.log(file, 'written')
					}),
			),
		)
	} catch {}
}

const main = async () => {
	const { historicaldataTableInfo } = await stackOutput(
		new CloudFormation({ region: process.env.AWS_DEFAULT_REGION }),
	)(CORE_STACK_NAME)

	console.log(historicaldataTableInfo)

	const [DatabaseName, TableName] = historicaldataTableInfo.split('|')

	const store = storeRecordsInTimeseries({
		timestream,
		DatabaseName,
		TableName,
	})

	await fetchFiles()

	const files = await fs.readdir(path.resolve(process.cwd(), 'messages'))
	const records = (
		await Promise.all(
			files.map((file) =>
				fs
					.readFile(path.resolve(process.cwd(), 'messages', file), 'utf-8')
					.then((s) =>
						s
							.split('\n')
							.map((l) => {
								const event = JSON.parse(l)
								if ('reported' in event) {
									return shadowUpdateToTimestreamRecords(event)
								}
								if ('message' in event) {
									return messageToTimestreamRecords(event)
								}
								console.error({
									error: 'Event ignored',
									event,
								})
								return []
							})
							.flat()
							.filter((s) => s !== undefined),
					),
			),
		)
	).flat() as TimestreamWrite.Records

	await chunk(records, 100).reduce(
		(p: Promise<void>, c: Record<string, any>[], k) =>
			p.then(async () => {
				console.log(`${(k + 1) * 100} / ${records.length}`)
				return store(c)
			}),
		Promise.resolve(),
	)
}

main()

coderbyheart added a commit that referenced this pull request Nov 30, 2020
BREAKING CHANGE: This changes the historical data storage to Timestream

See [this comment](#702 (comment))
in case you want to migrate existing device data.
Development automation moved this from Doing to Done Nov 30, 2020
coderbyheart added a commit to bifravst/bifravst that referenced this pull request Nov 30, 2020
BREAKING CHANGE: The switches the historical data storage to Timestream

If you need to migrate existing historical data, [see this comment](bifravst/aws#702 (comment)).
@coderbyheart coderbyheart deleted the use-timestream branch December 14, 2020 14:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
BREAKING CHANGE This will break something enhancement New feature or request
Projects
No open projects
Development
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

1 participant