Is the pgml.digits dataset storing images in psql? #488
-
I see one of the examples you present in the docs is a digits dataset. SELECT pgml.train(
'My First PostgresML Project',
task => 'regression',
relation_name => 'pgml.digits',
y_column_name => 'target',
algorithm => 'xgboost'
); Now I am guessing this is the classical MNIST dataset. Btw, super interesting project you have going here! Keep it going 🚀 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
In terms of image format, ML algos require images to be either 2D arrays for black and white or 3D arrays for color. The MNIST data is stored as an 8x8 pixel black and white image with 16 shades of gray, i.e. a Postgres https://github.com/postgresml/postgresml/blob/master/pgml-extension/src/orm/dataset.rs#L371 You definitely can store images and other binary data in a database, but I think the question is should you? A CDN fronting something like an S3 bucket is a better way to store and serve image content for a web application, rather than directly out of a database. Here are a few reasons you should consider vertically sharding your binary data (image, audio, large text...) into a different storage and distribution mechanism other than your primary database.
These same reasons may not hold up for an ML application with a dedicated ML database.
|
Beta Was this translation helpful? Give feedback.
In terms of image format, ML algos require images to be either 2D arrays for black and white or 3D arrays for color. The MNIST data is stored as an 8x8 pixel black and white image with 16 shades of gray, i.e. a Postgres
SMALLINT[][]
.https://github.com/postgresml/postgresml/blob/master/pgml-extension/src/orm/dataset.rs#L371
You definitely can store images and other binary data in a database, but I think the question is should you? A CDN fronting something like an S3 bucket is a better way to store and serve image content for a web application, rather than directly out of a database. Here are a few reasons you should consider vertically sharding your binary data (image, audio, large text.…