Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
cdd0416
Sort by Taken Date UI Changes
AndyKilmory Jun 17, 2025
ec8b5d0
JS syntax corrections
AndyKilmory Jun 17, 2025
335e25e
add in new control files
AndyKilmory Jun 17, 2025
be5d9a5
js syntax errors
AndyKilmory Jun 17, 2025
5fa73da
correcting sort by taken date ui defects
AndyKilmory Jun 23, 2025
b9828e1
revised build of sort by taken date ui
AndyKilmory Aug 13, 2025
908cbbc
js syntax corrections
AndyKilmory Aug 13, 2025
5da1bd4
further js corrections
AndyKilmory Aug 13, 2025
09c17ca
further js corrections - removing unused vars
AndyKilmory Aug 13, 2025
929de2d
single sort control operation (no tab control)
AndyKilmory Aug 21, 2025
3db4001
resolve css and collections issues in single control config
AndyKilmory Aug 21, 2025
d234588
fixing dual control issues
AndyKilmory Aug 23, 2025
324caac
fixing BBC config bugs
AndyKilmory Aug 27, 2025
824aa54
correcting GNM config css
AndyKilmory Aug 28, 2025
de3a5e7
correcting bug in order by after filter change
AndyKilmory Sep 1, 2025
6a7e827
enabling correct order by reset on collection change
AndyKilmory Sep 2, 2025
5eed78c
correcting issue when page refresh mis-sets order by
AndyKilmory Sep 2, 2025
bf0c6f4
correcting css error
AndyKilmory Sep 3, 2025
1300a3e
correcting js syntax errors
AndyKilmory Sep 3, 2025
8d4c7e7
correcting css errors and adding use of has:dateTaken chip to basic s…
AndyKilmory Sep 11, 2025
6aebd1c
removing use of taken date chip in basic operation
AndyKilmory Sep 11, 2025
a9a6c6a
removing unused vars
AndyKilmory Sep 11, 2025
bbe606f
add bedrock class
ellenmuller Aug 27, 2025
4c90556
copy over useful bits from old branch
ellenmuller Oct 1, 2025
1987034
start to s3 vectors class
ellenmuller Oct 1, 2025
f2fa09e
create putVector function
ellenmuller Oct 1, 2025
e9459bf
wip commit
ellenmuller Oct 7, 2025
9e51229
successful write to s3 vector store
ellenmuller Oct 7, 2025
721a6be
rename and tidy up
ellenmuller Oct 7, 2025
bff866c
tidy up more logging
ellenmuller Oct 7, 2025
333854c
tidy up error handling
ellenmuller Oct 8, 2025
d92802e
pull in main
ellenmuller Oct 8, 2025
a617989
update tests
ellenmuller Oct 8, 2025
7da7c0d
sort out mock (hopefully?)
ellenmuller Oct 9, 2025
828f8ca
Fix an issue where typing e.g. -sport{Enter} would not-dechip the inp…
jonathonherbert Oct 9, 2025
d1a3401
Correct name -> fieldName
jonathonherbert Oct 9, 2025
be7f6ba
Merge pull request #4540 from guardian/jsh/fix-filter-fields
jonathonherbert Oct 9, 2025
7f91ce9
Persist state of Apply circular mask square crop checkbox in localSto…
paperboyo Oct 10, 2025
d0ad13e
Catch error in console
paperboyo Oct 10, 2025
342235c
Clearer code
paperboyo Oct 10, 2025
a0c86a8
Don’t trust anyone
paperboyo Oct 10, 2025
35e252b
Bump CQL to 1.8.2
jonathonherbert Oct 13, 2025
74d1d1d
Merge pull request #4543 from guardian/jsh/bump-cql
jonathonherbert Oct 13, 2025
9c3f6dd
Merge pull request #4541 from guardian/mk-store-cropMask
paperboyo Oct 13, 2025
4484f21
add image embed config setting
ellenmuller Oct 14, 2025
1878a4f
refactor into embedding class and rename fetch to create
ellenmuller Oct 14, 2025
da90978
Modifications in response to PR comments
AndyKilmory Oct 14, 2025
fa69542
update tests
ellenmuller Oct 15, 2025
b05985e
refactor to embedder
ellenmuller Oct 15, 2025
7bafc0e
update script
ellenmuller Oct 16, 2025
e232b07
update pacakge lock
ellenmuller Oct 16, 2025
4d80752
merge in main
ellenmuller Oct 16, 2025
a3ba79c
Propagate pgUp and pgDown when caret in searchbox
paperboyo Oct 17, 2025
a32ef3b
Merge pull request #4544 from guardian/mk-propagate-pgUpDown-CQL
andrew-nowak Oct 17, 2025
e6e16bf
action comments
ellenmuller Oct 20, 2025
9c91481
Merge branch 'main' of github.com:guardian/grid into em-embed-to-s3-v…
ellenmuller Oct 20, 2025
3e9842e
only add sdk to common lib
ellenmuller Oct 20, 2025
3355eba
update elasticsearch container version
ellenmuller Oct 20, 2025
23457ff
change ci elasticsearch image version
ellenmuller Oct 20, 2025
2ad5d77
add check if > 5MB or not JPEG
ellenmuller Oct 21, 2025
706d882
Spotted some more useless byline values
paperboyo Oct 21, 2025
eb464e6
add shouldembed to image loader config script
ellenmuller Oct 22, 2025
f210ffd
check if image is compatible with cohere
ellenmuller Oct 22, 2025
d7e37a6
Merge pull request #4547 from guardian/mk-more-str-cruft
andrew-nowak Oct 24, 2025
449cb5d
Add link to docs in comment
joelochlann Oct 24, 2025
2886dc1
Indent comment to match surrounding indentation
joelochlann Oct 24, 2025
c6bdf29
Rename input => request to match type and function name
joelochlann Oct 24, 2025
97c2768
Back out formatting change to match existing style
joelochlann Oct 24, 2025
6597603
Indent comment to match surrounding indentation
joelochlann Oct 24, 2025
3aad809
embedder => maybeEmbedder to reflect optional type
joelochlann Oct 24, 2025
f76135e
maybeEmbed => maybeEmbedder
Oct 24, 2025
ed7c5d6
Indent comment to match surrounding indentation
Oct 24, 2025
c7d09d5
Mention image embedding in all log message so we can find them easily
Oct 24, 2025
1322691
Merge pull request #4475 from bbc/t2347-taken-date-ui
andrew-nowak Oct 27, 2025
47656fa
Merge pull request #4539 from guardian/em-embed-to-s3-vector-store
ellenmuller Oct 30, 2025
a7a5d7f
Add to PhotographerRenamer dictionary #36
paperboyo Oct 31, 2025
97e0de9
Merge pull request #4551 from guardian/mk-dict-add-36
andrew-nowak Oct 31, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
pull-requests: write
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
image: docker.elastic.co/elasticsearch/elasticsearch:7.17.29
# Wait for elasticsearch to report healthy before continuing.
# see https://github.com/actions/example-services/blob/master/.github/workflows/postgres-service.yml#L28
options: -e "discovery.type=single-node" --expose 9200 --health-cmd "curl localhost:9200/_cluster/health" --health-interval 10s --health-timeout 5s --health-retries 10
Expand Down
5 changes: 4 additions & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Global / concurrentRestrictions := Seq(
)

val awsSdkVersion = "1.12.470"
val awsSdkV2Version = "2.31.12"
val awsSdkV2Version = "2.32.33"
val elastic4sVersion = "8.18.2"
val okHttpVersion = "3.12.1"

Expand Down Expand Up @@ -107,6 +107,9 @@ lazy val commonLib = project("common-lib").settings(
"org.scanamo" %% "scanamo" % "2.0.0",
// declare explicit dependency on desired version of aws sdk v2 dynamo
"software.amazon.awssdk" % "dynamodb" % awsSdkV2Version,
// declare explicit dependency on desired version of aws sdk v2 bedrock runtime
"software.amazon.awssdk" % "bedrockruntime" % awsSdkV2Version,
"software.amazon.awssdk" % "s3vectors" % awsSdkV2Version,
ws,
"org.testcontainers" % "elasticsearch" % "1.19.2" % Test
),
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
package com.gu.mediaservice.lib.aws

import software.amazon.awssdk.services.bedrockruntime.model._
import software.amazon.awssdk.services.bedrockruntime._
import com.gu.mediaservice.lib.config.CommonConfig
import play.api.libs.json.Json
import software.amazon.awssdk.core.SdkBytes

import java.net.URI
import com.gu.mediaservice.lib.logging.LogMarker
import play.api.libs.json.OFormat.oFormatFromReadsAndOWrites
import play.api.libs.json._

object Bedrock {
private case class BedrockRequest(
input_type: String,
embedding_types: List[String],
images: List[String]
)
private implicit val bedrockRequestFormat: OFormat[BedrockRequest] = Json.format[BedrockRequest]
}

import scala.concurrent.{ExecutionContext, Future}

class Bedrock(config: CommonConfig)
extends AwsClientV2BuilderUtils {

// TODO: figure out what the more usual pattern for turning off localstack behaviour is
override def awsLocalEndpointUri: Option[URI] = None

override def isDev: Boolean = config.isDev

val client: BedrockRuntimeClient = {
withAWSCredentialsV2(BedrockRuntimeClient.builder())
.build()
}

private def createRequestBody(base64EncodedImage: String, fileType: CohereCompatibleMimeType): InvokeModelRequest = {
val images = fileType match {
case CohereJpeg => List(s"data:image/jpg;base64,$base64EncodedImage")
case CoherePng => List(s"data:image/png;base64,$base64EncodedImage")
}

val body = Bedrock.BedrockRequest(
input_type = "image",
embedding_types = List("float"),
images = images
)
val jsonBody = Json.toJson(body).toString()

val request: InvokeModelRequest = {
InvokeModelRequest
.builder()
.accept("*/*")
.body(SdkBytes.fromUtf8String(jsonBody))
.contentType("application/json")
.modelId("cohere.embed-english-v3")
.build()
}
request
}

private def sendBedrockEmbeddingRequest(base64EncodedImage: String, fileType: CohereCompatibleMimeType)(
implicit logMarker: LogMarker
): InvokeModelResponse = {
try {
val response = client.invokeModel(createRequestBody(base64EncodedImage, fileType))
logger.info(
logMarker,
s"Bedrock API call to create image embedding completed with status: ${response.sdkHttpResponse().statusCode()}"
)
response
}
catch {
case e: Exception =>
logger.error(logMarker, "Exception during Bedrock API call to create image embedding", e)
throw e
}
}

def createImageEmbedding(base64EncodedImage: String, fileType: CohereCompatibleMimeType)(implicit ec: ExecutionContext, logMarker: LogMarker): Future[List[Float]] = {
val bedrockFuture = Future { sendBedrockEmbeddingRequest(base64EncodedImage, fileType) }
bedrockFuture.map { response =>
val responseBody = response.body().asUtf8String()
val json = Json.parse(responseBody)
// Extract the embedding array (first element since it's an array of arrays)
val embedding = (json \ "embeddings" \ "float")(0).as[List[Float]]
logger.info(
logMarker,
s"Successfully extracted image embedding. Vector size: ${embedding.size}"
)
embedding
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package com.gu.mediaservice.lib.aws
import com.gu.mediaservice.lib.logging.{GridLogging, LogMarker}
import com.gu.mediaservice.model.{Jpeg, MimeType, Png, Tiff}
import software.amazon.awssdk.services.s3vectors.model.PutVectorsResponse

import java.nio.file.{Files, Path}
import java.util.Base64
import scala.concurrent.{ExecutionContext, Future}

sealed trait CohereCompatibleMimeType
case object CohereJpeg extends CohereCompatibleMimeType
case object CoherePng extends CohereCompatibleMimeType

class Embedder(s3vectors: S3Vectors, bedrock: Bedrock) extends GridLogging {
// https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-embed-v3.html#:~:text=The%20image%20must%20be%20in%20either%20image/jpeg%20or%20image/png%20format%20and%20has%20a%20maximum%20size%20of%205MB
def meetsCohereRequirements(fileType: MimeType, imageFilePath: Path)(implicit logMarker: LogMarker): Either[String, CohereCompatibleMimeType]= {
val fileSize = Files.size(imageFilePath)
val fiveMB = 5_000_000

fileType match {
case _ if fileSize > fiveMB => Left(s"Image file is >5MB. File size: $fileSize")
case Jpeg => Right(CohereJpeg)
case Png => Right(CoherePng)
case Tiff => Left("Image file type is not supported. File type: Tiff")
}
}

def createEmbeddingAndStore(fileType: MimeType, imageFilePath: Path, imageId: String)(implicit ec: ExecutionContext, logMarker: LogMarker
): Future[Option[PutVectorsResponse]] = {
meetsCohereRequirements(fileType, imageFilePath)(logMarker) match {
case Left(error) => {
logger.info(logMarker, s"Skipping image embedding for $imageId as it does not meet the requirements: $error")
Future.successful(None)
}
case Right(imageType) => {
val base64EncodedString: String = Base64.getEncoder().encodeToString(Files.readAllBytes(imageFilePath))
val embeddingFuture = bedrock.createImageEmbedding(base64EncodedString, imageType)
embeddingFuture.map { embedding =>
Some(s3vectors.storeEmbeddingInS3VectorStore(embedding, imageId))
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package com.gu.mediaservice.lib.aws
import com.gu.mediaservice.lib.config.CommonConfig
import com.gu.mediaservice.lib.logging.LogMarker

import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3vectors._
import software.amazon.awssdk.services.s3vectors.model.{PutInputVector, PutVectorsRequest, PutVectorsResponse, VectorData}

import java.net.URI
import scala.concurrent.{ExecutionContext, Future}
import scala.jdk.CollectionConverters._

class S3Vectors(config: CommonConfig)
extends AwsClientV2BuilderUtils {

// TODO: figure out what the more usual pattern for turning off localstack behaviour is
override def awsLocalEndpointUri: Option[URI] = None

override def isDev: Boolean = config.isDev

// The S3 Vector Store is not yet available in eu-west-1, so we are using eu-central-1 because it's closest to us.
override def awsRegionV2: Region = Region.EU_CENTRAL_1

val client: S3VectorsClient = {
withAWSCredentialsV2(S3VectorsClient.builder())
.build()
}

private def createRequestBody(embedding: List[Float], imageId: String): PutVectorsRequest = {
val vectorData: VectorData = VectorData
.builder()
.float32(embedding.map(float2Float).asJava)
.build()

val inputVector: PutInputVector = PutInputVector
.builder()
.data(vectorData)
.key(imageId)
.build()

val request: PutVectorsRequest = PutVectorsRequest
.builder()
.indexName("cohere-embed-english-v3")
.vectorBucketName(s"image-embeddings-${config.stage.toLowerCase}")
.vectors(inputVector)
.build()

request
}

def storeEmbeddingInS3VectorStore(bedrockEmbedding: List[Float], imageId: String)(implicit logMarker: LogMarker
): PutVectorsResponse = {
try {
val request = createRequestBody(bedrockEmbedding, imageId)
val response = client.putVectors(request)
logger.info(
logMarker,
s"S3 Vector Store API call to store image embedding completed with status: ${response.sdkHttpResponse().statusCode()}"
)
response
}
catch {
case e: Exception =>
logger.error(logMarker, s"Exception during S3 Vector Store API call to store image embedding for $imageId: ", e)
throw e
}
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Huseyin Demirci" -> "Hüseyin Demirci",
"Huseyin Yildiz" -> "Hüseyin Yıldız",
"Ian Macnicol" -> "Ian MacNicol",
"Igor Pavicevic" -> "Igor Pavićević",
"Inti Ocon" -> "Inti Ocón",
"Ints Kalnins" -> "Ints Kalniņš",
"Irek Dorozanski" -> "Irek Dorożański",
Expand Down Expand Up @@ -274,6 +275,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Jerome Prebois" -> "Jérôme Prébois",
"Jerome Prevost" -> "Jérôme Prévost",
"Jerome Sessini" -> "Jérôme Sessini",
"Jerzy Muszynski" -> "Jerzy Muszyński",
"Jesus Bustamante" -> "Jesús Bustamante",
"Jesus Diges" -> "Jesús Diges",
"Jesus Merida" -> "Jesús Mérida",
Expand Down Expand Up @@ -338,6 +340,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Klebher Vasquez" -> "Klebher Vásquez",
"Koca Sulejmanovic" -> "Koca Sulejmanović",
"Krisztian Elek" -> "Krisztián Elek",
"Krzysztof Cwik" -> "Krzysztof Ćwik",
"Krzysztof Swiderski" -> "Krzysztof Świderski",
"Kuba Stezycki" -> "Kuba Stężycki",
"Laszlo Balogh" -> "László Balogh",
Expand Down Expand Up @@ -378,6 +381,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Manu Fernandez" -> "Manu Fernández",
"Manuel Vazquez" -> "Manuel Vázquez",
"Manuel Velazquez" -> "Manuel Velázquez",
"Marek Antoni Iwanczuk" -> "Marek Antoni Iwańczuk",
"Marc Mccormack" -> "Marc McCormack",
"Marcelo Del Pozo" -> "Marcelo del Pozo",
"Marcelo Hernandez" -> "Marcelo Hernández",
Expand Down Expand Up @@ -419,6 +423,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Milos Bicanski" -> "Miloš Bičanski",
"Milos Vujovic" -> "Miloš Vujović",
"Miro Kuzmanovic" -> "Miro Kuzmanović",
"Mitar Mitrovic" -> "Mitar Mitrović",
"Moises Castillo" -> "Moisés Castillo",
"Morne de Klerk" -> "Morné de Klerk",
"Murat Ozgur Guvendik" -> "Murat Özgür Güvendik",
Expand Down Expand Up @@ -482,6 +487,7 @@ object PhotographerRenamer extends MetadataCleaner {
"Radoslaw Jozwiak" -> "Radosław Jóźwiak",
"Rafal Gaglewski" -> "Rafał Gąglewski",
"Rafal Guz" -> "Rafał Guz",
"Raimonda Kulikauskiene" -> "Raimonda Kulikauskienė",
"Ramon Buxo Martinez" -> "Ramon Buxó Martínez",
"Ramon Costa" -> "Ramón Costa",
"Ramon de la Rocha" -> "Ramón de la Rocha",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ object RedundantTokenRemover extends MetadataCleaner {
"Stringer",
"Stringer .",
"STR",
"STR New",
"-STR",
"supplied",
"Supplied",
"SUPPLIED",
Expand Down
1 change: 1 addition & 0 deletions dev/script/generate-config/service-config.js
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ function getImageLoaderConfig(config) {
|metrics.request.enabled=false
|transcoded.mime.types="image/tiff"
|upload.quarantine.enabled=false
|s3.vectors.shouldEmbed=false
|`;
}

Expand Down
21 changes: 21 additions & 0 deletions dev/script/get-s3-vector-store-records.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

# Check if environment argument is provided
if [ $# -ne 1 ]; then
echo "Usage: $0 <environment>"
echo "Environment options: dev, test, prod"
exit 1
fi

# Validate environment argument
env=$1
if [[ ! "$env" =~ ^(dev|test|prod)$ ]]; then
echo "Invalid environment. Please use dev, test, or prod"
exit 1
fi

aws s3vectors list-vectors \
--vector-bucket-name "image-embeddings-$env" \
--index-name cohere-embed-english-v3 \
--profile media-service \
--region eu-central-1
9 changes: 6 additions & 3 deletions image-loader/app/ImageLoaderComponents.scala
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import com.gu.mediaservice.GridClient
import com.gu.mediaservice.lib.aws.SimpleSqsMessageConsumer
import com.gu.mediaservice.lib.aws.{Bedrock, S3Vectors, SimpleSqsMessageConsumer, Embedder}
import com.gu.mediaservice.lib.config.Services
import com.gu.mediaservice.lib.imaging.ImageOperations
import com.gu.mediaservice.lib.logging.GridLogging
Expand Down Expand Up @@ -27,8 +27,11 @@ class ImageLoaderComponents(context: Context) extends GridComponents(context, ne
val imageOperations = new ImageOperations(context.environment.rootPath.getAbsolutePath)
val notifications = new Notifications(config)
val downloader = new Downloader()(ec,wsClient)
val uploader = new Uploader(store, config, imageOperations, notifications, imageProcessor)
val projector = Projector(config, imageOperations, imageProcessor, auth)

val maybeEmbedder: Option[Embedder] = if (config.shouldEmbed) Some(new Embedder(new S3Vectors(config), new Bedrock(config))) else None

val uploader = new Uploader(store, config, imageOperations, notifications, maybeEmbedder, imageProcessor)
val projector = Projector(config, imageOperations, imageProcessor, auth, maybeEmbedder)
val quarantineUploader: Option[QuarantineUploader] = (config.uploadToQuarantineEnabled, config.quarantineBucket) match {
case (true, Some(bucketName)) =>{
val quarantineStore = new QuarantineStore(config)
Expand Down
2 changes: 2 additions & 0 deletions image-loader/app/lib/ImageLoaderConfig.scala
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ class ImageLoaderConfig(resources: GridConfigResources) extends CommonConfig(res
val uploadStatusTable: String = string("dynamo.table.upload.status")
val uploadStatusExpiry: FiniteDuration = configuration.get[FiniteDuration]("uploadStatus.recordExpiry")

val shouldEmbed: Boolean = boolean("s3.vectors.shouldEmbed")

/**
* Load in the chain of image processors from config. This can be a list of
* companion objects, class names, both with and without config.
Expand Down
Loading
Loading