You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
unreasonable error duplicate time series world * on(myname) group_left(hair) tpmetric
In the binary operator condition, there are two timeseries returned from the right expr, but one of the timeseries has already been stale, which is judged by the empty result of request directly. But it still return duplicate time series.
In term of code, the value of stale timeseries is NaN, and the logic of mergeNonOverlappingTimeseries has been ignoring it since git commit b473c21915d27bbf1b64d485ab0c757fc76f494d, which said app/vmselect/promql: do not merge time series during requests to /api/v1/query.
{
"status": "error",
"errorType": "422",
"error": "error when executing query=\"world * on(myname) group_left(hair) tpmetric\" for (time=1626401785000, step=300000): cannot evaluate \"world * on (myname) group_left (hair) tpmetric\": duplicate time series on the right side of `* on (myname) group_left (hair)`: {address=\"beijing\", hair=\"black\", myname=\"tp0\"} and {address=\"shenzhen\", hair=\"black\", myname=\"tp0\"}"
}
the request will return error of duplicate time series between 1626401455 and 1626402025.
but, the response of http://localhost:18481/select/5/prometheus/api/v1/query?query=tpmetric{address="beijing", hair="black", myname="tp0"} &time=1626401725&nocache=1 have already been empty since 1626401725.
4.3. log
2021-07-27T14:01:52.876Z warn app/vmselect/main.go:523 error in "/select/5/prometheus/api/v1/query?query=world%20*%20on(myname)%20group_left(hair)%20tpmetric&time=1626401785&nocache=1": error when executing query="world * on(myname) group_left(hair) tpmetric" for (time=1626401785000, step=300000): cannot evaluate "world * on (myname) group_left (hair) tpmetric": duplicate time series on the right side of `* on (myname) group_left (hair)`: {address="shenzhen", hair="black", myname="tp0"} and {address="beijing", hair="black", myname="tp0"}
4.4. time summary
tpmetric{address="beijing", hair="black", myname="tp0"} will empty be since 1626401725
world
tpmetric{"beijing"}
tpmetric{"shenzhen"}
duplicate timeseries error
tpmetric{"beijing"} empty
from 1626401155
from 1626401155
to 1626401425
from 1626401455
from 1626401455
since 1626401725
to 1626401785
to 1626401785
to 1626402025
5. guess
5.1. NaN timeseries cannot be deleted in doInternal()
5.2. the changing of mergeNonOverlappingTimeseries logic
the logic of mergeNonOverlappingTimeseries will ignore the handling of NaN value when the number of values of right expr no more than 2, which means that it cannot reach math.IsNaN(v) continue.
funcmergeNonOverlappingTimeseries(dst, src*timeseries) bool {
...// Do not merge time series with too small number of datapoints.// This can be the case during evaluation of instant queries (alerting or recording rules).// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/1141iflen(srcValues) <=2&&len(dstValues) <=2 {
returnfalse
}
// Time series can be merged. Merge them.fori, v:=rangesrcValues {
ifmath.IsNaN(v) {
continue
}
dstValues[i] =v
}
returntrue
}
5.3. the right timeseries which value is NaN is added into tsExisting in groupJoin()
I added some codes to fix it temporarily by skiping NaN timeseries of right values.
app/vmselect/promql.groupJoin at binary_op.go groupJoin
funcgroupJoin(singleTimeseriesSidestring, be*metricsql.BinaryOpExpr, rvsLeft, rvsRight, tssLeft, tssRight []*timeseries) ([]*timeseries, []*timeseries, error) {
...for_, tsLeft:=rangetssLeft {
...bb:=bbPool.Get()
for_, tsRight:=rangetssRight {
//>>>>>>>>>>>>>>>>>>>>//I added the code to fix it temporarily.iflen(tsRight.Values)==1&&math.IsNaN(tsRight.Values[0]){
continue
}
//<<<<<<<<<<<<<<<<<<<<<...
The text was updated successfully, but these errors were encountered:
LiuPacific
changed the title
unreasonable duplicate time series error
vmselect unreasonable duplicate time series error
Jul 28, 2021
@LiuPacific , could you check whether the issue is fixed in the latest commits of master and cluster branches? VictoriaMetrics and vmagent gained support for Prometheus staleness markers - see this comment. Now VictoriaMetrics should handle stale time series for disappeared scrape targets in the same way as Prometheus does. Note that the stale time series handling works only for newly ingested samples after the upgrade of VictoriaMetrics and vmagent to the latest commits in master and cluster branches.
1. Describe the bug
unreasonable error
duplicate time series
world * on(myname) group_left(hair) tpmetric
In the binary operator condition, there are two timeseries returned from the right expr, but one of the timeseries has already been
stale
, which is judged by the empty result of request directly. But it still returnduplicate time series
.In term of code, the value of stale timeseries is
NaN
, and the logic ofmergeNonOverlappingTimeseries
has been ignoring it since git commitb473c21915d27bbf1b64d485ab0c757fc76f494d
, which saidapp/vmselect/promql: do not merge time series during requests to /api/v1/query
.2. Version
tag v1.63.0-cluster
3. Used command-line flags
vmselect program arguments
4. Reproduce
4.1. source data
world{myname="tp0"}
1626401155 -> 1626401785
tpmetric{address="beijing", hair="black", myname="tp0"}
1626401155 -> 1626401425
tpmetric{address="shenzhen", hair="black", myname="tp0"}
1626401455 -> 1626401785
4.2. request
the request will return error of
duplicate time series
between 1626401455 and 1626402025.but, the response of
http://localhost:18481/select/5/prometheus/api/v1/query?query=tpmetric{address="beijing", hair="black", myname="tp0"} &time=1626401725&nocache=1
have already been empty since 1626401725.4.3. log
4.4. time summary
tpmetric{address="beijing", hair="black", myname="tp0"}
will empty be since 16264017255. guess
5.1. NaN timeseries cannot be deleted in
doInternal()
app/vmselect/promql.rollupLast at rollup.go: doInternal
5.2. the changing of
mergeNonOverlappingTimeseries
logicthe logic of
mergeNonOverlappingTimeseries
will ignore the handling of NaN value when the number of values of right expr no more than 2, which means that it cannot reachmath.IsNaN(v) continue
.5.3. the right timeseries which value is
NaN
is added into tsExisting ingroupJoin()
I added some codes to fix it temporarily by skiping NaN timeseries of right values.
app/vmselect/promql.groupJoin at binary_op.go groupJoin
The text was updated successfully, but these errors were encountered: