-
Notifications
You must be signed in to change notification settings - Fork 32
Add ASCII check, distance check, visual distance check #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I went with a major version bump because this will flag packages that were previously allowed, but the public method signatures don't need to change, at least. |
|
bors try |
tryBuild failed:
|
The actual example was Websocket vs WebSockets. So maybe the cutoff should be DL distance is <= 2? Edit: I just saw Stefan already said this in his review above. |
|
bors try |
|
I think we should probably measure edit distance after lowercasing. In particular, since some file systems are (unfortunately) case insensitive, you can cause problems by registering something with the same name, capitalized differently, which could have a large edit distance. |
I think we can be pretty conservative here, this is just for automatic merging after all. If the limits become a problem we can always lower them later on. What things would be caught with e.g. 3? |
|
Would be good with a list of existing packages that are within this distance. PgfPlots vs PGFPlotsX for example. |
tryBuild failed:
|
|
Thanks for all the quick feedback!
Ah, good point, done.
Oops, my mistake. It would be caught with a limit of 1 plus lowercasing the names before comparison, so we don't actually need 2 for this, but could make that change anyway to be more conservative. I've left it at 1 for now.
For your example specifically, ("PGFPlots", "PGFPlotsX") => "Too similar to existing package name PGFPlotsX. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."So it would be caught (and differences in case don't matter now for the edit distances). There are 242 other clashes with the current settings: julia> @time clashes = all_name_clashes(sort!(AutoMerge.get_all_package_names(expanduser("~/.julia/registries/General"))))
95.335998 seconds (165.32 M allocations: 12.287 GiB, 2.16% gc time)where I've defined using RegistryCI
using RegistryCI.AutoMerge
function all_name_clashes(packages; kwargs...)
n = length(packages)
clashes = Dict{Tuple{String, String}, String}()
for i = 1:n, j = i+1:n
name1 = packages[i]
name2 = packages[j]
pass, message = AutoMerge.meets_distance_check(name1, tuple(name2); kwargs...)
if !pass
clashes[(name1, name2)] = message
end
end
return clashes
endFull resultsDict{Tuple{String,String},String} with 243 entries:
("DBInterface", "ODEInterface") => "Too similar to existing package name ODEInterface. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Exfiltrator", "Infiltrator") => "Too similar to existing package name Infiltrator. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("SMM", "SOM") => "Too similar to existing package name SOM. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.82 is at or below cutoff 2.50."
("NaiveGAflux", "NaiveNASflux") => "Too similar to existing package name NaiveNASflux. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("LSL", "VSL") => "Too similar to existing package name VSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.49 is at or below cutoff 2.50."
("MPIReco", "MRIReco") => "Too similar to existing package name MRIReco. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 0.59 is at or below cutoff 2.50."
("UAParser", "URIParser") => "Too similar to existing package name URIParser. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("GAP", "GCP") => "Too similar to existing package name GCP. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.00 is at or below cutoff 2.50."
("AMD", "Amb") => "Too similar to existing package name Amb. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("JuLIP", "julia") => "Too similar to existing package name julia. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("MLJModels", "NLPModels") => "Too similar to existing package name NLPModels. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Hive", "Jive") => "Too similar to existing package name Jive. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("EchoviewEcs", "EchoviewEvr") => "Too similar to existing package name EchoviewEvr. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("CSV", "uCSV") => "Too similar to existing package name uCSV. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Bijections", "Bijectors") => "Too similar to existing package name Bijectors. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("IRTools", "IterTools") => "Too similar to existing package name IterTools. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("CUDA", "Cuba") => "Too similar to existing package name Cuba. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Conda", "Onda") => "Too similar to existing package name Onda. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("MLDatasets", "RDatasets") => "Too similar to existing package name RDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("StanSample", "StanSamples") => "Too similar to existing package name StanSamples. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25. Normalized visual distance 2.23 is at or below cutoff 2.50."
("Match", "Matcha") => "Too similar to existing package name Matcha. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("ExportAll", "ImportAll") => "Too similar to existing package name ImportAll. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("ITensors", "Tensors") => "Too similar to existing package name Tensors. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("NTFk", "NTNk") => "Too similar to existing package name NTNk. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 2.17 is at or below cutoff 2.50."
("StanModels", "StatsModels") => "Too similar to existing package name StatsModels. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Glo", "Glob") => "Too similar to existing package name Glob. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Dhall", "Shell") => "Too similar to existing package name Shell. Normalized visual distance 1.80 is at or below cutoff 2.50."
("BIDSTools", "BioTools") => "Too similar to existing package name BioTools. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("NFLTables", "Nullables") => "Too similar to existing package name Nullables. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("GracePlot", "GraphPlot") => "Too similar to existing package name GraphPlot. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 2.26 is at or below cutoff 2.50."
("MLDatasets", "NLIDatasets") => "Too similar to existing package name NLIDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Plots", "Pluto") => "Too similar to existing package name Pluto. Normalized visual distance 1.38 is at or below cutoff 2.50."
("Mocking", "Packing") => "Too similar to existing package name Packing. Normalized visual distance 2.20 is at or below cutoff 2.50."
("MIDI", "MIRT") => "Too similar to existing package name MIRT. Normalized visual distance 2.16 is at or below cutoff 2.50."
("Mads", "Mods") => "Too similar to existing package name Mods. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 0.70 is at or below cutoff 2.50."
("TreeView", "TreeViews") => "Too similar to existing package name TreeViews. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 2.25 is at or below cutoff 2.50."
("LSHFunctions", "LossFunctions") => "Too similar to existing package name LossFunctions. Sqrt-normalized Damerau-Levenshtein distance 0.23 is at or below cutoff 0.25."
("Media", "Modia") => "Too similar to existing package name Modia. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 0.55 is at or below cutoff 2.50."
("SCIP", "SciPy") => "Too similar to existing package name SciPy. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Calculus", "ZXCalculus") => "Too similar to existing package name ZXCalculus. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("MCMCChain", "MCMCChains") => "Too similar to existing package name MCMCChains. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25. Normalized visual distance 2.27 is at or below cutoff 2.50."
("GLMakie", "WGLMakie") => "Too similar to existing package name WGLMakie. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("CUDA", "CoDa") => "Too similar to existing package name CoDa. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 1.96 is at or below cutoff 2.50."
("HAML", "YAML") => "Too similar to existing package name YAML. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 2.30 is at or below cutoff 2.50."
("RSCG", "Rsvg") => "Too similar to existing package name Rsvg. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("CuArrays", "GPUArrays") => "Too similar to existing package name GPUArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("YAJL", "YAML") => "Too similar to existing package name YAML. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("COSMA", "COSMO") => "Too similar to existing package name COSMO. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 1.43 is at or below cutoff 2.50."
("ITensors", "NDTensors") => "Too similar to existing package name NDTensors. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Devito", "Pavito") => "Too similar to existing package name Pavito. Normalized visual distance 1.87 is at or below cutoff 2.50."
("PGFPlots", "PGFPlotsX") => "Too similar to existing package name PGFPlotsX. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("KernelDensity", "KernelDensitySJ") => "Too similar to existing package name KernelDensitySJ. Sqrt-normalized Damerau-Levenshtein distance 0.23 is at or below cutoff 0.25."
("BDF", "JDF") => "Too similar to existing package name JDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("Knet", "UNet") => "Too similar to existing package name UNet. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("UnROOT", "UpROOT") => "Too similar to existing package name UpROOT. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 0.63 is at or below cutoff 2.50."
("ROMEO", "RoME") => "Too similar to existing package name RoME. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Altro", "Attrs") => "Too similar to existing package name Attrs. Normalized visual distance 2.47 is at or below cutoff 2.50."
("HSL", "LSL") => "Too similar to existing package name LSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.14 is at or below cutoff 2.50."
("FileTrees", "Filetimes") => "Too similar to existing package name Filetimes. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("POMDPFiles", "POMDPXFiles") => "Too similar to existing package name POMDPXFiles. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25."
("SMM", "SPH") => "Too similar to existing package name SPH. Normalized visual distance 2.37 is at or below cutoff 2.50."
("Bcrypt", "Scrypt") => "Too similar to existing package name Scrypt. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 0.79 is at or below cutoff 2.50."
("NRRD", "Nord") => "Too similar to existing package name Nord. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("BDF", "ODE") => "Too similar to existing package name ODE. Normalized visual distance 1.32 is at or below cutoff 2.50."
("SCS", "WCS") => "Too similar to existing package name WCS. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.11 is at or below cutoff 2.50."
("Clp", "Glo") => "Too similar to existing package name Glo. Normalized visual distance 1.32 is at or below cutoff 2.50."
("NCDatasets", "NLIDatasets") => "Too similar to existing package name NLIDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("BandedMatrices", "PaddedMatrices") => "Too similar to existing package name PaddedMatrices. Sqrt-normalized Damerau-Levenshtein distance 0.23 is at or below cutoff 0.25. Normalized visual distance 1.76 is at or below cutoff 2.50."
("Bio", "Glo") => "Too similar to existing package name Glo. Normalized visual distance 1.13 is at or below cutoff 2.50."
("CAOS", "COBS") => "Too similar to existing package name COBS. Normalized visual distance 2.09 is at or below cutoff 2.50."
("CoDa", "Conda") => "Too similar to existing package name Conda. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("GSL", "HSL") => "Too similar to existing package name HSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.54 is at or below cutoff 2.50."
("UnitfulMR", "UnitfulUS") => "Too similar to existing package name UnitfulUS. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 2.33 is at or below cutoff 2.50."
("COBS", "ECOS") => "Too similar to existing package name ECOS. Normalized visual distance 2.11 is at or below cutoff 2.50."
("NMFk", "NTFk") => "Too similar to existing package name NTFk. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("MLDatasets", "NCDatasets") => "Too similar to existing package name NCDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 1.62 is at or below cutoff 2.50."
("MortarContact2D", "MortarContact2DAD") => "Too similar to existing package name MortarContact2DAD. Sqrt-normalized Damerau-Levenshtein distance 0.22 is at or below cutoff 0.25."
("GLM", "SOM") => "Too similar to existing package name SOM. Normalized visual distance 2.41 is at or below cutoff 2.50."
("AIControl", "DFControl") => "Too similar to existing package name DFControl. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("AMD", "AWS") => "Too similar to existing package name AWS. Normalized visual distance 2.13 is at or below cutoff 2.50."
("Stopping", "Strapping") => "Too similar to existing package name Strapping. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("FCA", "FDM") => "Too similar to existing package name FDM. Normalized visual distance 2.27 is at or below cutoff 2.50."
("JDBC", "ODBC") => "Too similar to existing package name ODBC. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 2.01 is at or below cutoff 2.50."
("GBIF", "GRIB") => "Too similar to existing package name GRIB. Normalized visual distance 2.23 is at or below cutoff 2.50."
("CSDP", "OSQP") => "Too similar to existing package name OSQP. Normalized visual distance 1.52 is at or below cutoff 2.50."
("Tar", "Taro") => "Too similar to existing package name Taro. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("GlobalApproximationValueIteration", "LocalApproximationValueIteration") => "Too similar to existing package name LocalApproximationValueIteration. Sqrt-normalized Damerau-Levenshtein distance 0.19 is at or below cutoff 0.25."
("IJulia", "julia") => "Too similar to existing package name julia. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("NCDatasets", "RDatasets") => "Too similar to existing package name RDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("LasIO", "Lasso") => "Too similar to existing package name Lasso. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Distributions", "DistributionsAD") => "Too similar to existing package name DistributionsAD. Sqrt-normalized Damerau-Levenshtein distance 0.23 is at or below cutoff 0.25."
("JuLIP", "Tulip") => "Too similar to existing package name Tulip. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("StrFs", "Strs") => "Too similar to existing package name Strs. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Gtk", "ITK") => "Too similar to existing package name ITK. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("XSim", "Xsum") => "Too similar to existing package name Xsum. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("BigArrays", "DimArrays") => "Too similar to existing package name DimArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("StanBase", "StatsBase") => "Too similar to existing package name StatsBase. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("TeXTable", "TexTables") => "Too similar to existing package name TexTables. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("BED", "PEG") => "Too similar to existing package name PEG. Normalized visual distance 2.00 is at or below cutoff 2.50."
("NES", "WCS") => "Too similar to existing package name WCS. Normalized visual distance 1.91 is at or below cutoff 2.50."
("MDInclude", "NBInclude") => "Too similar to existing package name NBInclude. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 1.40 is at or below cutoff 2.50."
("StatPlots", "StatsPlots") => "Too similar to existing package name StatsPlots. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25."
("ECC", "SCS") => "Too similar to existing package name SCS. Normalized visual distance 2.09 is at or below cutoff 2.50."
("COBRA", "COESA") => "Too similar to existing package name COESA. Normalized visual distance 2.08 is at or below cutoff 2.50."
("IPMeasures", "Measures") => "Too similar to existing package name Measures. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("EMIRT", "MIRT") => "Too similar to existing package name MIRT. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("GSL", "LSL") => "Too similar to existing package name LSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.83 is at or below cutoff 2.50."
("CDCS", "COBS") => "Too similar to existing package name COBS. Normalized visual distance 1.51 is at or below cutoff 2.50."
("AdvancedHMC", "AdvancedMH") => "Too similar to existing package name AdvancedMH. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("LCPsolve", "LapSolve") => "Too similar to existing package name LapSolve. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("JDF", "XDF") => "Too similar to existing package name XDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.19 is at or below cutoff 2.50."
("Chess", "Loess") => "Too similar to existing package name Loess. Normalized visual distance 2.05 is at or below cutoff 2.50."
("SHA", "SMM") => "Too similar to existing package name SMM. Normalized visual distance 1.87 is at or below cutoff 2.50."
("EDF", "JDF") => "Too similar to existing package name JDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.33 is at or below cutoff 2.50."
("MDDatasets", "NCDatasets") => "Too similar to existing package name NCDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 1.50 is at or below cutoff 2.50."
("OPFSampler", "PDSampler") => "Too similar to existing package name PDSampler. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("COBRA", "COSMA") => "Too similar to existing package name COSMA. Normalized visual distance 2.05 is at or below cutoff 2.50."
("HSL", "VSL") => "Too similar to existing package name VSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.20 is at or below cutoff 2.50."
("Nord", "Serd") => "Too similar to existing package name Serd. Normalized visual distance 2.48 is at or below cutoff 2.50."
("Fire", "Pipe") => "Too similar to existing package name Pipe. Normalized visual distance 1.57 is at or below cutoff 2.50."
("MPI", "NPZ") => "Too similar to existing package name NPZ. Normalized visual distance 2.25 is at or below cutoff 2.50."
("SDPA", "SOFA") => "Too similar to existing package name SOFA. Normalized visual distance 0.87 is at or below cutoff 2.50."
("ASDF", "CSDP") => "Too similar to existing package name CSDP. Normalized visual distance 2.26 is at or below cutoff 2.50."
("NDTensors", "Tensors") => "Too similar to existing package name Tensors. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("AES", "ASE") => "Too similar to existing package name ASE. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("BitFlags", "BitFloats") => "Too similar to existing package name BitFloats. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("CMPlot", "ImPlot") => "Too similar to existing package name ImPlot. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("ITK", "Tk") => "Too similar to existing package name Tk. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("Jets", "Jute") => "Too similar to existing package name Jute. Normalized visual distance 1.99 is at or below cutoff 2.50."
("HealthBase", "HealthMLBase") => "Too similar to existing package name HealthMLBase. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("JSON", "JSON2") => "Too similar to existing package name JSON2. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Ogg", "Org") => "Too similar to existing package name Org. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("CUDA", "CUDD") => "Too similar to existing package name CUDD. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 1.45 is at or below cutoff 2.50."
("AEMS", "AES") => "Too similar to existing package name AES. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("H3", "Z3") => "Too similar to existing package name Z3. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.16 is at or below cutoff 0.25."
("Hose", "Soss") => "Too similar to existing package name Soss. Normalized visual distance 2.47 is at or below cutoff 2.50."
("GDAL", "SEAL") => "Too similar to existing package name SEAL. Normalized visual distance 1.98 is at or below cutoff 2.50."
("GMT", "Git") => "Too similar to existing package name Git. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("JWAS", "JWTs") => "Too similar to existing package name JWTs. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("KDEstimation", "MEstimation") => "Too similar to existing package name MEstimation. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Persa", "Porta") => "Too similar to existing package name Porta. Normalized visual distance 2.26 is at or below cutoff 2.50."
("JLD", "JLD2") => "Too similar to existing package name JLD2. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("StanMCMCChain", "StanMCMCChains") => "Too similar to existing package name StanMCMCChains. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.11 is at or below cutoff 0.25. Normalized visual distance 2.12 is at or below cutoff 2.50."
("BeaData", "GeoData") => "Too similar to existing package name GeoData. Normalized visual distance 1.42 is at or below cutoff 2.50."
("BDF", "EDF") => "Too similar to existing package name EDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 0.84 is at or below cutoff 2.50."
("ACME", "AEMS") => "Too similar to existing package name AEMS. Normalized visual distance 2.31 is at or below cutoff 2.50."
("JWTs", "Jets") => "Too similar to existing package name Jets. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("SMC", "SMM") => "Too similar to existing package name SMM. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.33 is at or below cutoff 2.50."
("CRC", "SMC") => "Too similar to existing package name SMC. Normalized visual distance 2.10 is at or below cutoff 2.50."
("GCP", "SCS") => "Too similar to existing package name SCS. Normalized visual distance 2.35 is at or below cutoff 2.50."
("BAT", "GMT") => "Too similar to existing package name GMT. Normalized visual distance 2.16 is at or below cutoff 2.50."
("EDF", "XDF") => "Too similar to existing package name XDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.28 is at or below cutoff 2.50."
("GPUArrays", "GeoArrays") => "Too similar to existing package name GeoArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("AES", "NES") => "Too similar to existing package name NES. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.45 is at or below cutoff 2.50."
("DataArrays", "MetaArrays") => "Too similar to existing package name MetaArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 2.04 is at or below cutoff 2.50."
("AdvancedMH", "AdvancedVI") => "Too similar to existing package name AdvancedVI. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("QDates", "RDates") => "Too similar to existing package name RDates. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 1.78 is at or below cutoff 2.50."
("GLM", "Glo") => "Too similar to existing package name Glo. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("PosDefManifold", "PosDefManifoldML") => "Too similar to existing package name PosDefManifoldML. Sqrt-normalized Damerau-Levenshtein distance 0.22 is at or below cutoff 0.25."
("Spec", "Spot") => "Too similar to existing package name Spot. Normalized visual distance 2.40 is at or below cutoff 2.50."
("Debugger", "Rebugger") => "Too similar to existing package name Rebugger. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 1.36 is at or below cutoff 2.50."
("LiBr", "Libz") => "Too similar to existing package name Libz. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("HTTPClient", "SMTPClient") => "Too similar to existing package name SMTPClient. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Unitful", "UnitfulUS") => "Too similar to existing package name UnitfulUS. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("JSON2", "JSON3") => "Too similar to existing package name JSON3. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 0.75 is at or below cutoff 2.50."
("ForwardDiff", "ForwardDiff2") => "Too similar to existing package name ForwardDiff2. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25."
("LinearMaps", "LinearMapsAA") => "Too similar to existing package name LinearMapsAA. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("CRC", "Cbc") => "Too similar to existing package name Cbc. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("CAOS", "GAMS") => "Too similar to existing package name GAMS. Normalized visual distance 2.42 is at or below cutoff 2.50."
("FDM", "SOM") => "Too similar to existing package name SOM. Normalized visual distance 2.34 is at or below cutoff 2.50."
("BEAST", "FFAST") => "Too similar to existing package name FFAST. Normalized visual distance 1.95 is at or below cutoff 2.50."
("Blobs", "Blosc") => "Too similar to existing package name Blosc. Normalized visual distance 2.19 is at or below cutoff 2.50."
("ACME", "ADCME") => "Too similar to existing package name ADCME. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("GMAT", "MAT") => "Too similar to existing package name MAT. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Pandas", "Pandoc") => "Too similar to existing package name Pandoc. Normalized visual distance 1.36 is at or below cutoff 2.50."
("LasIO", "LazIO") => "Too similar to existing package name LazIO. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 1.16 is at or below cutoff 2.50."
("CSDP", "SDDP") => "Too similar to existing package name SDDP. Normalized visual distance 2.06 is at or below cutoff 2.50."
("Dolo", "YOLO") => "Too similar to existing package name YOLO. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("BTCParser", "BibParser") => "Too similar to existing package name BibParser. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25. Normalized visual distance 2.47 is at or below cutoff 2.50."
("CGAL", "GDAL") => "Too similar to existing package name GDAL. Normalized visual distance 1.57 is at or below cutoff 2.50."
("Bits", "LITS") => "Too similar to existing package name LITS. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Catlab", "MATLAB") => "Too similar to existing package name MATLAB. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("CodeCosts", "CodecZstd") => "Too similar to existing package name CodecZstd. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("StructuredGrids", "UnstructuredGrids") => "Too similar to existing package name UnstructuredGrids. Sqrt-normalized Damerau-Levenshtein distance 0.22 is at or below cutoff 0.25."
("ClimateBase", "ClimateEasy") => "Too similar to existing package name ClimateEasy. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("AMQPClient", "SMTPClient") => "Too similar to existing package name SMTPClient. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("BAT", "BBI") => "Too similar to existing package name BBI. Normalized visual distance 2.37 is at or below cutoff 2.50."
("SimplePlots", "SimpleRoots") => "Too similar to existing package name SimpleRoots. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("FTPClient", "HTTPClient") => "Too similar to existing package name HTTPClient. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("FixedEffectModels", "GLFixedEffectModels") => "Too similar to existing package name GLFixedEffectModels. Sqrt-normalized Damerau-Levenshtein distance 0.21 is at or below cutoff 0.25."
("Fire", "Jfire") => "Too similar to existing package name Jfire. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("CGAL", "SEAL") => "Too similar to existing package name SEAL. Normalized visual distance 2.09 is at or below cutoff 2.50."
("ResumableFunctions", "ReusableFunctions") => "Too similar to existing package name ReusableFunctions. Sqrt-normalized Damerau-Levenshtein distance 0.22 is at or below cutoff 0.25."
("MDDatasets", "MLDatasets") => "Too similar to existing package name MLDatasets. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25. Normalized visual distance 1.28 is at or below cutoff 2.50."
("Sass", "Soss") => "Too similar to existing package name Soss. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 0.70 is at or below cutoff 2.50."
("NMF", "NMFk") => "Too similar to existing package name NMFk. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("NumericIO", "Numerics") => "Too similar to existing package name Numerics. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Atmosphere", "ISAtmosphere") => "Too similar to existing package name ISAtmosphere. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Config", "Configs") => "Too similar to existing package name Configs. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25. Normalized visual distance 2.44 is at or below cutoff 2.50."
("LightGBM", "LightOSM") => "Too similar to existing package name LightOSM. Normalized visual distance 1.23 is at or below cutoff 2.50."
("MDDatasets", "RDatasets") => "Too similar to existing package name RDatasets. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Gtk", "Tk") => "Too similar to existing package name Tk. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("EDF", "ODE") => "Too similar to existing package name ODE. Normalized visual distance 2.09 is at or below cutoff 2.50."
("MIDI", "Mimi") => "Too similar to existing package name Mimi. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("ThreadPools", "ThreadTools") => "Too similar to existing package name ThreadTools. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.12 is at or below cutoff 0.25. Normalized visual distance 2.09 is at or below cutoff 2.50."
("JSON", "JSON3") => "Too similar to existing package name JSON3. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Clustering", "ClusteringGA") => "Too similar to existing package name ClusteringGA. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("CoDa", "Cuba") => "Too similar to existing package name Cuba. Normalized visual distance 1.70 is at or below cutoff 2.50."
("DimArrays", "SymArrays") => "Too similar to existing package name SymArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("DimArrays", "DiskArrays") => "Too similar to existing package name DiskArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("GLFW", "GLPK") => "Too similar to existing package name GLPK. Normalized visual distance 2.17 is at or below cutoff 2.50."
("Media", "Redis") => "Too similar to existing package name Redis. Normalized visual distance 2.14 is at or below cutoff 2.50."
("BLPData", "BlsData") => "Too similar to existing package name BlsData. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("Clustering", "DPClustering") => "Too similar to existing package name DPClustering. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("CAOS", "CDCS") => "Too similar to existing package name CDCS. Normalized visual distance 1.90 is at or below cutoff 2.50."
("MLBase", "MLJBase") => "Too similar to existing package name MLJBase. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("BAT", "MAT") => "Too similar to existing package name MAT. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.56 is at or below cutoff 2.50."
("DiffEqBase", "DiffEqBayes") => "Too similar to existing package name DiffEqBayes. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("Tar", "Tau") => "Too similar to existing package name Tau. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 1.67 is at or below cutoff 2.50."
("MeshArrays", "MetaArrays") => "Too similar to existing package name MetaArrays. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("DBFTables", "FWFTables") => "Too similar to existing package name FWFTables. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("DSP", "GCP") => "Too similar to existing package name GCP. Normalized visual distance 1.71 is at or below cutoff 2.50."
("Tau", "Yao") => "Too similar to existing package name Yao. Normalized visual distance 2.16 is at or below cutoff 2.50."
("CFITSIO", "FITSIO") => "Too similar to existing package name FITSIO. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("EzXML", "MzXML") => "Too similar to existing package name MzXML. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 2.07 is at or below cutoff 2.50."
("BDF", "XDF") => "Too similar to existing package name XDF. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.43 is at or below cutoff 2.50."
("LITS", "Lints") => "Too similar to existing package name Lints. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("Porta", "XPORTA") => "Too similar to existing package name XPORTA. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("DCCA", "ORCA") => "Too similar to existing package name ORCA. Normalized visual distance 1.99 is at or below cutoff 2.50."
("Stipple", "StippleUI") => "Too similar to existing package name StippleUI. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("Unitful", "UnitfulMR") => "Too similar to existing package name UnitfulMR. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("GLM", "Gym") => "Too similar to existing package name Gym. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("GMAT", "GMT") => "Too similar to existing package name GMT. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25."
("TOML", "TSML") => "Too similar to existing package name TSML. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 0.79 is at or below cutoff 2.50."
("BSON", "JSON") => "Too similar to existing package name JSON. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.14 is at or below cutoff 0.25. Normalized visual distance 2.45 is at or below cutoff 2.50."
("Cubature", "HCubature") => "Too similar to existing package name HCubature. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("SimpleRoots", "SimpleTools") => "Too similar to existing package name SimpleTools. Sqrt-normalized Damerau-Levenshtein distance 0.24 is at or below cutoff 0.25."
("SOM", "SPH") => "Too similar to existing package name SPH. Normalized visual distance 2.16 is at or below cutoff 2.50."
("Caching", "Packing") => "Too similar to existing package name Packing. Normalized visual distance 2.47 is at or below cutoff 2.50."
("GSL", "VSL") => "Too similar to existing package name VSL. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25."
("FTPClient", "SMTPClient") => "Too similar to existing package name SMTPClient. Sqrt-normalized Damerau-Levenshtein distance 0.25 is at or below cutoff 0.25."
("PANDA", "Pandas") => "Too similar to existing package name Pandas. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.13 is at or below cutoff 0.25."
("AES", "AWS") => "Too similar to existing package name AWS. Damerau-Levenshtein distance 1 is at or below cutoff 1. Sqrt-normalized Damerau-Levenshtein distance 0.15 is at or below cutoff 0.25. Normalized visual distance 2.30 is at or below cutoff 2.50."
Agreed that we can be conservative, although I think lowercasing already helps a lot. I think 3 may be too high, especially for short packages; any two four-letter packages would clash if they shared a single common letter (independently of case). Just checking for pairs with DL-distance <= 3, we get 15404 clashes (using |
|
So anything that is three letters and share one letter will be marked? I think we should check for casing when it comes to not allowing registering a new package with identical name in a different casing but not when it comes to string distance. When you install something the case is important everywhere so an A and an a are just as different as two other letters. |
Keep in mind, any new package with a name that is less than five letters long will require manual merging anyway. That's just the regular "new package name length" check that we already have. |
|
The tests keep failing; will need to figure out why. |
|
Also, can you rebase on master and squash? |
|
How about this:
Keep in mind that this is just a trigger that requires manual review, not a block, so if there's some case where the name is reasonable but these "rules" aren't satisfied, that's still ok, we can just merge manually. |
|
It would be great to have some more examples of things that should be flagged, especially long names (because I wonder about length normalization). If anyone wants to contribute pairs of names that are qualitatively “too close”, I can add them to the tests and try to tune the cutoffs a bit. We are saved from some of the bad examples that URLs face by not allowing Unicode and other software ecosystems where there’s been examples with missing a “-“, etc., so there’s that at least. |
|
Note that we don't have to get this right all at once: we can adjust the rules as we encounter more cases. Having some checks in place is already much, much better than having no checks in place. I do suspect that a single cutoff regardless of name length may not be right, but let's see how it goes. |
|
Ok, then I'll just apply those rules you mentioned and Dilum's review suggestions. I am also concerned about communication, i.e. when someone's new package PR isn't automerged due to these rules and they don't understand why. I'll try to add a bit more to the README to help with that. My dream would be for when the automerge check fails due to the visual distance, if the automerge comment could include a gif showing the two names merging into each other, to show how close they look :). That should definitely be followup work though, if we even want to take on the hefty plotting dependencies that would be needed. |
|
I applied the review suggestions from @DilumAluthge and updated the name checks to as @StefanKarpinski suggested. I also tweaked the short-circuiting and the resulting message so that we only shortcircuit if the package name is in the registry. Otherwise, e.g. if you try "Flux" (with the ell), you get an error that it's too similar to "Mux", instead of the more appropriate error that it is already a name in the registry. Additionally, if the name is not in the registry but is too close, you get a numbered list of all the packages that it is too similar to. E.g. with "FIux" (uppercase eye), you get julia> println(AutoMerge.meets_distance_check("FIux", all_pkg_names)[2])
Package name too similar to 3 existing packages.
1. Too similar to Mux. Damerau-Levenshtein distance 2 is at or below cutoff 2.
2. Too similar to FIB. Damerau-Levenshtein distance 2 is at or below cutoff 2.
3. Too similar to Flux. Damerau-Levenshtein distance 1 is at or below cutoff 2. Damerau-Levenshtein distance 1 between lowercased names is at or below cutoff 1. Normalized visual distance 0.46 is at or below cutoff 2.50.I did not squash as @DilumAluthge requested because I thought it would make reviewing harder. But I can do that too when we're ready to merge. (I guess Bors can't squash merge yet? I see bors-ng/bors-ng#718 was merged though) P.S. Not short-circuiting makes the (already quadratic scaling) |
|
Re-
I think they are not quite as different as two other letters; I think failing to press shift is a more common typing error than most two letter swaps (though not all), so if we are worried about typo-squatting that can be a factor. And I think people's memory tends to remember the letters easier than the case, i.e. you might remember "jump" but not remember if it's capitalized as "Jump" or "JuMP". So from the perspective of routing people to the right package, I think it is relevant. I don't have evidence handy for these claims though. |
Yeah, I guess. But then |
Agreed :). IMO the right solution for typo-related concerns is a weighted DL distance, where the weights are empirically determined from typos produced by QWERTY1 typists (mentioned in #10 somewhere). But StringDistances.jl doesn't do weighted DL yet, and I haven't found a weight matrix yet. Actually these are reasons I hadn't submitted a PR sooner. My hope is that with the right combination of weighted DL for typos and visual distances for malicious websites giving example code with tricksy package names (ref #10 (comment)), one could have a very low false-positive rate and still a good false-negative rate, by somewhat precisely determining if a package name could cause trouble or not. I see special casing lowercase letters as a half-step towards that world. [1]: I am actually not a QWERTY typist (I use colemak), but I know we are in the vast minority, so I think special casing QWERTY would still be a good step. |
I'm not actually sure where the logs are; can anyone point me to it? Or do I just need to try and reproduce locally? |
|
Can we change the wording from "too similar" to just "similar"? Otherwise people are going to take this automated feedback as telling them they strictly may not call something this—which I can guarantee will cause some people to get unhappy—whereas all we're doing is requiring a manual review if a name is similar enough to an existing package. |
Good call, just made that change. By the way, in the README I also added a reminder that this is deliberately conservative guidance, not a requirement; let me know if that can be phrased better. If this PR is merged, we should likely also update the General registry README. |
|
bors try- |
|
bors try- |
tryBuild failed: |
|
Aha, an interesting failure: https://travis-ci.com/github/JuliaRegistries/RegistryCI.jl/jobs/398081414#L403-L405 I think what happens is that we run the code on the branch of the registry with the update committed, so the new package name is always already in the registry. I wonder if the case of exactly duplicate names is already covered by #255? |
No. You could have two packages with exactly the same name but different paths. |
|
@ericphanson In a new package PR, where are you getting the list of existing package names? |
|
AutoMerge specifically clones the master branch of the registry. You should refer to that clone to get the list of existing package names. |
|
In other words, AutoMerge has two copies of the registry:
Make sure that you are using the correct copy of the registry for each tasks. For some tasks you need to be looking at the PR branch, and for other tasks you need to be looking at the master branch. |
|
ah, thanks @DilumAluthge! I did not understand that, and was using bors try |
tryBuild succeeded: |
|
This is good to go from my end :) |
|
bors r+ |
|
bors r- |
274: Add ASCII check, distance check, visual distance check r=fredrikekre a=ericphanson My attempt to fix #10 and close #273. There's three semi-arbitrarily chosen cutoffs that might need more tuning, and if any are hit, then the package is flagged. 1. DL distance is <= 1 (which would catch Websocket vs Websockets) 2. A normalized DL distance catch long package names with more than 1 edit but only a few. I ended up going with a weird `5 + sqrt(max(len1, len2))` normalization just because it seemed like just dividing by the length made long packages get flagged too much. 3. Finally, there's the visual distance check, which can catch package names with more edits than allowed by the other checks if the edits are hard to distinguish visually, like `Jill` vs `JiII` (that's 2 edits of lowercase-ell to uppercase-eye, so the straight DL doesn't catch, short name so the normalized one doesn't catch it, but very similar looking letters, so the visual one catches it). I put a `DL <= 2` guard on the calculation so we don't have to perform the expensive visual check too often. I also added an ASCII check; I saw that's in the guidelines but doesn't appear to be implemented. I added some unit tests but not an integration test (out of lazyness / time constraints). P.S. https://ericphanson.github.io/VisualStringDistances.jl/dev/packagenames/ has some short discussion of VisualStringDistances for this problem, and https://github.com/ericphanson/VisualStringDistances.jl/tree/master/scripts/packagenames has some messy/exploratory code for playing around with distances and cutoffs. --- DL = Damerau–Levenshtein distance Co-authored-by: ericphanson <5846501+ericphanson@users.noreply.github.com> Co-authored-by: Eric Hanson <5846501+ericphanson@users.noreply.github.com>
|
Canceled. |
|
Does bors support squashing nowadays? Can you squash otherwise? |
|
Ah right, it does not (I looked into it a bit, and you can configure it to always squash or not, but not per PR, and we don’t have the “always squash” option set). I’ll squash it now. |
Bump version Lowercase edit distances, clean up code Fixes from review; update checks Add more details about name checks to README Include all clashes in error message Cleanup code Update Project.toml Co-authored-by: Dilum Aluthge <dilum@aluthge.com> Tweak wording Fix order in comment Add some logging in the tests for Travis Remove outer testset Restore outer testset, add more detailed logging around distance checks Allow inlining add unused keyword argument to fix call signature Always check ascii names Fix typo Fix another typo another swap `registry_head` -> `registry_master` `of` -> `for`
|
bors r+ |
274: Add ASCII check, distance check, visual distance check r=fredrikekre a=ericphanson My attempt to fix #10 and close #273. There's three semi-arbitrarily chosen cutoffs that might need more tuning, and if any are hit, then the package is flagged. 1. DL distance is <= 1 (which would catch Websocket vs Websockets) 2. A normalized DL distance catch long package names with more than 1 edit but only a few. I ended up going with a weird `5 + sqrt(max(len1, len2))` normalization just because it seemed like just dividing by the length made long packages get flagged too much. 3. Finally, there's the visual distance check, which can catch package names with more edits than allowed by the other checks if the edits are hard to distinguish visually, like `Jill` vs `JiII` (that's 2 edits of lowercase-ell to uppercase-eye, so the straight DL doesn't catch, short name so the normalized one doesn't catch it, but very similar looking letters, so the visual one catches it). I put a `DL <= 2` guard on the calculation so we don't have to perform the expensive visual check too often. I also added an ASCII check; I saw that's in the guidelines but doesn't appear to be implemented. I added some unit tests but not an integration test (out of lazyness / time constraints). P.S. https://ericphanson.github.io/VisualStringDistances.jl/dev/packagenames/ has some short discussion of VisualStringDistances for this problem, and https://github.com/ericphanson/VisualStringDistances.jl/tree/master/scripts/packagenames has some messy/exploratory code for playing around with distances and cutoffs. --- DL = Damerau–Levenshtein distance Co-authored-by: ericphanson <5846501+ericphanson@users.noreply.github.com>
|
Build failed:
|
|
Nightly is failing on commit 8c03bf7 (https://travis-ci.com/github/JuliaRegistries/RegistryCI.jl/jobs/398306656#L170) which has the latest Pkg version bump from JuliaLang/julia#37992, and it looks like something is going wrong with that. It seems were using Pkg internals in RegistryCI.jl/src/registry_testing.jl Line 51 in 61e9d46
|
My attempt to fix #10 and close #273. There's three semi-arbitrarily chosen cutoffs that might need more tuning, and if any are hit, then the package is flagged.
5 + sqrt(max(len1, len2))normalization just because it seemed like just dividing by the length made long packages get flagged too much.JillvsJiII(that's 2 edits of lowercase-ell to uppercase-eye, so the straight DL doesn't catch, short name so the normalized one doesn't catch it, but very similar looking letters, so the visual one catches it). I put aDL <= 2guard on the calculation so we don't have to perform the expensive visual check too often.I also added an ASCII check; I saw that's in the guidelines but doesn't appear to be implemented.
I added some unit tests but not an integration test (out of lazyness / time constraints).
P.S. https://ericphanson.github.io/VisualStringDistances.jl/dev/packagenames/ has some short discussion of VisualStringDistances for this problem, and https://github.com/ericphanson/VisualStringDistances.jl/tree/master/scripts/packagenames has some messy/exploratory code for playing around with distances and cutoffs.
DL = Damerau–Levenshtein distance