Skip to content

Add support for writing HTML literals using UTF-8 strings#83457

Open
chsienki wants to merge 14 commits intodotnet:mainfrom
chsienki:chsienki/utf8-html-literals-refactor
Open

Add support for writing HTML literals using UTF-8 strings#83457
chsienki wants to merge 14 commits intodotnet:mainfrom
chsienki:chsienki/utf8-html-literals-refactor

Conversation

@chsienki
Copy link
Copy Markdown
Member

@chsienki chsienki commented Apr 28, 2026

Ports dotnet/razor#13052 onto dotnet/roslyn after the razor → roslyn repo merge (dotnet/roslyn#83444).
Original 9 commits preserved; paths rewritten to src/Razor/... and the source-generator test project folder renamed to Microsoft.NET.Sdk.Razor.SourceGenerators.UnitTests.

Summary

Builds on #12848 by @DamianEdwards, refactoring the UTF-8 HTML literal detection to use a pipeline-friendly pre-computed map approach.

When a .cshtml page's @inherits base class has a callable WriteLiteral(ReadOnlySpan<byte>) overload, HTML literals are emitted as C# UTF-8 string literals ("..."u8), enabling direct binding to the byte-span overload and avoiding UTF-16→UTF-8 transcoding at runtime.

Key changes from the original PR

  • Pre-computed Utf8SupportMap instead of per-file probe compilations -- the source generator extracts @inherits base type names from parsed syntax trees, combines with the declaration compilation to build a value-comparable map, and passes it to ProcessRemaining
  • No compilation reference in the project engine -- the map flows through the incremental pipeline as pure data with value equality, so downstream stages only re-run when UTF-8 support results actually change
  • IUtf8WriteLiteralFeature engine feature with DefaultUtf8WriteLiteralFeature implementation backed by the map
  • Moved Utf8WriteLiteralDetectionPass to CSharp namespace

Tests

  • End-to-end source generator tests with baseline verification (u8 vs string literals)
  • Mixed files: two .cshtml files with different @inherits, only one uses UTF-8
  • Incremental switching: overload added → u8, removed → string, in a single test
  • No @inherits directive → string literals (default base class)
  • MVC integration tests for @inherits detection

Closes dotnet/razor#8429

Microsoft Reviewers: Open in CodeFlow

chsienki and others added 9 commits April 28, 2026 11:59
Implement auto-detection of UTF-8 WriteLiteral support for legacy .cshtml
code generation. When a page's @inherits base class has a callable
WriteLiteral(ReadOnlySpan<byte>) overload, HTML literals are emitted as
C# UTF-8 string literals ("..."u8).
- FullyQualifiedInherits: namespaced type with fully-qualified @inherits
- ShortNameInherits_WithUsing: documents that short names don't resolve
  for UTF-8 detection (GetTypeByMetadataName requires full qualification)
- PartiallyQualifiedInherits: documents partial names don't resolve
- SwitchesWhenOverloadAddedOrRemoved: uses fresh drivers per edit step

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add GetInheritsDirectiveContent and GetUsingDirectives extension methods
  on RazorCodeDocument for extracting @inherits and @using directives
- Resolve short/aliased type names via augmented compilation with the
  document's @using directives when GetTypeByMetadataName fails
- Dual-lookup Utf8SupportMap: per-file (filePath -> FQN) + per-type
  (FQN -> bool) to handle same @inherits text resolving differently
- Use GetFullName() for metadata name formatting
- Call HasCallableUtf8WriteLiteralOverload via string overload to avoid
  cross-compilation symbol issues
- Add InheritsInfo nested record on DefaultUtf8WriteLiteralFeature
- Tests: short name with @using, alias via _ViewImports, file-level
  alias, alias shadowing (CS0576 graceful fallback), fully-qualified

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build one probe syntax tree with namespace-scoped usings for all entries
that need resolution, instead of creating a separate augmented compilation
per entry. This reduces O(N) AddSyntaxTrees calls to O(1).

- Two-pass Create: fast path via GetTypeByMetadataName, then batch slow path
- ResolveTypeNamesWithUsings takes CSharpCompilation directly
- Split pipeline: extract @inherits first, then usings only for files that need it
- Rename GetInheritsDirectiveContent to GetInheritsDirectiveValue
- Make InheritsInfo fields non-nullable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GetInheritsDirectiveValue() now searches import syntax trees when the
main document has no @inherits directive. The most specific _ViewImports
wins, and the page's own @inherits overrides everything.

Added tests for @inherits in _ViewImports (global and namespaced types)
and cascading _ViewImports with override precedence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The slow path for resolving @inherits type names previously skipped
entries with no Razor @using directives. Since .cshtml files always
have default MVC imports, this filter was ineffective. Removing it
ensures types resolvable via C# global usings or the compilation's
existing context are not missed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The slow path now uses GetFullMetadataName() which builds a proper CLR
metadata name (with backtick arity for generics and + for nested types)
instead of GetFullName() which produces C# display syntax that cannot
be resolved by GetTypeByMetadataName.

Added tests for generic base classes (single and multiple type params),
generics in namespaces, nested generics, and generics from metadata
references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@davidwengier davidwengier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks how I remember

- Use HashCodeCombiner in Utf8SupportMap.GetHashCode

- Replace DescendantNodes() with ChildNodes() over the shallow probe tree

- Drop the .ToArray() and iterate the namespace declarations with a foreach + explicit entryIndex

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 6, 2026 00:16
@chsienki chsienki enabled auto-merge (squash) May 6, 2026 00:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Razor support for emitting HTML literals as C# UTF-8 literals when a page base type exposes WriteLiteral(ReadOnlySpan<byte>), wiring that through the source-generator pipeline, codegen options, and tests.

Changes:

  • Precomputes per-file UTF-8 support from @inherits and passes it into code generation via a new IUtf8WriteLiteralFeature.
  • Threads a new WriteHtmlUtf8StringLiterals option through lowering and RuntimeNodeWriter so HTML literals can render as "..."u8.
  • Adds source-generator, integration, and node-writer coverage plus updated baselines.

Notable review findings:

  • UTF-8 literal emission is not gated on C# 11+, so projects pinned to older language versions will get invalid generated code.
  • The precomputed support map is built from raw @inherits text before TModel substitution, so common MVC @inherits MyBase<TModel> patterns will miss the new behavior.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Razor/src/Shared/Microsoft.AspNetCore.Razor.Test.Common/Language/CodeGeneration/TestCodeRenderingContext.cs Adds test hook for UTF-8 literal option.
src/Razor/src/Compiler/test/Microsoft.NET.Sdk.Razor.SourceGenerators.UnitTests/TestFiles/RazorSourceGeneratorCshtmlTests/Utf8HtmlLiterals_WithoutOverload_UsesStringLiterals/Pages/Index_cshtml.g.cs New baseline for non-UTF-8 emission.
src/Razor/src/Compiler/test/Microsoft.NET.Sdk.Razor.SourceGenerators.UnitTests/TestFiles/RazorSourceGeneratorCshtmlTests/Utf8HtmlLiterals_AutoDetectedFromInherits/Pages/Index_cshtml.g.cs New baseline for UTF-8 emission.
src/Razor/src/Compiler/test/Microsoft.NET.Sdk.Razor.SourceGenerators.UnitTests/RazorSourceGeneratorCshtmlTests.cs Adds source-generator scenario coverage.
src/Razor/src/Compiler/perf/Microsoft.AspNetCore.Razor.Microbenchmarks.Generator/Microsoft.AspNetCore.Razor.Microbenchmarks.Generator.csproj Updates perf harness transport package version.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/SourceGenerators/SourceGeneratorProjectEngine.cs Passes UTF-8 support map into final generation phase.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/SourceGenerators/RazorSourceGenerator.Helpers.cs Registers UTF-8 feature on generation engine.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/SourceGenerators/RazorSourceGenerator.cs Builds and wires the precomputed support map.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorProjectEngine.cs Adds UTF-8 detection pass to default features.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorCodeGenerationOptions.Flags.cs Adds option flag bit.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorCodeGenerationOptions.cs Exposes option and flag plumbing.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorCodeGenerationOptions.Builder.cs Adds builder surface for the new option.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorCodeDocumentExtensions.cs Extracts @inherits and @using data from syntax trees.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/DefaultRazorCSharpLoweringPhase.cs Uses per-document options during lowering.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/CodeGeneration/RuntimeNodeWriter.cs Emits HTML literals using UTF-8 option.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/CodeGeneration/CodeWriterExtensions.cs Adds UTF-8 suffix support to string literal writer.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/CSharp/Utf8WriteLiteralDetectionPass.cs New pass that enables UTF-8 literals per file.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/CSharp/IUtf8WriteLiteralFeature.cs New feature contract for support checks.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/CSharp/DefaultUtf8WriteLiteralFeature.cs Implements support-map based detection logic.
src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/CSharp/CompilationExtensions.cs Adds overload detection helpers on compilations.
src/Razor/src/Compiler/Microsoft.AspNetCore.Razor.Language/test/RazorProjectEngineTest.cs Verifies default feature set includes new pass.
src/Razor/src/Compiler/Microsoft.AspNetCore.Razor.Language/test/CodeGeneration/RuntimeNodeWriterTest.cs Adds node-writer UTF-8 output tests.
src/Razor/src/Compiler/Microsoft.AspNetCore.Mvc.Razor.Extensions/test/IntegrationTests/CodeGenerationIntegrationTest.cs Adds integration coverage for UTF-8 literal emission.

Comment on lines +43 to +46
var baseTypeName = baseType.BaseType.Content;
if (_utf8Feature.IsSupported(codeDocument.Source.FilePath, baseTypeName))
{
documentNode.Options = documentNode.Options.WithFlags(writeHtmlUtf8StringLiterals: true);
Comment on lines +271 to +285
var utf8SupportMap = parsedDocuments
.Select(static (item, _) =>
{
var codeDocument = item.Item3.CodeDocument;
return (codeDocument, InheritsValue: codeDocument.GetInheritsDirectiveValue());
})
.Where(static item => item.InheritsValue is not null)
.Select(static (item, _) => new DefaultUtf8WriteLiteralFeature.InheritsInfo(
item.codeDocument.Source.FilePath ?? string.Empty, item.InheritsValue!, item.codeDocument.GetUsingDirectives()))
.Collect()
.Combine(declCompilation)
.Select(static (pair, _) =>
{
var (inheritsInfos, compilation) = pair;
return DefaultUtf8WriteLiteralFeature.Utf8SupportMap.Create(inheritsInfos, compilation);
chsienki and others added 2 commits May 5, 2026 18:23
- Utf8SupportMap.Create returns Empty when the consuming compilation is not C# 11+ so older projects don't get invalid 'u8' literals.

- Add 'using TModel = global::System.Object;' to each probe namespace so '@inherits Base<TModel>' (paired with @model) still resolves; WriteLiteral overloads don't depend on the model type argument.

- Add tests covering both fixes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…literals-refactor

# Conflicts:
#	src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/CodeGeneration/RuntimeNodeWriter.cs
#	src/Razor/src/Compiler/Microsoft.CodeAnalysis.Razor.Compiler/src/Language/RazorProjectEngine.cs
Copilot AI review requested due to automatic review settings May 6, 2026 01:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Comment on lines +50 to +61
for (var currentType = type; currentType is not null; currentType = currentType.BaseType)
{
foreach (var member in currentType.GetMembers("WriteLiteral"))
{
if (member is IMethodSymbol
{
IsStatic: false,
ReturnsVoid: true,
Parameters: [{ Type: var paramType }]
} method &&
SymbolEqualityComparer.Default.Equals(paramType, readOnlySpanOfByte) &&
compilation.IsSymbolAccessibleWithin(method, type))
Comment on lines +118 to +126
var usings = new List<string>();
CollectUsings(syntaxTree, usings);

if (codeDocument.TryGetImportSyntaxTrees(out var importSyntaxTrees))
{
foreach (var importTree in importSyntaxTrees)
{
CollectUsings(importTree, usings);
}
chsienki and others added 2 commits May 6, 2026 17:04
- HasCallableUtf8WriteLiteralOverload no longer treats the base type itself as the lookup context. Private (and assembly-restricted internal) overloads on a referenced base now correctly fall back to string literals instead of producing inaccessible 'u8' calls.

- GetUsingDirectives now returns import usings before page usings, matching DefaultRazorIntermediateNodeLoweringPhase. Probe compilations now bind aliases that originate in _ViewImports.cshtml the same way the final generated code does.

- Add tests for private/internal-on-referenced-assembly fallback, protected detection, and import-defined alias resolution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Roslyn-wide BannedSymbols.txt forbids the (string, ...) overload in favor of the (SourceText, ...) overload. The Correctness_Analyzers CI leg builds with --runanalyzers --warnaserror so this fails the build.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 22:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 3 comments.

Comment on lines +120 to +132
var usings = new List<string>();

if (codeDocument.TryGetImportSyntaxTrees(out var importSyntaxTrees))
{
foreach (var importTree in importSyntaxTrees)
{
CollectUsings(importTree, usings);
}
}

CollectUsings(syntaxTree, usings);

return [.. usings];
Comment on lines +19 to +31
/// <summary>
/// Determines whether the type identified by <paramref name="typeMetadataName"/> has a callable
/// instance <c>WriteLiteral(ReadOnlySpan&lt;byte&gt;)</c> overload accessible from that type.
/// </summary>
public static bool HasCallableUtf8WriteLiteralOverload(this Compilation compilation, string typeMetadataName)
{
var type = compilation.GetTypeByMetadataName(typeMetadataName);
if (type is null || type.TypeKind == TypeKind.Error)
{
return false;
}

return compilation.HasCallableUtf8WriteLiteralOverload(type);
Comment on lines +59 to +63
/// <item>Per-file: maps <c>(filePath, rawInheritsText)</c> to a fully-qualified type name</item>
/// <item>Per-type: maps fully-qualified type name to <see langword="bool"/></item>
/// </list>
/// This handles cases where the same <c>@inherits</c> text resolves to different types
/// in different files (e.g., via <c>@using</c> aliases).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to opt-in to HTML literals being written as UTF8 string literals in generated class files

4 participants