Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

url/url: Add URL parser #476

Merged
merged 1 commit into from Jun 21, 2023
Merged

url/url: Add URL parser #476

merged 1 commit into from Jun 21, 2023

Conversation

Zer0-One
Copy link
Collaborator

@Zer0-One Zer0-One commented Feb 9, 2023

WIP. Comments, questions, concerns, death threats?

@codecov
Copy link

codecov bot commented Feb 9, 2023

Codecov Report

Base: 86.91% // Head: 80.49% // Decreases project coverage by -6.43% ⚠️

Coverage data is based on head (cf9a2fb) compared to base (4e87c53).
Patch coverage: 21.21% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #476      +/-   ##
==========================================
- Coverage   86.91%   80.49%   -6.43%     
==========================================
  Files          84       85       +1     
  Lines        4066     4521     +455     
==========================================
+ Hits         3534     3639     +105     
- Misses        532      882     +350     
Impacted Files Coverage Δ
url/url.cpp 24.11% <20.00%> (-71.54%) ⬇️
util/string.h 80.00% <21.42%> (-16.71%) ⬇️
url/url.h 50.00% <50.00%> (ø)
util/base_parser.h 95.65% <100.00%> (-4.35%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Zer0-One Zer0-One force-pushed the master branch 9 times, most recently from 4e62335 to 0044280 Compare February 10, 2023 18:41
Copy link
Sponsor Contributor

@GrayHatter GrayHatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I started write a lot of extra comments that I've deleted, but then talked with @robinlinden and he thought that it was still reasonable to add some comments, if for no other reason than to start a conversation. Perhaps I'll readd the deleted comments at some point, depending on how this first set evolves.

I have a few qualms to how the spec defines URI parsing behavior. But I'm not sure which option is better, between trying to improve/reduce the implementation, or conforming exactly to the spec as written.

url/url.cpp Outdated Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
url/url.h Outdated Show resolved Hide resolved
@Zer0-One Zer0-One force-pushed the master branch 4 times, most recently from f7bf389 to cf9a2fb Compare February 13, 2023 04:06
@Zer0-One Zer0-One force-pushed the master branch 11 times, most recently from 5c48612 to bfb2280 Compare February 27, 2023 17:47
@Zer0-One Zer0-One force-pushed the master branch 2 times, most recently from 9c3959e to fc924cc Compare March 3, 2023 04:11
@Zer0-One Zer0-One force-pushed the master branch 3 times, most recently from f33bc79 to f055091 Compare March 26, 2023 18:53
url/url.h Outdated Show resolved Hide resolved
url/BUILD Outdated Show resolved Hide resolved
.bazelrc Show resolved Hide resolved
url/url.h Outdated Show resolved Hide resolved
url/url.h Outdated Show resolved Hide resolved
url/url_test.cpp Outdated Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
url/url.cpp Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
url/url.cpp Outdated Show resolved Hide resolved
@Zer0-One Zer0-One force-pushed the master branch 2 times, most recently from d5a50ef to 277e14f Compare April 27, 2023 15:52
@anonrig
Copy link

anonrig commented May 7, 2023

You could use https://github.com/ada-url/ada instead of implementing from scratch.

@robinlinden
Copy link
Owner

robinlinden commented May 8, 2023

@anonrig

You could use https://github.com/ada-url/ada instead of implementing from scratch.

Technically yes, but we won't for a few different reasons:

  • Relying on 3rd-party libraries for what's "core" in what we're doing doesn't seem like a good plan
  • ada-url didn't exist when Zer0-One started on this
  • I like that I can read our implementation side-by-side with the spec top-to-bottom and it all lines up (which is exactly what I did when reviewing for spec-compliance)
  • And most importantly, this is more fun

I'm assuming you had reasons to implement ada-url from scratch rather than using something like Boost.url as well. :P

@anonrig
Copy link

anonrig commented May 8, 2023

Yes, definitely makes sense.

I'm assuming you had reasons to implement ada-url from scratch rather than using something like Boost.url as well.

Yes, performance. Ada is 50% faster than Boost. https://www.yagiz.co/announcing-ada-url-parser-v2-0

And Boost follows RFC 3986 which is a lot different than WHATWG url spec

@Zer0-One
Copy link
Collaborator Author

Disabling bugprone-unchecked-optional-access from the clang-tidy config definitely stops clang-tidy from hanging. I disabled it as part of this PR.

@robinlinden robinlinden merged commit ded99b1 into robinlinden:master Jun 21, 2023
17 of 19 checks passed
@robinlinden
Copy link
Owner

I was able to reproduce the hang locally and got a call stack from clang-tidy-15:

#0  0x00007f1161df16a4 in ?? () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#1  0x00007f1161df0cef in clang::dataflow::WatchedLiteralsSolver::solve(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#2  0x00007f1161dded94 in clang::dataflow::DataflowAnalysisContext::querySolver(llvm::DenseSet<clang::dataflow::BoolValue*, llvm::DenseMapInfo<clang::dataflow::BoolValue*, void> >) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#3  0x00007f1161ddf010 in clang::dataflow::DataflowAnalysisContext::flowConditionImplies(clang::dataflow::AtomicBoolValue&, clang::dataflow::BoolValue&) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#4  0x00007f1161dfc06f in clang::dataflow::UncheckedOptionalAccessModel::merge(clang::QualType, clang::dataflow::Value const&, clang::dataflow::Environment const&, clang::dataflow::Value const&, clang::dataflow::Environment const&, clang::dataflow::Value&, clang::dataflow::Environment&) ()
   from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#5  0x00007f1161de5c15 in clang::dataflow::Environment::join(clang::dataflow::Environment const&, clang::dataflow::Environment::ValueModel&) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#6  0x00007f1161dec999 in clang::dataflow::transferBlock(clang::dataflow::ControlFlowContext const&, std::vector<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState>, std::allocator<llvm::Optional<clang::dataflow::TypeErasedDataflowAnalysisState> > >&, clang::CFGBlock const&, clang::dataflow::Environment const&, clang::dataflow::TypeErasedDataflowAnalysis&, std::function<void (clang::CFGStmt const&, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#7  0x00007f1161ded9f8 in clang::dataflow::runTypeErasedDataflowAnalysis(clang::dataflow::ControlFlowContext const&, clang::dataflow::TypeErasedDataflowAnalysis&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::TypeErasedDataflowAnalysisState const&)>) ()
   from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#8  0x000055cb35ab8624 in llvm::Expected<std::vector<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> >, std::allocator<llvm::Optional<clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> > > > > clang::dataflow::runDataflowAnalysis<clang::dataflow::UncheckedOptionalAccessModel>(clang::dataflow::ControlFlowContext const&, clang::dataflow::UncheckedOptionalAccessModel&, clang::dataflow::Environment const&, std::function<void (clang::Stmt const*, clang::dataflow::DataflowAnalysisState<clang::dataflow::UncheckedOptionalAccessModel::Lattice> const&)>) ()
#9  0x000055cb35ab8283 in clang::tidy::bugprone::UncheckedOptionalAccessCheck::check(clang::ast_matchers::MatchFinder::MatchResult const&) ()
#10 0x000055cb35f8afdc in ?? ()
#11 0x000055cb35fbeb1c in clang::ast_matchers::internal::BoundNodesTreeBuilder::visitMatches(clang::ast_matchers::internal::BoundNodesTreeBuilder::Visitor*) ()
#12 0x000055cb35f8aa0e in ?? ()
#13 0x000055cb35f8d4d1 in ?? ()
#14 0x000055cb35f8f44b in ?? ()
#15 0x000055cb35f8d628 in ?? ()
#16 0x000055cb35f95d5b in ?? ()
#17 0x000055cb35f8d509 in ?? ()
#18 0x000055cb35f5d461 in clang::ast_matchers::MatchFinder::matchAST(clang::ASTContext&) ()
#19 0x00007f116266dfdc in clang::MultiplexConsumer::HandleTranslationUnit(clang::ASTContext&) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#20 0x00007f1160a07e9b in clang::ParseAST(clang::Sema&, bool, bool) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#21 0x00007f116262fea7 in clang::FrontendAction::Execute() () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#22 0x00007f11625a2fb6 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#23 0x00007f1162859b11 in clang::tooling::FrontendActionFactory::runInvocation(std::shared_ptr<clang::CompilerInvocation>, clang::FileManager*, std::shared_ptr<clang::PCHContainerOperations>, clang::DiagnosticConsumer*) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#24 0x000055cb364957f6 in ?? ()
#25 0x00007f116285987f in clang::tooling::ToolInvocation::runInvocation(char const*, clang::driver::Compilation*, std::shared_ptr<clang::CompilerInvocation>, std::shared_ptr<clang::PCHContainerOperations>) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#26 0x00007f11628588df in clang::tooling::ToolInvocation::run() () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#27 0x00007f116285b3ae in clang::tooling::ClangTool::run(clang::tooling::ToolAction*) () from /usr/lib/llvm-15/bin/../lib/libclang-cpp.so.15
#28 0x000055cb36490a09 in clang::tidy::runClangTidy(clang::tidy::ClangTidyContext&, clang::tooling::CompilationDatabase const&, llvm::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem>, bool, bool, llvm::StringRef) ()
#29 0x000055cb3592dbb2 in clang::tidy::clangTidyMain(int, char const**) ()
#30 0x00007f1158a4bd90 in __libc_start_call_main (main=main@entry=0x55cb359289b0 <main>, argc=argc@entry=4, argv=argv@entry=0x7ffd366a60e8) at ../sysdeps/nptl/libc_start_call_main.h:58
#31 0x00007f1158a4be40 in __libc_start_main_impl (main=0x55cb359289b0 <main>, argc=4, argv=0x7ffd366a60e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd366a60d8) at ../csu/libc-start.c:392
#32 0x000055cb359288e5 in _start ()
(gdb) f 31
#31 0x00007f1158a4be40 in __libc_start_main_impl (main=0x55cb359289b0 <main>, argc=4, argv=0x7ffd366a60e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd366a60d8) at ../csu/libc-start.c:392
392     ../csu/libc-start.c: No such file or directory.
(gdb) p argv[3]
$1 = 0x7ffd366a7112 "/home/robin/code/hastur/url/url.cpp"

Looks like other people are hitting the same issue: llvm/llvm-project#55530, so I guess it'll stay disabled for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants