Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page fault in media_server_addon and media_server #3

Closed
diegoroux opened this issue Jul 13, 2024 · 10 comments
Closed

Page fault in media_server_addon and media_server #3

diegoroux opened this issue Jul 13, 2024 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@diegoroux
Copy link
Owner

media_server_addon and media_server sometimes stops due to a page_fault.

vm_page_fault: vm_soft_fault returned error 'Bad address' on fault at 0x24, ip 0x21f8ce74391, write 0, user 1, exec 0, thread 0x13a
debug_server: Thread 314 entered the debugger: Segment violation
stack trace, current PC 0x21f8ce74391  </boot/system/lib/libmedia.so> _ZN11BTimeSource11RealTimeForEll + 0x11:
  (0x7fbba958c3e0)  0x21f8ce5dde3  </boot/system/lib/libmedia.so> _ZN17BMediaEventLooper11ControlLoopEv + 0x183
  (0x7fbba958c4c0)  0x21f8ce5d9ef  </boot/system/lib/libmedia.so> _ZN17BMediaEventLooper19_ControlThreadStartEPv + 0x1f
  (0x7fbba958c4e0)  0x72a3668dd9  </boot/system/lib/libroot.so> thread_entry + 0x19

debug_server: Thread 329 entered the debugger: General protection fault
stack trace, current PC 0x46aa7cf11b  </boot/system/lib/libmedia.so> _ZN12BMediaRoster12GetLiveNodesEP14live_node_infoPiPK12media_formatS5_PKcm + 0x17b:
  (0x7f87769c41e0)  0x161b8c9378c  </boot/system/servers/media_server> _ZN14DefaultManager13_FindPhysicalEPVijb10media_type + 0x18c
  (0x7f87769c4f50)  0x161b8c93e64  </boot/system/servers/media_server> _ZN14DefaultManager13_RescanThreadEv + 0x1c4
  (0x7f87769c5000)  0x161b8c93f09  </boot/system/servers/media_server> _ZN14DefaultManager13rescan_threadEPv + 0x9
  (0x7f87769c5010)  0x15d8b45add9  </boot/system/lib/libroot.so> thread_entry + 0x19

Thread 329 seems to fail with this error message:

SendToPort: write_port failed, msg code 0x703, port 233: Bad port ID
NodeManager::_AcquireNodeReference: node 1 not found
NodeManager::GetCloneForID: couldn't increment ref count, node 1 team 213
NodeManager::_AcquireNodeReference: node 1 not found
@diegoroux
Copy link
Owner Author

diegoroux commented Jul 18, 2024

Thread 329 seems to crash as a result of thread 314's segment violation.

@diegoroux
Copy link
Owner Author

Screenshot 2024-07-16 at 23 53 31

@diegoroux
Copy link
Owner Author

BMediaEventLooper::ControlLoop calls BTimeSource::RealTimeFor here:

/* virtual */ void
BMediaEventLooper::ControlLoop()
{
		// [...]
		hasRealtime = fRealTimeQueue.HasEvents();
		hasEvent = fEventQueue.HasEvents();

		if (hasEvent) {
			waitUntil = TimeSource()->RealTimeFor(
				fEventQueue.FirstEventTime(),
				fEventLatency + fSchedulingLatency);
		} 
		// [...]
}

A BTimeSource instance is obtained with TimeSource:

BTimeSource*
BMediaNode::TimeSource() const
{
	// [...]
	BMediaNode* self = const_cast<BMediaNode*>(this);
	self->fTimeSource = MediaRosterEx(
		BMediaRoster::Roster())->MakeTimeSourceObject(fTimeSourceID);

	ASSERT(fTimeSource == self->fTimeSource);

	if (fTimeSource == 0) {
		ERROR("BMediaNode::TimeSource: MakeTimeSourceFor failed\n");
	} else {
		ASSERT(fTimeSourceID == fTimeSource->ID());
		fTimeSource->AddMe(self);
	}

	TRACE("BMediaNode::TimeSource node %" B_PRId32 " leave\n", ID());

	return fTimeSource;
}

TimeSource() allows for a null pointer to be returned, allowing RealTimeFor method to possibly be called with this == NULL.

@diegoroux
Copy link
Owner Author

BMediaRosterEx::MakeTimeSourceObject: GetNodeFor failed: Operation timed out

@diegoroux
Copy link
Owner Author

309: DEBUGGER: BufferCache::GetBuffer: IDs mismatch
debug_server: Thread 309 entered the debugger: Debugger call: `BufferCache::GetBuffer: IDs mismatch'
stack trace, current PC 0x10078bd70a9  </boot/system/lib/libroot.so> _kern_debugger + 0x9:
  (0x7f47be767160)  0x17491d8b16f  </boot/system/lib/libmedia.so> _ZN15BBufferConsumer13HandleMessageEiPKvm + 0x37f
  (0x7f47be767360)  0x17491d97bc3  </boot/system/lib/libmedia.so> _ZN10BMediaNode14WaitForMessageEljPv + 0x163
  (0x7f47be76b3c0)  0x17491d93e25  </boot/system/lib/libmedia.so> _ZN17BMediaEventLooper11ControlLoopEv + 0x1c5
  (0x7f47be76b4a0)  0x17491d939ef  </boot/system/lib/libmedia.so> _ZN17BMediaEventLooper19_ControlThreadStartEPv + 0x1f
  (0x7f47be76b4c0)  0x10078bd5dd9  </boot/system/lib/libroot.so> thread_entry + 0x19

@diegoroux
Copy link
Owner Author

Weeks went into fixing this, but after quite some testing, it seems like I've managed to fix this.

Matched with unreliable appearance of this page fault, it seems to be related to the recorded real_time and speed of the buffer_exchange returns.

diegoroux added a commit that referenced this issue Jul 24, 2024
Fixes #3, implements #2 and #1.

Implements suggestions from korli in mentioned issues.

Recording functionality was added, there's still
work to do, performace is not where it should be.

Signed-off-by: Diego Roux <diegoroux04@protonmail.com>
@korli
Copy link

korli commented Jul 24, 2024

I'd recommend against using threads, it's not needed. It can be directly implemented directly in buffer_exchange(), something like. Having the next timestamp in each stream would make it easier to wait on the next one. The usedLength might be needed for the record stream.

	void* cookie;
	uint32 usedLength;
	do {
		snooze_until(stream->real_time + 1000000L / stream->rate * FRAMES_PER_BUFFER, CLOCK_REALTIME);
		acquire_sem(info->txSem);
		if (info->virtio->queue_dequeue(info->txQueue, &cookie, &usedLength)) {
			stream->buffer_cycle = (stream->buffer_cycle + 1) % BUFFERS;
			stream->real_time = system_time();
			stream->frames_count += FRAMES_PER_BUFFER;
		}
		
		status = send_playback_buffer(info, stream);
	} while (status == B_BUSY);
	if (status != B_OK) {
		ERROR("playback failed (%s)\n", strerror(status));
		return status;
	}

	buf_info.playback_buffer_cycle = stream->buffer_cycle;
	buf_info.played_real_time = stream->real_time;
	buf_info.played_frames_count = stream->frames_count;

@diegoroux
Copy link
Owner Author

diegoroux commented Jul 24, 2024

I was trying to maintain a thread-free approach in the buffer-exchange, #3 proved to be quite the tricky bug to catch, I had to restart qemu 5000 times, to only have the bug appear 100 times. A nightmare.

If I did solve this issue, it seemed related two to things. One being I shouldn't have gone playback-first then recording, one of the issues stemmed from returning:

buf_info.recorded_real_time = 0; // or whatever garbage value we had there.
buf_info.recorded_frames_count = 0; // same goes here.

even though we were signalling: buf_info.flags = B_MULTI_BUFFER_PLAYBACK; exclusively.

The second part, seemed to be related to buffer_exchange being a bit slow at return times.

We differed, compared to other audio drivers, in not using interrupts nor threads to update:

stream->buffer_cycle = (stream->buffer_cycle + 1) % BUFFERS;
stream->real_time = system_time();
stream->frames_count += FRAMES_PER_BUFFER;

at 3AM trying this approach did not sound farfetched, and showed effects issue #3 seemed to be fixed and could finally get back into the desktop.

I decided to push this commit, with two goals in mind: to see if issue #3 wasn't just solved in my machine, and to show progress in fixing this issue (it did slow me down, quite a lot).

But I do agree with you, I'd rather have this in buffer_exchange(), which is what I'll be working up to as of now.

While maybe I shouldn't have pushed a half-fix, it worked but its probably not the best approach, I appreciate the feedback.

@korli
Copy link

korli commented Jul 25, 2024

Well, if you wait in buffer_exchange() then update buf_info only when queue_dequeue succeeds with a status OK, everything should line up.

@diegoroux
Copy link
Owner Author

diegoroux commented Jul 25, 2024

Thanks for all your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants