Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiprocessor support #154

Closed
ClawSeven opened this issue Mar 16, 2023 · 1 comment
Closed

Add multiprocessor support #154

ClawSeven opened this issue Mar 16, 2023 · 1 comment

Comments

@ClawSeven
Copy link
Contributor

No description provided.

@ClawSeven ClawSeven assigned ClawSeven and sdww0 and unassigned ClawSeven Mar 16, 2023
@tatetian tatetian changed the title Support multi-core Add multiprocessor support Jun 2, 2023
@tatetian tatetian mentioned this issue Jun 2, 2023
19 tasks
@LclclcdIsACat
Copy link
Contributor

LclclcdIsACat commented Sep 25, 2023

I think that achieving support for multi-processor primarily requires the completion of the following three sub-tasks:

  • Improve the boot process for multi-processors.
  • cpu local.
  • Refactor the scheduling mechanism to support multi-processor execution.
  • Implement load balancing across multiple processors.

Improve the boot process for multi-processors

At system startup, the CPU selects one processor as the Bootstrap Processor (BSP) and designates the other processors as Application Processors (APs). The BSP is responsible for executing the bootloader to initialize and bootstrap the system, while the APs are halted and waiting to be awakened by the BSP. Even when the system enters the operating system entry point, the APs are still waiting to be awakened. During initialization, the operating system sends an interrupt to awaken the other APs using the Local APIC on the BSP.

Therefore, the operating system needs to obtain basic information about the other cores, typically through the Advanced Configuration and Power Interface (ACPI) table, before they can be awakened.

The process for waking up the processors is as follows:

  • send an INIT (write 0x000C4500 to interrupt control register (ICR) );
  • wait for 10ms;
  • send the first SIPI (write 0x000C46__ to ICR, where the __ is filled with the address for initializing the AP);
  • wait for 200us;
  • if the processor is still not running, send a second SIPI (Linux does not perform this step, as it considers one SIPI to be sufficient).

There are some key points to keep in mind:

  • When the APs are awakened, they are in real mode and need to transition to protected mode.
  • Additionally, they can only access physical memory in the first 1MB, so it is crucial to ensure that the initialization code for the AP is located below 1MB and aligned on a 4KB boundary.
  • Finally, after setting up the GDT, the system can transition into long mode.

Refactor the scheduling mechanism to support multi-processor execution.

After initialization, the APs can start processing tasks by retrieving them from the global scheduler. However, only a portion of the system design can be reused in a multi-core environment, as other parts have not been designed with multi-core support in mind. For example, while GLOBAL_SCHEDULER can effectively handle parallel requests in a multi-processor environment, PROCESSOR only has a single global variable and may need to be turned into a CPU-local variable to support multi-processor architectures.

Furthermore, the design of the global scheduler also requires consideration. Generally, there are two types:

  • The single global queue model ( BFS ), where all processors process tasks in parallel, which can lead to significant lock contention and low cache utilization;
  • The multiple queues model (O(1) scheduler and CFS), where each processor has an independent queue and a task is assigned to a particular processor for its entire execution, in order to ensure cache affinity.

Implement load balancing across multiple processors.

Scheduling is particularly crucial for multi-processor systems. Proper task scheduling can ensure that each processor develops its full potential in multi-processor systems. When there is an uneven distribution of tasks, load balancing mechanisms are required to achieve balance. Overall, we need to consider two aspects:

  • How to achieve load balancing
  • When to perform load balancing.

How to achieve load balancing

To achieve load balancing in a multi-processor system, it mainly relies on task migration between different processors, which means transferring a task from a heavily loaded processor to a relatively lightly loaded processor for execution. The action of taking the task from the processor's run queue is called "pulling" while placing it onto another processor's run queue is called "pushing".

There is a cost associated with migration and this cost may vary. The more cache is shared, the lower the cost of migration. For example, migrating between two logical processors on the same physical processor incurs a relatively small cost because of the L1 and L2 cache shared between them.

Therefore, there is a scheduling logic called sched domain in Linux. For example, in hyper-threading, the logical processor is called sched group, and the physical processor is called sched domain. At the multi-socket level, a physical processor is a sched group, and a node is a sched domain.

When to perform load balancing

One of the most straightforward ideas is to perform load balancing when a new task is executed, a task is awakened, or a processor enters the idle state. These situations can be divided into two categories, when a task is ready to run (new task executed or awakened). In this case, we need to find the relatively most idle processor. When a processor enters the idle state, we need to find the busiest processor for task migration. Of course, determining whether a processor is busy or idle can be evaluated by considering a combination of ticks since the last idle state and the number of tasks in the ready queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants