JerryScript 源码学习相关 #479

cisen · 2019-06-14T14:31:20Z

说明

https://github.com/pando-project/jerryscript
https://github.com/cisen/sourcecode-jerryscript-

编译过程

环境要求

目前，官方只支持Ubuntu 14.04+作为主要的开发环境
编译主要靠：
gcc or any C99-compliant compiler (native or cross, e.g., arm-none-eabi)
cmake >= 2.8.12.2
构建过程控制需要用到
bash >= 4.3.11
cppcheck >= 1.61
vera++ >= 1.2.1
python >= 2.7.6

# 安装完依赖就可以编译了
sudo apt-get install gcc gcc-arm-none-eabi  make cmake cppcheck vera++ python python-pip

vscode要安装
python/c++/cmake插件

为了保证构建脚本运行正确，请保证以下工具系统有提供
awk
bc
find
sed

编译过程

生成configure命令，注意这里使用cmake：

configure_cmake_cmd ['cmake', '-B/home/cisen/桌面/develop/jerryscript/sourcecode-jerryscript/build', '-H/home/cisen/桌面/develop/jerryscript/sourcecode-jerryscript']

生成make命令

cmake_cmd ['make', '--no-print-directory', '-j', '9', '-C', '/home/cisen/桌面/develop/jerryscript/sourcecode-jerryscript/build']

注意：

构建只使用了build.py一个文件来生成构建命令
整个python只是为了配置环境，生成对应的make命令并执行

运行

demo构建过程记录

注意

jerryscript.h文件的路径：jerry-core\include\jerryscript.h
所有的js token的type存储在jerry-core/parser/js/js-lexer.h的lexer_token_type_t结构

总结

其实就是编译为c语言代码，然后再通过gcc编译为汇编被机器执行
Jerryscript vm实现的就是js的一个执行环境，最后还是使用c语言执行
仅支持UTF8编码的js代码

The text was updated successfully, but these errors were encountered:

cisen · 2019-06-16T14:14:51Z

源码

数据结构

JerryScript context 全局环境

/**
 * JerryScript context
 *

jerry-core\jcontext\jcontext.h

 * The purpose of this header is storing
 * all global variables for Jerry
 * 
 * 这个堆的目的是存储jerry的所有全局变量
 */
struct jerry_context_t
{
  /* The value of external context members must be preserved across initializations and cleanups. */
  // 必须在初始化和清理之间的的时候保留外部上下文成员的值。
#ifdef JERRY_ENABLE_EXTERNAL_CONTEXT
#ifndef JERRY_SYSTEM_ALLOCATOR
  jmem_heap_t *heap_p; /**< point to the heap aligned to JMEM_ALIGNMENT. 指向与JMEM_ALIGNMENT对齐的堆。 */
  uint32_t heap_size; /**< size of the heap */
#endif /* !JERRY_SYSTEM_ALLOCATOR */
#endif /* JERRY_ENABLE_EXTERNAL_CONTEXT */

  /* Update JERRY_CONTEXT_FIRST_MEMBER if the first non-external member changes */
  // 如果第一个非外部成员更改，请更新JERRY_CONTEXT_FIRST_MEMBER
  ecma_object_t *ecma_builtin_objects[ECMA_BUILTIN_ID__COUNT]; /**< pointer to instances of built-in objects 指向内置对象实例的指针 */
#if ENABLED (JERRY_BUILTIN_REGEXP)
  const re_compiled_code_t *re_cache[RE_CACHE_SIZE]; /**< regex cache 正则缓存 */
#endif /* ENABLED (JERRY_BUILTIN_REGEXP) */
  ecma_object_t *ecma_gc_objects_p; /**< List of currently alive objects. 当前活动的对象列表。 */
  jmem_heap_free_t *jmem_heap_list_skip_p; /**< This is used to speed up deallocation. 这用于加速释放堆缓存。 */
  jmem_pools_chunk_t *jmem_free_8_byte_chunk_p; /**< list of free eight byte pool chunks 空余的八个字节的池块列表 */
#ifdef JERRY_CPOINTER_32_BIT
  jmem_pools_chunk_t *jmem_free_16_byte_chunk_p; /**< list of free sixteen byte pool chunks 空余的16个字节的池块列表 */
#endif /* JERRY_CPOINTER_32_BIT */
  jmem_free_unused_memory_callback_t jmem_free_unused_memory_callback; /**< Callback for freeing up memory. 释放内存的回调。 */
  const lit_utf8_byte_t * const *lit_magic_string_ex_array; /**< array of external magic strings 外部魔术字符串数组 */
  const lit_utf8_size_t *lit_magic_string_ex_sizes; /**< external magic string lengths 外部魔术弦长 */
  ecma_lit_storage_item_t *string_list_first_p; /**< first item of the literal string list 常量字符串列表的第一项 */
#if ENABLED (JERRY_ES2015_BUILTIN_SYMBOL)
  ecma_lit_storage_item_t *symbol_list_first_p; /**< first item of the global symbol list 全局符号列表的第一项 */
#endif /* ENABLED (JERRY_ES2015_BUILTIN_SYMBOL) */
  ecma_lit_storage_item_t *number_list_first_p; /**< first item of the literal number list 常量数字列表的第一项 */
  ecma_object_t *ecma_global_lex_env_p; /**< global lexical environment 全局ecma常量环境 */

#if ENABLED (JERRY_ES2015_MODULE_SYSTEM)
  ecma_module_t *ecma_modules_p; /**< list of referenced modules 引用模块的列表 */
  ecma_module_context_t *module_top_context_p; /**< top (current) module parser context 顶级（当前）模块解析器上下文 */
#endif /* ENABLED (JERRY_ES2015_MODULE_SYSTEM) */

  vm_frame_ctx_t *vm_top_context_p; /**< top (current) interpreter context 顶级（当前）解释器上下文 */
  jerry_context_data_header_t *context_data_p; /**< linked list of user-provided context-specific pointers 用户提供的特定上下文的指针列表 */
  size_t ecma_gc_objects_number; /**< number of currently allocated objects 当前分配对象的数量 */
  size_t ecma_gc_new_objects; /**< number of newly allocated objects since last GC session 自上次GC会话以来新分配的对象数 */
  size_t jmem_heap_allocated_size; /**< size of allocated regions 分配区域的大小 */
  size_t jmem_heap_limit; /**< current limit of heap usage, that is upon being reached,
                           *   causes call of "try give memory back" callbacks 当前堆使用的限制，快达到最大限制时，会导致调用“try give memory back”回调 */
  ecma_value_t error_value; /**< currently thrown error value 目前抛出的错误值 */
  uint32_t lit_magic_string_ex_count; /**< external magic strings count 外部魔术字符数 */
  uint32_t jerry_init_flags; /**< run-time configuration flags 运行时配置标志 */
  uint32_t status_flags; /**< run-time flags (the top 8 bits are used for passing class parsing options) 运行时标志（前8位用于传递类解析选项） */

#ifndef CONFIG_ECMA_PROPERTY_HASHMAP_DISABLE
  uint8_t ecma_prop_hashmap_alloc_state; /**< property hashmap allocation state: 0-4,
                                          *   if !0 property hashmap allocation is disabled 属性hashmap分配状态：0-4，如果！0属性hashmap分配被禁用 */
#endif /* !CONFIG_ECMA_PROPERTY_HASHMAP_DISABLE */

#if ENABLED (JERRY_BUILTIN_REGEXP)
  uint8_t re_cache_idx; /**< evicted item index when regex cache is full (round-robin) 正则缓存已满时推出的项的索引（循环法） */
#endif /* ENABLED (JERRY_BUILTIN_REGEXP) */

#if ENABLED (JERRY_ES2015_BUILTIN_PROMISE)
  ecma_job_queueitem_t *job_queue_head_p; /**< points to the head item of the jobqueue 指向工作队列的头部项 */
  ecma_job_queueitem_t *job_queue_tail_p; /**< points to the tail item of the jobqueue 指向工作队列的尾部项 */
#endif /* ENABLED (JERRY_ES2015_BUILTIN_PROMISE) */

#ifdef JERRY_VM_EXEC_STOP
  uint32_t vm_exec_stop_frequency; /**< reset value for vm_exec_stop_counter */
  uint32_t vm_exec_stop_counter; /**< down counter for reducing the calls of vm_exec_stop_cb */
  void *vm_exec_stop_user_p; /**< user pointer for vm_exec_stop_cb */
  ecma_vm_exec_stop_callback_t vm_exec_stop_cb; /**< user function which returns whether the
                                                 *   ECMAScript execution should be stopped */
#endif /* JERRY_VM_EXEC_STOP */

#ifdef VM_RECURSION_LIMIT
  uint32_t vm_recursion_counter;  /**< VM recursion counter */
#endif /* VM_RECURSION_LIMIT */

#ifdef JERRY_DEBUGGER
  uint8_t debugger_send_buffer[JERRY_DEBUGGER_TRANSPORT_MAX_BUFFER_SIZE]; /**< buffer for sending messages */
  uint8_t debugger_receive_buffer[JERRY_DEBUGGER_TRANSPORT_MAX_BUFFER_SIZE]; /**< buffer for receiving messages */
  jerry_debugger_transport_header_t *debugger_transport_header_p; /**< head of transport protocol chain */
  uint8_t *debugger_send_buffer_payload_p; /**< start where the outgoing message can be written */
  vm_frame_ctx_t *debugger_stop_context; /**< stop only if the current context is equal to this context */
  uint8_t *debugger_exception_byte_code_p; /**< Location of the currently executed byte code if an
                                            *   error occours while the vm_loop is suspended */
  jmem_cpointer_t debugger_byte_code_free_head; /**< head of byte code free linked list */
  jmem_cpointer_t debugger_byte_code_free_tail; /**< tail of byte code free linked list */
  uint32_t debugger_flags; /**< debugger flags */
  uint16_t debugger_received_length; /**< length of currently received bytes */
  uint8_t debugger_message_delay; /**< call receive message when reaches zero */
  uint8_t debugger_max_send_size; /**< maximum amount of data that can be sent */
  uint8_t debugger_max_receive_size; /**< maximum amount of data that can be received */
#endif /* JERRY_DEBUGGER */

#ifdef JERRY_ENABLE_LINE_INFO
  ecma_value_t resource_name; /**< resource name (usually a file name) */
#endif /* JERRY_ENABLE_LINE_INFO */

#ifdef JMEM_STATS
  jmem_heap_stats_t jmem_heap_stats; /**< heap's memory usage statistics */
#endif /* JMEM_STATS */

  /* This must be at the end of the context for performance reasons */
#ifndef CONFIG_ECMA_LCACHE_DISABLE
  /** hash table for caching the last access of properties 哈希表，用于缓存属性的最后一次访问 */
  ecma_lcache_hash_entry_t lcache[ECMA_LCACHE_HASH_ROWS_COUNT][ECMA_LCACHE_HASH_ROW_LENGTH];
#endif /* !CONFIG_ECMA_LCACHE_DISABLE */
};

context_p

/**
 * Those members of a context which needs
 * to be saved when a sub-function is parsed.
 * jerry-core/parser/js/js-parser-internal.h
 * 解析子函数时需要保存的上下文成员。
 */
typedef struct parser_saved_context_t
{
  /* Parser members. parser成员 */
  uint32_t status_flags;                      /**< parsing options parser 选项 */
  uint16_t stack_depth;                       /**< current stack depth 当前栈深度 */
  uint16_t stack_limit;                       /**< maximum stack depth 最大栈深度 */
  struct parser_saved_context_t *prev_context_p; /**< last saved context 最新保存的context */
  parser_stack_iterator_t last_statement;     /**< last statement position 最新声明的位置 */

  /* Literal types */
  uint16_t argument_count;                    /**< number of function arguments 函数参数的数量 */
  uint16_t register_count;                    /**< number of registers 寄存器的数量 */
  uint16_t literal_count;                     /**< number of literals 常量的数量 */

  /* Memory storage members. */
  parser_mem_data_t byte_code;                /**< byte code buffer 字节码内存 */
  uint32_t byte_code_size;                    /**< byte code size for branches branch字节码大小 */
  parser_mem_data_t literal_pool_data;        /**< literal list 常量列表 */

#ifndef JERRY_NDEBUG
  uint16_t context_stack_depth;               /**< current context stack depth 当前context的stack的深度 */
#endif /* !JERRY_NDEBUG */
} parser_saved_context_t;

/**
 * Shared parser context.
 * 共享的parser上下文，全局对象
 */
typedef struct
{
  PARSER_TRY_CONTEXT (try_buffer);            /**< try_buffer */
  parser_error_t error;                       /**< error code */
  void *allocated_buffer_p;                   /**< dinamically allocated buffer 动态分配缓冲区 
                                               *   which needs to be freed on error 需要在出错时释放 */
  uint32_t allocated_buffer_size;             /**< size of the dinamically allocated buffer 动态分配缓冲区的大小 */

  /* Parser members. parser阶段的成员 */
  uint32_t status_flags;                      /**< status flags 状态标记 */
  uint16_t stack_depth;                       /**< current stack depth 当前栈的深度 */
  uint16_t stack_limit;                       /**< maximum stack depth 最大栈深度 */
  parser_saved_context_t *last_context_p;     /**< last saved context 上一次保存的context指针 */
  parser_stack_iterator_t last_statement;     /**< last statement position  上一次发表的位置 */

#if ENABLED (JERRY_ES2015_MODULE_SYSTEM)
  ecma_module_node_t *module_current_node_p;  /**< import / export node that is being processed 正在处理的import 或 export 节点 */
  lexer_literal_t *module_identifier_lit_p;   /**< the literal for the identifier of the current element 当前元素标识符的文字 */
#endif /* ENABLED (JERRY_ES2015_MODULE_SYSTEM) */

  /* Lexer members. lexer的成员 */
  lexer_token_t token;                        /**< current token 当前token */
  lexer_lit_object_t lit_object;              /**< current literal object 当前词对象 */
  const uint8_t *source_p;                    /**< next source byte 下一个代码的数据 */
  const uint8_t *source_end_p;                /**< last source byte 上一个代码的数据 */
  parser_line_counter_t line;                 /**< current line 当前lexe的行 */
  parser_line_counter_t column;               /**< current column 当前lexe的列 */

  /* Compact byte code members. 压缩字节码的成员 */
  cbc_argument_t last_cbc;                    /**< argument of the last cbc 上一个字节码的参数 */
  uint16_t last_cbc_opcode;                   /**< opcode of the last cbc 上一个字节码的操作码 */

  /* Literal types 词的类型 */
  uint16_t argument_count;                    /**< number of function arguments 函数参数的数量 */
  uint16_t register_count;                    /**< number of registers 寄存器的数量 */
  uint16_t literal_count;                     /**< number of literals 文字的数量 */

  /* Memory storage members. 内存存储成员。 */
  parser_mem_data_t byte_code;                /**< byte code buffer 字节码内存 */
  uint32_t byte_code_size;                    /**< current byte code size for branches 当前branches字节代码大小  */
  parser_list_t literal_pool;                 /**< literal list 常量列表*/
  parser_mem_data_t stack;                    /**< storage space 存储空间 */
  parser_mem_page_t *free_page_p;             /**< space for fast allocation 可快速分配的空间 */
  uint8_t stack_top_uint8;                    /**< top byte stored on the stack 存储在堆栈中的顶部字节 */

#ifndef JERRY_NDEBUG
  /* Variables for debugging / logging. */
  uint16_t context_stack_depth;               /**< current context stack depth 当前context栈的深度 */
#endif /* !JERRY_NDEBUG */

#ifdef PARSER_DUMP_BYTE_CODE
  int is_show_opcodes;                        /**< show opcodes 展示操作码 */
  uint32_t total_byte_code_size;              /**< total byte code size 总共代码大小 */
#endif /* PARSER_DUMP_BYTE_CODE */

#ifdef JERRY_DEBUGGER
  parser_breakpoint_info_t breakpoint_info[PARSER_MAX_BREAKPOINT_INFO_COUNT]; /**< breakpoint info list */
  uint16_t breakpoint_info_count; /**< current breakpoint index */
  parser_line_counter_t last_breakpoint_line; /**< last line where breakpoint has been inserted */
#endif /* JERRY_DEBUGGER */

#ifdef JERRY_ENABLE_LINE_INFO
  parser_line_counter_t last_line_info_line; /**< last line where line info has been inserted */
#endif /* JERRY_ENABLE_LINE_INFO */
} parser_context_t;

/**
 * Context of interpreter, related to a JS stack frame
 * 解析器的全局上下文，根JS栈帧相关
    栈帧 jerry-core/vm/vm-defines.h
 */
typedef struct vm_frame_ctx_t
{
  const ecma_compiled_code_t *bytecode_header_p;      /**< currently executed byte-code data 当前执行的字节码数据 */
  uint8_t *byte_code_p;                               /**< current byte code pointer 当前字节码指针 */
  uint8_t *byte_code_start_p;                         /**< byte code start pointer 字节码凯撒指针 uint32_t */
  ecma_value_t *registers_p;                          /**< register start pointer 注册开始指针 uint32_t  4字节 */
  ecma_value_t *stack_top_p;                          /**< stack top pointer 栈顶指针 uint32_t 4字节 */
  ecma_value_t *literal_start_p;                      /**< literal list start pointer 字面量列表开始指针*/
  ecma_object_t *lex_env_p;                           /**< current lexical environment 当前lexer环境 */
  struct vm_frame_ctx_t *prev_context_p;              /**< previous context 之前一个context */
  ecma_value_t this_binding;                          /**< this binding */
  ecma_value_t block_result;                          /**< block result 块结果 */
#ifdef JERRY_ENABLE_LINE_INFO
  ecma_value_t resource_name;                         /**< current resource name (usually a file name) 当前资源的名称，通常是文件名 */
  uint32_t current_line;                              /**< currently executed line 当前执行的行 */
#endif /* JERRY_ENABLE_LINE_INFO */
  uint16_t context_depth;                             /**< current context depth 当前context的深度 */
  uint8_t is_eval_code;                               /**< eval mode flag eval摸索标识 */
  uint8_t call_operation;                             /**< perform a call or construct operation 执行调用或构造操作  */
} vm_frame_ctx_t;

每个token的数据结构

/**
 * Literal data.
 * 常量数据结构，单个token的数据结构
 */
typedef struct
{
  union
  {
    ecma_value_t value;                  /**< literal value (not processed by the parser) 常量值（未由解析器处理） */
    const uint8_t *char_p;               /**< character value 字母值 */
    ecma_compiled_code_t *bytecode_p;    /**< compiled function or regexp pointer 编译函数或正则表达式的指针 */
    uint32_t source_data;                /**< encoded source literal 编码的源常量 */
  } u;

#ifdef PARSER_DUMP_BYTE_CODE
  struct
#else /* !PARSER_DUMP_BYTE_CODE */
  union
#endif /* PARSER_DUMP_BYTE_CODE */
  {
    prop_length_t length;                /**< length of ident / string literal 关键字的长度 */
    uint16_t index;                      /**< real index during post processing 在提交过程中真实的index */
  } prop;

  uint8_t type;                          /**< type of the literal 常量的类型 */
  uint8_t status_flags;                  /**< status flags 状态标志 */
} lexer_literal_t;

var  aa = {
    token:  {
        type: 0, // 这是一个指针，是token的类型，lexer_token_type_t其中的一员
        literal_is_reserved: 0, // 指针
        extra_value: 255, // 指针，一般是‘\377’
        flags: 0, // 指针
        line: 1, // 数值，代码的第几行
        column: 9, // number, 代码在第几个，相当于列的位置
    },
    // `source_p[0]`就是`lexer_next_token`函数中分割出来要判断token type的代码片段开始位，是一个指针，具体词语可以根据line，column和length定位
    source_p,
    // 这个好像是lexe后的结果，是一个iterator
    // page_p就是stack的一员
    stack: {
       // 这个好
        first_p,
        last_p,
        lastpositon
    },
    byte_code: {},
    lit_object: {},
    free_page_p: {}
}

jerry_global_heap

/**
 *  Free region node
 * 自由区域节点
jerry-core\jmem\jmem.h
 */
typedef struct
{
  uint32_t next_offset; /**< Offset of next region in list 列表中下一个区域的偏移量 */
  uint32_t size; /**< Size of region 区域大小 */
} jmem_heap_free_t;

// jerry-core\jcontext\jcontext.h
struct jmem_heap_t
{
  jmem_heap_free_t first; /**< first node in free region list 自由区域列表中的第一个节点 */
  uint8_t area[JMEM_HEAP_AREA_SIZE]; /**< heap area 头部区域 */
};

parser

byte code是字节码的意思
就是说，lexer完了就是直接转化为字节码？

lexer

parser_parse_statements

lexer是有语句和单词的区分，这个函数只负责语句部分的循环，结束一个语句后再次循环通过switch到下一个语句。知道代码的最后
这个函数有个switch，负责分配不同语句不同的处理方法。而语句的处理负责单词的分割，ast节点的生成。
代码的循环主要是在parser_parse_statements函数里面，然后每个语句也有每个字的循环遍历。

lexer_next_token

这个函数的作用：

先通过lexer_parse_identifier函数分割单词（line，column，length）
根据单词找到的token类型，修改全局变量context_p.token，然后修改source_p的长度，指向下一个token

lexer_skip_spaces

筛选截取到的代码段属于哪个token type，然后通过修改type，flags，line和column跳过。单纯就是跳过空格token而已

lexer_parse_identifier

这个函数是关键字和参数的寻找，context_p下有三个字段跟关键字定位相关，分别是：line（行），column（列），length（长度）。原代码语句存储在source_p里面

lexer_parse_string

这个也很关键，循环确定变量名称

字节码

Byte定义在jerry-core\parser\js\js-parser-limits.h

ecma_value_t

说明，

/**
 * Description of an ecma value
 * 描述一个ecma值
 *
 * Bit-field structure: type (3) | value (29)
 * 字段位结构体：类型占3位，值占29
    比如72是64+8 = 1001000，所以类型是000
 */
typedef uint32_t ecma_value_t;

类型有：

/**
 * Type of ecma value
 * ecma值的类型
 */
typedef enum
{
  ECMA_TYPE_DIRECT = 0, /**< directly encoded value, a 28 bit signed integer or a simple value 直接编码数值，是一个28位有符号整数或在一个简单的值 */
  ECMA_TYPE_STRING = 1, /**< pointer to description of a string 描述一个字符串的指针 */
  ECMA_TYPE_FLOAT = 2, /**< pointer to a 64 or 32 bit floating point number 32或64位浮点小数的指针 */
  ECMA_TYPE_OBJECT = 3, /**< pointer to description of an object 对象的指针 */
  ECMA_TYPE_SYMBOL = 4, /**< pointer to description of a symbol symbol的指针  */
  ECMA_TYPE_DIRECT_STRING = 5, /**< directly encoded string values 直接编码的字符串值 */
  ECMA_TYPE_ERROR = 7, /**< pointer to description of an error reference (only supported by C API) 指向错误引用描述的指针（仅由C API支持） */
  ECMA_TYPE_POINTER = ECMA_TYPE_ERROR, /**< a generic aligned pointer 通用对齐指针 */
  ECMA_TYPE_SNAPSHOT_OFFSET = ECMA_TYPE_ERROR, /**< offset to a snapshot number/string 偏移到快照数字/字符串 */
  ECMA_TYPE___MAX = ECMA_TYPE_ERROR /** highest value for ecma types ecma类型的最高价值 */
} ecma_type_t;

parser阶段的内存结构

/**
 * All data allocated by the parser is
 * stored in parser_data_pages in the memory.
 * 
 * 解析器分配的所有数据堆都是存储在内存中的parser_data_pages中。
 */
typedef struct parser_mem_page_t
{
  struct parser_mem_page_t *next_p;           /**< next page 下一页 */
  uint8_t bytes[1];                           /**< memory bytes 内存字节，就是存储的数据 */
} parser_mem_page_t;

/**
 * Structure for managing parser memory.
 * 
 *  用于管理解析器内存的结构。其实就是一个连表
 */
typedef struct
{
  parser_mem_page_t *first_p;                 /**< first allocated page 第一个分配的页 */
  parser_mem_page_t *last_p;                  /**< last allocated page 最后分配的页 */
  uint32_t last_position;                     /**< position of the last allocated byte 最后分配字节的位置 */
} parser_mem_data_t;

/**
 * Parser memory list.
 * 解析器内存链表
 */
typedef struct
{
  parser_mem_data_t data;                     /**< storage space 存储空间*/
  uint32_t page_size;                         /**< size of each page 每一页大小 */
  uint32_t item_size;                         /**< size of each item 每一项大小 */
  uint32_t item_count;                        /**< number of items on each page 每一页有多少项 */
} parser_list_t;

/**
 * Iterator for parser memory list.
 * parser内存列表
 */
typedef struct
{
  parser_list_t *list_p;                      /**< parser list parser 列表 可以迭代存储书记*/
  parser_mem_page_t *current_p;               /**< currently processed page 当前处理的页，可以迭代存储 */
  size_t current_position;                    /**< current position on the page 这页当前的位置 */
} parser_list_iterator_t;

/**
 * Parser memory stack.
 */
typedef struct
{
  parser_mem_data_t data;                     /**< storage space */
  parser_mem_page_t *free_page_p;             /**< space for fast allocation */
} parser_stack_t;

注意

主要的类型目录有两个，在：

JERRY_STATIC_ASSERT 定义在jerry-core\jrt\jrt.h
jerry-core\config.h也定义了很多类型
jerry-core/lit文件夹下面有很多全局宏，比如：LIT_CHAR_LF
token匹配保留字符是由先后的，比如*就在+之前，这就是优先级
jerry-core\parser\js\js-lexer.c和jerry-core\parser\js\js-lexer.h是js关键字的存储地方
lexer和opcode的转化函数：

#define LEXER_BINARY_OP_TOKEN_TO_OPCODE(token_type) \
   ((cbc_opcode_t) ((((token_type) - LEXER_BIT_OR) * 3) + CBC_BIT_OR))

问题

函数的作用域链如何实现的
匿名函数的作用链又如何？
数据和汇编如何交会计算的？
数据和汇编指令都是存储到内存中的，只有到执行的时候才从内存中读取汇编指令，然后再把内存数据放到寄存器中运算。

cisen · 2019-06-16T15:44:00Z

进度

第一天，6.15，完成一个官方引入的编译，学会python的构建脚本，cmake的构建过程

#include "jerryscript.h"

int
main (void)
{
  const jerry_char_t script[] = "var str = 'Hello, World!';";

  bool ret_value = jerry_run_simple (script, sizeof (script) - 1, JERRY_INIT_EMPTY);

  return (ret_value ? 0 : 1);
}

第二天，6.16，完成第一个demo的vscode debug，可以打断点了。并了解c语言的很多语法
第三天，6.17，定位token parse，学习了c的类结构，&取值，想到了token循环解析
第四天，6.18，解决lexer，定位字节码和虚拟机，寻找简化版c编译器和虚拟机
第五天，6.19，翻译jerryscript关于字节码的说明，学习虚拟机的实现原理
第六天，6.20，学习c4 token的生成，学习c4源码，区分字节码和汇编
第七条，6.21，搞定c4指针和部分源码，回过头来定位jerryscript的vm.c的vm_loop函数
第八天，6.22，熟悉c语言的指针和内存，使用gdb调试，while（0）
第九天，6.23，深入vm_loop
第十天，6.24，解决opcode原理，开始分析内存
第十一天，6.25，找到全局三个全局变量JerryScript context，context_p，parser_saved_context_t的定义并翻译注释
第十二天，6.26，找到js的执行最后还是使用c语言运算的，比如加法：case VM_OC_ADD:，分析出ecma_value的定义
第十三天，6.27，研究几个压缩和解压字节码和操作码的函数，定位vm_decode_table的值究竟是如何跟opcode对上的
第十四天，6.28，解决CBC和VM_CODE的关系，解决jerry_context栈和jerry_global_heap堆的关系
第十五天，6.29，完成token的分解，lexe token type生成的过程，以来的全局变量的，比如jerry_context
第十六天，6.30，解决参数名的lexer过程，分析byte_code对象和literal_pool链表，尤其literal_pool链表栈和lit_object堆的联系。
第十七天，7.1，解决了token后生成函数对象的过程，分析出函数的作用域就是循环执行vm_run函数
第十八天，7.2，了解jmem的封装，发现默认不是使用堆而是栈。了解栈帧的结构，解决GC的引用计数type_flags_refs。基本流程摸通，源码解读方式改为解决问题
第十九天，7.3，确定代码字符匹配出token.type，然后token.type转为CBC和literal_pool。执行的时候再由CBC转为OC操作。运算符优先级这块比较复杂，1+2*3定位到VM_OC_GET_LITERAL_LITERAL和编译原理书本
第二十天，7.4，学习编译原理前三章，学会利用递归下降分析LL(1)将源码生成ast的流程。思考扁平化的cbc和literal如何在vm里面执行
第二十一天，7.5，休息一天
第二十二天，7.6，休息
第二十三天，7.7，解决等号和全等好判断问题，this_binding，new，now，undefined等问题。源码分析暂时中止。预计完成度50%。

cisen · 2019-06-19T02:16:23Z

parser

上图显示了JerryScript的主要组件之间的交互：Parser和虚拟机（VM）。解析器将输入ECMAScript应用程序转换为具有指定格式的字节代码（有关详细信息，请参阅字节码和解析器页面）。准备好的字节码由执行解释的虚拟机执行（有关详细信息，请参阅虚拟机和ECMA页面）。

Parser

解析器实现为递归下降解析器。解析器将JavaScript源代码直接转换为字节码，而无需构建抽象语法树。解析器依赖于以下子组件。

lexer

词法分析器将输入字符串（ECMAScript程序）拆分为token序列。它不仅可以向前扫描输入字符串，还可以移动到任意位置。由./jerry-core/parser/js/js-lexer.h中的结构lexer_token_t描述的token结构。

scanner

扫描程序（./jerry-core/parser/js/js-parser-scanner.c）预扫描输入字符串以查找某些token。例如，扫描程序确定关键字是否定义了for或for-for循环。在while循环中读取tokens是不够的，因为斜杠（/）可以表示正则表达式的开始或者是除法运算符。

Expression Parser

表达式解析器负责解析JavaScript表达式。它在./jerry-core/parser/js/js-parser-expr.c中实现。

Statement Parser

JavaScript语句由此组件解析。它使用Expression解析器来解析组成表达式。 Statement解析器的实现位于./jerry-core/parser/js/js-parser-statm.c中。

函数parser_parse_source执行输入ECMAScript源代码的解析和编译。当函数出现在源parser_parse_source中时，调用parser_parse_function，它负责递归地处理函数的源代码，包括参数解析和上下文处理。解析之后，函数parser_post_processing转储创建的操作码并返回指向编译的字节码序列的ecma_compiled_code_t *。

主要组件之间的相互作用如下图所示。

紧凑字节码（ `compact byte-code (CBC)`）

编译代码的格式 Compiled Code Format

结构

包括三部分
- header, 头部包括cbc_compiled_code的几个字段，这几个字段是编译后代码的关键key
- literals，字面量部分，这部分是一个数组，包含了ecma任何数据类型的值，比如数组，函数，正则等。长度存储在头部header的literal_end字段
- CBC instruction list，CBC指令列表是字节代码指令序列，表示编译的代码。

字节码格式，Byte-code Format

结构

包括两个部分：
- opcode, 操作码，每个字节码都是以操作码开头。频繁指令的操作码是一个字节长，罕见指令是两个字节长。罕见指令的第一个字节始终为零（CBC_EXT_OPCODE），第二个字节表示扩展操作码。常见和罕见指令的名称分别以CBC_和CBC_EXT_前缀开头。最大操作码数为511，因为可以定义255个常用（零值排除）和256个稀有指令。目前有大约230种频繁和120种罕见指令可用。
- arguments，参数，参数部分有三种类型：
- - byte argument，字节参数，0到255之间的值，通常表示调用的参数计数，如操作码（函数调用，new，eval等）。
- - literal argument，字面量参数，是一个大于或等于0，小于header的literal_end数值的整数索引，
- - relative branch，相对分支，1-3个字节的长偏移量。 branch参数也可能表示指令范围的结束。例如，CBC_EXT_WITH_CREATE_CONTEXT的分支参数显示with语句的结束。更准确地说是最后一条指令后的位置。
- - 参数组合仅限于以下七种形式：
- - - no arguments，没有参数
- - - a literal argument，一个字面量参数
- - - a byte argument，一个字节的参数
- - - a branch argument，一个分支参数
- - - a byte and a literal arguments
- - - two literal arguments
- - - three literal arguments

字面量，Literals

字面量由代表各种字面量类型的组管理。这种类型分组比每个字面量分配内存要占用更少的内存空间。

这里主要有两个字面量分组：

identifiers，识别码，对变量名称的引用。header的zero到ident_end的字面量就是identifiers，所有这些文字必须是字符串或undefined。undefined只能用于文字名称无法访问的文字。例如，函数（arg，arg）有两个参数，但arg标识符仅引用第二个参数。在这种情况下，第一个参数的名称是未定义的。此外，诸如CSE之类的优化也可能引入没有名称的文字。还有另外两组identifiers的子组
- Registers，是存储在函数调用堆栈中的标识符identifiers
- Arguments，参数identifiers是传给函数调用的
values，对value值的引用。 ident_end和const_literal_end之间的文字是常量值，例如数字或字符串。这些文字可以由虚拟机直接使用。 const_literal_end和literal_end之间的字面量是模板字面量。每次访问其值时都需要构造一个新对象。这些字面量是函数和正则表达式。

CBC中有两种类型的文字编码。两者都是可变长度，其长度为一或两个字节长

small，最长可以编码511字面量
- 编码一个字节，0-254的字面量：byte[0] = literal_index
- 编码两个字节，255-510的字面量：

byte[0] = 0xff
byte[1] = literal_index - 0xff

full，最大可以编码32767个字面量
- 编码一个字节，0-254的字面量：byte[0] = literal_index
  -- 编码两个字节，255-510的字面量：

byte[0] = (literal_index >> 8) | 0x80
byte[1] = (literal_index & 0xff)

由于大多数函数需要少于255个文字，因此small为所有文字提供单字节文字索引。small比full编码消耗更少的空间，但它的范围有限。

字面量存储库，Literal Store

JerryScript没有字面量的全局字符串表，但将它们存储到Literal Store中。在解析阶段，当一个新的字面量出现，校验它的identifier发现之前已经用过时，将不再存储该字符串，但Literal Store中的identifier仍被使用。如果新字面量尚未在Literal Store中，则会插入。

字节码类别 Byte-code Categories

字节码可以分成四类：

push字节码

此类别的字节代码用于将对象放置到堆栈上。由于在CBC中有许多复杂指令，因此还有许多指令用于根据参数的数量和类型将对象压入堆栈。下表列出了一些这些操作码及其简要说明。

byte-code	description
CBC_PUSH_LITERAL	Pushes the value of the given literal argument.
CBC_PUSH_TWO_LITERALS	Pushes the value of the given two literal arguments.
CBC_PUSH_UNDEFINED	Pushes an undefined value.
CBC_PUSH_TRUE	Pushes a logical true.
CBC_PUSH_PROP_LITERAL	Pushes a property whose base object is popped from the stack, and the property name is passed as a literal argument.

call字节码，调用字节码

byte-code	description
CBC_CALL0	Calls a function without arguments. The return value won't be pushed onto the stack.
CBC_CALL1	Calls a function with one argument. The return value won't be pushed onto the stack.
CBC_CALL	Calls a function with n arguments. n is passed as a byte argument. The return value won't be pushed onto the stack.
CBC_CALL0_PUSH_RESULT	Calls a function without arguments. The return value will be pushed onto the stack.
CBC_CALL1_PUSH_RESULT	Calls a function with one argument. The return value will be pushed onto the stack.
CBC_CALL2_PROP	Calls a property function with two arguments. The base object, the property name, and the two arguments are on the stack.

算术运算，逻辑运算，位运算，赋值运算，字节码

byte-code	description
CBC_LOGICAL_NOT	Negates the logical value that popped from the stack. The result is pushed onto the stack.
CBC_LOGICAL_NOT_LITERAL	Negates the logical value that given in literal argument. The result is pushed onto the stack.
CBC_ADD	Adds two values that are popped from the stack. The result is pushed onto the stack.
CBC_ADD_RIGHT_LITERAL	Adds two values. The left one popped from the stack, the right one is given as literal argument.
CBC_ADD_TWO_LITERALS	Adds two values. Both are given as literal arguments.
CBC_ASSIGN	Assigns a value to a property. It has three arguments: base object, property name, value to assign.
CBC_ASSIGN_PUSH_RESULT	Assigns a value to a property. It has three arguments: base object, property name, value to assign. The result will be pushed onto the stack.

Branch，分支字节码

branch就是跳转，分支字节代码用于在字节代码中执行条件和无条件跳转。这些指令的参数是1-3字节长的相对偏移量。它占用的位数是操作码（opcode）位数的一部分，因此每个带有branch参数的字节码都有三种形式。因为偏移是无符号值，方向（前向，后向）也由操作码定义。因此，某些分支指令有六种形式。可以在下表中找到一些示例。

byte-code	description
CBC_JUMP_FORWARD	Jumps forward by the 1 byte long relative offset argument.
CBC_JUMP_FORWARD_2	Jumps forward by the 2 byte long relative offset argument.
CBC_JUMP_FORWARD_3	Jumps forward by the 3 byte long relative offset argument.
CBC_JUMP_BACKWARD	Jumps backward by the 1 byte long relative offset argument.
CBC_JUMP_BACKWARD_2	Jumps backward by the 2 byte long relative offset argument.
CBC_JUMP_BACKWARD_3	Jumps backward by the 3 byte long relative offset argument.
CBC_BRANCH_IF_TRUE_FORWARD	Jumps if the value on the top of the stack is true by the 1 byte long relative offset argument.

Snapshot 快照

编译后的字节码可以保存到快照中，快照可以重新加载和执行。直接执行快照可以节省在parser源码中的内存消耗和性能。快照也可以从ROM执行，在这种情况下，也可以保存将其加载到内存中的开销。（The snapshot can also be executed from ROM, in which case the overhead of loading it into the memory can also be saved.）

虚拟机

虚拟机是一个逐条执行字节码指令的解释器。启动解释的函数是./jerry-core/vm/vm.c中的vm_run。 vm_loop是虚拟机的主循环，具有非递归（不是重复调用自己）的特性。调用函数时，它不会递归地调用自身而是返回调用结果，这样做的好处是它不用负担整个堆栈递归的执行。

ECMA

ECMA组件组要负责引擎的：

数据表示, Data representation
运行时表示, Runtime representation
垃圾回收（GC）

数据表示，Data Representation

数据展示的主要结构体是ECMA_value，最后的两位用于编码数值的类型，比如：simple，number，string，object。结构是：
| value | err | type |

value, 29bit
err，1bit
type，2bit

在数字，字符串和对象的情况下，value可以是一个指针。simple是预定义的常量，可以是：undefined，null，true，false，empty（没有初始化的值）

压缩指针，Compressed Pointers

压缩指针的引入是为了节省堆（heap）空间

这些指针是8字节对齐的16位长指针，可以处理512 Kb的内存，这也是JerryScript堆的最大大小。为了支持更多的内存，压缩指针的大小可以扩展到32位，通过将“--cpointer_32_bit on”传递给构建系统来覆盖32位系统的整个地址空间。这些“未压缩指针”消耗的内存将增加大约20％。

数字，number

根据标准IEEE 754，有两种可能的数字表示形式：默认值为8字节（双精度），但引擎通过设置CONFIG_ECMA_NUMBER_TYPE也支持4字节（单精度）表示。

多个对数字的单分配（allocated）的引用是不支持的。每个引用都拥有自己的数字副本。

字符串

JerryScript中的字符串不仅仅是字符序列，而且还可以包含数字和所谓的魔法ID。对于常见字符序列（在./jerry-core/lit/lit-magic-strings.ini中定义），在只读存储器中有一个表，其中包含魔术ID和字符序列对。如果此表中已存在字符串，则会存储其字符串的魔术ID，而不是字符序列本身。使用数字可加快属性访问速度。这些技术节省了内存。

对象/词汇环境 Object / Lexical Environment

对象可以是传统数据对象或词汇环境对象。与其他数据类型不同，对象可以具有对其他数据类型的引用（称为属性）。由于循环引用，引用计数并不总是足以确定死对象。因此，生成一条根据所有现有对象形成的链表，用于在垃圾回收期间查找未引用的对象。每个对象的gc-next指针显示链表中的下一个已分配对象。

词法环境在JerryScript中实现为对象，因为词法环境包含像对象一样的键值对（称为绑定）。这简化了实现并减少了代码大小。

对象/ Lexicat环境结构

对象表示为以下结构：
引用计数器 - 硬（非属性）引用的数量
垃圾收集器的下一个对象指针
GC的访问标志
类型（功能对象，词汇环境等）

对象的属性 Properties of Objects

对象有包含其属性的链接（指针）列表。该列表实际上包含属性对，以便节省下面情况需要的内存：属性是7位长，其类型字段是2位长，消耗9位，不适合1个字节但消耗2个字节。因此，将两个属性（14位）与2位长类型字段放在一起适合2个字节。

属性Hashmap
如果属性对的数量达到限制（当前此限制被定义为16），则会在属性对列表的第一个位置插入一个hash map（称为Property Hashmap），以便使用它来查找属性，而不是通过遍历属性对来找到它。

属性hashmap包含2^n个元素，其中2^n是大于对象属性的数量。每个元素都可以具有树类型的值有：

null，表示空元素
deleted，表示已删除的属性，或别的
对现有属性的引用

此hashmap是必须返回的类型缓存，这意味着可以使用它找到该对象具有的每个属性。

内部属性
内部属性是特殊属性，它们虽然携带的meta-information无法被JavaScript代码访问，但对引擎本身还是很重要的。下面列出了一些内部属性的示例：

[[Class]] - 对象的类（类型）（ECMA定义）
[[Code]] - 指向查找函数字节码的位置
native code - 指向查找本机函数代码的位置
[[PrimitiveValue]] for Boolean - 存储布尔对象的布尔值
[[PrimitiveValue]] for Number - 存储Number对象的数值

LCache

LCache是一个根据对象和属性名称寻找属性的散列映射。 LCache的object-name-property结构会被频繁调用，如下图所示。

当要访问属性时，会根据所需属性的名称生成一个哈希值，然后用该哈希值在LCache中要到要找的对象和属性

需要注意的是，如果在LCache中找不到需要的属性，并不意味着它不存在（即LCache是一个可能返回的缓存）。如果在LCache中找不到该属性，则会在对象的属性列表中搜索该属性，如果在该对象的属性列表中找到了该属性，该属性就会被放入LCache中。

集合 Collections

集合是为了节省内存而做的类似于数组的数据结构。实际上，集合是一个链表，其元素不是单个元素，而是可以包含多个元素的数组。

错误处理 Exception Handling

为了实现一个基本满足使用的异常处理，JerryScript函数在返回值里面告诉外面错误或“异常”操作。返回值实际上是ECMA值（参见数据表示部分），如果发生错误操作，则设置错误位。

数值管理和所有权 Value Management and Ownership

引擎所存储的每个ECMA值都与虚拟的“所有权”相关联，该“所有权”定义了如何管理值：何时在不再需要时释放它以及如何将值传递给其他函数。

初始状态，值由其所有者（即拥有所有权）分配。所有者负责释放分配的值。当值作为参数传递给函数时，其所有权并不会被传递给函数。被调用的函数必须自己创建这个值的副本才行。但是，只要函数返回一个值，所有权就会发生传递，要想释放它只能靠调用者。

cisen · 2019-06-20T07:17:58Z

jerryscript字节码相关

注意

字节码编码入口在jerry-core\parser\js\js-parser-util.c，比如parser_emit_cbc_literal_from_token函数
字节码执行是在vm里面jerry-core\vm\vm.c的vm_loop函数
vm_loop这个函数是代码执行的关键，要花很多时间
注意使用jerry_run函数才会进入vm里面
vm_loop的执行流程大概是：jerry_run -> vm_run_global -> vm_run -> vm_execute -> vm_loop_init -> vm_loop
说有的字节码的定义都在：jerry-core/parser/js/byte-code.h
jerry-core/jcontext/jcontext.c是jerryscript的全局变量存放地址
内置值的生成

/**
 * Type of ecma value
 */
typedef enum
{
  ECMA_TYPE_DIRECT = 0, /**< directly encoded value, a 28 bit signed integer or a simple value */
  ECMA_TYPE_STRING = 1, /**< pointer to description of a string */
  ECMA_TYPE_FLOAT = 2, /**< pointer to a 64 or 32 bit floating point number */
  ECMA_TYPE_OBJECT = 3, /**< pointer to description of an object */
  ECMA_TYPE_SYMBOL = 4, /**< pointer to description of a symbol */
  ECMA_TYPE_DIRECT_STRING = 5, /**< directly encoded string values */
  ECMA_TYPE_ERROR = 7, /**< pointer to description of an error reference (only supported by C API) */
  ECMA_TYPE_POINTER = ECMA_TYPE_ERROR, /**< a generic aligned pointer */
  ECMA_TYPE_SNAPSHOT_OFFSET = ECMA_TYPE_ERROR, /**< offset to a snapshot number/string */
  ECMA_TYPE___MAX = ECMA_TYPE_ERROR /** highest value for ecma types */
} ecma_type_t;
/**
 * Shift for value part in ecma_value_t
 */
#define ECMA_VALUE_SHIFT 3
/**
 * Ecma simple value type
 */
#define ECMA_DIRECT_TYPE_SIMPLE_VALUE ((1u << ECMA_VALUE_SHIFT) | ECMA_TYPE_DIRECT)

/**
 * Shift for directly encoded values in ecma_value_t
 */
#define ECMA_DIRECT_SHIFT 4
/**
 * ECMA make simple value
 */
#define ECMA_MAKE_VALUE(value) \
  ((((ecma_value_t) (value)) << ECMA_DIRECT_SHIFT) | ECMA_DIRECT_TYPE_SIMPLE_VALUE)

/**
 * Simple ecma values
 */
enum
{
  /**
   * Empty value is implementation defined value, used for representing:
   *   - empty (uninitialized) values
   *   - immutable binding values
   *   - special register or stack values for vm
   * 空值是实现定义值，用于表示：
    *  - 空（未初始化）值
    *  - 不可变的绑定值
    *  -  vm的特殊寄存器或堆栈值
   */
  ECMA_VALUE_EMPTY = ECMA_MAKE_VALUE (0), /**< uninitialized value 8=1000  */
  ECMA_VALUE_ERROR = ECMA_MAKE_VALUE (1), /**< an error is currently thrown 24=11000 */
  ECMA_VALUE_FALSE = ECMA_MAKE_VALUE (2), /**< boolean false 40=101000 */
  ECMA_VALUE_TRUE = ECMA_MAKE_VALUE (3), /**< boolean true 56=111000 */
  ECMA_VALUE_UNDEFINED = ECMA_MAKE_VALUE (4), /**< undefined value 72=1001000 */
  ECMA_VALUE_NULL = ECMA_MAKE_VALUE (5), /**< null value 88=1011000*/
  ECMA_VALUE_ARRAY_HOLE = ECMA_MAKE_VALUE (6), /**< array hole, used for
                                                *   initialization of an array literal 104=1101000 */
  ECMA_VALUE_NOT_FOUND = ECMA_MAKE_VALUE (7), /**< a special value returned by
                                               *   ecma_op_object_find 120=1111000 */
  ECMA_VALUE_REGISTER_REF = ECMA_MAKE_VALUE (8), /**< register reference,
                                                  *   a special "base" value for vm 136=10001000 */
  ECMA_VALUE_IMPLICIT_CONSTRUCTOR = ECMA_MAKE_VALUE (9), /**< special value for bound class constructors 152=10011000 */
};

CBC_OPCODE函数说明，一共有两个该函数，一个在byte-code.c，一个在vm.c

/**
    byte-code.c
 * Compact bytecode definition
 * 紧凑的字节码定义
 */
#define CBC_OPCODE(arg1, arg2, arg3, arg4) \
  ((arg2) | (((arg3) + CBC_STACK_ADJUST_BASE) << CBC_STACK_ADJUST_SHIFT)),

// vm.c
/** Compact bytecode define */
#define CBC_OPCODE(arg1, arg2, arg3, arg4) arg4,

/**
 * Compact byte code (CBC) is a byte code representation
 * of EcmaScript which is designed for low memory
 * environments. Most opcodes are only one or sometimes
 * two byte long so the CBC provides a small binary size.
 *
 * The execution engine of CBC is a stack machine, where
 * the maximum stack size is known in advance for each
 * function.

紧凑字节代码（CBC）是表示EcmaScript的字节码，这样设计的目的是适应低内存环境。 
大多数操作码只有一个或有时两个字节长，所以CBC可以提供了一个小的二进制打包大小。
 
  CBC的执行引擎是一个堆栈机器，在那里每个函数都预先知道最大堆栈大小
 */

/**
 * Byte code flags. Only the lower 5 bit can be used
 * since the stack change is encoded in the upper
 * three bits for each instruction between -4 and 3
 * (except for call / construct opcodes).
字节代码标志。 只能使用低5位，因为堆栈更改是在每个指令上层的三位，在-4和3之间（调用/构造操作码除外）。
 */
#define CBC_STACK_ADJUST_BASE          4
#define CBC_STACK_ADJUST_SHIFT         5
#define CBC_STACK_ADJUST_VALUE(value)  \
  (((value) >> CBC_STACK_ADJUST_SHIFT) - CBC_STACK_ADJUST_BASE)

// u的意思是没有符号位
#define CBC_NO_FLAG                    0x00u
#define CBC_HAS_LITERAL_ARG            0x01u
#define CBC_HAS_LITERAL_ARG2           0x02u
#define CBC_HAS_BYTE_ARG               0x04u
#define CBC_HAS_BRANCH_ARG             0x08u

/* These flags are shared */
#define CBC_FORWARD_BRANCH_ARG         0x10u
#define CBC_POP_STACK_BYTE_ARG         0x10u

CBC_OPCODE_LIST里面的每个OPCODE比如：CBC_EXT_OPCODE，它的值就是它的顺序，注意CBC_BACKWARD_BRANCH每次使用3个数，CBC_OPCODE每次使用1个数。比如，第一部分：CBC_EXT_OPCODE到CBC_BRANCH_IF_STRICT_EQUAL是：0-（20 - 1+ 2*10），即：0-39。第一部分basic opcode是40-74
设置opcode的是通过parser_emit_cbc函数设置
var定义变量的函数parser_parse_var_statement在：jerryscript
context生成是在parser_parse_source函数
opcode就是js代码要执行的操作翻译位c的操作码，也就算js运算的关键，opcode来自frame_ctx_p->byte_code_p，就是byte_code_p栈内容的一条操作

CBC_CODE和VM_OC_CODE的转换

opcode 是CBC_CODE, 其数值经过vm_decode_table变成opcode_data, 然后在经过VM_OC_GROUP_GET_INDEX就可以获得VM_OC_CODE

VM_OC_CODE = vm_decode_table[BC_CODE] & 111_1111;

#define VM_OC_GROUP_MASK 0x7f

/**
 * Extract the "group" opcode.
 * VM_OC_GROUP_MASK 是7
 * 50 & 0x7f = 110010 & 111_1111 = 110010 = 50
 * 0 & 0x7f = 0
 */
#define VM_OC_GROUP_GET_INDEX(O) ((O) & VM_OC_GROUP_MASK)

字节码收集

CBC_DEFINE_VARS 定义var
CBC_INITIALIZE_VAR
CBC_INITIALIZE_VARS

frame_ctx是在vm_run函数定义的，

问题

一条一条的指令如何实现if else？
答：可以一次性取多条指令，直到指令结束
context_p是什么？是一个栈帧，还是多个栈帧？
答：应该是一个栈帧，stack是多个栈
vm_decode_table的值究竟是如何跟opcode对上的？估计CBC_OPCODE_LIST就是kv匹配，原理和运算规则是什么，目前还为确定，因为CBC_OPCODE一直是变化的。
为什么要设置CBC_CODE和VM_CODE?

注意

branch有跳转的意思，比如
parser阶段是没有任何VM_CODE的，只有CBC_CODE。所以CBC是转化码，VM是执行码

cisen · 2019-06-22T09:14:47Z

内存地址相关

简介

https://blog.csdn.net/qq_23079443/article/details/81143877

32位系统最大可用内存2^32个字节，2^10*2^10*2^10*2^2，1024个字节等于1KB 1024个KB等于1MB，1024个MB等于1GB，2^30 = 1GB，2^40 = 1TB， 1111 1111 1111 ...1111，32位共32个*8个1（1Byte = 8bit）
在64位操作系统中: 程序员可以使用的内存只要有前面的48位就可以了。也就是0X7fffffffffffffff(0x7fffffffffff = 01111111111111111111111111111111111111111111111)以下的。
javascript的帧栈：
https://github.com/stacktracejs/stackframe/blob/0d7a328bf05f6867e62245e3c9574aab2c97fd3b/stackframe.js#L29

var stackFrame = new StackFrame({
    functionName: 'funName',
    args: ['args'],
    fileName: 'http://localhost:3000/file.js',
    lineNumber: 1,
    columnNumber: 3288, 
    isEval: true,
    isNative: false,
    source: 'ORIGINAL_STACK_LINE'
});
   var booleanProps = ['isConstructor', 'isEval', 'isNative', 'isToplevel'];
    var numericProps = ['columnNumber', 'lineNumber'];
    var stringProps = ['fileName', 'functionName', 'source'];
    var arrayProps = ['args'];

ecma_init()的调用栈

jmem_heap_alloc_block_internal(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:163)
jmem_heap_gc_and_alloc_block(const size_t size, _Bool ret_null_on_error) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:327)
jmem_heap_alloc_block(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:373)
ecma_alloc_extended_object(size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-alloc.c:113)
ecma_create_object(ecma_object_t * prototype_object_p, size_t ext_object_size, ecma_object_type_t type) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-helpers.c:81)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:385)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:360)
ecma_builtin_get(ecma_builtin_id_t builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:297)
ecma_init_global_lex_env() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/operations/ecma-lex-env.c:42)
ecma_init() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-init-finalize.c:38)
jerry_init(jerry_init_flag_t flags) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/api/jerry.c:185)
main() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/main.c:13)

jmem_heap_alloc_block_internal函数是内存分配的最底层调用，该函数直接从全局的jerry_global_heap堆中申请内存，jerry_global_heap是固定512KB。jerry_global_heap堆的结构如下：

/**
 *  Free region node
 * 自由区域节点
 */
typedef struct
{
  uint32_t next_offset; /**< Offset of next region in list 列表中下一个区域的偏移量 */
  uint32_t size; /**< Size of region 区域大小 */
} jmem_heap_free_t;

struct jmem_heap_t
{
  jmem_heap_free_t first; /**< first node in free region list 自由区域列表中的第一个节点 */
  uint8_t area[JMEM_HEAP_AREA_SIZE]; /**< heap area 头部区域 JMEM_HEAP_AREA_SIZE是512*1024 -8预留8字节*/
};

jmen就是jerry-memory的意思
jerry_context_t是一个独立的栈，不是存在上面jerry_global_heap堆里面的

内部

貌似定义的地址范围是0x7FFFFFFFFFFF，即可

总结

说有的数据变更基本都是变更global_heap，所以一定要搞清楚，全局堆的操作方式，具体见：jmem_heap_alloc_block_internal函数
parser_list的每个项目叫做page
parser_context的literal_pool就是存储很多literal_p的地方，详见：

literal_p = (lexer_literal_t *) parser_list_append (context_p, &context_p->literal_pool);

parser_context.lit_object只是存储当前的literal_object。每个literal_object其实是存在global_heap里面的，详见上面的函数调用了：parser_malloc函数申请heap堆内存

cisen · 2019-06-26T10:08:19Z

变量管理相关

全局变量

jerry_global_context的声明在jerry-core\jcontext\jcontext.h文件
parser_context的声明在jerry-core/parser/js/js-parser.c文件
vm_frame_ctx_t栈帧在jerry-core/vm/vm-defines.h

nacyzhouw · 2019-07-13T07:16:41Z

内存地址相关

简介

https://blog.csdn.net/qq_23079443/article/details/81143877

32位系统最大可用内存2^32个字节，2^10*2^10*2^10*2^2，1024个字节等于1KB 1024个KB等于1MB，1024个MB等于1GB，2^30 = 1GB，2^40 = 1TB， 1111 1111 1111 ...1111，32位共32个*8个1（1Byte = 8bit）

在64位操作系统中: 程序员可以使用的内存只要有前面的48位就可以了。也就是0X7fffffffffffffff(0x7fffffffffff = 01111111111111111111111111111111111111111111111)以下的。

javascript的帧栈：
https://github.com/stacktracejs/stackframe/blob/0d7a328bf05f6867e62245e3c9574aab2c97fd3b/stackframe.js#L29
var stackFrame = new StackFrame({
    functionName: 'funName',
    args: ['args'],
    fileName: 'http://localhost:3000/file.js',
    lineNumber: 1,
    columnNumber: 3288, 
    isEval: true,
    isNative: false,
    source: 'ORIGINAL_STACK_LINE'
});
   var booleanProps = ['isConstructor', 'isEval', 'isNative', 'isToplevel'];
    var numericProps = ['columnNumber', 'lineNumber'];
    var stringProps = ['fileName', 'functionName', 'source'];
    var arrayProps = ['args'];
ecma_init()的调用栈
jmem_heap_alloc_block_internal(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:163)
jmem_heap_gc_and_alloc_block(const size_t size, _Bool ret_null_on_error) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:327)
jmem_heap_alloc_block(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:373)
ecma_alloc_extended_object(size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-alloc.c:113)
ecma_create_object(ecma_object_t * prototype_object_p, size_t ext_object_size, ecma_object_type_t type) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-helpers.c:81)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:385)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:360)
ecma_builtin_get(ecma_builtin_id_t builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:297)
ecma_init_global_lex_env() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/operations/ecma-lex-env.c:42)
ecma_init() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-init-finalize.c:38)
jerry_init(jerry_init_flag_t flags) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/api/jerry.c:185)
main() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/main.c:13)
jmem_heap_alloc_block_internal函数是内存分配的最底层调用，该函数直接从全局的jerry_global_heap堆中申请内存，jerry_global_heap是固定512KB。jerry_global_heap堆的结构如下：
/**
 *  Free region node
 * 自由区域节点
 */
typedef struct
{
  uint32_t next_offset; /**< Offset of next region in list 列表中下一个区域的偏移量 */
  uint32_t size; /**< Size of region 区域大小 */
} jmem_heap_free_t;

struct jmem_heap_t
{
  jmem_heap_free_t first; /**< first node in free region list 自由区域列表中的第一个节点 */
  uint8_t area[JMEM_HEAP_AREA_SIZE]; /**< heap area 头部区域 JMEM_HEAP_AREA_SIZE是512*1024 -8预留8字节*/
};
jmen就是jerry-memory的意思

jerry_context_t是一个独立的栈，不是存在上面jerry_global_heap堆里面的

内部

貌似定义的地址范围是0x7FFFFFFFFFFF，即可

总结

说有的数据变更基本都是变更global_heap，所以一定要搞清楚，全局堆的操作方式，具体见：jmem_heap_alloc_block_internal函数

parser_list的每个项目叫做page

parser_context的literal_pool就是存储很多literal_p的地方，详见：
literal_p = (lexer_literal_t *) parser_list_append (context_p, &context_p->literal_pool);
parser_context.lit_object只是存储当前的literal_object。每个literal_object其实是存在global_heap里面的，详见上面的函数调用了：parser_malloc函数申请heap堆内存

您好，好巧我最近也在研究JerryScript，有个问题请教下。
jerry_global_heap 这个代码中要求8字节对齐，知道是什么原因吗？
JERRY_ASSERT ((uintptr_t) JERRY_HEAP_CONTEXT (area) % JMEM_ALIGNMENT == 0);

cisen · 2019-07-13T09:23:19Z

内存地址相关

简介

https://blog.csdn.net/qq_23079443/article/details/81143877

32位系统最大可用内存2^32个字节，2^10*2^10*2^10*2^2，1024个字节等于1KB 1024个KB等于1MB，1024个MB等于1GB，2^30 = 1GB，2^40 = 1TB， 1111 1111 1111 ...1111，32位共32个*8个1（1Byte = 8bit）

在64位操作系统中: 程序员可以使用的内存只要有前面的48位就可以了。也就是0X7fffffffffffffff(0x7fffffffffff = 01111111111111111111111111111111111111111111111)以下的。

javascript的帧栈：
https://github.com/stacktracejs/stackframe/blob/0d7a328bf05f6867e62245e3c9574aab2c97fd3b/stackframe.js#L29
var stackFrame = new StackFrame({
    functionName: 'funName',
    args: ['args'],
    fileName: 'http://localhost:3000/file.js',
    lineNumber: 1,
    columnNumber: 3288, 
    isEval: true,
    isNative: false,
    source: 'ORIGINAL_STACK_LINE'
});
   var booleanProps = ['isConstructor', 'isEval', 'isNative', 'isToplevel'];
    var numericProps = ['columnNumber', 'lineNumber'];
    var stringProps = ['fileName', 'functionName', 'source'];
    var arrayProps = ['args'];
ecma_init()的调用栈
jmem_heap_alloc_block_internal(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:163)
jmem_heap_gc_and_alloc_block(const size_t size, _Bool ret_null_on_error) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:327)
jmem_heap_alloc_block(const size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/jmem/jmem-heap.c:373)
ecma_alloc_extended_object(size_t size) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-alloc.c:113)
ecma_create_object(ecma_object_t * prototype_object_p, size_t ext_object_size, ecma_object_type_t type) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-helpers.c:81)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:385)
ecma_instantiate_builtin(ecma_builtin_id_t obj_builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:360)
ecma_builtin_get(ecma_builtin_id_t builtin_id) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/builtin-objects/ecma-builtins.c:297)
ecma_init_global_lex_env() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/operations/ecma-lex-env.c:42)
ecma_init() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/ecma/base/ecma-init-finalize.c:38)
jerry_init(jerry_init_flag_t flags) (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/jerry-core/api/jerry.c:185)
main() (/home/cisen/桌面/develop/jerryscript/demo-jerryscript/main.c:13)
jmem_heap_alloc_block_internal函数是内存分配的最底层调用，该函数直接从全局的jerry_global_heap堆中申请内存，jerry_global_heap是固定512KB。jerry_global_heap堆的结构如下：
/**
 *  Free region node
 * 自由区域节点
 */
typedef struct
{
  uint32_t next_offset; /**< Offset of next region in list 列表中下一个区域的偏移量 */
  uint32_t size; /**< Size of region 区域大小 */
} jmem_heap_free_t;

struct jmem_heap_t
{
  jmem_heap_free_t first; /**< first node in free region list 自由区域列表中的第一个节点 */
  uint8_t area[JMEM_HEAP_AREA_SIZE]; /**< heap area 头部区域 JMEM_HEAP_AREA_SIZE是512*1024 -8预留8字节*/
};
jmen就是jerry-memory的意思

jerry_context_t是一个独立的栈，不是存在上面jerry_global_heap堆里面的

内部

貌似定义的地址范围是0x7FFFFFFFFFFF，即可

总结

说有的数据变更基本都是变更global_heap，所以一定要搞清楚，全局堆的操作方式，具体见：jmem_heap_alloc_block_internal函数

parser_list的每个项目叫做page

parser_context的literal_pool就是存储很多literal_p的地方，详见：
literal_p = (lexer_literal_t *) parser_list_append (context_p, &context_p->literal_pool);
parser_context.lit_object只是存储当前的literal_object。每个literal_object其实是存在global_heap里面的，详见上面的函数调用了：parser_malloc函数申请heap堆内存
您好，好巧我最近也在研究JerryScript，有个问题请教下。
jerry_global_heap 这个代码中要求8字节对齐，知道是什么原因吗？
JERRY_ASSERT ((uintptr_t) JERRY_HEAP_CONTEXT (area) % JMEM_ALIGNMENT == 0);

因为的它的存储结构是jmem_heap_free_t，刚好是8字节吧，(建议关注：#535 这个帖子，以后解读基本以那个为主)

kimyLee · 2021-09-01T03:47:30Z

@cisen hello, 我想请教下，

jerryScript build之后会生成什么，c代码吗，能否在window作为其他嵌入式工程比如esp32依赖使用？
在和硬件搭配时候主要是怎么使用的？

cisen · 2021-09-01T11:21:24Z

@cisen hello, 我想请教下，

jerryScript build之后会生成什么，c代码吗，能否在window作为其他嵌入式工程比如esp32依赖使用？

在和硬件搭配时候主要是怎么使用的？

1，build之后是bin二进制文件，相当于多了一个命令，比如node app.js。JS是JIT，jerryscript只是将js字符串转换为c逻辑执行
2，jerry有依赖，这些依赖又依赖操作系统，可以再硬件上装系统。或者自己摸索用ids把程序的内存分布都定位好，看能不能无操作系统（我不懂，估计是不行的）

cisen added javascript JS引擎 labels Jun 14, 2019

cisen added the jerryscript label Jul 3, 2019

nacyzhouw mentioned this issue Jul 15, 2019

> > # 内存地址相关 #560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JerryScript 源码学习相关 #479

JerryScript 源码学习相关 #479

cisen commented Jun 14, 2019 •

edited

Loading

cisen commented Jun 16, 2019 •

edited

Loading

cisen commented Jun 16, 2019 •

edited

Loading

cisen commented Jun 19, 2019 •

edited

Loading

cisen commented Jun 20, 2019 •

edited

Loading

cisen commented Jun 22, 2019 •

edited

Loading

cisen commented Jun 26, 2019 •

edited

Loading

nacyzhouw commented Jul 13, 2019 •

edited by cisen

Loading

内存地址相关

简介

内部

总结

cisen commented Jul 13, 2019 •

edited

Loading

内存地址相关

简介

内部

总结

kimyLee commented Sep 1, 2021

cisen commented Sep 1, 2021

JerryScript 源码学习相关 #479

JerryScript 源码学习相关 #479

Comments

cisen commented Jun 14, 2019 • edited Loading

说明

编译过程

环境要求

编译过程

运行

demo构建过程记录

注意

总结

cisen commented Jun 16, 2019 • edited Loading

源码

数据结构

JerryScript context 全局环境

context_p

每个token的数据结构

jerry_global_heap

parser

lexer

parser_parse_statements

lexer_next_token

lexer_skip_spaces

lexer_parse_identifier

lexer_parse_string

字节码

ecma_value_t

parser阶段的内存结构

注意

问题

cisen commented Jun 16, 2019 • edited Loading

进度

cisen commented Jun 19, 2019 • edited Loading

parser

Parser

lexer

scanner

Expression Parser

Statement Parser

紧凑字节码（ compact byte-code (CBC)）

编译代码的格式 Compiled Code Format

结构

字节码格式，Byte-code Format

结构

字面量，Literals

字面量存储库，Literal Store

字节码类别 Byte-code Categories

push字节码

call字节码，调用字节码

算术运算，逻辑运算，位运算，赋值运算，字节码

Branch，分支字节码

Snapshot 快照

虚拟机

ECMA

数据表示，Data Representation

压缩指针，Compressed Pointers

数字，number

字符串

对象/词汇环境 Object / Lexical Environment

对象的属性 Properties of Objects

LCache

集合 Collections

错误处理 Exception Handling

数值管理和所有权 Value Management and Ownership

cisen commented Jun 20, 2019 • edited Loading

jerryscript字节码相关

注意

CBC_CODE和VM_OC_CODE的转换

字节码收集

问题

注意

cisen commented Jun 22, 2019 • edited Loading

内存地址相关

简介

内部

总结

cisen commented Jun 26, 2019 • edited Loading

变量管理相关

全局变量

cisen commented Jun 14, 2019 •

edited

Loading

cisen commented Jun 16, 2019 •

edited

Loading

cisen commented Jun 16, 2019 •

edited

Loading

cisen commented Jun 19, 2019 •

edited

Loading

紧凑字节码（ `compact byte-code (CBC)`）

cisen commented Jun 20, 2019 •

edited

Loading

cisen commented Jun 22, 2019 •

edited

Loading

cisen commented Jun 26, 2019 •

edited

Loading

nacyzhouw commented Jul 13, 2019 •

edited by cisen

Loading

cisen commented Jul 13, 2019 •

edited

Loading